Oracle7 Server Tuning

Library

Product

Contents

Index

Designing Data Warehouse Applications

Designing Data Warehouse Applications
Introduction
Oracle Data Warehousing Features
Backup and Recovery of the Data Warehouse

Designing Data Warehouse Applications

This chapter introduces integrated Oracle7 features for tuning enterprise scale data warehouses. By intelligently tuning the system, the data layout, and the application, you can build a high performance, scalable data warehouse.

Topics in this chapter include

Introduction

Data warehousing applications process a substantial amount of data by means of many CPU- and I/O-bound, data intensive tasks such as

loading, indexing, and summarizing tables
scanning, joining, sorting, aggregating, and fetching data

The resource required to complete the tasks on many gigabytes of data distinguishes data warehousing applications from other types of data processing. The bulk and complexity of your data may clearly indicate that you need to deploy multiple CPUs, investigate parallel processing, or consider specific data processing features that are directly relevant to the tasks at hand.

For example, in a typical data warehousing application, data intensive tasks might be performed on 100 gigabytes of data. At a processing speed of 0.2 G to 2 G of data per hour per CPU, a single CPU might need from 2 days to more than 2 weeks to perform a task. With more than a single gigabyte of data (certainly with upwards of 10G) you need to consider increasing the number of CPUs.

Similarly, if you need to copy 10 gigabytes of data, consider that using Export/Import might take a single CPU 10 hours. By contrast, using parallel CREATE TABLE AS SELECT on 10 CPUs might take only 1 hour.

Actual processing time depends on many factors, such as the complexity of the queries, the processing speed to which a particular hardware configuration can be tuned, and so on. Always run simple tests on your own system to find out its performance characteristics with regard to particular operations.

Oracle Data Warehousing Features

This section outlines the data warehousing functionality that is integrated into Oracle7 Server. It includes:

See Also: Oracle7 Server Concepts and Oracle7 SQL Reference

Parallel Load

When very large amounts of data must be loaded, the destination table may be unavailable for an unacceptable amount of time. The ability to load data in parallel can dramatically slash the amount of downtime necessary.

See Also: "Using Parallel Load" on page 18-19"

Parallel Index Creation

The ability to create indexes in parallel benefits both DSS and OLTP applications. On extremely large tables, rebuilding an index may take an unacceptable amount of time. Rather than making the table unavailable for long periods of time, DBAs sometimes decide to just live with an old index. However, with the ability to create indexes in parallel, you may be able to drop an index before a period of heavy updating activity, and recreate it before a period when the workload consists mainly of queries.

See Also: "Creating Indexes in Parallel" on page 18-23

Bitmap Indexes

Regular B tree indexes work best when each key or key range references only a few records, such as employee names. Bitmap indexes, by contrast, work best when each key references many records, such as employee gender.

Bitmap indexing provides the same functionality as regular indexes, but uses a different internal representation, which makes it very fast and space efficient. Bitmap indexing benefits data warehousing applications which have large amounts of data and ad hoc queries, but a low level of concurrent transactions. It provides reduced response time for many kinds of ad hoc queries; considerably reduced space usage compared to other indexing techniques; and dramatic performance gains even on very low end hardware. Bitmap indexes can be created in parallel and are completely integrated with cost-based optimization.

See Also: "Bitmap Indexing" on page 8-14"

Star Queries

One type of data warehouse design is known as a "star" schema. This typically consists of one or more very large "fact" tables and a number of much smaller "dimension" or reference tables. A star query is one that joins several of the dimension tables, usually by predicates in the query, to one of the fact tables.

Oracle cost-based optimization will recognize star queries and generate efficient execution plans for them; indeed, you must use cost-based optimization to get efficient star query execution. To enable cost-based optimization, simply ANALYZE your tables and be sure that the OPTIMIZER_MODE initialization parameter is set to its default value of CHOOSE.

See Also: "Optimizing Star Queries" on page -45

"STAR" on page 7-24

Parallel Query

The parallel query feature enables multiple processes to work together simultaneously to process a single query. By dividing the task among multiple server processes, Oracle can execute a SQL statement more quickly than if only one server process were used.

Parallel query can dramatically improve performance for data-intensive data warehousing operations. It helps systems scale in performance when adding hardware resources. The greatest performance benefits are on symmetric multiprocessing (SMP), clustered, or massively parallel systems where query processing can be effectively spread out among many CPUs on a single system.

See Also: Chapter 18, "Parallel Query Tuning"

Appendix C, "Parallel Query Concepts"

"The Parallel Query on OPS" in Oracle Parallel Server Concepts and Administration.

Parallel Aware Optimizer

Knowledge about parallelism is incorporated into the Oracle7 cost-based optimizer. Parallel execution considerations are thus a fundamental component in arriving at query execution plans. In addition, the user can control the trade-off of throughput for response time in plan selection.

The optimizer chooses intelligent defaults for the degree of parallelism based on available processors and the number of disk drives storing data the query will access. Access path choices (such as table scans vs. index access) take into account the degree of parallelism, resulting in plans that are optimized for parallel execution. Execution plans are more scalable, and there is improved correlation between optimizer cost and execution time for parallel query.

The initialization parameter OPTIMIZER_PERCENT_PARALLEL defines the weighting that the optimizer will use to minimize response time in its cost functions.

See Also: "OPTIMIZER_PERCENT_PARALLEL" on page 18-5

ANALYZE

You can analyze the storage characteristics of tables, indexes, and clusters to gather statistics which are then stored in the data dictionary. The optimizer uses these statistics in a cost-based approach to determine the most efficient execution plan for the SQL statements you issue. Note that statistics can be either computed or estimated, depending on the amount of overhead you are willing to allow for this purpose.

See Also: "Step 3: Analyzing Data" on page 18-24

Oracle7 Server Administrator's Guide

Fast Full Index Scan

Fast full index scan is a faster alternative to a full table scan when an existing index already contains all the columns that are needed for the query. It can use multiblock I/O and can be parallelized just like a table scan. The hint INDEX_FFS enforces fast full index scan.

See Also: "FAST FULL SCAN" on page 8-12

PARALLEL CREATE TABLE AS SELECT

The ability to CREATE TABLE AS SELECT in parallel enables you to reorganize extremely large tables efficiently. You might find it prohibitive to take a serial approach to reorganizing or reclaiming space in a table containing tens or thousands of Gigabytes of data. Using Export/Import to perform such a task might result in an unacceptable amount of downtime. If you have a lot of temporary space available, you can use CREATE TABLE AS SELECT to perform such tasks in parallel. With this approach, you must explicitly rebuild indexes and redefine integrity constraints.

See Also: "Creating and Populating Tables in Parallel" on page 18-50

"CREATE TABLE ... AS SELECT in Parallel" in "Parallel Query Concepts"

Partitions and Partition Views

You can avoid downtime with very large or mission-critical tables by using partition views. You create a partition view by dividing a large table into multiple physical tables using partitioning criteria. Then create a UNION-ALL view of the partitions to make the partitioning transparent for queries. Queries that use a key range to select from a partition view retrieve only the partitions that fall within that range. In this way partition views offer significant improvements in availability, administration and table scan performance.

See Also: Chapter 11, "Managing Partition Views"

"Partitioning Data" on page 18-17

Oracle Parallel Server Option

The Oracle Parallel Server option provides benefits important to both OLTP and DSS applications:

application failover
scalable performance
load balancing
multiuser scalability

OPS failover capability (the ability of the application to automatically reconnect if the connection to the database is broken) results in improved availability, a primary benefit for OLTP applications. Likewise, scalability in the number of users that can connect to the database is a major benefit in OLTP environments. OLTP performance on OPS can scale as well, if an application's data is isolated onto a single server.

For data warehousing applications, scalability of performance is a primary benefit of OPS. The architecture of OPS allows parallel query to perform excellent load balancing of work at runtime. If a node in an OPS cluster or MPP is temporarily slowed down, work that was originally assigned to parallel query servers on that node (but not yet commenced by those servers) may be performed by servers on other nodes, hence preventing that node from becoming a serious bottleneck. Even though OPS is a cornerstone of parallel query on clusters and MPPs, in a mostly query environment, the overhead on the distributed lock manager is minimal.

See Also: "Optimizing Parallel Query on Oracle Parallel Server" on page 18-35

Oracle Parallel Server Concepts and Administration

Backup and Recovery of the Data Warehouse

Recoverability has various levels: to recover from disk failure, human error, software failure, fire, and so on, requires different procedures. ORACLE is only part of the solution. Organizations must decide how much to spend on backup and recovery by considering the business cost of a long outage.

The UNRECOVERABLE option is better thought of as "no logging". Even without logging, you can avoid disk failure if you use disk mirroring or RAID technology. If you load your warehouse from tapes every day or week, you might satisfactorily recover from all failures by simply saving copies of the tape in several remote locations and reloading from tape when something goes wrong. At the other end of the spectrum, you could both mirror disks and take backups and archive logs, and maintain a remote standby system. The mirrored disks prevent loss of availability for disk failure, and also protect against total loss in the event of human error (such as dropping the wrong table) or software error (such as disk block corruption). In the event of fire, power failure, or other problems at the primary site, the backup site prevents long outages.

See Also: For more information on recovery and the UNRECOVERABLE option, see the Oracle7 Server Administrator's Guide and Oracle7 Server SQL Reference.

Prev Next

Library

Product

Contents

Index