8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » Misc » Here

Cost-Based Optimizer (CBO) And Database Statistics

When a valid SQL statement is sent to the server for the first time, Oracle produces an execution plan that describes how to retrieve the necessary data. In older versions of the database this execution plan could be generated using one of two optimizers:

In newer versions of the database the cost-based optimizer is the only option available. If new objects are created, the amount of data or the spread of data in the database changes the statistics will no longer represent the real state of the database so the CBO decision process may be seriously impaired. This article will focus on management of statistics using the DBMS_STATS package, although there will be some mention of legacy methods.

Related articles.

Introduction

If you put 10 Oracle performance gurus in the same room they will all say database statistics are vital for the cost-based optimizer to choose the correct execution plan for a query, but they will all have a different opinion on how to gather those statistics. A couple of quotes that stand out in my mind are:

Neither of these experts are suggesting you never update your stats, just pointing out that in doing so you are altering information the optimizer uses to determine which execution plan is the most efficient. In altering that information it is not unlikely the optimizer may make a different decision. Hopefully it will be the correct decision, but maybe it wont. If you gather statistics for all tables every night, your system will potentially act differently every day. This is the fundamental paradox of gathering statistics.

So what should our statistics strategy be? Here are some suggestions.

Which one of these approaches you take should be decided on a case-by-case basis. Whichever route you take, you will be using the DBMS_STATS package to manage your stats.

Regardless of the approach you take, you need to consider system and fixed object statistics for every database, as these are not gathered by the automatic job.

DBMS_STATS

The DBMS_STATS package was introduced in Oracle 8i and is Oracle's preferred method of gathering statistics. Oracle list a number of benefits to using it including parallel execution, long term storage of statistics and transfer of statistics between servers.

The functionality of the DBMS_STATS package varies greatly between database versions, as do the default parameter settings and the quality of the statistics they generate. It is worth spending some time checking the documentation relevant to your version.

Table and Index Stats

Table statistics can be gathered for the database, schema, table or partition.

EXEC DBMS_STATS.gather_database_stats;
EXEC DBMS_STATS.gather_database_stats(estimate_percent => 15);
EXEC DBMS_STATS.gather_database_stats(estimate_percent => 15, cascade => TRUE);

EXEC DBMS_STATS.gather_schema_stats('SCOTT');
EXEC DBMS_STATS.gather_schema_stats('SCOTT', estimate_percent => 15);
EXEC DBMS_STATS.gather_schema_stats('SCOTT', estimate_percent => 15, cascade => TRUE);

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');
EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES', estimate_percent => 15);
EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES', estimate_percent => 15, cascade => TRUE);

EXEC DBMS_STATS.gather_dictionary_stats;

The ESTIMATE_PERCENT parameter was often used when gathering stats from large segments to reduce the sample size and therefore the overhead of the operation. In Oracle 9i upwards, we also had the option of letting Oracle determine the sample size using the AUTO_SAMPLE_SIZE constant, but this got a bad reputation because the selected sample size was sometimes inappropriate, making the resulting statistics questionable.

In Oracle 11g, the AUTO_SAMPLE_SIZE constant is the preferred (and default) sample size as the mechanism for determining the actual sample size has been improved. In addition, the statistics estimate based on the auto sampling are near to 100% accurate and much faster to gather than in previous versions, as described here.

The CASCADE parameter determines if statistics should be gathered for all indexes on the table currently being analyzed. Prior to Oracle 10g, the default was FALSE, but in 10g upwards it defaults to AUTO_CASCADE, which means Oracle determines if index stats are necessary.

As a result of these modifications to the behavior in the stats gathering, in Oracle 11g upwards, the basic defaults for gathering table stats are satisfactory for most tables.

Index statistics can be gathered explicitly using the GATHER_INDEX_STATS procedure.

EXEC DBMS_STATS.gather_index_stats('SCOTT', 'EMPLOYEES_PK');
EXEC DBMS_STATS.gather_index_stats('SCOTT', 'EMPLOYEES_PK', estimate_percent => 15);

The current statistics information is available from the data dictionary views for the specific objects (DBA, ALL and USER views). Some of these view were added in later releases.

Histogram information is available from the following views.

Table, column and index statistics can be deleted using the relevant delete procedures.

EXEC DBMS_STATS.delete_database_stats;
EXEC DBMS_STATS.delete_schema_stats('SCOTT');
EXEC DBMS_STATS.delete_table_stats('SCOTT', 'EMP');
EXEC DBMS_STATS.delete_column_stats('SCOTT', 'EMP', 'EMPNO');
EXEC DBMS_STATS.delete_index_stats('SCOTT', 'EMP_PK');

EXEC DBMS_STATS.delete_dictionary_stats;

System Stats

Introduced in Oracle 9iR1, the GATHER_SYSTEM_STATS procedure gathers statistics relating to the performance of your systems I/O and CPU. Giving the optimizer this information makes its choice of execution plan more accurate, since it is able to weigh the relative costs of operations using both the CPU and I/O profiles of the system.

There are two possible types of system statistics:

Your current system statistics can be displayed by querying the AUX_STATS$ table.

SELECT pname, pval1 FROM sys.aux_stats$ WHERE sname = 'SYSSTATS_MAIN';

PNAME                               PVAL1
------------------------------ ----------
CPUSPEED
CPUSPEEDNW                           1074
IOSEEKTIM                              10
IOTFRSPEED                           4096
MAXTHR
MBRC
MREADTIM
SLAVETHR
SREADTIM

9 rows selected.

SQL>

If you are running 11.2.0.1 or 11.2.0.2 then check out MOS Note: 9842771.8.

The DELETE_SYSTEM_STATS procedure will delete all workload stats and replace previously gathered noworkload stats with the default values.

EXEC DBMS_STATS.delete_system_stats;

You only need to update your system statistics when something major has happened to your systems hardware or workload profile.

There are two schools of thought about system stats. One side avoid the use of system statistics altogether, favoring the default noworkload stats. The other side suggests providing accurate system statistics. The problem with the latter, is it is very difficult to decide what represents an accurate set of system statistics. Most people seem to favor investigation of systems using a variety of methods, including gathering system stats into a stats table, then manually setting the system statistics using the SET_SYSTEM_STATS procedure.

EXEC DBMS_STATS.set_system_stats('iotfrspeed', 4096);

The available parameter names can be found here.

I would say, if in doubt, use the defaults.

Fixed Object Stats

Introduced in Oracle 10gR1, the GATHER_FIXED_OBJECTS_STATS procedure gathers statistics on the X$ tables, which sit underneath the V$ dynamic performance views. The X$ tables are not really tables at all, but a window on to the memory structures in the Oracle kernel. Fixed object stats are not gathered automatically, so you need to gather them manually at a time when the database is in a representative level of activity.

EXEC DBMS_STATS.gather_fixed_objects_stats;

Major changes to initialization parameters or system activity should signal you to gather fresh stats, but under normal running this does not need to be done on a regular basis.

The stats are removed using the DELETE_FIXED_OBJECTS_STATS procedure.

EXEC DBMS_STATS.delete_fixed_objects_stats;

Locking Stats

To prevent statistics being overwritten, you can lock the stats at schema, table or partition level.

EXEC DBMS_STATS.lock_schema_stats('SCOTT');
EXEC DBMS_STATS.lock_table_stats('SCOTT', 'EMP');
EXEC DBMS_STATS.lock_partition_stats('SCOTT', 'EMP', 'EMP_PART1');

If you need to replace the stats, they must be unlocked.

EXEC DBMS_STATS.unlock_schema_stats('SCOTT');
EXEC DBMS_STATS.unlock_table_stats('SCOTT', 'EMP');
EXEC DBMS_STATS.unlock_partition_stats('SCOTT', 'EMP', 'EMP_PART1');

Locking stats can be very useful to prevent automated jobs from changing them. This is especially useful with tables used for ETL processes. If the stats are gathered when the tables are empty, they will not reflect the real quantity of data during the load process. Instead, either gather stats each time the data is loaded, or gather them once on a full table and lock them.

Transfering Stats

It is possible to transfer statistics between servers allowing consistent execution plans between servers with varying amounts of data. First the statistics must be collected into a statistics table. In the following examples the statistics for the APPSCHEMA user are collected into a new table, STATS_TABLE, which is owned by DBASCHEMA.

EXEC DBMS_STATS.create_stat_table('DBASCHEMA','STATS_TABLE');
EXEC DBMS_STATS.export_schema_stats('APPSCHEMA','STATS_TABLE',NULL,'DBASCHEMA');

This table can then be transfered to another server using your preferred method (Export/Import, SQL*Plus COPY etc.) and the stats imported into the data dictionary as follows.

EXEC DBMS_STATS.import_schema_stats('APPSCHEMA','STATS_TABLE',NULL,'DBASCHEMA');
EXEC DBMS_STATS.drop_stat_table('DBASCHEMA','STATS_TABLE');

Setting Preferences

Since Oracle 10g, many of the default values of parameters for the DBMS_STATS procedures have changed from being hard coded to using preferences. In Oracle 10g, these preferences could be altered using the SET_PARAM procedure.

EXEC DBMS_STATS.set_param('DEGREE', '5');
In 11g, the SET_PARAM procedure was deprecated in favor of a layered approach to preferences. The four levels of preferences are amended with the following procedures.

The available preferences are listed below, along with the available scope (G=Global, D=Database, S=Schema, T=Table).

Preference Description Default (11gR2) Scope Version
CASCADE Determines if index stats should be gathered for the current table (TRUE, FALSE, AUTO_CASCADE). DBMS_STATS.AUTO_CASCADE G, D, S, T 10gR1+
DEGREE Degree of parallelism (integer or DEFAULT_DEGREE). DBMS_STATS.DEFAULT_DEGREE G, D, S, T 10gR1+
ESTIMATE_PERCENT Percentage of rows to sample when gathering stats (0.000001-100 or AUTO_SAMPLE_SIZE). DBMS_STATS.AUTO_SAMPLE_SIZE G, D, S, T 10gR1+
METHOD_OPT Controls column statistics collection and histogram creation. FOR ALL COLUMNS SIZE AUTO G, D, S, T 10gR1+
NO_INVALIDATE Determines if dependent cursors should be invalidated as a result of new stats on objects (TRUE, FALSE or AUTO_INVALIDATE). DBMS_STATS.AUTO_INVALIDATE G, D, S, T 10gR1+
AUTOSTATS_TARGET Determines which objects have stats gathered (ALL, ORACLE, AUTO). AUTO G 10gR2+
GRANULARITY The granularity of stats to be collected on partitioned objects (ALL, AUTO, DEFAULT, GLOBAL, 'GLOBAL AND PARTITION', PARTITION, SUBPARTITION). AUTO G, D, S, T 10gR2+
PUBLISH Determines if gathered stats should be published immediately or left in a pending state (TRUE, FALSE). TRUE G, D, S, T 11gR2+
INCREMENTAL Determines whether incremental stats will be used for global statistics on partitioned objects, rather than generated using table scans (TRUE, FALSE). FALSE G, D, S, T 11gR2+
CONCURRENT Should objects statistics be gathered on multiple objects at once, or one at a time (MANUAL, AUTOMATIC, ALL, OFF). OFF G 12cR1+
GLOBAL_TEMP_TABLE_STATS Should stats on global temporary tables be session-specific or shared between sessions (SHARED, SESSION). SESSION G, D, S 12cR1+
INCREMENTAL_LEVEL Which level of synopses should be collected for incremental partitioned statistics (TABLE, PARTITION). PARTITION G, D, S, T 12cR1+
INCREMENTAL_STALENESS How is staleness of partition statistics determined (USE_STALE_PERCENT, USE_LOCKED_STATS, NULL). NULL G, D, S, T 12cR1+
TABLE_CACHED_BLOCKS The number of blocks cached in the buffer cache during calculation of index cluster factor. Jonathan Lewis recommends "16" as a sensible value. 1 G, D, S, T 12cR1+
OPTIONS Used for the OPTIONS parameter of the GATHER_TABLE_STATS procedure (GATHER, GATHER AUTO). GATHER G, D, S, T 12cR1+

The following shows their basic usage.

EXEC DBMS_STATS.set_global_prefs('AUTOSTATS_TARGET', 'AUTO');
EXEC DBMS_STATS.set_database_prefs('STALE_PERCENT', '15');
EXEC DBMS_STATS.set_schema_prefs('SCOTT','DEGREE', '5');
EXEC DBMS_STATS.set_table_prefs('SCOTT', 'EMP', 'CASCADE', 'FALSE');

Global preferences can be reset and the other layers of preferences deleted using the following procedures.

EXEC DBMS_STATS.reset_global_pref_defaults;
EXEC DBMS_STATS.delete_database_prefs('CASCADE');
EXEC DBMS_STATS.delete_schema_prefs('SCOTT','DEGREE');
EXEC DBMS_STATS.delete_table_prefs('SCOTT', 'EMP', 'CASCADE');

Setting Stats Manually

The DBMS_STATS package provides several procedures for manually setting statistics.

The current stats can be returned using the following procedures.

Be careful when setting stats manually. Possibly the safest approach is to get the current values, amend them as required, then set them. An example of setting column statistics is shown below.

SET SERVEROUTPUT ON
DECLARE
  l_distcnt  NUMBER; 
  l_density  NUMBER;
  l_nullcnt  NUMBER; 
  l_srec     DBMS_STATS.StatRec;
  l_avgclen  NUMBER;
BEGIN
  -- Get current values.
  DBMS_STATS.get_column_stats (
    ownname => 'SCOTT', 
    tabname => 'EMP', 
    colname => 'EMPNO', 
    distcnt => l_distcnt, 
    density => l_density,
    nullcnt => l_nullcnt, 
    srec    => l_srec,
    avgclen => l_avgclen);

  -- Amend values.
  l_srec.minval := UTL_RAW.cast_from_number(7369);
  l_srec.maxval := UTL_RAW.cast_from_number(7934);

  -- Set new values.
  DBMS_STATS.set_column_stats (
    ownname => 'SCOTT', 
    tabname => 'EMP', 
    colname => 'EMPNO', 
    distcnt => l_distcnt, 
    density => l_density,
    nullcnt => l_nullcnt, 
    srec    => l_srec,
    avgclen => l_avgclen);
END;
/

Issues

Legacy Methods for Gathering Database Stats

The information in this section is purely for historical reasons. All statistics management should now be done using the DBMS_STATS package.

Analyze Statement

The ANALYZE statement can be used to gather statistics for a specific table, index or cluster. The statistics can be computed exactly, or estimated based on a specific number of rows, or a percentage of rows.

ANALYZE TABLE employees COMPUTE STATISTICS;
ANALYZE INDEX employees_pk COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 100 ROWS;
ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;

DBMS_UTILITY

The DBMS_UTILITY package can be used to gather statistics for a whole schema or database. Both methods follow the same format as the analyze statement.

EXEC DBMS_UTILITY.analyze_schema('SCOTT','COMPUTE');
EXEC DBMS_UTILITY.analyze_schema('SCOTT','ESTIMATE', estimate_rows => 100);
EXEC DBMS_UTILITY.analyze_schema('SCOTT','ESTIMATE', estimate_percent => 15);

EXEC DBMS_UTILITY.analyze_database('COMPUTE');
EXEC DBMS_UTILITY.analyze_database('ESTIMATE', estimate_rows => 100);
EXEC DBMS_UTILITY.analyze_database('ESTIMATE', estimate_percent => 15);

Refreshing Stale Stats

This involves monitoring the DML operations against individual tables so statistics are only gathered for those tables whose data has changed significantly. This is the default action for the automatic optimizer statistics collection in 10g and above, but if you are using an older version of the database, you may want to read more about this here.

Scheduling Stats

Prior to Oracle 10g, scheduling the gathering of statistics using the DBMS_JOB package ws the easiest way to make sure they were always up to date.

SET SERVEROUTPUT ON
DECLARE
  l_job  NUMBER;
BEGIN
  DBMS_JOB.submit(l_job,
                  'BEGIN DBMS_STATS.gather_schema_stats(''SCOTT''); END;',
                  SYSDATE,
                  'SYSDATE + 1');
  COMMIT;
  DBMS_OUTPUT.put_line('Job: ' || l_job);
END;
/

The above code sets up a job to gather statistics for SCOTT for the current time every day. You can list the current jobs on the server using the DBA_JOBS and DBA_JOBS_RUNNING views.

Existing jobs can be removed using the following.

EXEC DBMS_JOB.remove(X);
COMMIT;

Where 'X' is the number of the job to be removed.

For more information see:

Hope this helps. Regards Tim...

Back to the Top.