8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
Approximate Query Processing in Oracle Database 12c Release 2 (12.2)
The APPROX_COUNT_DISTINCT
function was added, but not documented, in Oracle 11g to improve the speed of calculating the number of distinct values (NDV) when gathering statistics using the DBMS_STATS
package. Oracle Database 12c Release 1 (12.1) documented it for the first time, making it a supported feature. Oracle Database 12c Release 2 (12.2) extends the concept of approximate query processing by the addition of new functions and transparent conversion to approximate query processing.
- Approximate Functions
- Convert Exact to Approximate
- Query Transformation
- Approximate Query Processing and Materialized Views
Related articles.
- Converting Exact to Approximate Query Processing in Oracle Database 12.2 Onward
- APPROX_COUNT_DISTINCT : Quick Distinct Count in Oracle Database 12cR1 (12.1.0.2)
- Approximate Top-N Query Processing (APPROX_RANK, APPROX_SUM, APPROX_COUNT) In Oracle Database 18c
Approximate Functions
In Oracle Database 12c Release 2 (12.2) the following functions provide approximate results.
APPROX_COUNT_DISTINCT
APPROX_COUNT_DISTINCT_AGG
APPROX_COUNT_DISTINCT_DETAIL
APPROX_MEDIAN
APPROX_PERCENTILE
APPROX_PERCENTILE_AGG
APPROX_PERCENTILE_DETAIL
TO_APPROX_COUNT_DISTINCT
TO_APPROX_PERCENTILE
The documentation states they "obtain approximate results with negligible deviation from the exact result". If you are writing a new query or refactoring an existing query and approximate results are acceptable, you can use them explicitly.
Convert Exact to Approximate
Having the new approximate query processing is great, but what do you do about all the existing code you have that uses the original calls? You could refactor your code, or you could ask Oracle to convert your exact calls to approximate calls instead.
Oracle Database 12c Release 2 (12.2) includes three new parameters that control approximate query processing, which can be set at the system or session level.
APPROX_FOR_AGGREGATION
: Setting this toTRUE
is the equivalent of settingAPPROX_FOR_COUNT_DISTINCT
toTRUE
andAPPROX_FOR_PERCENTILE
toALL
.APPROX_FOR_COUNT_DISTINCT
: Setting this toTRUE
convertsCOUNT(DISTINCT ...)
calls toAPPROX_COUNT_DISTINCT
calls.APPROX_FOR_PERCENTILE
: This can be set toNONE
PERCENTILE_CONT
,PERCENTILE_CONT DETERMINISTIC
,PERCENTILE_DISC
,PERCENTILE_DISC DETERMINISTIC
,ALL
,ALL DETERMINISTIC
.
We can do a simple test to prove to ourselves this works. Remember, estimated execution plans are not always representative of the actual plans used by a query.
Build a large table.
CONN test/test@pdb1 DROP TABLE t1 PURGE; CREATE TABLE t1 AS SELECT level AS id, 'Description of ' || level as description, TRUNC(sysdate) - TRUNC(DBMS_RANDOM.value(1,10001)) AS created_date FROM dual CONNECT BY level <= 10000; INSERT /*+ APPEND */ INTO t1 SELECT rownum+10000, a.description, a.created_date FROM t1 a CROSS JOIN t1 b WHERE rownum <= 1000000; COMMIT; EXEC DBMS_STATS.gather_table_stats(USER,'T1');
Turn on the approximate query processing and get the estimated execution plan for a regular COUNT(DISTINCT ...)
query.
CONN test/test@pdb1 -- Check session settings. SELECT name, value FROM v$ses_optimizer_env WHERE sid = SYS_CONTEXT('USERENV','SID') AND name LIKE '%approx%'; NAME VALUE ---------------------------------------- ------------------------- approx_for_aggregation false approx_for_count_distinct false approx_for_percentile none SQL> -- Turn on approximate query processing. ALTER SESSION SET approx_for_aggregation = TRUE; -- Check session settings. SELECT name, value FROM v$ses_optimizer_env WHERE sid = SYS_CONTEXT('USERENV','SID') AND name LIKE '%approx%'; NAME VALUE ---------------------------------------- ------------------------- approx_for_aggregation true approx_for_count_distinct true approx_for_percentile all SQL> -- Get an estimated execution plan. SET AUTOTRACE TRACE EXPLAIN SELECT COUNT(DISTINCT id) AS data_count FROM t1; Execution Plan ---------------------------------------------------------- Plan hash value: 3724264953 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 5 | 1390 (1)| 00:00:01 | | 1 | SORT AGGREGATE APPROX| | 1 | 5 | | | | 2 | TABLE ACCESS FULL | T1 | 1010K| 4931K| 1390 (1)| 00:00:01 | ------------------------------------------------------------------------------ SQL>
We can see from the output the estimated plan includes a SORT AGGREGATE APPROX
operation.
Create a new session and run the same test without enabling the approximate query processing.
CONN test/test@pdb1 -- Check session settings. SELECT name, value FROM v$ses_optimizer_env WHERE sid = SYS_CONTEXT('USERENV','SID') AND name LIKE '%approx%'; NAME VALUE ---------------------------------------- ------------------------- approx_for_aggregation false approx_for_count_distinct false approx_for_percentile none SQL> -- Get an estimated execution plan. SET AUTOTRACE TRACE EXPLAIN SELECT COUNT(DISTINCT id) AS data_count FROM t1; Execution Plan ---------------------------------------------------------- Plan hash value: 405047221 ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 13 | | 4472 (1)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 13 | | | | | 2 | VIEW | VW_DAG_0 | 1010K| 12M| | 4472 (1)| 00:00:01 | | 3 | HASH GROUP BY | | 1010K| 4931K| 11M| 4472 (1)| 00:00:01 | | 4 | TABLE ACCESS FULL| T1 | 1010K| 4931K| | 1390 (1)| 00:00:01 | ----------------------------------------------------------------------------------------- SQL>
We can see from the output below the estimated execution plan no longer contains the approximate query processing.
Here are some examples of setting the parameters at the session and system level.
-- SESSION level. ALTER SESSION SET approx_for_aggregation = TRUE; ALTER SESSION SET approx_for_aggregation = FALSE; ALTER SESSION SET approx_for_count_distinct = TRUE; ALTER SESSION SET approx_for_count_distinct = FALSE; ALTER SESSION SET approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC'; ALTER SESSION SET approx_for_percentile = PERCENTILE_DISC; ALTER SESSION SET approx_for_percentile = NONE; -- SYSTEM level. ALTER SYSTEM SET approx_for_aggregation = TRUE; ALTER SYSTEM SET approx_for_aggregation = FALSE; ALTER SYSTEM RESET approx_for_aggregation; ALTER SYSTEM SET approx_for_count_distinct = TRUE; ALTER SYSTEM SET approx_for_count_distinct = FALSE; ALTER SYSTEM RESET approx_for_count_distinct; ALTER SYSTEM SET approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC'; ALTER SYSTEM SET approx_for_percentile = PERCENTILE_DISC; ALTER SYSTEM SET approx_for_percentile = NONE; ALTER SYSTEM RESET approx_for_percentile;
Query Transformation
If you want to see the associated query transformation you can perform a 10053 trace and look at the resulting trace file. As an example, run the following.
CONN test/test@pdb1 SELECT value FROM v$diag_info WHERE name = 'Default Trace File'; VALUE -------------------------------------------------------------------------------- /u01/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_15000.trc SQL> ALTER SESSION SET approx_for_aggregation = TRUE; ALTER SESSION SET EVENTS '10053 trace name context forever'; SELECT COUNT(DISTINCT id) FROM t1; ALTER SESSION SET EVENTS '10053 trace name context off';
Open the trace file displayed by the v$DIAG_INFO query and search for the term "Final query after transformations". You will see something like this.
Final query after transformations:******* UNPARSED QUERY IS ******* SELECT APPROX_COUNT_DISTINCT("T1"."ID") "COUNT(DISTINCTID)" FROM "TEST"."T1" "T1"
Approximate Query Processing and Materialized Views
You can use approximate query processing functions in materialized views, which can subsequently be used for query rewrites.
Create a materialized view based on the test table we used in the previous section, using a query containing the APPROX_COUNT_DISTINCT
function.
CONN test/test@pdb1 DROP MATERIALIZED VIEW data_ndv_mv; CREATE MATERIALIZED VIEW data_ndv_mv ENABLE QUERY REWRITE AS SELECT created_date, APPROX_COUNT_DISTINCT(id) AS data_count FROM t1 GROUP BY created_date; EXEC DBMS_STATS.gather_table_stats(USER, 'data_ndv_mv');
We check the approximate query processing and query rewrite parameters for the session, then check the estimated execution plan for a query against the base table using the APPROX_COUNT_DISTINCT
function.
-- Check session settings. SELECT name, value FROM v$ses_optimizer_env WHERE sid = SYS_CONTEXT('USERENV','SID') AND (name LIKE '%approx%' OR name LIKE '%rewrite%') ORDER BY 1; NAME VALUE ---------------------------------------- ------------------------- approx_for_aggregation false approx_for_count_distinct false approx_for_percentile none query_rewrite_enabled true query_rewrite_integrity enforced SQL> -- Get an estimated execution plan. SET AUTOTRACE TRACE EXPLAIN SELECT created_date, APPROX_COUNT_DISTINCT(id) AS data_count FROM t1 GROUP BY created_date; Execution Plan ---------------------------------------------------------- Plan hash value: 850394339 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6310 | 69410 | 7 (0)| 00:00:01 | | 1 | MAT_VIEW REWRITE ACCESS FULL| DATA_NDV_MV | 6310 | 69410 | 7 (0)| 00:00:01 | -------------------------------------------------------------------------------------------- SQL>
We can see a query rewrite was done to use the materialized view.
Let's see what happens if we use a conventional COUNT(DISTINCT ...)
query against the base table.
SET AUTOTRACE TRACE EXPLAIN SELECT created_date, COUNT(DISTINCT id) AS data_count FROM t1 GROUP BY created_date; Execution Plan ---------------------------------------------------------- Plan hash value: 3640378487 ------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 6310 | 129K| | 6189 (1)| 00:00:01 | | 1 | HASH GROUP BY | | 6310 | 129K| | 6189 (1)| 00:00:01 | | 2 | VIEW | VM_NWVW_1 | 1010K| 20M| | 6189 (1)| 00:00:01 | | 3 | HASH GROUP BY | | 1010K| 12M| 23M| 6189 (1)| 00:00:01 | | 4 | TABLE ACCESS FULL| T1 | 1010K| 12M| | 1392 (1)| 00:00:01 | ------------------------------------------------------------------------------------------ SQL>
We can see the estimated execution plan used the base table, rather than performing a rewrite to use the materialized view.
This time we will enable approximate query processing and try again.
CONN test/test@pdb1 ALTER SESSION SET approx_for_aggregation = TRUE; -- Check session settings. SELECT name, value FROM v$ses_optimizer_env WHERE sid = SYS_CONTEXT('USERENV','SID') AND (name LIKE '%approx%' OR name LIKE '%rewrite%') ORDER BY 1; NAME VALUE ---------------------------------------- ------------------------- approx_for_aggregation true approx_for_count_distinct true approx_for_percentile all query_rewrite_enabled true query_rewrite_integrity enforced SQL> -- Get an estimated execution plan. SET AUTOTRACE TRACE EXPLAIN SELECT created_date, COUNT(DISTINCT id) AS data_count FROM t1 GROUP BY created_date; Execution Plan ---------------------------------------------------------- Plan hash value: 850394339 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 6310 | 69410 | 7 (0)| 00:00:01 | | 1 | MAT_VIEW REWRITE ACCESS FULL| DATA_NDV_MV | 6310 | 69410 | 7 (0)| 00:00:01 | -------------------------------------------------------------------------------------------- SQL>
We can see the estimated execution plan used the materialized view, rather than the base table.
There are some restrictions associated with fast refreshes of materialized views containing approximate query processing functions listed here.
For more information see:
- About Approximate Query Processing
- Creating Materialized Views Based on Approximate Queries
- Rewriting Queries with Exact Functions to Use Materialized Views that Contain Approximate Functions
- Refreshing Materialized Views Based on Approximate Queries
- Converting Exact to Approximate Query Processing in Oracle Database 12.2 Onward
- APPROX_COUNT_DISTINCT : Quick Distinct Count in Oracle Database 12cR1 (12.1.0.2)
- Approximate Top-N Query Processing (APPROX_RANK, APPROX_SUM, APPROX_COUNT) In Oracle Database 18c
Hope this helps. Regards Tim...