8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » 12c » Here

Approximate Query Processing in Oracle Database 12c Release 2 (12.2)

The APPROX_COUNT_DISTINCT function was added, but not documented, in Oracle 11g to improve the speed of calculating the number of distinct values (NDV) when gathering statistics using the DBMS_STATS package. Oracle Database 12c Release 1 (12.1) documented it for the first time, making it a supported feature. Oracle Database 12c Release 2 (12.2) extends the concept of approximate query processing by the addition of new functions and transparent conversion to approximate query processing.

Related articles.

Approximate Functions

In Oracle Database 12c Release 2 (12.2) the following functions provide approximate results.

The documentation states they "obtain approximate results with negligible deviation from the exact result". If you are writing a new query or refactoring an existing query and approximate results are acceptable, you can use them explicitly.

Convert Exact to Approximate

Having the new approximate query processing is great, but what do you do about all the existing code you have that uses the original calls? You could refactor your code, or you could ask Oracle to convert your exact calls to approximate calls instead.

Oracle Database 12c Release 2 (12.2) includes three new parameters that control approximate query processing, which can be set at the system or session level.

We can do a simple test to prove to ourselves this works. Remember, estimated execution plans are not always representative of the actual plans used by a query.

Build a large table.

conn test/test@pdb1

drop table t1 purge;

create table t1 as
select level as id,
       'Description of ' || level as description,
       trunc(sysdate) - trunc(dbms_random.value(1,10001)) as created_date
from   dual
connect by level <= 10000;

insert /*+ append */ into t1
select rownum+10000,
       a.description,
       a.created_date
from   t1 a
cross join t1 b
where  rownum <= 1000000;

commit;

exec dbms_stats.gather_table_stats(user,'t1');

Turn on the approximate query processing and get the estimated execution plan for a regular COUNT(DISTINCT ...) query.

conn test/test@pdb1

-- Check session settings.
select name, value 
from   v$ses_optimizer_env 
where  sid = sys_context('userenv','sid') 
and    name like '%approx%';

NAME                                     VALUE
---------------------------------------- -------------------------
approx_for_aggregation                   false
approx_for_count_distinct                false
approx_for_percentile                    none

SQL>


-- Turn on approximate query processing.
alter session set approx_for_aggregation = true;


-- Check session settings.
select name, value 
from   v$ses_optimizer_env 
where  sid = sys_context('userenv','sid') 
and    name like '%approx%';

NAME                                     VALUE
---------------------------------------- -------------------------
approx_for_aggregation                   true
approx_for_count_distinct                true
approx_for_percentile                    all

SQL>


-- Get an estimated execution plan.
set autotrace trace explain

select count(distinct id) as data_count
from   t1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3724264953

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |     1 |     5 |  1390   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE APPROX|      |     1 |     5 |            |          |
|   2 |   TABLE ACCESS FULL   | T1   |  1010K|  4931K|  1390   (1)| 00:00:01 |
------------------------------------------------------------------------------

SQL>

We can see from the output the estimated plan includes a SORT AGGREGATE APPROX operation.

Create a new session and run the same test without enabling the approximate query processing.

conn test/test@pdb1

-- Check session settings.
select name, value 
from   v$ses_optimizer_env 
where  sid = sys_context('userenv','sid') 
and    name like '%approx%';

NAME                                     VALUE
---------------------------------------- -------------------------
approx_for_aggregation                   false
approx_for_count_distinct                false
approx_for_percentile                    none

SQL>


-- Get an estimated execution plan.
set autotrace trace explain

select count(distinct id) as data_count
from   t1;

Execution Plan
----------------------------------------------------------
Plan hash value: 405047221

-----------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |     1 |    13 |       |  4472   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE      |          |     1 |    13 |       |            |          |
|   2 |   VIEW               | VW_DAG_0 |  1010K|    12M|       |  4472   (1)| 00:00:01 |
|   3 |    HASH GROUP BY     |          |  1010K|  4931K|    11M|  4472   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1       |  1010K|  4931K|       |  1390   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

SQL>

We can see from the output below the estimated execution plan no longer contains the approximate query processing.

Here are some examples of setting the parameters at the session and system level.

-- SESSION level.
alter session set approx_for_aggregation = true;
alter session set approx_for_aggregation = false;

alter session set approx_for_count_distinct = true;
alter session set approx_for_count_distinct = false;

alter session set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC';
alter session set approx_for_percentile = percentile_disc;
alter session set approx_for_percentile = none;


-- SYSTEM level.
alter system set approx_for_aggregation = true;
alter system set approx_for_aggregation = false;
alter system reset approx_for_aggregation;

alter system set approx_for_count_distinct = true;
alter system set approx_for_count_distinct = false;
alter system reset approx_for_count_distinct;

alter system set approx_for_percentile = 'PERCENTILE_CONT DETERMINISTIC';
ALTER SYSTEM SET approx_for_percentile = PERCENTILE_DISC;
alter system set approx_for_percentile = none;
alter system reset approx_for_percentile;

Query Transformation

If you want to see the associated query transformation you can perform a 10053 trace and look at the resulting trace file. As an example, run the following.

conn test/test@pdb1

select value from v$diag_info where name = 'Default Trace File';

VALUE
--------------------------------------------------------------------------------
/u01/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_15000.trc

SQL>


alter session set approx_for_aggregation = true;

alter session set events '10053 trace name context forever';

select count(distinct id)
from   t1;

alter session set events '10053 trace name context off';

Open the trace file displayed by the v$DIAG_INFO query and search for the term "Final query after transformations". You will see something like this.

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT APPROX_COUNT_DISTINCT("T1"."ID") "COUNT(DISTINCTID)" FROM "TEST"."T1" "T1"

Approximate Query Processing and Materialized Views

You can use approximate query processing functions in materialized views, which can subsequently be used for query rewrites.

Create a materialized view based on the test table we used in the previous section, using a query containing the APPROX_COUNT_DISTINCT function.

conn test/test@pdb1

drop materialized view data_ndv_mv;

create materialized view data_ndv_mv enable query rewrite as
select created_date, approx_count_distinct(id) as data_count
from   t1
group by created_date;

exec dbms_stats.gather_table_stats(user, 'data_ndv_mv');

We check the approximate query processing and query rewrite parameters for the session, then check the estimated execution plan for a query against the base table using the APPROX_COUNT_DISTINCT function.

-- Check session settings.
select name, value 
from   v$ses_optimizer_env 
where  sid = sys_context('userenv','sid') 
and    (name like '%approx%' or name like '%rewrite%')
order by 1;

NAME                                     VALUE
---------------------------------------- -------------------------
approx_for_aggregation                   false
approx_for_count_distinct                false
approx_for_percentile                    none
query_rewrite_enabled                    true
query_rewrite_integrity                  enforced

SQL>


-- Get an estimated execution plan.
set autotrace trace explain

select created_date, approx_count_distinct(id) as data_count
from   t1
group by created_date;

Execution Plan
----------------------------------------------------------
Plan hash value: 850394339

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  6310 | 69410 |     7   (0)| 00:00:01 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| DATA_NDV_MV |  6310 | 69410 |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

SQL>

We can see a query rewrite was done to use the materialized view.

Let's see what happens if we use a conventional COUNT(DISTINCT ...) query against the base table.

set autotrace trace explain

select created_date, count(distinct id) as data_count
from   t1
group by created_date;

Execution Plan
----------------------------------------------------------
Plan hash value: 3640378487

------------------------------------------------------------------------------------------
| Id  | Operation            | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |           |  6310 |   129K|       |  6189   (1)| 00:00:01 |
|   1 |  HASH GROUP BY       |           |  6310 |   129K|       |  6189   (1)| 00:00:01 |
|   2 |   VIEW               | VM_NWVW_1 |  1010K|    20M|       |  6189   (1)| 00:00:01 |
|   3 |    HASH GROUP BY     |           |  1010K|    12M|    23M|  6189   (1)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| T1        |  1010K|    12M|       |  1392   (1)| 00:00:01 |
------------------------------------------------------------------------------------------

SQL>

We can see the estimated execution plan used the base table, rather than performing a rewrite to use the materialized view.

This time we will enable approximate query processing and try again.

conn test/test@pdb1

alter session set approx_for_aggregation = true;

-- check session settings.
select name, value 
from   v$ses_optimizer_env 
where  sid = sys_context('userenv','sid') 
and    (name like '%approx%' or name like '%rewrite%')
order by 1;

NAME                                     VALUE
---------------------------------------- -------------------------
approx_for_aggregation                   true
approx_for_count_distinct                true
approx_for_percentile                    all
query_rewrite_enabled                    true
query_rewrite_integrity                  enforced

SQL>


-- Get an estimated execution plan.
set autotrace trace explain

select created_date, count(distinct id) as data_count
from   t1
group by created_date;

Execution Plan
----------------------------------------------------------
Plan hash value: 850394339

--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |  6310 | 69410 |     7   (0)| 00:00:01 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| DATA_NDV_MV |  6310 | 69410 |     7   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

SQL>

We can see the estimated execution plan used the materialized view, rather than the base table.

There are some restrictions associated with fast refreshes of materialized views containing approximate query processing functions listed here.

For more information see:

Hope this helps. Regards Tim...

Back to the Top.