8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23ai | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

APPROX_COUNT_DISTINCT : Quick Distinct Count in Oracle Database 12cR1 (12.1.0.2)

The APPROX_COUNT_DISTINCT function was added, but not documented, in Oracle 11g to improve the speed of calculating the number of distinct values (NDV) when gathering statistics using the DBMS_STATS package. Oracle database 12c (12.1.0.2) now includes the function in the documentation, so we are free to use it in our applications as a supported SQL function.

Basic Usage
Performance

Basic Usage

In previous database versions, if we wanted to perform a count of distinct values, we would probably have done something like the following.

SELECT COUNT(DISTINCT object_name) AS obj_count
FROM   all_objects;

 OBJ_COUNT
----------
     47171

1 row selected.

SQL>

This query gives us the exact count of the distinct values based on Oracle's read-consistency model. As such, we see all the committed data, along with any uncommitted changes made by the current session.

In contrast, the new APPROX_COUNT_DISTINCT function does not provide exact results, but should give "negligible deviation from the exact result" according to the documentation.

SELECT APPROX_COUNT_DISTINCT(object_name) AS obj_count
FROM   all_objects;

 OBJ_COUNT
----------
     46788

1 row selected.

SQL>

The function can also be used as part of a group query.

SELECT tablespace_name, APPROX_COUNT_DISTINCT(table_name) AS tab_count
FROM   all_tables
GROUP BY tablespace_name
ORDER BY tablespace_name;


TABLESPACE_NAME                 TAB_COUNT
------------------------------ ----------
SYSAUX                                 73
SYSTEM                                 36
USERS                                   7
                                       44

4 rows selected.

SQL>

Performance

Even in the following simple example we can see a repeatable difference in the speed of the two approaches, but the performance difference does not seem too dramatic.

SET TIMING ON

SELECT COUNT(DISTINCT object_name) AS obj_count
FROM   all_objects;

 OBJ_COUNT
----------
     47171

1 row selected.

Elapsed: 00:00:02.39
SQL>


SELECT APPROX_COUNT_DISTINCT(object_name) AS obj_count
FROM   all_objects;

 OBJ_COUNT
----------
     46788

1 row selected.

Elapsed: 00:00:02.00
SQL>

In reality, the APPROX_COUNT_DISTINCT function is designed to process much bigger workloads than this, so we can create a bigger table to test.

DROP TABLE t1 PURGE;

CREATE TABLE t1 AS
SELECT level AS  data
FROM   dual
CONNECT BY level <= 10000;

INSERT /*+ APPEND */ INTO t1
SELECT a.data FROM t1 a
CROSS JOIN t1 b;

COMMIT;

EXEC DBMS_STATS.gather_table_stats(USER,'T1');

Now we have a table with a little over 100 million rows and 10,000 distinct values. Now we can see the difference in performance of the two methods is more significant.

SET TIMING ON

SELECT COUNT(DISTINCT data) AS data_count
FROM   t1;

DATA_COUNT
----------
     10000

1 row selected.

Elapsed: 00:00:19.66
SQL>


SELECT APPROX_COUNT_DISTINCT(data) AS data_count
FROM   t1;

DATA_COUNT
----------
     10050

1 row selected.

Elapsed: 00:00:10.46
SQL>

Christian Antognini has an interesting blog post about the performance benefits of the APPROX_COUNT_DISTINCT function here, where he shows the elapsed time and memory usage of the APPROX_COUNT_DISTINCT function remains consistent regardless of the number of distinct values in the data set. In contrast, as the number of distinct values increase, the elapsed time and memory usage of the "COUNT(DISTINCT expr)" method increase drastically.

For more information see:

Hope this helps. Regards Tim...

Back to the Top.