8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23ai | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
Approximate Top-N Query Processing (APPROX_RANK, APPROX_SUM, APPROX_COUNT) In Oracle Database 18c
The APPROX_RANK
, APPROX_SUM
and APPROX_COUNT
functions were introduced in Oracle 18c to allow approximate top-n query processing. This extends the approximate query processing that was introduced in the previous two releases (12cR1 and 12cR2).
Related articles.
- Top-N Queries
- APPROX_COUNT_DISTINCT : Quick Distinct Count in Oracle Database 12cR1 (12.1.0.2)
- Approximate Query Processing in Oracle Database 12c Release 2 (12.2)
- Analytic Functions : All Articles
Setup
The examples in this article require the following table. The table is populated with random data, so your results may look a little different than those presented here.
DROP TABLE t1 PURGE; CREATE TABLE t1 AS SELECT level AS id, TRUNC(DBMS_RANDOM.value(1, 5))*10 department, UPPER(TO_CHAR(TO_DATE(TRUNC(DBMS_RANDOM.value(1, 6)),'j'), 'jsp')) AS record_type, ROUND(DBMS_RANDOM.value(1,51),2) AS record_value FROM dual CONNECT BY level <= 100000;
You can get an idea of your data set using the following query.
COLUMN record_type FORMAT A12 SELECT department, record_type, COUNT(*) AS record_count, SUM(record_value) AS sum_record_val FROM t1 GROUP BY department, record_type ORDER BY department, record_type; DEPARTMENT RECORD_TYPE RECORD_COUNT SUM_RECORD_VAL ---------- ------------ ------------ -------------- 10 FIVE 5012 130044.4 10 FOUR 5042 131431.59 10 ONE 4912 128534.48 10 THREE 4922 126185.08 10 TWO 5160 133586.67 20 FIVE 4816 124278.87 20 FOUR 4983 130158.06 20 ONE 4961 128156.44 20 THREE 5049 131219.07 20 TWO 5018 129587.6 30 FIVE 5059 131168.33 30 FOUR 4969 128337.92 30 ONE 5042 131607.88 30 THREE 5115 131455.11 30 TWO 4918 127673.23 40 FIVE 4965 128567.84 40 FOUR 5007 128968.23 40 ONE 5017 130735.96 40 THREE 4944 130084.71 40 TWO 5089 131310.33 SQL>
Introduction
In this section I'll keep showing the same query, highlighting different parts of the statement to illustrate the point being made. It's not important what the query actually does at this point. Just focus on the syntax. The examples are all based around the APPROX_SUM
function, but the points apply equally to the APPROX_COUNT
function.
The approximate top-n query processing currently focuses on scenarios where you are ranking groups of data based on a SUM
of values or a COUNT
of the number of rows in the group, hence the APPROX_SUM
and APPROX_COUNT
functions. These functions can only be used in combination with the APPROX_RANK
function, which must be present in the HAVING
clause of the query, but can also appear in the SELECT
list, if you want to display the rank of the data.
SELECT department, record_type, APPROX_SUM(record_value) AS approx_sum_val, APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val FROM t1 GROUP BY department, record_type HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2 ORDER BY 1, 4;
The same approximate function used in the SELECT
list must also appear in the ORDER BY
clause of the APPROX_RANK
function.
SELECT department, record_type, APPROX_SUM(record_value) AS approx_sum_val, APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val FROM t1 GROUP BY department, record_type HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2 ORDER BY 1, 4;
The ORDER BY
clause is mandatory and must always include the DESC
keyword, so this can't be used to perform bottom-n queries like most other methods.
SELECT department, record_type, APPROX_SUM(record_value) AS approx_sum_val, APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val FROM t1 GROUP BY department, record_type HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2 ORDER BY 1, 4;
The PARTITION BY
clause must include all or a subset of the columns in the GROUP BY
clause. If there is more than one column in the GROUP BY
clause the PARTITION BY
clause is mandatory. The PARTITION BY
clause can be omitted if there is only a single column in the GROUP BY
clause, as this will automatically be the partition.
SELECT department, record_type, APPROX_SUM(record_value) AS approx_sum_val, APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val FROM t1 GROUP BY department, record_type HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2 ORDER BY 1, 4;
The HAVING
clause limits the ranks returned by the query. This is the top-n.
SELECT department, record_type, APPROX_SUM(record_value) AS approx_sum_val, APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val FROM t1 GROUP BY department, record_type HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2 ORDER BY 1, 4;
The APPROX_SUM
and APPROX_COUNT
functions can accept a second parameter of 'MAX_ERROR', which causes them to display the maximum error between the actual and approximate sum or count respectively. I've not seen this return anything but "0" in the data sets I've tried.
SELECT department, record_type, APPROX_SUM(record_value) AS approx_sum_val, APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me, APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val FROM t1 GROUP BY department, record_type HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2 ORDER BY 1, 5;
The following sections will provide some examples of the usage.
APPROX_SUM and APPROX_RANK
In this section we will show some examples of using the APPROX_SUM
and APPROX_RANK
functions together.
The following example displays the top-2 record types, based on the approximate sum of the values per record type across all departments.
SELECT record_type, APPROX_SUM(record_value) AS approx_sum_val, APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me, APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val FROM t1 GROUP BY record_type HAVING APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) <= 2 ORDER BY 1, 4; RECORD_TYPE APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL ------------ -------------- ----------------- --------------- ONE 519034.76 0 2 TWO 522157.83 0 1 SQL>
The following example displays the top-2 record types in each department, based on the approximate sum of the values per record type for each department.
SELECT department, record_type, APPROX_SUM(record_value) AS approx_sum_val, APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me, APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val FROM t1 GROUP BY department, record_type HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2 ORDER BY 1, 5; DEPARTMENT RECORD_TYPE APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL ---------- ------------ -------------- ----------------- --------------- 10 TWO 133631 0 1 10 FOUR 131452 0 2 20 THREE 131247 0 1 20 FOUR 130162 0 2 30 ONE 131646 0 1 30 THREE 131482 0 2 40 TWO 131315 0 1 40 ONE 130750 0 2 SQL>
The following example displays the top department, based on the approximate sum of the values per department for all record types.
SELECT department, APPROX_SUM(record_value) AS approx_sum_val, APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me, APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val FROM t1 GROUP BY department HAVING APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) <= 1 ORDER BY 1, 4; DEPARTMENT APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL ---------- -------------- ----------------- --------------- 30 650242.47 0 1 SQL>
The following example displays the top department for each record type, based on the approximate sum of the values per department for each record type.
SELECT record_type, department, APPROX_SUM(record_value) AS approx_sum_val, APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me, APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val FROM t1 GROUP BY record_type, department HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) <= 1 ORDER BY 1, 5; RECORD_TYPE DEPARTMENT APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL ------------ ---------- -------------- ----------------- --------------- FIVE 30 131171 0 1 FOUR 10 131452 0 1 ONE 30 131646 0 1 THREE 30 131482 0 1 TWO 10 133631 0 1 SQL>
APPROX_COUNT and APPROX_RANK
In this section we will show some examples of using the APPROX_COUNT
and APPROX_RANK
functions together.
The following example displays the top-2 record types, based on the approximate number of rows per record type across all departments.
SELECT record_type, APPROX_COUNT(*) AS approx_count_val, APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me, APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val FROM t1 GROUP BY record_type HAVING APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) <= 2 ORDER BY 1, 4; RECORD_TYPE APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL ------------ ---------------- ------------------- --------------- THREE 20030 0 2 TWO 20185 0 1 SQL>
The following example displays the top-2 record types in each department, based on the approximate number of rows per record type in each department.
SELECT department, record_type, APPROX_COUNT(*) AS approx_count_val, APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me, APPROX_RANK(PARTITION BY department ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val FROM t1 GROUP BY department, record_type HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_COUNT(*) DESC) <= 2 ORDER BY 1, 5; DEPARTMENT RECORD_TYPE APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL ---------- ------------ ---------------- ------------------- --------------- 10 TWO 5160 0 1 10 FOUR 5042 0 2 20 THREE 5049 0 1 20 TWO 5018 0 2 30 THREE 5115 0 1 30 FIVE 5059 0 2 40 TWO 5089 0 1 40 ONE 5017 0 2 SQL>
The following example displays the top department, based on the approximate number of rows of all record types in the department
SELECT department, APPROX_COUNT(*) AS approx_count_val, APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me, APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val FROM t1 GROUP BY department HAVING APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) <= 1 ORDER BY 1, 4; DEPARTMENT APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL ---------- ---------------- ------------------- --------------- 30 25103 0 1 SQL>
The following example displays the top department for each record type, based on the approximate number of rows of that record type in the department
SELECT record_type, department, APPROX_COUNT(*) AS approx_count_val, APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me, APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val FROM t1 GROUP BY record_type, department HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_COUNT(*) DESC) <= 1 ORDER BY 1, 5; RECORD_TYPE DEPARTMENT APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL ------------ ---------- ---------------- ------------------- --------------- FIVE 30 5059 0 1 FOUR 10 5042 0 1 ONE 30 5042 0 1 THREE 30 5115 0 1 TWO 10 5160 0 1 SQL>
For more information see:
- APPROX_RANK
- APPROX_COUNT
- APPROX_SUM
- Top-N Queries
- APPROX_COUNT_DISTINCT : Quick Distinct Count in Oracle Database 12cR1 (12.1.0.2)
- Approximate Query Processing in Oracle Database 12c Release 2 (12.2)
- Analytic Functions : All Articles
Hope this helps. Regards Tim...