8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23ai | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » 18c » Here

Approximate Top-N Query Processing (APPROX_RANK, APPROX_SUM, APPROX_COUNT) In Oracle Database 18c

The APPROX_RANK, APPROX_SUM and APPROX_COUNT functions were introduced in Oracle 18c to allow approximate top-n query processing. This extends the approximate query processing that was introduced in the previous two releases (12cR1 and 12cR2).

Related articles.

Setup

The examples in this article require the following table. The table is populated with random data, so your results may look a little different than those presented here.

DROP TABLE t1 PURGE;

CREATE TABLE t1 AS
SELECT level AS id,
       TRUNC(DBMS_RANDOM.value(1, 5))*10 department,
       UPPER(TO_CHAR(TO_DATE(TRUNC(DBMS_RANDOM.value(1, 6)),'j'), 'jsp')) AS record_type,
       ROUND(DBMS_RANDOM.value(1,51),2) AS record_value
FROM   dual
CONNECT BY level <= 100000;

You can get an idea of your data set using the following query.

COLUMN record_type FORMAT A12

SELECT department,
       record_type,
       COUNT(*) AS record_count,
       SUM(record_value) AS sum_record_val
FROM   t1
GROUP BY department, record_type
ORDER BY department, record_type;

DEPARTMENT RECORD_TYPE  RECORD_COUNT SUM_RECORD_VAL
---------- ------------ ------------ --------------
        10 FIVE                 5012       130044.4
        10 FOUR                 5042      131431.59
        10 ONE                  4912      128534.48
        10 THREE                4922      126185.08
        10 TWO                  5160      133586.67
        20 FIVE                 4816      124278.87
        20 FOUR                 4983      130158.06
        20 ONE                  4961      128156.44
        20 THREE                5049      131219.07
        20 TWO                  5018       129587.6
        30 FIVE                 5059      131168.33
        30 FOUR                 4969      128337.92
        30 ONE                  5042      131607.88
        30 THREE                5115      131455.11
        30 TWO                  4918      127673.23
        40 FIVE                 4965      128567.84
        40 FOUR                 5007      128968.23
        40 ONE                  5017      130735.96
        40 THREE                4944      130084.71
        40 TWO                  5089      131310.33

SQL>

Introduction

In this section I'll keep showing the same query, highlighting different parts of the statement to illustrate the point being made. It's not important what the query actually does at this point. Just focus on the syntax. The examples are all based around the APPROX_SUM function, but the points apply equally to the APPROX_COUNT function.

The approximate top-n query processing currently focuses on scenarios where you are ranking groups of data based on a SUM of values or a COUNT of the number of rows in the group, hence the APPROX_SUM and APPROX_COUNT functions. These functions can only be used in combination with the APPROX_RANK function, which must be present in the HAVING clause of the query, but can also appear in the SELECT list, if you want to display the rank of the data.

SELECT department,
       record_type,
       APPROX_SUM(record_value) AS approx_sum_val,
       APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM   t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;

The same approximate function used in the SELECT list must also appear in the ORDER BY clause of the APPROX_RANK function.

SELECT department,
       record_type,
       APPROX_SUM(record_value) AS approx_sum_val,
       APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM   t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;

The ORDER BY clause is mandatory and must always include the DESC keyword, so this can't be used to perform bottom-n queries like most other methods.

SELECT department,
       record_type,
       APPROX_SUM(record_value) AS approx_sum_val,
       APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM   t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;

The PARTITION BY clause must include all or a subset of the columns in the GROUP BY clause. If there is more than one column in the GROUP BY clause the PARTITION BY clause is mandatory. The PARTITION BY clause can be omitted if there is only a single column in the GROUP BY clause, as this will automatically be the partition.

SELECT department,
       record_type,
       APPROX_SUM(record_value) AS approx_sum_val,
       APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM   t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;

The HAVING clause limits the ranks returned by the query. This is the top-n.

SELECT department,
       record_type,
       APPROX_SUM(record_value) AS approx_sum_val,
       APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM   t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;

The APPROX_SUM and APPROX_COUNT functions can accept a second parameter of 'MAX_ERROR', which causes them to display the maximum error between the actual and approximate sum or count respectively. I've not seen this return anything but "0" in the data sets I've tried.

SELECT department,
       record_type,
       APPROX_SUM(record_value) AS approx_sum_val,
       APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
       APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM   t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 5;

The following sections will provide some examples of the usage.

APPROX_SUM and APPROX_RANK

In this section we will show some examples of using the APPROX_SUM and APPROX_RANK functions together.

The following example displays the top-2 record types, based on the approximate sum of the values per record type across all departments.

SELECT record_type,
       APPROX_SUM(record_value) AS approx_sum_val,
       APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
       APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM   t1
GROUP BY record_type
HAVING APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 4;

RECORD_TYPE  APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL
------------ -------------- ----------------- ---------------
ONE               519034.76                 0               2
TWO               522157.83                 0               1

SQL>

The following example displays the top-2 record types in each department, based on the approximate sum of the values per record type for each department.

SELECT department,
       record_type,
       APPROX_SUM(record_value) AS approx_sum_val,
       APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
       APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM   t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_SUM(record_value) DESC) <= 2
ORDER BY 1, 5;

DEPARTMENT RECORD_TYPE  APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL
---------- ------------ -------------- ----------------- ---------------
        10 TWO                  133631                 0               1
        10 FOUR                 131452                 0               2
        20 THREE                131247                 0               1
        20 FOUR                 130162                 0               2
        30 ONE                  131646                 0               1
        30 THREE                131482                 0               2
        40 TWO                  131315                 0               1
        40 ONE                  130750                 0               2

SQL>

The following example displays the top department, based on the approximate sum of the values per department for all record types.

SELECT department,
       APPROX_SUM(record_value) AS approx_sum_val,
       APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
       APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM   t1
GROUP BY department
HAVING APPROX_RANK(ORDER BY APPROX_SUM(record_value) DESC) <= 1
ORDER BY 1, 4;

DEPARTMENT APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL
---------- -------------- ----------------- ---------------
        30      650242.47                 0               1

SQL>

The following example displays the top department for each record type, based on the approximate sum of the values per department for each record type.

SELECT record_type,
       department,
       APPROX_SUM(record_value) AS approx_sum_val,
       APPROX_SUM(record_value, 'MAX_ERROR') AS approx_sum_val_me,
       APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) AS approx_rank_val
FROM   t1
GROUP BY record_type, department
HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_SUM(record_value) DESC) <= 1
ORDER BY 1, 5;

RECORD_TYPE  DEPARTMENT APPROX_SUM_VAL APPROX_SUM_VAL_ME APPROX_RANK_VAL
------------ ---------- -------------- ----------------- ---------------
FIVE                 30         131171                 0               1
FOUR                 10         131452                 0               1
ONE                  30         131646                 0               1
THREE                30         131482                 0               1
TWO                  10         133631                 0               1

SQL>

APPROX_COUNT and APPROX_RANK

In this section we will show some examples of using the APPROX_COUNT and APPROX_RANK functions together.

The following example displays the top-2 record types, based on the approximate number of rows per record type across all departments.

SELECT record_type,
       APPROX_COUNT(*) AS approx_count_val,
       APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
       APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val
FROM   t1
GROUP BY record_type
HAVING APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) <= 2
ORDER BY 1, 4;

RECORD_TYPE  APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL
------------ ---------------- ------------------- ---------------
THREE                   20030                   0               2
TWO                     20185                   0               1

SQL>

The following example displays the top-2 record types in each department, based on the approximate number of rows per record type in each department.

SELECT department,
       record_type,
       APPROX_COUNT(*) AS approx_count_val,
       APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
       APPROX_RANK(PARTITION BY department ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val
FROM   t1
GROUP BY department, record_type
HAVING APPROX_RANK(PARTITION BY department ORDER BY APPROX_COUNT(*) DESC) <= 2
ORDER BY 1, 5;

DEPARTMENT RECORD_TYPE  APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL
---------- ------------ ---------------- ------------------- ---------------
        10 TWO                      5160                   0               1
        10 FOUR                     5042                   0               2
        20 THREE                    5049                   0               1
        20 TWO                      5018                   0               2
        30 THREE                    5115                   0               1
        30 FIVE                     5059                   0               2
        40 TWO                      5089                   0               1
        40 ONE                      5017                   0               2

SQL>

The following example displays the top department, based on the approximate number of rows of all record types in the department

SELECT department,
       APPROX_COUNT(*) AS approx_count_val,
       APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
       APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val
FROM   t1
GROUP BY department
HAVING APPROX_RANK(ORDER BY APPROX_COUNT(*) DESC) <= 1
ORDER BY 1, 4;

DEPARTMENT APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL
---------- ---------------- ------------------- ---------------
        30            25103                   0               1

SQL>

The following example displays the top department for each record type, based on the approximate number of rows of that record type in the department

SELECT record_type,
       department,
       APPROX_COUNT(*) AS approx_count_val,
       APPROX_COUNT(*, 'MAX_ERROR') AS approx_count_val_me,
       APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_COUNT(*) DESC) AS approx_rank_val
FROM   t1
GROUP BY record_type, department
HAVING APPROX_RANK(PARTITION BY record_type ORDER BY APPROX_COUNT(*) DESC) <= 1
ORDER BY 1, 5;

RECORD_TYPE  DEPARTMENT APPROX_COUNT_VAL APPROX_COUNT_VAL_ME APPROX_RANK_VAL
------------ ---------- ---------------- ------------------- ---------------
FIVE                 30             5059                   0               1
FOUR                 10             5042                   0               1
ONE                  30             5042                   0               1
THREE                30             5115                   0               1
TWO                  10             5160                   0               1

SQL>

For more information see:

Hope this helps. Regards Tim...

Back to the Top.