8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23ai | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Analytic Function Syntax Enhancements (WINDOW, GROUPS, EXCLUDE) in Oracle Database 21c

This article describes the syntax enhancements to analytic function introduced in Oracle database 21c.

Setup
WINDOW Clause
GROUPS Clause
EXCLUDE Clause
Quick Links

Setup

The examples in this article require the following tables.

--drop table emp purge;

create table emp (
  empno    number(4) constraint pk_emp primary key,
  ename    varchar2(10),
  job      varchar2(9),
  mgr      number(4),
  hiredate date,
  sal      number(7,2),
  comm     number(7,2),
  deptno   number(2) );

insert into emp values (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
insert into emp values (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
insert into emp values (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
insert into emp values (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
insert into emp values (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
insert into emp values (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
insert into emp values (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
insert into emp values (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
insert into emp values (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
insert into emp values (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
insert into emp values (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
insert into emp values (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
insert into emp values (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
insert into emp values (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
commit;


--drop table t1 purge;

create table t1 (id number, value number);

insert into t1 (id, value) values (1, 1);
insert into t1 (id, value) values (2, 2);
insert into t1 (id, value) values (3, 3);
insert into t1 (id, value) values (4, 3);
insert into t1 (id, value) values (5, 4);
insert into t1 (id, value) values (6, 6);
insert into t1 (id, value) values (7, 6);
insert into t1 (id, value) values (8, 7);
insert into t1 (id, value) values (9, 7);
insert into t1 (id, value) values (10, 8);
commit;

WINDOW Clause

In previous releases the window frame was defined as part the analytic function call. The following query uses the FIRST_VALUE analytic function to display the lowest salary in each department, along with the raw data about the employees in the department.

select empno,
       deptno,
       sal,
       first_value(sal)
         over (partition by deptno order by sal) as lowest_in_dept
from   emp;

     EMPNO     DEPTNO        SAL LOWEST_IN_DEPT
---------- ---------- ---------- --------------
      7934         10       1300           1300
      7782         10       2450           1300
      7839         10       5000           1300
      7369         20        800            800
      7876         20       1100            800
      7566         20       2975            800
      7788         20       3000            800
      7902         20       3000            800
      7900         30        950            950
      7654         30       1250            950
      7521         30       1250            950
      7844         30       1500            950
      7499         30       1600            950
      7698         30       2850            950

SQL>

From Oracle 21c onward the window frame can be defined using a separate window clause, which is referenced in the analytic function call. This example uses a separate window clause to define the window frame.

select empno,
       deptno,
       sal,
       first_value(sal) over w1 as lowest_in_dept
from   emp
window w1 as (partition by deptno order by sal);

     EMPNO     DEPTNO        SAL LOWEST_IN_DEPT
---------- ---------- ---------- --------------
      7934         10       1300           1300
      7782         10       2450           1300
      7839         10       5000           1300
      7369         20        800            800
      7876         20       1100            800
      7566         20       2975            800
      7788         20       3000            800
      7902         20       3000            800
      7900         30        950            950
      7654         30       1250            950
      7521         30       1250            950
      7844         30       1500            950
      7499         30       1600            950
      7698         30       2850            950

SQL>

This means multiple analytic functions can reference the same window clause. In this example we add the RANK analytic function to assign a rank to each person in the department based on their salary from lowest to highest.

select empno,
       deptno,
       sal,
       first_value(sal) over w1 as lowest_in_dept,
       rank() over w1 as sal_rank_in_dept
from   emp
window w1 as (partition by deptno order by sal);

     EMPNO     DEPTNO        SAL LOWEST_IN_DEPT SAL_RANK_IN_DEPT
---------- ---------- ---------- -------------- ----------------
      7934         10       1300           1300                1
      7782         10       2450           1300                2
      7839         10       5000           1300                3
      7369         20        800            800                1
      7876         20       1100            800                2
      7566         20       2975            800                3
      7788         20       3000            800                4
      7902         20       3000            800                4
      7900         30        950            950                1
      7654         30       1250            950                2
      7521         30       1250            950                2
      7844         30       1500            950                4
      7499         30       1600            950                5
      7698         30       2850            950                6

SQL>

Multiple window frames can be defined and referenced. The following query displays the lowest and highest salary in each department. It uses the FIRST_VALUE analytic function twice, each referencing a different window frame.

select empno,
       deptno,
       sal,
       first_value(sal) over w1 as lowest_in_dept,
       first_value(sal) over w2 as highest_in_dept
from   emp
window w1 as (partition by deptno order by sal),
       w2 as (partition by deptno order by sal desc);

     EMPNO     DEPTNO        SAL LOWEST_IN_DEPT HIGHEST_IN_DEPT
---------- ---------- ---------- -------------- ---------------
      7934         10       1300           1300            5000
      7782         10       2450           1300            5000
      7839         10       5000           1300            5000
      7369         20        800            800            3000
      7876         20       1100            800            3000
      7566         20       2975            800            3000
      7788         20       3000            800            3000
      7902         20       3000            800            3000
      7900         30        950            950            2850
      7521         30       1250            950            2850
      7654         30       1250            950            2850
      7844         30       1500            950            2850
      7499         30       1600            950            2850
      7698         30       2850            950            2850

SQL>

We can define a common WINDOW clause, and extend it in each analytic function call.

select row_number () over w1 as row_order,
       sal,
       avg(sal) over (w1 rows between unbounded preceding and current row) as avg_rolling,
       avg(sal) over (w1 rows between unbounded preceding and unbounded following) as avg_all
from   emp
window w1 as (order by sal);

 ROW_ORDER        SAL AVG_ROLLING    AVG_ALL
---------- ---------- ----------- ----------
         1        800         800 2073.21429
         2        950         875 2073.21429
         3       1100         950 2073.21429
         4       1250        1025 2073.21429
         5       1250        1070 2073.21429
         6       1300  1108.33333 2073.21429
         7       1500  1164.28571 2073.21429
         8       1600     1218.75 2073.21429
         9       2450  1355.55556 2073.21429
        10       2850        1505 2073.21429
        11       2975  1638.63636 2073.21429
        12       3000  1752.08333 2073.21429
        13       3000  1848.07692 2073.21429
        14       5000  2073.21429 2073.21429

SQL>

It's worth keeping in mind this new syntax is a query transformation. We can see this if we trace the session. Check the trace file for the session.

select value from v$diag_info where  name = 'Default Trace File';

VALUE
----------------------------------------------------------------
/u01/app/oracle/diag/rdbms/cdb1/cdb1/trace/cdb1_ora_23984.trc

1 row selected.

SQL>

Perform a 10053 trace of a statement using the new syntax.

alter session set events '10053 trace name context forever';

select empno,
       deptno,
       sal,
       first_value(sal) over w1 as lowest_in_dept,
       first_value(sal) over w2 as highest_in_dept
from   emp
window w1 as (partition by deptno order by sal),
       w2 as (partition by deptno order by sal desc);
       
alter session set events '10053 trace name context off';

The section of the trace file beginning with "Final query after transformations" shows the statement that was actually processed, after the query transformation.

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "EMP"."EMPNO" "EMPNO",
       "EMP"."DEPTNO" "DEPTNO",
       "EMP"."SAL" "SAL",
       FIRST_VALUE("EMP"."SAL") OVER ( PARTITION BY "EMP"."DEPTNO" ORDER BY "EMP"."SAL"
         RANGE  BETWEEN  UNBOUNDED  PRECEDING  AND  CURRENT ROW ) "LOWEST_IN_DEPT",
       FIRST_VALUE("EMP"."SAL") OVER ( PARTITION BY "EMP"."DEPTNO" ORDER BY "EMP"."SAL" DESC
         RANGE  BETWEEN  UNBOUNDED  PRECEDING  AND  CURRENT ROW ) "HIGHEST_IN_DEPT"
FROM "TESTUSER1"."EMP" "EMP"

As we can see, the statement has been rewritten to a form we might have used prior to 21c.

GROUPS Clause

The GROUPS keyword has been added to the ROWS and RANGE keywords, so now there are three options for defining a windowing clause.

ROWS : We are indicating a specific number of rows relative to the current row, either directly, or via an expression. There is no concept of tied rows. The cutoff is explicitly the row indicated. It can have any number of sort keys, of any ordered types.
RANGE : We are referring to a range of values in a specific column relative to the value in the current row. As a result, Oracle doesn't know how many rows are included in the range until the ordered set is created. There must be a single sort key, of a data type that allows addition and subtraction, such as a numeric, date or interval.
GROUPS : We are dividing the data into groups based on the ordered values. All ties are part of the same group. The current row is part of the current group, so references to preceding or following refer to the preceding or following group, not specific rows. Similar to RANGE, the number of rows in the window are not known until the data is ordered. It can have any number of sort keys, of any ordered types.

The following query compares an average salary "between 1 preceding and current row" using the ROWS, GROUPS and RANGE clauses.

select row_number () over (order by sal) as row_order,
       sal,
       avg(sal) over w1 as avg_rows,
       avg(sal) over w2 as avg_groups,
       avg(sal) over w3 as avg_range
from   emp
window w1 as (order by sal rows between 1 preceding and current row),
       w2 as (order by sal groups between 1 preceding and current row),
       w3 as (order by sal range between 1 preceding and current row);

 ROW_ORDER        SAL   AVG_ROWS AVG_GROUPS  AVG_RANGE
---------- ---------- ---------- ---------- ----------
         1        800        800        800        800
         2        950        875        875        950
         3       1100       1025       1025       1100
         4       1250       1175       1200       1250
         5       1250       1250       1200       1250
         6       1300       1275 1266.66667       1300
         7       1500       1400       1400       1500
         8       1600       1550       1550       1600
         9       2450       2025       2025       2450
        10       2850       2650       2650       2850
        11       2975     2912.5     2912.5       2975
        12       3000     2987.5 2991.66667       3000
        13       3000       3000 2991.66667       3000
        14       5000       4000 3666.66667       5000

SQL>

If we work down the AVG_ROWS column one row at a time we can see we always have the average for salary value of the current row and the row immediately preceding it.

The AVG_GROUPS column starts off the same, but once the current row is 4, we have a salary tie with row 5, so rows 4 and 5 are treated as a single group, with row 3 as the preceding group. Rather returning the average of the salaries from rows 3 and 4, we see the average of salaries from rows 3, 4 and 5, the current group and the preceding group.

For this data the AVG_RANGE column is essentially reporting the salary value. Remember RANGE is a reporting range value, not a row reference. A range of "1 preceding" is the "value-1", not the "row-1". Ties do matter, but you can't see the effect using this data set.

Let's try a different data set by writing a similar query against the T1 table. Now we have increments of "1" in the VALUE column, with a few ties and one gap, because value "5" is missing from the ordered set.

select row_number () over (order by value) as row_order,
       value,
       avg(value) over w1 as avg_rows,
       avg(value) over w2 as avg_groups,
       avg(value) over w3 as avg_range
from   t1
window w1 as (order by value rows between 1 preceding and current row),
       w2 as (order by value groups between 1 preceding and current row),
       w3 as (order by value range between 1 preceding and current row);

 ROW_ORDER      VALUE   AVG_ROWS AVG_GROUPS  AVG_RANGE
---------- ---------- ---------- ---------- ----------
         1          1          1          1          1
         2          2        1.5        1.5        1.5
         3          3        2.5 2.66666667 2.66666667
         4          3          3 2.66666667 2.66666667
         5          4        3.5 3.33333333 3.33333333
         6          6          5 5.33333333          6
         7          6          6 5.33333333          6
         8          7        6.5        6.5        6.5
         9          7          7        6.5        6.5
        10          8        7.5 7.33333333 7.33333333

SQL>

The AVG_ROWS column displays the average between the current row and the preceding row.

The AVG_GROUPS column looks different at row 3. Row 3 and 4 have a tie for the value of "3". This means row 3 and 4 are a single group and row 2 with the value "2" is the previous group. As a result we get the average for rows 2, 3 and 4 returned for rows 3 and 4. We have a similar situation with rows 6 and 7, which have a tie for the value "6" and are handled as a single group. The previous group jumps the missing value "5" and is a group with the value "4".

The AVG_RANGE column looks similar to the AVG_GROUPS column until we hit a gap in the value sequence. Rows 6 and 7 have a tie for the value "6". The range is from "6-1=5" to "6", but there is no value of "5" in the ordered set, so we are presented with the average of rows 6 and 7 only when the current row is either row 6 or 7.

EXCLUDE Clause

The EXCLUDE clause has several options to allow us to exclude various rows from to the window frame. The following options are available.

EXCLUDE NO OTHERS : Nothing is excluded. This is the equivalent not having an EXCLUDE clause, so this is the default behaviour.
EXCLUDE CURRENT ROW : Excludes only the current row.
EXCLUDE GROUP : Excludes the current group, the current row and all other rows with the same value.
EXCLUDE TIES : Remove all rows with the same value as the current row, but does not exclude the current row.

All these exclusions can be used with ROWS, RANGE and GROUPS window definitions.

The following query calculates the average of the values from one row preceding and one row following, but excludes the current row.

select row_number () over w1 as row_order,
       value,
       avg(value) over (w1 rows between 1 preceding and 1 following exclude current row) as ex_current_row
from t1
window w1 as (order by value);

 ROW_ORDER      VALUE EX_CURRENT_ROW
---------- ---------- --------------
         1          1              2
         2          2              2
         3          3            2.5
         4          3            3.5
         5          4            4.5
         6          6              5
         7          6            6.5
         8          7            6.5
         9          7            7.5
        10          8              7
SQL>

The following query calculates the average of the values from one row preceding and one row following, but excludes the current group. Using a row definition for the main window and a group exclusion leads to an interesting result.

select row_number () over w1 as row_order,
       value,
       avg(value) over (w1 rows between 1 preceding and 1 following exclude group) as ex_group
from t1
window w1 as (order by value);

 ROW_ORDER      VALUE   EX_GROUP
---------- ---------- ----------
         1          1          2
         2          2          2
         3          3          2
         4          3          4
         5          4        4.5
         6          6          4
         7          6          7
         8          7          6
         9          7          8
        10          8          7
SQL>

The following query calculates a rolling average using a range from unbounded preceding to current row, but excludes ties, so the reporting range doesn't do the normal processing of including all ties with the current row.

select row_number () over w1 as row_order,
       value,
       avg(value) over (w1 range between unbounded preceding and current row exclude ties) as ex_ties
from t1
window w1 as (order by value);

 ROW_ORDER      VALUE    EX_TIES
---------- ---------- ----------
         1          1          1
         2          2        1.5
         3          3          2
         4          3          2
         5          4        2.6
         6          6 3.16666667
         7          6 3.16666667
         8          7          4
         9          7          4
        10          8        4.7
SQL>

The following query calculates a rolling average using a range from unbounded preceding to current row, but excludes no other rows, which is the equivalent on not having an EXCLUDE clause.

select row_number () over w1 as row_order,
       value,
       avg(value) over (w1 range between unbounded preceding and current row
                           exclude no others) as ex_no_others
from t1
window w1 as (order by value);

 ROW_ORDER      VALUE EX_NO_OTHERS
---------- ---------- ------------
         1          1            1
         2          2          1.5
         3          3         2.25
         4          3         2.25
         5          4          2.6
         6          6   3.57142857
         7          6   3.57142857
         8          7   4.33333333
         9          7   4.33333333
        10          8          4.7
SQL>

Quick Links

The "*" indicates the function supports the full analytic syntax, including the windowing clause.

AVG *	BIT_AND_AGG *	BIT_OR_AGG *	BIT_XOR_AGG *	CHECKSUM *
CLUSTER_DETAILS	CLUSTER_DISTANCE	CLUSTER_ID	CLUSTER_SET	CORR *
COUNT *	COVAR_POP *	COVAR_SAMP *	CUME_DIST	DENSE_RANK
FEATURE_DETAILS	FEATURE_ID	FEATURE_SET	FEATURE_VALUE	FIRST
FIRST_VALUE *	KURTOSIS_POP *	KURTOSIS_SAMP *	LAG	LAST
LAST_VALUE *	LEAD	LISTAGG	MATCH_RECOGNIZE	MAX *
MEDIAN	MIN *	NTH_VALUE *	NTILE	PERCENT_RANK
PERCENTILE_CONT	PERCENTILE_DISC	PREDICTION	PREDICTION_COST	PREDICTION
PREDICTION_COST	PREDICTION_DETAILS	PREDICTION_PROBABILITY	PREDICTION_SET	RANK
RATIO_TO_REPORT	REGR_ (Linear Regression) Functions *	ROW_NUMBER	SKEWNESS_POP *	SKEWNESS_SAMP *
STDDEV *	STDDEV_POP *	STDDEV_SAMP *	SUM *	VAR_POP *
VAR_SAMP *	VARIANCE *	String Aggregation	Top-N Queries

For more information see:

Hope this helps. Regards Tim...

Back to the Top.