8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
Multivalue Function-Based Indexes for JSON_EXISTS in Oracle Database 21c
Multivalue function-based indexes allow us to index multiple scalar values in JSON documents stored in the JSON data type. This can improve the performance of queries containing the JSON_EXISTS
condition.
- Setup
- Non-Composite Multivalue Indexes for Arrays
- Data Type Conversion Methods
- Composite Multivalue Indexes Using JSON_TABLE
- Considerations
Related articles.
- Multivalue Function-Based Indexes for JSON_EXISTS
- JSON Quick Links : Introduction, 12cR1, 12cR2, 18c, 19c, 21c, All Articles
Setup
Create and populate a test table with some JSON data.
-- drop table t1 purge; create table t1 ( id number generated always as identity, name varchar2(20), json_data json, constraint t1_pk primary key (id) ); begin insert into t1 (name, json_data) values ('book 1', json('{"words":[ {"word":"apple","pages":[5, 10, 15, 20, 25, 30]}, {"word":"orange","pages":[10, 20, 30, 40, 50, 60]} ]}')); insert into t1 (name, json_data) values ('book 2', json('{"words":[ {"word":"apple","pages":[1, 6, 11, 16, 21, 26]}, {"word":"orange","pages":[11, 21, 31, 41, 51, 61]} ]}')); commit; dbms_stats.gather_table_stats(null, 't1'); end; /
Each row represents a book, with an array of words. For each word there is an array of pages that word can be found on.
Between each test we will flush the shared pool to make sure we get a hard parse each time. We use SET AUTOTRACE TRACE EXPLAIN
to display the execution plan when we run queries, but not clutter the screen with the output rows or the statistics.
conn sys/SysPassword1@//localhost:1521/pdb1 as sysdba alter system flush shared_pool; conn testuser1/testuser1@//localhost:1521/pdb1 set autotrace trace explain
Non-Composite Multivalue Indexes for Arrays
We use the following query to check for a row which has a reference to page 40. It's a bit of a silly query, since we are not checking what word is referenced on that page, but it serves our purpose for this test.
conn sys/SysPassword1@//localhost:1521/pdb1 as sysdba alter system flush shared_pool; conn testuser1/testuser1@//localhost:1521/pdb1 set autotrace trace explain select * from t1 where json_exists(json_data, '$.words.pages?(@.number() == 40)'); Execution Plan ---------------------------------------------------------- Plan hash value: 3617692013 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 183 | 36 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T1 | 1 | 183 | 36 (0)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(JSON_EXISTS2("JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.words.pages?(@.number() == 40)' FALSE ON ERROR)=1) SQL>
There are no indexes present, so we get a full table scan to retrieve the data.
We create an index on the pages array, but this doesn't use the MULTIVALUE
keyword. As a result the index is not used and the query still requires a full table scan.
drop index t1_mvi; create index t1_mvi on t1 t (t.json_data.words.pages.number());
We run the query again and it still uses a full table scan.
conn sys/SysPassword1@//localhost:1521/pdb1 as sysdba alter system flush shared_pool; conn testuser1/testuser1@//localhost:1521/pdb1 set autotrace trace explain select * from t1 where json_exists(json_data, '$.words.pages?(@.number() == 40)'); Execution Plan ---------------------------------------------------------- Plan hash value: 3617692013 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 183 | 36 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T1 | 1 | 183 | 36 (0)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(JSON_EXISTS2("JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.words.pages?(@.number() == 40)' FALSE ON ERROR)=1) SQL>
We recreate the index using the MULTIVALUE
keyword, so now it can index the array properly. This is a non-composite multivalue index, so we must use a data type conversion method. In this case we use the "number()" method.
drop index t1_mvi; create multivalue index t1_mvi on t1 t (t.json_data.words.pages.number());
We run the same query again, but this time notice the INDEX RANGE SCAN (MULTI VALUE)
operation.
conn sys/SysPassword1@//localhost:1521/pdb1 as sysdba alter system flush shared_pool; conn testuser1/testuser1@//localhost:1521/pdb1 set autotrace trace explain select * from t1 where json_exists(json_data, '$.words.pages?(@.number() == 40)'); Execution Plan ---------------------------------------------------------- Plan hash value: 1854116654 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 185 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 185 | 2 (0)| 00:00:01 | | 2 | HASH UNIQUE | | 1 | 185 | | | |* 3 | INDEX RANGE SCAN (MULTI VALUE) | T1_MVI | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(JSON_QUERY("JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.words.pages.number()' RETURNING NUMBER ASIS WITHOUT ARRAY WRAPPER ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH MULTIVALUE)=40) SQL>
The index can still be used if it only represents part of our search term. In the following query we are looking for books with the word "apple" on the 25th page.
conn sys/SysPassword1@//localhost:1521/pdb1 as sysdba alter system flush shared_pool; conn testuser1/testuser1@//localhost:1521/pdb1 set autotrace trace explain select * from t1 t where json_exists(json_data,'$.words[*]?(@.word == "apple" && @.pages.number() == 25)'); Execution Plan ---------------------------------------------------------- Plan hash value: 1854116654 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 185 | 2 (0)| 00:00:01 | |* 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 185 | 2 (0)| 00:00:01 | | 2 | HASH UNIQUE | | 1 | 185 | | | |* 3 | INDEX RANGE SCAN (MULTI VALUE) | T1_MVI | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(JSON_EXISTS2("JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.words[*]?(@.word == "apple" && @.pages.number() == 25)' FALSE ON ERROR)=1) 3 - access(JSON_QUERY("JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.words.pages.number()' RETURNING NUMBER ASIS WITHOUT ARRAY WRAPPER ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH MULTIVALUE)=25) SQL>
Data Type Conversion Methods
We mentioned previously that non-composite multivalue indexes must include a data type conversion method. The list of data type conversion methods are show below, but the full list of item methods and their descriptions can be found here.
- binary()
- boolean()
- booleanOnly()
- date()
- dateWithTime()
- double()
- dsInterval()
- float()
- number()
- numberOnly()
- string()
- stringOnly()
- timestamp()
- ymInterval()
When deciding on the conversion methods to use, remember the "Only" methods can be quite restrictive and reduce the chances of the index being used.
Our original multivalue index uses the "number()" method.
drop index t1_mvi; create multivalue index t1_mvi on t1 t (t.json_data.words.pages.number());
This can be used equally well by the following queries, using the "number()" and "numberOnly()" methods, and even when no conversion method is used.
-- number() select * from t1 where json_exists(json_data, '$.words.pages?(@.number() == 40)'); -- numberOnly() select * from t1 where json_exists(json_data, '$.words.pages?(@.numberOnly() == 40)'); -- No data type conversion method. select * from t1 where json_exists(json_data, '$.words.pages?(@ == 40)');
We recreate the index using the "numberOnly()" method, which is more restrictive.
drop index t1_mvi; create multivalue index t1_mvi on t1 t (t.json_data.words.pages.numberOnly());
Now only the query using the "numberOnly()" conversion method can use the index. The other two do a full table scan.
-- numberOnly() select * from t1 where json_exists(json_data, '$.words.pages?(@.numberOnly() == 40)');
Choose the data type conversion methods used in the index definition carefully, to make sure they are actually used by your application.
Composite Multivalue Indexes Using JSON_TABLE
We can create more complex composite multivalue indexes using the JSON_TABLE
syntax. I find it easier to plan these out using a JSON_TABLE
query before building the index itself. In this example we want to index all the pages that reference a specific word, so we use JSON_TABLE
to flatten out the nested data.
set autotrace off select id, name, word, page from t1, (json_table(json_data, '$.words[*]' error on error null on empty null on mismatch columns ( word varchar2(10) path '$.word', nested path '$.pages[*]' columns (page number(20) path '$')))); ID NAME WORD PAGE ---------- -------------------- ---------- ---------- 1 book 1 apple 5 1 book 1 apple 10 1 book 1 apple 15 1 book 1 apple 20 1 book 1 apple 25 1 book 1 apple 30 1 book 1 orange 10 1 book 1 orange 20 1 book 1 orange 30 1 book 1 orange 40 1 book 1 orange 50 1 book 1 orange 60 2 book 2 apple 1 2 book 2 apple 6 2 book 2 apple 11 2 book 2 apple 16 2 book 2 apple 21 2 book 2 apple 26 2 book 2 orange 11 2 book 2 orange 21 2 book 2 orange 31 2 book 2 orange 41 2 book 2 orange 51 2 book 2 orange 61 24 rows selected. SQL>
We used the ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH
error handler because it is mandatory for the multivalue indexes using JSON_TABLE
, so it makes sense to test the query using it.
We can now create the multivalue index using this same JSON_TABLE call.
drop index t1_mvi; create multivalue index t1_mvi on t1 (json_table(json_data, '$.words[*]' error on error null on empty null on mismatch columns ( word varchar2(10) path '$.word', nested path '$.pages[*]' columns (page number(20) path '$'))));
In the following example we search for books with the word "apple" on page 25, and we can see the index is used.
conn sys/SysPassword1@//localhost:1521/pdb1 as sysdba alter system flush shared_pool; conn testuser1/testuser1@//localhost:1521/pdb1 set autotrace trace explain select * from t1 t where json_exists(json_data,'$.words[*]?(@.word == "apple" && @.pages == 25)'); Execution Plan ---------------------------------------------------------- Plan hash value: 1854116654 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 185 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 185 | 2 (0)| 00:00:01 | | 2 | HASH UNIQUE | | 1 | 185 | | | |* 3 | INDEX RANGE SCAN (MULTI VALUE) | T1_MVI | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(JSON_QUERY("JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.words[*].word' RETURNING VARCHAR2(10) ASIS WITHOUT ARRAY WRAPPER ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH MULTIVALUE)='apple' AND JSON_QUERY("JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.words[*].pages[*]' RETURNING NUMBER(20,0) ASIS WITHOUT ARRAY WRAPPER ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH MULTIVALUE)=25) SQL>
The index can also be used for a query referencing one of the values, provided it is on the leading edge of the index. For example searching for books containing the word apple, with "word" being the first element in the composite index.
conn sys/SysPassword1@//localhost:1521/pdb1 as sysdba alter system flush shared_pool; conn testuser1/testuser1@//localhost:1521/pdb1 set autotrace trace explain select * from t1 t where json_exists(json_data,'$.words[*]?(@.word == "apple")'); Execution Plan ---------------------------------------------------------- Plan hash value: 1854116654 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 185 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 185 | 2 (0)| 00:00:01 | | 2 | HASH UNIQUE | | 1 | 185 | | | |* 3 | INDEX RANGE SCAN (MULTI VALUE) | T1_MVI | 1 | | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(JSON_QUERY("JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.words[*].word' RETURNING VARCHAR2(10) ASIS WITHOUT ARRAY WRAPPER ERROR ON ERROR NULL ON EMPTY NULL ON MISMATCH MULTIVALUE)='apple') SQL>
Notice we don't use the index if we don't reference the leading edge of the index. In this case we search for books with a reference to page 25, but do not reference to the "word" element. Notice the index is not used.
conn sys/SysPassword1@//localhost:1521/pdb1 as sysdba alter system flush shared_pool; conn testuser1/testuser1@//localhost:1521/pdb1 set autotrace trace explain select * from t1 t where json_exists(json_data,'$.words[*]?(@.pages.numberOnly() == 25)'); Execution Plan ---------------------------------------------------------- Plan hash value: 3617692013 -------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | -------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 183 | 36 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| T1 | 1 | 183 | 36 (0)| 00:00:01 | -------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(JSON_EXISTS2("JSON_DATA" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.words[*]?(@.pages.numberOnly() == 25)' FALSE ON ERROR)=1) SQL>
Similar to normal composite indexes, we need to be careful when deciding the order of the indexed elements.
The JSON_TABLE
approach is very tolerant of data type conversion methods. All the following queries are able to use the index.
-- No data type conversion methods. select * from t1 t where json_exists(json_data,'$.words[*]?(@.word == "apple" && @.pages == 25)'); -- number() select * from t1 t where json_exists(json_data,'$.words[*]?(@.word == "apple" && @.pages.number() == 25)'); -- numberOnly() select * from t1 t where json_exists(json_data,'$.words[*]?(@.word == "apple" && @.pages.numberOnly() == 25)'); -- string() select * from t1 t where json_exists(json_data,'$.words[*]?(@.word.string() == "apple" && @.pages == 25)'); -- stringOnly() select * from t1 t where json_exists(json_data,'$.words[*]?(@.word.stringOnly() == "apple" && @.pages == 25)'); -- stringOnly() and numberOnly() select * from t1 t where json_exists(json_data,'$.words[*]?(@.word.stringOnly() == "apple" && @.pages.numberOnly() == 25)');
Considerations
Here are some things to consider when using multivalue function-based indexes.
- Every index creates additional overhead on DML performance. This is especially true of function-based indexes, so pick your indexes carefully.
- For non-composite indexes, be careful in your choice of data type conversion methods. Make sure you are not reducing the usefulness of the index by being too restrictive.
- Like any composite index, make a careful decision about the element/column order in the composite multivalue index.
- Like any index, remember to check the impact on the performance of all statements that could be affected by the presence of the new index, not just the statement you are focusing on.
For more information see:
- Creating Multivalue Function-Based Indexes for JSON_EXISTS
- Multivalue Function-Based Indexes for JSON_EXISTS
- JSON Quick Links : Introduction, 12cR1, 12cR2, 18c, 19c, 21c, All Articles
Hope this helps. Regards Tim...