8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
Indexing JSON Data in Oracle Database 12c Release 1 (12.1.0.2)
This article describes the server side support for indexing JSON data in Oracle Database 12c Release 1 (12.1.0.2). It is assumed you already have an understanding of JSON support in the database, as described here. After using this article to familiarise yourself with the concepts, you really need to spend some time working through the documentation to be able to make full use of the JSON support and indexing of JSON data.
- Considerations
- Setup
- Function-Based Indexes
- Composite B-Tree Indexes
- Bitmap Indexes
- Full-Text Search (JSON Search Indexes)
Related articles.
- Indexing JSON Data in Oracle Database 12c
- JSON Support in Oracle Database 12c
- JSON Quick Links : Introduction, 12cR1, 12cR2, 18c, 19c, 21c, All Articles
Considerations
Before you launch into indexing your JSON documents, there are a few things to consider.
- Many developers will have used document stores that are designed for extremely fast throughput. Adding indexes to any table will affect the performance of DML against it, as the indexes have to be maintained. You have to weigh up the overhead of indexing on DML performance against the improved query performance.
- The majority of the JSON indexes are function-based indexes, which means the maintenance overhead is event higher than a regular B*Tree index.
- If the JSON documents are very large, the overhead of indexing is likely to be even greater.
- During this article you will see the optimizer is very picky about using JSON indexes. Slight changes in parameters can make the index unavailable for a specific query. You will definitely not be able to index every possible combination.
- The same rules apply with respect to the maintenance of bitmap and full-text indexes. You will probably not want to consider these for highly volatile tables.
Setup
The examples in this article assume the following table as been created.
DROP TABLE json_documents PURGE; CREATE TABLE json_documents ( id RAW(16) NOT NULL, data CLOB, CONSTRAINT json_documents_pk PRIMARY KEY (id), CONSTRAINT json_documents_json_chk CHECK (data IS JSON) ); INSERT INTO json_documents (id, data) VALUES (SYS_GUID(), '{ "FirstName" : "John", "LastName" : "Doe", "Job" : "Clerk", "Address" : { "Street" : "99 My Street", "City" : "My City", "Country" : "UK", "Postcode" : "A12 34B" }, "ContactDetails" : { "Email" : "john.doe@example.com", "Phone" : "44 123 123456", "Twitter" : "@johndoe" }, "DateOfBirth" : "01-JAN-1980", "Active" : true }'); INSERT INTO json_documents (id, data) VALUES (SYS_GUID(), '{ "FirstName" : "Jayne", "LastName" : "Doe", "Job" : "Manager", "Address" : { "Street" : "100 My Street", "City" : "My City", "Country" : "UK", "Postcode" : "A12 34B" }, "ContactDetails" : { "Email" : "jayne.doe@example.com", "Phone" : "" }, "DateOfBirth" : "01-JAN-1982", "Active" : false }'); COMMIT;
Function-Based Indexes
The following example creates a function-based index on the email element of the JSON documents using the JSON_VALUE
function. The execution plan shows the index is used by a suitable query.
CREATE INDEX json_docs_email_idx ON json_documents (JSON_VALUE(data, '$.ContactDetails.Email' RETURNING VARCHAR2 ERROR ON ERROR)); SET AUTOTRACE TRACE EXPLAIN SELECT a.data.FirstName, a.data.LastName, a.data.ContactDetails.Email AS Email FROM json_documents a WHERE JSON_VALUE(data, '$.ContactDetails.Email' RETURNING VARCHAR2 ERROR ON ERROR) = 'john.doe@example.com'; ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1499 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JSON_DOCUMENTS | 1 | 1499 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | JSON_DOCS_EMAIL_IDX | 1 | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000) ERROR ON ERROR)='john.doe@example.com')
The same index is also used if we query the table using the dot notation.
SET AUTOTRACE TRACE EXPLAIN SELECT a.data.FirstName, a.data.LastName, a.data.ContactDetails.Email AS Email FROM json_documents a WHERE a.data.ContactDetails.Email = 'john.doe@example.com'; ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1499 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JSON_DOCUMENTS | 1 | 1499 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | JSON_DOCS_EMAIL_IDX | 1 | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000) ERROR ON ERROR)='john.doe@example.com')
Notice that although we use the dot notation, the predicate is actually treated as a JSON_VALUE
call to enable it to use the index. Normally we see dot notation references transformed into JSON_QUERY
calls (as described here), but in this case, the optimizer has performed a query transformation to a JSON_VALUE
call to take advantage of the index.
We can see a typical example of a query transformation of dot notation in the following example, which uses the dot notation to create the index. Notice how the index is used when querying using the dot notation, but not when using the direct JSON_VALUE
call.
DROP INDEX json_docs_email_idx; CREATE INDEX json_docs_email_idx ON json_documents a (a.data.ContactDetails.Email); SET AUTOTRACE TRACE EXPLAIN SELECT a.data.FirstName, a.data.LastName, a.data.ContactDetails.Email AS Email FROM json_documents a WHERE a.data.ContactDetails.Email = 'john.doe@example.com'; ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1499 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JSON_DOCUMENTS | 1 | 1499 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | JSON_DOCS_EMAIL_IDX | 1 | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_QUERY("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL ON ERROR)='john.doe@example.com') SET AUTOTRACE TRACE EXPLAIN SELECT a.data.FirstName, a.data.LastName, a.data.ContactDetails.Email AS Email FROM json_documents a WHERE JSON_VALUE(data, '$.ContactDetails.Email' RETURNING VARCHAR2 ERROR ON ERROR) = 'john.doe@example.com'; ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 1499 | 4 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| JSON_DOCUMENTS | 1 | 1499 | 4 (0)| 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(JSON_VALUE("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000) ERROR ON ERROR)='john.doe@example.com')
Looking at the predicate information in the execution plan produced by the query using dot notation, we can see a reference to the JSON_QUERY
function, which is what we would expect from a typical query transformation of dot notation. Since the index creation and the query both used dot notation, which was transformed to a JSON_QUERY
call, the index was used as expected. The query using the direct JSON_VALUE
call found no suitable function-based index using JSON_VALUE
, so no index was used.
If we rewrite the query to use a direct call to JSON_QUERY
, the index is used as expected, but only if we match all parameters of the function call. The example below compares two queries using a direct call to JSON_QUERY
. The first matches the index definition exactly and so the index is used. The second uses the JSON_QUERY
default settings, which are missing the ASIS
keyword, so the index is not used.
SET AUTOTRACE TRACE EXPLAIN SELECT a.data.FirstName, a.data.LastName, a.data.ContactDetails.Email AS Email FROM json_documents a WHERE JSON_QUERY("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL ON ERROR) = 'john.doe@example.com'; ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1499 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| JSON_DOCUMENTS | 1 | 1499 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | JSON_DOCS_EMAIL_IDX | 1 | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_QUERY("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL ON ERROR)='john.doe@example.com') SET AUTOTRACE TRACE EXPLAIN SELECT a.data.FirstName, a.data.LastName, a.data.ContactDetails.Email AS Email FROM json_documents a WHERE JSON_QUERY("DATA" FORMAT JSON , '$.ContactDetails.Email') = 'john.doe@example.com'; ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 1 | 1499 | 4 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| JSON_DOCUMENTS | 1 | 1499 | 4 (0)| 00:00:01 | ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(JSON_QUERY("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000) WITHOUT ARRAY WRAPPER NULL ON ERROR)='john.doe@example.com')
This means your indexing choices have to be carefully considered and you should try to standardise your approach to using the JSON functionality in SQL to make sure you can take advantage of the indexes you create. The use of a relational view over the JSON data, with queries accessing the relation view, rather than the JSON directly, would force this standardisation, but with a loss of flexibility.
In the previous examples, we've seen index creation using the JSON_VALUE
function explicitly and the JSON_QUERY
function implicitly via dot notation. It is also possible to create function-based indexes using JSON_QUERY
and JSON_EXISTS
explicitly.
DROP INDEX json_docs_name_idx; CREATE INDEX json_docs_name_idx ON json_documents ( JSON_QUERY(data, '$.FirstName') ); DROP INDEX json_docs_name_idx; CREATE INDEX json_docs_name_idx ON json_documents ( JSON_EXISTS(data, '$.FirstName') );
With some restrictions, queries containing JSON_TABLE
can use JSON indexes if predicates and filters against columns projected by JSON_TABLE
match the index definition.
Composite B-Tree Indexes
Composite indexes can be created by defining virtual columns, with a conventional index against those columns. Internally, this is still a function-based index, but the definition looks much simpler and it gives you the option of querying the virtual columns directly.
-- Create the virtual columns and index. ALTER TABLE json_documents ADD (first_name VARCHAR2(50) GENERATED ALWAYS AS (JSON_VALUE(data, '$.FirstName' RETURNING VARCHAR2(50)))); ALTER TABLE json_documents ADD (last_name VARCHAR2(50) GENERATED ALWAYS AS (JSON_VALUE(data, '$.LastName' RETURNING VARCHAR2(50)))); CREATE INDEX json_docs_name_idx ON json_documents (first_name, last_name); -- Test a query against the virtual columns, SET AUTOTRACE TRACE EXPLAIN SELECT COUNT(*) FROM json_documents WHERE first_name = 'John' AND last_name = 'Doe'; ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 54 | 1 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 54 | | | |* 2 | INDEX RANGE SCAN| JSON_DOCS_NAME_IDX | 1 | 54 | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("FIRST_NAME"='John' AND "LAST_NAME"='Doe') -- Test a query using direct JSON_VALUE calls. SET AUTOTRACE TRACE EXPLAIN SELECT COUNT(*) FROM json_documents WHERE JSON_VALUE(data, '$.FirstName' RETURNING VARCHAR2(50)) = 'John' AND JSON_VALUE(data, '$.LastName' RETURNING VARCHAR2(50)) = 'Doe'; ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 54 | 1 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 54 | | | |* 2 | INDEX RANGE SCAN| JSON_DOCS_NAME_IDX | 1 | 54 | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("JSON_DOCUMENTS"."FIRST_NAME"='John' AND "JSON_DOCUMENTS"."LAST_NAME"='Doe')
Alternatively, you can define the composite function-based index directly.
-- Remove the previous index and virtual columns. DROP INDEX json_docs_name_idx; ALTER TABLE json_documents DROP COLUMN first_name; ALTER TABLE json_documents DROP COLUMN last_name; -- Create the composite index directly. CREATE INDEX json_docs_name_idx ON json_documents ( JSON_VALUE(data, '$.FirstName' RETURNING VARCHAR2(50)), JSON_VALUE(data, '$.LastName' RETURNING VARCHAR2(50)) ); -- Test a query using direct JSON_VALUE calls. SET AUTOTRACE TRACE EXPLAIN SELECT COUNT(*) FROM json_documents WHERE JSON_VALUE(data, '$.FirstName' RETURNING VARCHAR2(50)) = 'John' AND JSON_VALUE(data, '$.LastName' RETURNING VARCHAR2(50)) = 'Doe'; ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 54 | 1 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 54 | | | |* 2 | INDEX RANGE SCAN| JSON_DOCS_NAME_IDX | 1 | 54 | 1 (0)| 00:00:01 | ---------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(JSON_VALUE("DATA" FORMAT JSON , '$.FirstName' RETURNING VARCHAR2(50) NULL ON ERROR)='John' AND JSON_VALUE("DATA" FORMAT JSON , '$.LastName' RETURNING VARCHAR2(50) NULL ON ERROR)='Doe')
Bitmap Indexes
Much of the discussion from the previous section applies equally to bitmap indexes against JSON data. The examples below show you that JSON_VALUE
, JSON_QUERY
and JSON_EXISTS
can all be used to define bitmap indexes.
DROP INDEX json_docs_name_idx; CREATE BITMAP INDEX json_docs_name_idx ON json_documents ( JSON_VALUE(data, '$.FirstName') ); DROP INDEX json_docs_name_idx; CREATE BITMAP INDEX json_docs_name_idx ON json_documents ( JSON_QUERY(data, '$.FirstName') ); DROP INDEX json_docs_name_idx; CREATE BITMAP INDEX json_docs_name_idx ON json_documents ( JSON_EXISTS(data, '$.FirstName') );
The normal rules apply here. If the data is highly volatile, bitmap indexes may not be an ideal choice.
Full-Text Search (JSON Search Indexes)
A new simplified JSON search index syntax has been introduced in Oracle Database 12.2. There are no differences in the way the resulting JSON search index performs.
A JSON search index is a type of full text index specifically for JSON data. The optimiser will only consider using the index if the database uses a character set of AL32UTF8 or WE8ISO8859P1, and only for JSON data in VARCHAR2
, BLOB
, or CLOB
columns, not NVARCHAR2
and NCLOB
columns.
-- 12.1 Syntax CREATE INDEX json_docs_search_idx ON json_documents (data) INDEXTYPE IS CTXSYS.CONTEXT PARAMETERS ('section group CTXSYS.JSON_SECTION_GROUP SYNC (ON COMMIT)'); -- 12.2 Syntax CREATE SEARCH INDEX json_docs_search_idx ON json_documents (data) FOR JSON; EXEC DBMS_STATS.gather_table_stats(USER, 'JSON_DOCUMENTS');
Once a JSON search index has been created, the JSON_TEXTCONTAINS
condition can be used in a query.
SET AUTOTRACE TRACE EXPLAIN SELECT COUNT(*) FROM json_documents WHERE JSON_TEXTCONTAINS(data, '$.ContactDetails.Email', 'john.doe@example.com'); ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 2014 | 4 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 2014 | | | |* 2 | DOMAIN INDEX | JSON_DOCS_SEARCH_IDX | 1 | 2014 | 4 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CTXSYS"."CONTAINS"("JSON_DOCUMENTS"."DATA",'{john.doe@example.com} INPATH(/ContactDetails/Email)')>0)
JSON search indexes can also be used to support queries other than those containing JSON_TEXTCONTAINS
conditions.
SELECT COUNT(*) FROM json_documents WHERE JSON_EXISTS(data, '$.ContactDetails'); ----------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1494 | 4 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 1494 | | | |* 2 | DOMAIN INDEX | JSON_DOCS_SEARCH_IDX | 1 | 1494 | 4 (0)| 00:00:01 | ----------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("CTXSYS"."CONTAINS"("JSON_DOCUMENTS"."DATA",'HASPATH(/ContactDetail s)')>0) SELECT COUNT(*) FROM json_documents WHERE JSON_VALUE(data, '$.ContactDetails.Email') = 'john.doe@example.com'; ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 1494 | 4 (0)| 00:00:01 | | 1 | SORT AGGREGATE | | 1 | 1494 | | | |* 2 | TABLE ACCESS BY INDEX ROWID| JSON_DOCUMENTS | 1 | 1494 | 4 (0)| 00:00:01 | |* 3 | DOMAIN INDEX | JSON_DOCS_SEARCH_IDX | | | 4 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(JSON_VALUE("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000) NULL ON ERROR)='john.doe@example.com') 3 - access("CTXSYS"."CONTAINS"("JSON_DOCUMENTS"."DATA",'{john.doe@example.com} INPATH(/ContactDetails/Email)')>0)
For more information see:
- Indexing JSON Data in Oracle Database 12c
- JSON Support in Oracle Database 12c
- JSON Quick Links : Introduction, 12cR1, 12cR2, 18c, 19c, 21c, All Articles
- JSON in Oracle Database
Hope this helps. Regards Tim...