8i | 9i | 10g | 11g | 12c | 13c | 18c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

Home » Articles » 12c » Here

Indexing JSON Data in Oracle Database 12c Release 1 (12.1.0.2)

This article describes the server side support for indexing JSON data in Oracle Database 12c Release 1 (12.1.0.2). It is assumed you already have an understanding of JSON support in the database, as described here. After using this article to familiarise yourself with the concepts, you really need to spend some time working through the documentation to be able to make full use of the JSON support and indexing of JSON data.

Related articles.

Considerations

Before you launch into indexing your JSON documents, there are a few things to consider.

Setup

The examples in this article assume the following table as been created.

DROP TABLE json_documents PURGE;

CREATE TABLE json_documents (
  id    RAW(16) NOT NULL,
  data  CLOB,
  CONSTRAINT json_documents_pk PRIMARY KEY (id),
  CONSTRAINT json_documents_json_chk CHECK (data IS JSON)
);

INSERT INTO json_documents (id, data)
VALUES (SYS_GUID(),
        '{
          "FirstName"      : "John",
          "LastName"       : "Doe",
          "Job"            : "Clerk",
          "Address"        : {
                              "Street"   : "99 My Street",
                              "City"     : "My City",
                              "Country"  : "UK",
                              "Postcode" : "A12 34B"
                             },
          "ContactDetails" : {
                              "Email"    : "john.doe@example.com",
                              "Phone"    : "44 123 123456",
                              "Twitter"  : "@johndoe"
                             },
          "DateOfBirth"    : "01-JAN-1980",
          "Active"         : true
         }');

INSERT INTO json_documents (id, data)
VALUES (SYS_GUID(),
        '{
          "FirstName"      : "Jayne",
          "LastName"       : "Doe",
          "Job"            : "Manager",
          "Address"        : {
                              "Street"   : "100 My Street",
                              "City"     : "My City",
                              "Country"  : "UK",
                              "Postcode" : "A12 34B"
                             },
          "ContactDetails" : {
                              "Email"    : "jayne.doe@example.com",
                              "Phone"    : ""
                             },
          "DateOfBirth"    : "01-JAN-1982",
          "Active"         : false
         }');

COMMIT;

Function-Based Indexes

The following example creates a function-based index on the email element of the JSON documents using the JSON_VALUE function. The execution plan shows the index is used by a suitable query.

CREATE INDEX json_docs_email_idx
  ON json_documents (JSON_VALUE(data, '$.ContactDetails.Email' RETURNING VARCHAR2 ERROR ON ERROR));

SET AUTOTRACE TRACE EXPLAIN

SELECT a.data.FirstName,
       a.data.LastName,
       a.data.ContactDetails.Email AS Email
FROM   json_documents a
WHERE  JSON_VALUE(data, '$.ContactDetails.Email' RETURNING VARCHAR2 ERROR ON ERROR) = 'john.doe@example.com';

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     1 |  1499 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| JSON_DOCUMENTS      |     1 |  1499 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | JSON_DOCS_EMAIL_IDX |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(JSON_VALUE("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000)
              ERROR ON ERROR)='john.doe@example.com')

The same index is also used if we query the table using the dot notation.

SET AUTOTRACE TRACE EXPLAIN

SELECT a.data.FirstName,
       a.data.LastName,
       a.data.ContactDetails.Email AS Email
FROM   json_documents a
WHERE  a.data.ContactDetails.Email = 'john.doe@example.com';

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     1 |  1499 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| JSON_DOCUMENTS      |     1 |  1499 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | JSON_DOCS_EMAIL_IDX |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(JSON_VALUE("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000)
              ERROR ON ERROR)='john.doe@example.com')

Notice that although we use the dot notation, the predicate is actually treated as a JSON_VALUE call to enable it to use the index. Normally we see dot notation references transformed into JSON_QUERY calls (as described here), but in this case, the optimizer has performed a query transformation to a JSON_VALUE call to take advantage of the index.

We can see a typical example of a query transformation of dot notation in the following example, which uses the dot notation to create the index. Notice how the index is used when querying using the dot notation, but not when using the direct JSON_VALUE call.

DROP INDEX json_docs_email_idx;

CREATE INDEX json_docs_email_idx
  ON json_documents a (a.data.ContactDetails.Email);


SET AUTOTRACE TRACE EXPLAIN

SELECT a.data.FirstName,
       a.data.LastName,
       a.data.ContactDetails.Email AS Email
FROM   json_documents a
WHERE  a.data.ContactDetails.Email = 'john.doe@example.com';

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     1 |  1499 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| JSON_DOCUMENTS      |     1 |  1499 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | JSON_DOCS_EMAIL_IDX |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(JSON_QUERY("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000)
              ASIS  WITHOUT ARRAY WRAPPER NULL ON ERROR)='john.doe@example.com')


SET AUTOTRACE TRACE EXPLAIN

SELECT a.data.FirstName,
       a.data.LastName,
       a.data.ContactDetails.Email AS Email
FROM   json_documents a
WHERE  JSON_VALUE(data, '$.ContactDetails.Email' RETURNING VARCHAR2 ERROR ON ERROR) = 'john.doe@example.com';

------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |     1 |  1499 |     4   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| JSON_DOCUMENTS |     1 |  1499 |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(JSON_VALUE("DATA" FORMAT JSON , '$.ContactDetails.Email'
              RETURNING VARCHAR2(4000) ERROR ON ERROR)='john.doe@example.com')

Looking at the predicate information in the execution plan produced by the query using dot notation, we can see a reference to the JSON_QUERY function, which is what we would expect from a typical query transformation of dot notation. Since the index creation and the query both used dot notation, which was transformed to a JSON_QUERY call, the index was used as expected. The query using the direct JSON_VALUE call found no suitable function-based index using JSON_VALUE, so no index was used.

If we rewrite the query to use a direct call to JSON_QUERY, the index is used as expected, but only if we match all parameters of the function call. The example below compares two queries using a direct call to JSON_QUERY. The first matches the index definition exactly and so the index is used. The second uses the JSON_QUERY default settings, which are missing the ASIS keyword, so the index is not used.

SET AUTOTRACE TRACE EXPLAIN

SELECT a.data.FirstName,
       a.data.LastName,
       a.data.ContactDetails.Email AS Email
FROM   json_documents a
WHERE  JSON_QUERY("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000)
              ASIS  WITHOUT ARRAY WRAPPER NULL ON ERROR) = 'john.doe@example.com';

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     1 |  1499 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| JSON_DOCUMENTS      |     1 |  1499 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | JSON_DOCS_EMAIL_IDX |     1 |       |     1   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(JSON_QUERY("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING VARCHAR2(4000)
              ASIS  WITHOUT ARRAY WRAPPER NULL ON ERROR)='john.doe@example.com')


SET AUTOTRACE TRACE EXPLAIN

SELECT a.data.FirstName,
       a.data.LastName,
       a.data.ContactDetails.Email AS Email
FROM   json_documents a
WHERE  JSON_QUERY("DATA" FORMAT JSON , '$.ContactDetails.Email') = 'john.doe@example.com';

------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |     1 |  1499 |     4   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| JSON_DOCUMENTS |     1 |  1499 |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(JSON_QUERY("DATA" FORMAT JSON , '$.ContactDetails.Email'
              RETURNING VARCHAR2(4000) WITHOUT ARRAY WRAPPER NULL ON
              ERROR)='john.doe@example.com')

This means your indexing choices have to be carefully considered and you should try to standardise your approach to using the JSON functionality in SQL to make sure you can take advantage of the indexes you create. The use of a relational view over the JSON data, with queries accessing the relation view, rather than the JSON directly, would force this standardisation, but with a loss of flexibility.

In the previous examples, we've seen index creation using the JSON_VALUE function explicitly and the JSON_QUERY function implicitly via dot notation. It is also possible to create function-based indexes using JSON_QUERY and JSON_EXISTS explicitly.

DROP INDEX json_docs_name_idx;
CREATE INDEX json_docs_name_idx ON json_documents (
  JSON_QUERY(data, '$.FirstName')
);

DROP INDEX json_docs_name_idx;
CREATE INDEX json_docs_name_idx ON json_documents (
  JSON_EXISTS(data, '$.FirstName')
);

With some restrictions, queries containing JSON_TABLE can use JSON indexes if predicates and filters against columns projected by JSON_TABLE match the index definition.

Composite B-Tree Indexes

Composite indexes can be created by defining virtual columns, with a conventional index against those columns. Internally, this is still a function-based index, but the definition looks much simpler and it gives you the option of querying the virtual columns directly.

-- Create the virtual columns and index.
ALTER TABLE json_documents ADD (first_name VARCHAR2(50)
  GENERATED ALWAYS AS (JSON_VALUE(data, '$.FirstName' RETURNING VARCHAR2(50))));

ALTER TABLE json_documents ADD (last_name VARCHAR2(50)
  GENERATED ALWAYS AS (JSON_VALUE(data, '$.LastName' RETURNING VARCHAR2(50))));

CREATE INDEX json_docs_name_idx ON json_documents (first_name, last_name);


-- Test a query against the virtual columns,
SET AUTOTRACE TRACE EXPLAIN

SELECT COUNT(*)
FROM   json_documents
WHERE  first_name = 'John'
AND    last_name  = 'Doe';

----------------------------------------------------------------------------------------
| Id  | Operation         | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                    |     1 |    54 |     1   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |                    |     1 |    54 |            |          |
|*  2 |   INDEX RANGE SCAN| JSON_DOCS_NAME_IDX |     1 |    54 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("FIRST_NAME"='John' AND "LAST_NAME"='Doe')


-- Test a query using direct JSON_VALUE calls.
SET AUTOTRACE TRACE EXPLAIN

SELECT COUNT(*)
FROM   json_documents
WHERE  JSON_VALUE(data, '$.FirstName' RETURNING VARCHAR2(50)) = 'John'
AND    JSON_VALUE(data, '$.LastName' RETURNING VARCHAR2(50)) = 'Doe';

----------------------------------------------------------------------------------------
| Id  | Operation         | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                    |     1 |    54 |     1   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |                    |     1 |    54 |            |          |
|*  2 |   INDEX RANGE SCAN| JSON_DOCS_NAME_IDX |     1 |    54 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("JSON_DOCUMENTS"."FIRST_NAME"='John' AND
              "JSON_DOCUMENTS"."LAST_NAME"='Doe')

Alternatively, you can define the composite function-based index directly.

-- Remove the previous index and virtual columns.
DROP INDEX json_docs_name_idx;
ALTER TABLE json_documents DROP COLUMN first_name;
ALTER TABLE json_documents DROP COLUMN last_name;

-- Create the composite index directly.
CREATE INDEX json_docs_name_idx ON json_documents (
  JSON_VALUE(data, '$.FirstName' RETURNING VARCHAR2(50)),
  JSON_VALUE(data, '$.LastName' RETURNING VARCHAR2(50))
);


-- Test a query using direct JSON_VALUE calls.
SET AUTOTRACE TRACE EXPLAIN

SELECT COUNT(*)
FROM   json_documents
WHERE  JSON_VALUE(data, '$.FirstName' RETURNING VARCHAR2(50)) = 'John'
AND    JSON_VALUE(data, '$.LastName' RETURNING VARCHAR2(50)) = 'Doe';

----------------------------------------------------------------------------------------
| Id  | Operation         | Name               | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                    |     1 |    54 |     1   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |                    |     1 |    54 |            |          |
|*  2 |   INDEX RANGE SCAN| JSON_DOCS_NAME_IDX |     1 |    54 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(JSON_VALUE("DATA" FORMAT JSON , '$.FirstName' RETURNING
              VARCHAR2(50) NULL ON ERROR)='John' AND JSON_VALUE("DATA" FORMAT JSON ,
              '$.LastName' RETURNING VARCHAR2(50) NULL ON ERROR)='Doe')

Bitmap Indexes

Much of the discussion from the previous section applies equally to bitmap indexes against JSON data. The examples below show you that JSON_VALUE, JSON_QUERY and JSON_EXISTS can all be used to define bitmap indexes.

DROP INDEX json_docs_name_idx;
CREATE BITMAP INDEX json_docs_name_idx ON json_documents (
  JSON_VALUE(data, '$.FirstName')
);

DROP INDEX json_docs_name_idx;
CREATE BITMAP INDEX json_docs_name_idx ON json_documents (
  JSON_QUERY(data, '$.FirstName')
);

DROP INDEX json_docs_name_idx;
CREATE BITMAP INDEX json_docs_name_idx ON json_documents (
  JSON_EXISTS(data, '$.FirstName')
);

The normal rules apply here. If the data is highly volatile, bitmap indexes may not be an ideal choice.

Full-Text Search (JSON Search Indexes)

A new simplified JSON search index syntax has been introduced in Oracle Database 12.2. There are no differences in the way the resulting JSON search index performs.

A JSON search index is a type of full text index specifically for JSON data. The optimiser will only consider using the index if the database uses a character set of AL32UTF8 or WE8ISO8859P1, and only for JSON data in VARCHAR2, BLOB, or CLOB columns, not NVARCHAR2 and NCLOB columns.

-- 12.1 Syntax
CREATE INDEX json_docs_search_idx ON json_documents (data)
  INDEXTYPE IS CTXSYS.CONTEXT
  PARAMETERS ('section group CTXSYS.JSON_SECTION_GROUP SYNC (ON COMMIT)');

-- 12.2 Syntax
CREATE SEARCH INDEX json_docs_search_idx ON json_documents (data) FOR JSON;

EXEC DBMS_STATS.gather_table_stats(USER, 'JSON_DOCUMENTS');

Once a JSON search index has been created, the JSON_TEXTCONTAINS condition can be used in a query.

SET AUTOTRACE TRACE EXPLAIN

SELECT COUNT(*)
FROM   json_documents
WHERE  JSON_TEXTCONTAINS(data, '$.ContactDetails.Email', 'john.doe@example.com');

-----------------------------------------------------------------------------------------
| Id  | Operation        | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                      |     1 |  2014 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE  |                      |     1 |  2014 |            |          |
|*  2 |   DOMAIN INDEX   | JSON_DOCS_SEARCH_IDX |     1 |  2014 |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("JSON_DOCUMENTS"."DATA",'{john.doe@example.com}
               INPATH(/ContactDetails/Email)')>0)

JSON search indexes can also be used to support queries other than those containing JSON_TEXTCONTAINS conditions.

SELECT COUNT(*)
FROM   json_documents
WHERE  JSON_EXISTS(data, '$.ContactDetails');

-----------------------------------------------------------------------------------------
| Id  | Operation        | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |                      |     1 |  1494 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE  |                      |     1 |  1494 |            |          |
|*  2 |   DOMAIN INDEX   | JSON_DOCS_SEARCH_IDX |     1 |  1494 |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("CTXSYS"."CONTAINS"("JSON_DOCUMENTS"."DATA",'HASPATH(/ContactDetail
              s)')>0)


SELECT COUNT(*)
FROM   json_documents
WHERE  JSON_VALUE(data, '$.ContactDetails.Email') = 'john.doe@example.com';

-----------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                      |     1 |  1494 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |                      |     1 |  1494 |            |          |
|*  2 |   TABLE ACCESS BY INDEX ROWID| JSON_DOCUMENTS       |     1 |  1494 |     4   (0)| 00:00:01 |
|*  3 |    DOMAIN INDEX              | JSON_DOCS_SEARCH_IDX |       |       |     4   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(JSON_VALUE("DATA" FORMAT JSON , '$.ContactDetails.Email' RETURNING
              VARCHAR2(4000) NULL ON ERROR)='john.doe@example.com')
   3 - access("CTXSYS"."CONTAINS"("JSON_DOCUMENTS"."DATA",'{john.doe@example.com}
              INPATH(/ContactDetails/Email)')>0)

For more information see:

Hope this helps. Regards Tim...

Back to the Top.