8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
Column-Level Collation and Case-Insensitive Database in Oracle Database 12c Release 2 (12.2)
Collation determines how strings are compared, which has a direct impact on ordering (sorting) and equality tests between strings. Oracle Database 12c Release 2 (12.2) lets you specify the collation used for columns that hold string data, allowing you to easily perform case insensitive queries, as well as control the output order of queried data.
This article is only covering the basics of how to set the collation. The wider implications of using collations, as well as the restrictions associated with them can be found here.
- Assumptions
- Default Behaviour
- Setting Collation
- Linguistic Indexes
- Views and Materialized Views
- Views and Materialized Views
Related articles.
Assumptions
This article assumes the following.
- Your are using Oracle Database 12c Release 2 (12.2).
- Your
MAX_STRING_SIZE
parameter is set toEXTENDED
. You can see how to do this here. Without this you will see this error, "ORA-43929: Collation cannot be specified if parameter MAX_STRING_SIZE=STANDARD is set." - Your
COMPATIBLE
parameter is set to12.2
or higher. - The database character set is AL32UTF8,
NLS_LANGUAGE=AMERICAN
andNLS_TERRITORY=AMERICA
. Other setting can be used, but the output may not resemble that shown in this article.
Default Behaviour
To understand the impact of collation, it's important to understand the default behaviour of an Oracle database. Before we start using UTF data we need to make sure SQL*Plus will handle it properly. We can do this by setting the NLS_LANG
environment variable correctly before starting SQL*Plus.
# set NLS_LANG=AMERICAN_AMERICA.UTF8 export NLS_LANG=AMERICAN_AMERICA.UTF8 sqlplus test/test@pdb1
Create the following test table and populate it with some data containing special characters. Notice the variation in the first letter of the first name.
DROP TABLE t1 PURGE; CREATE TABLE t1 ( id NUMBER, company VARCHAR2(15 CHAR), location VARCHAR2(15 CHAR), CONSTRAINT t1_pk PRIMARY KEY (id) ); INSERT INTO t1 VALUES (1, 'Löwenbrauerei', 'Bräunlingen'); INSERT INTO t1 VALUES (2, 'LÖwenbrauerei', 'BrÄunlingen'); INSERT INTO t1 VALUES (3, 'Lowenbrauerei', 'Braunlingen'); INSERT INTO t1 VALUES (4, 'LOwenbrauerei', 'BrAunlingen'); COMMIT;
The following query shows how Oracle treats special characters with regards to sort operations. We can see upper case comes before lower case in the sort order, and the regular characters come before the special characters.
COLUMN company FORMAT A15 COLUMN location FORMAT A15 SELECT * FROM t1 ORDER BY company; ID COMPANY LOCATION ---------- --------------- --------------- 4 LOwenbrauerei BrAunlingen 3 Lowenbrauerei Braunlingen 2 LÖwenbrauerei BrÄunlingen 1 Löwenbrauerei Bräunlingen SQL>
The following query shows how Oracle treats special characters in comparisons. We can see an exact match is made.
SELECT * FROM t1 WHERE company = 'Löwenbrauerei'; ID COMPANY LOCATION ---------- --------------- --------------- 1 Löwenbrauerei Bräunlingen SQL> SELECT * FROM t1 WHERE company LIKE '%ö%'; ID COMPANY LOCATION ---------- --------------- --------------- 1 Löwenbrauerei Bräunlingen SQL>
The following query shows how Oracle treats special characters in group operations. Once again, the variants on the first character are treated separately.
SELECT company, COUNT(*) AS amount FROM t1 GROUP BY company ORDER BY company; COMPANY AMOUNT --------------- ---------- LOwenbrauerei 1 Lowenbrauerei 1 LÖwenbrauerei 1 Löwenbrauerei 1 SQL>
In the following examples, think back to this default behaviour.
Setting Collation
Collation can be set at a number of different levels, which will be demonstrated below. Each section will not repeat all possible syntax variations or collations as that would be too repetitive.
There are two basic types of collation.
- Binary : Ordering and comparisons of string data are based on the numeric value of the characters in the strings.
- Linguistic : Ordering and comparisons of string data are based on the alphabetic sequence of the characters, regardless of their numeric values. The list of linguistic collations is available here.
When using collations there are three suffixes that alter the behaviour of sorts and comparisons.
- "_CI" : Case insensitive, but accent sensitive.
- "_AI" : Both case and accent insensitive.
- "_CS" : Both case and accent sensitive. This is default if no extension is used.
If no collation is specified, directly or via a default setting, the default USING_NLS_COMP
pseudo-collation is used, which means the NLS_SORT
and NLS_COMP
parameters are used to determine the actual collation used.
The only supported collation for CLOB
and NCLOB
columns is the USING_NLS_COMP
pseudo-collation.
Column-Level
The collation of a specific column can be defined when the table is created. In the following example we set the collation of the COMPANY
column to BINARY_CI
, which will make sorts and comparisons of that columns data case insensitive, but will still treat special characters as separate.
DROP TABLE t1 PURGE; CREATE TABLE t1 ( id NUMBER, company VARCHAR2(15 CHAR) COLLATE BINARY_CI, CONSTRAINT t1_pk PRIMARY KEY (id) ); INSERT INTO t1 VALUES (1, 'Löwenbrauerei'); INSERT INTO t1 VALUES (2, 'LÖwenbrauerei'); INSERT INTO t1 VALUES (3, 'Lowenbrauerei'); INSERT INTO t1 VALUES (4, 'LOwenbrauerei'); COMMIT;
We can see the collation has made a difference if we query data in the column.
SELECT company, COUNT(*) AS amount FROM t1 GROUP BY company ORDER BY company; COMPANY AMOUNT --------------- ---------- Lowenbrauerei 2 Löwenbrauerei 2 SQL> SELECT * FROM t1 WHERE company LIKE '%ö%'; ID COMPANY ---------- --------------- 1 Löwenbrauerei 2 LÖwenbrauerei SQL>
We can also set the collation of new columns added to an existing table. In the following example we set the collation of the LOCATION
column to BINARY_AI
, which will make sorts and comparisons of that columns data both case insensitive and accent insensitive.
ALTER TABLE t1 ADD ( location VARCHAR2(15 CHAR) COLLATE BINARY_AI ); UPDATE t1 SET location = 'Bräunlingen' WHERE id = 1; UPDATE t1 SET location = 'BrÄunlingen' WHERE id = 2; UPDATE t1 SET location = 'Braunlingen' WHERE id = 3; UPDATE t1 SET location = 'BrAunlingen' WHERE id = 4; COMMIT;
The output below shows the impact of the collation on this new column.
SELECT location, COUNT(*) AS amount FROM t1 GROUP BY location ORDER BY location; LOCATION AMOUNT --------------- ---------- Bräunlingen 4 SQL> SELECT * FROM t1 WHERE location LIKE '%ä%'; ID COMPANY LOCATION ---------- --------------- --------------- 1 Löwenbrauerei Bräunlingen 2 LÖwenbrauerei BrÄunlingen 3 Lowenbrauerei Braunlingen 4 LOwenbrauerei BrAunlingen SQL>
We can see the collation of the columns using the {DBA|ALL|USER}_TAB_COLUMNS
views.
COLUMN column_name FORMAT A30 COLUMN collation FORMAT A20 SELECT column_id, column_name, collation FROM user_tab_columns WHERE table_name = 'T1' ORDER BY column_id; COLUMN_ID COLUMN_NAME COLLATION ---------- ------------------------------ -------------------- 1 ID 2 COMPANY BINARY_CI 3 LOCATION BINARY_AI SQL>
Table-Level
The default collation for the whole table can be defined when the table is created.
DROP TABLE t1 PURGE; CREATE TABLE t1 ( id NUMBER, company VARCHAR2(15 CHAR), CONSTRAINT t1_pk PRIMARY KEY (id) ) DEFAULT COLLATION BINARY_CI;
The table default collation can be changed using the ALTER TABLE
statement, but this only affects new columns added to table, not existing columns.
ALTER TABLE t1 DEFAULT COLLATION BINARY_AI; ALTER TABLE t1 ADD ( location VARCHAR2(15 CHAR) );
By displaying the column level collation we can see the existing column was not affected by the change in the default collation of the table.
COLUMN column_name FORMAT A30 COLUMN collation FORMAT A20 SELECT column_id, column_name, collation FROM user_tab_columns WHERE table_name = 'T1' ORDER BY column_id; COLUMN_ID COLUMN_NAME COLLATION ---------- ------------------------------ -------------------- 1 ID 2 COMPANY BINARY_CI 3 LOCATION BINARY_AI SQL>
The default table collation is displayed using the {DBA|ALL|USER}_TABLES
views.
SELECT default_collation FROM user_tables WHERE table_name = 'T1'; DEFAULT_COLLATION -------------------------------------------------------------------------------- BINARY_AI SQL>
Schema-Level
The default collation for a schema can be defined when the user is created.
CONN / AS SYSDBA ALTER SESSION SET CONTAINER = pdb1; CREATE USER test2 IDENTIFIED BY test2 DEFAULT TABLESPACE users QUOTA UNLIMITED ON users DEFAULT COLLATION BINARY_CI; GRANT CREATE SESSION, CREATE TABLE to test2;
If we connect to the user and create a table we can see the default setting is used to define the default table collation.
CONN test2/test2@pdb1 DROP TABLE t1 PURGE; CREATE TABLE t1 ( id NUMBER, company VARCHAR2(15 CHAR), CONSTRAINT t1_pk PRIMARY KEY (id) ); SELECT default_collation FROM user_tables WHERE table_name = 'T1'; DEFAULT_COLLATION -------------------------------------------------------------------------------- BINARY_CI SQL>
The default schema collation is changed using the ALTER USER
command, but this doesn't affect the default collation of any existing objects.
ALTER USER test2 DEFAULT COLLATION BINARY_AI;
If we add another column to the existing table you may think it will use the new default schema collation, but you would be wrong.
ALTER TABLE t1 ADD ( location VARCHAR2(15 CHAR) ); COLUMN column_name FORMAT A30 COLUMN collation FORMAT A20 SELECT column_id, column_name, collation FROM user_tab_columns WHERE table_name = 'T1' ORDER BY column_id; COLUMN_ID COLUMN_NAME COLLATION ---------- ------------------------------ -------------------- 1 ID 2 COMPANY BINARY_CI 3 LOCATION BINARY_CI SQL>
Remember, changes to the default schema collation do not affect existing objects, so the default table collation is unchanged.
SELECT default_collation FROM user_tables WHERE table_name = 'T1'; DEFAULT_COLLATION -------------------------------------------------------------------------------- BINARY_CI SQL>
Session-Level
There are two ways to set the default collation for a session. The DEFAULT_COLLATION
parameter can be set at session level.
CONN test/test@pdb1 ALTER SESSION SET DEFAULT_COLLATION=BINARY_CI; SELECT SYS_CONTEXT('USERENV', 'SESSION_DEFAULT_COLLATION') FROM DUAL; SYS_CONTEXT('USERENV','SESSION_DEFAULT_COLLATION') -------------------------------------------------------------------------------- BINARY_CI SQL> DROP TABLE t1 PURGE; CREATE TABLE t1 ( id NUMBER, company VARCHAR2(15 CHAR), location VARCHAR2(15 CHAR), CONSTRAINT t1_pk PRIMARY KEY (id) ); SELECT column_id, column_name, collation FROM user_tab_columns WHERE table_name = 'T1' ORDER BY column_id; COLUMN_ID COLUMN_NAME COLLATION ---------- ------------------------------ -------------------- 1 ID 2 COMPANY BINARY_CI 3 LOCATION BINARY_CI SQL> SELECT default_collation FROM user_tables WHERE table_name = 'T1'; DEFAULT_COLLATION -------------------------------------------------------------------------------- BINARY_CI SQL>
Alternatively the NLS_SORT
and NLS_COMP
parameters can be set at session level, making them the defaults for the session.
CONN test/test@pdb1 ALTER SESSION SET NLS_SORT=BINARY_CI; ALTER SESSION SET NLS_COMP=LINGUISTIC; DROP TABLE t1 PURGE; CREATE TABLE t1 ( id NUMBER, company VARCHAR2(10), location VARCHAR2(10), CONSTRAINT t1_pk PRIMARY KEY (id) ); SELECT column_id, column_name, collation FROM user_tab_columns WHERE table_name = 'T1' ORDER BY column_id; COLUMN_ID COLUMN_NAME COLLATION ---------- ------------------------------ -------------------- 1 ID 2 COMPANY USING_NLS_COMP 3 LOCATION USING_NLS_COMP SQL> SELECT default_collation FROM user_tables WHERE table_name = 'T1'; DEFAULT_COLLATION -------------------------------------------------------------------------------- USING_NLS_COMP SQL>
Notice the default references to the USING_NLS_COMP
pseudo-collation, which means the NLS_SORT
and NLS_COMP
parameters are used to determine the actual collation used.
Database-Level
If the NLS_SORT
and NLS_SORT
parameters are set at the CDB level they represent the default values for the CDB and all associated PDBs.
CONN / AS SYSDBA -- Alter the CDB defaults. ALTER SYSTEM SET NLS_SORT=BINARY_CI SCOPE=SPFILE; ALTER SYSTEM SET NLS_COMP=LINGUISTIC SCOPE=SPFILE; SHUTDOWN IMMEDIATE; STARTUP; SHOW PARAMETER NLS_SORT NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ nls_sort string BINARY_CI SQL> SHOW PARAMETER NLS_COMP NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ nls_comp string LINGUISTIC SQL> -- Switch to the PDB. ALTER SESSION SET CONTAINER = pdb1; SHOW PARAMETER NLS_SORT NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ nls_sort string BINARY_CI SQL> SHOW PARAMETER NLS_COMP NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ nls_comp string LINGUISTIC SQL>
If the parameters are set at the PDB level they override the CDB settings.
CONN / AS SYSDBA ALTER SESSION SET CONTAINER = pdb1; -- Alter the PDB defaults. ALTER SYSTEM SET NLS_SORT=BINARY SCOPE=SPFILE; ALTER SYSTEM SET NLS_COMP=BINARY SCOPE=SPFILE; SHUTDOWN IMMEDIATE; STARTUP; SHOW PARAMETER NLS_SORT NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ nls_sort string BINARY SQL> SHOW PARAMETER NLS_COMP NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ nls_sort string BINARY SQL>
Remember to reset these values if you don't want any unexpected behaviour in your instance.
CONN / AS SYSDBA ALTER SYSTEM RESET NLS_SORT SCOPE=SPFILE; ALTER SYSTEM RESET NLS_COMP SCOPE=SPFILE; ALTER SESSION SET CONTAINER = pdb1; ALTER SYSTEM RESET NLS_SORT SCOPE=SPFILE; ALTER SYSTEM RESET NLS_COMP SCOPE=SPFILE; CONN / AS SYSDBA SHUTDOWN IMMEDIATE; STARTUP;
Statement-Level
There are a number of ways to influence the results of statements, regardless of the collation associated with the columns. The following examples show how you might use the COLLATE
operator and NLSSORT
function.
First we need a new session and a new table with no collation defined.
CONN test/test@pdb1 DROP TABLE t1 PURGE; CREATE TABLE t1 ( id NUMBER, company VARCHAR2(15 CHAR), location VARCHAR2(15 CHAR), CONSTRAINT t1_pk PRIMARY KEY (id) ); INSERT INTO t1 VALUES (1, 'Löwenbrauerei', 'Bräunlingen'); INSERT INTO t1 VALUES (2, 'LÖwenbrauerei', 'BrÄunlingen'); INSERT INTO t1 VALUES (3, 'Lowenbrauerei', 'Braunlingen'); INSERT INTO t1 VALUES (4, 'LOwenbrauerei', 'BrAunlingen'); COMMIT;
Now we can try a few examples of using the COLLATE
operator and NLSSORT
function.
SELECT * FROM t1 ORDER BY company COLLATE BINARY_CI; ID COMPANY LOCATION ---------- --------------- --------------- 3 Lowenbrauerei Braunlingen 4 LOwenbrauerei BrAunlingen 1 Löwenbrauerei Bräunlingen 2 LÖwenbrauerei BrÄunlingen SQL> SELECT * FROM t1 ORDER BY NLSSORT(company, 'NLS_SORT=BINARY_AI'); ID COMPANY LOCATION ---------- --------------- --------------- 1 Löwenbrauerei Bräunlingen 4 LOwenbrauerei BrAunlingen 3 Lowenbrauerei Braunlingen 2 LÖwenbrauerei BrÄunlingen SQL> COLUMN sorted_name FORMAT A15 SELECT company COLLATE BINARY_AI AS sorted_name, COUNT(*) AS amount FROM t1 GROUP BY company COLLATE BINARY_AI ORDER BY 1; SORTED_NAME AMOUNT --------------- ---------- Löwenbrauerei 4 SQL> COLUMN sorted_name FORMAT A15 SELECT NLSSORT(company, 'NLS_SORT=BINARY_AI') AS sorted_name, COUNT(*) AS amount FROM t1 GROUP BY NLSSORT(company, 'NLS_SORT=BINARY_AI') ORDER BY 1; SORTED_NAME AMOUNT --------------- ---------- 6C6F77656E62726 4 SQL> SELECT * FROM t1 WHERE SUBSTR(company,2,1) COLLATE BINARY_AI = 'ö' COLLATE BINARY_AI ORDER BY company; ID COMPANY LOCATION ---------- --------------- --------------- 4 LOwenbrauerei BrAunlingen 3 Lowenbrauerei Braunlingen 2 LÖwenbrauerei BrÄunlingen 1 Löwenbrauerei Bräunlingen SQL> SELECT * FROM t1 WHERE NLSSORT(SUBSTR(company,2,1), 'NLS_SORT=BINARY_AI') = NLSSORT('ö', 'NLS_SORT=BINARY_AI') ORDER BY company; ID COMPANY LOCATION ---------- --------------- --------------- 4 LOwenbrauerei BrAunlingen 3 Lowenbrauerei Braunlingen 2 LÖwenbrauerei BrÄunlingen 1 Löwenbrauerei Bräunlingen SQL>
Linguistic Indexes
Any index created on a column using a named collation is actually a function-based index, but you can create additional indexes with alternative collations.
CREATE INDEX t1_company_1_idx ON t1(NLSSORT(company, 'NLS_SORT=GERMAN_AI')); CREATE INDEX t1_company_2_idx ON t1(NLSSORT(company, 'NLS_SORT=FRENCH_AI')); CREATE INDEX t1_location_idx ON t1(location COLLATE BINARY_AI);
Views and Materialized Views
Views and materialized views can take advantage of collation. The following example creates a clean table, a view that uses collation, then queries the view to shown the collation in action.
CONN test/test@pdb1 DROP TABLE t1 PURGE; CREATE TABLE t1 ( id NUMBER, company VARCHAR2(15 CHAR), location VARCHAR2(15 CHAR), CONSTRAINT t1_pk PRIMARY KEY (id) ); INSERT INTO t1 VALUES (1, 'Löwenbrauerei', 'Bräunlingen'); INSERT INTO t1 VALUES (2, 'LÖwenbrauerei', 'BrÄunlingen'); INSERT INTO t1 VALUES (3, 'Lowenbrauerei', 'Braunlingen'); INSERT INTO t1 VALUES (4, 'LOwenbrauerei', 'BrAunlingen'); COMMIT; -- Create a view using collation. CREATE OR REPLACE VIEW t1_binary_ai_v ( id, company, location ) AS SELECT id, company COLLATE BINARY_AI, location COLLATE BINARY_AI FROM t1; --Query using collation-based sort. SELECT * FROM t1_binary_ai_v ORDER BY company; ID COMPANY LOCATION ---------- --------------- --------------- 1 Löwenbrauerei Bräunlingen 4 LOwenbrauerei BrAunlingen 3 Lowenbrauerei Braunlingen 2 LÖwenbrauerei BrÄunlingen SQL> -- Query using collation-based group by. SELECT company, COUNT(*) AS amount FROM t1_binary_ai_v GROUP BY company ORDER BY company; COMPANY AMOUNT --------------- ---------- Löwenbrauerei 4 SQL>
For more information see:
- Linguistic Sorting and Matching
- Collation and Case-Insensitive Queries
- Extended Data Types in Oracle Database 12c Release 1 (12.1)
Hope this helps. Regards Tim...