Oracle 8i | Oracle 9i | Oracle 10g | Oracle 11g | Oracle 12c | Miscellaneous | PL/SQL | SQL | Oracle RAC | Oracle Apps | Linux

Character Semantics and Globalization in Oracle

Historically database columns which hold alphanumeric data have been defined using the number of bytes they store. This approach was fine as the number of bytes equated to the number of characters when using single-byte character sets. With the increasing use of multibyte character sets to support globalized databases comes the problem of bytes no longer equating to characters.

Suppose we had a requirement for a table with an id and description column, where the description must hold up to a maximum of 20 characters. We might consider the following definition.

SQL> CREATE TABLE tab1 (
  2    id           NUMBER(10),
  3    description  VARCHAR2(20)
  4  );

Table created.

SQL> DESC tab1
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- ------------
 ID                                                             NUMBER(10)
 DESCRIPTION                                                    VARCHAR2(20)

We then decide to make a multilingual version of our application and use the same table definition in a new instance with a multibyte character set. Everything works fine until we try of fill the column with 20 two-byte characters. All of a sudden the column is trying to store twice the data it was before and we have a problem.

Oracle9i has solved this problem with the introduction of character and byte length semantics. When defining an alphanumeric column it is now possible to specify the length in 3 different ways:

If we now create a second table using character semantics we can see the difference when the table is described.

SQL> CREATE TABLE tab2 (
  2    id           NUMBER(10),
  3    description  VARCHAR2(20 CHAR)
  4  );

Table created.

SQL> DESC tab2
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- -----------------
 ID                                                             NUMBER(10)
 DESCRIPTION                                                    VARCHAR2(20 CHAR)

The default character semantics of the database or session can be altered using the NLS_LENGTH_SEMANTICS parameter.

ALTER SYSTEM SET NLS_LENGTH_SEMANTICS=CHAR;
ALTER SYSTEM SET NLS_LENGTH_SEMANTICS=BYTE;

ALTER SESSION SET NLS_LENGTH_SEMANTICS=CHAR;
ALTER SESSION SET NLS_LENGTH_SEMANTICS=BYTE;

If we alter the sessions length semantics to character and describe the tables we can see the change has taken affect by the way the column definitions are displayed.

SQL> ALTER SESSION SET NLS_LENGTH_SEMANTICS=CHAR;

Session altered.

SQL> desc tab1
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- -----------------
 ID                                                             NUMBER(10)
 DESCRIPTION                                                    VARCHAR2(20 BYTE)

SQL> DESC tab2
 Name                                                  Null?    Type
 ----------------------------------------------------- -------- -----------------
 ID                                                             NUMBER(10)
 DESCRIPTION                                                    VARCHAR2(20)

SQL>

The INSTR, LENGTH and SUBSTR functions always deal with characters, regardless of column definitions and the character sets. For times when you specifically need to deal in bytes Oracle provides the INSTRB, LENGTHB and SUBSTRB functions.

Hope this helps. Regards Tim...

Back to the Top.