8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux
Character Semantics And Globalization in Oracle
Historically database columns which hold alphanumeric data have been defined using the number of bytes they store. This approach was fine as the number of bytes equated to the number of characters when using single-byte character sets. With the increasing use of multibyte character sets to support globalized databases comes the problem of bytes no longer equating to characters.
Suppose we had a requirement for a table with an id and description column, where the description must hold up to a maximum of 20 characters. We might consider the following definition.
SQL> CREATE TABLE tab1 ( 2 id NUMBER(10), 3 description VARCHAR2(20) 4 ); Table created. SQL> DESC tab1 Name Null? Type ----------------------------------------------------- -------- ------------ ID NUMBER(10) DESCRIPTION VARCHAR2(20)
We then decide to make a multilingual version of our application and use the same table definition in a new instance with a multibyte character set. Everything works fine until we try of fill the column with 20 two-byte characters. All of a sudden the column is trying to store twice the data it was before and we have a problem.
Oracle9i has solved this problem with the introduction of character and byte length semantics. When defining an alphanumeric column it is now possible to specify the length in 3 different ways:
VARCHAR2(20)
: Uses the default length semantics defined by theNLS_LENGTH_SEMANTICS
parameter which defaults toBYTE
.VARCHAR2(20 BYTE)
: Allows only the specified number of bytes to be stored in the column, regardless of how many characters this represents.VARCHAR2(20 CHAR)
: Allows the specified number of characters to be stored in the column regardless of the number of bytes this equates to.
If we now create a second table using character semantics we can see the difference when the table is described.
SQL> CREATE TABLE tab2 ( 2 id NUMBER(10), 3 description VARCHAR2(20 CHAR) 4 ); Table created. SQL> DESC tab2 Name Null? Type ----------------------------------------------------- -------- ----------------- ID NUMBER(10) DESCRIPTION VARCHAR2(20 CHAR)
The default character semantics of the database or session can be altered using the NLS_LENGTH_SEMANTICS
parameter.
ALTER SYSTEM SET NLS_LENGTH_SEMANTICS=CHAR; ALTER SYSTEM SET NLS_LENGTH_SEMANTICS=BYTE; ALTER SESSION SET NLS_LENGTH_SEMANTICS=CHAR; ALTER SESSION SET NLS_LENGTH_SEMANTICS=BYTE;
If we alter the sessions length semantics to character and describe the tables we can see the change has taken affect by the way the column definitions are displayed.
SQL> ALTER SESSION SET NLS_LENGTH_SEMANTICS=CHAR; Session altered. SQL> desc tab1 Name Null? Type ----------------------------------------------------- -------- ----------------- ID NUMBER(10) DESCRIPTION VARCHAR2(20 BYTE) SQL> DESC tab2 Name Null? Type ----------------------------------------------------- -------- ----------------- ID NUMBER(10) DESCRIPTION VARCHAR2(20) SQL>
The INSTR
, LENGTH
and SUBSTR
functions always deal with characters, regardless of column definitions and the character sets. For times when you specifically need to deal in bytes Oracle provides the INSTRB
, LENGTHB
and SUBSTRB
functions.
Hope this helps. Regards Tim...