8i | 9i | 10g | 11g | 12c | 13c | 18c | 19c | 21c | 23c | Misc | PL/SQL | SQL | RAC | WebLogic | Linux

One True Lookup Table (OTLT)

The One True Lookup Table (OTLT) design, also know as "one lookup table to rule them all", is a classic anti-pattern that looks so sensible at first glance, that you will defend your decision to use it, even as your database performance takes a nose dive, your data integrity is compromised, and your developers stick pins into a voodoo doll in your likeness!

What is One True Lookup Table (OTLT)?
So What's The Problem?
Ugly SQL?
Conclusion

What is One True Lookup Table (OTLT)?

Most systems have lots of basic lookup tables that look similar to this.

-- DROP TABLE order_types PURGE;

CREATE TABLE order_types (
  order_type_id    NUMBER         NOT NULL,
  order_type_code  VARCHAR2(10)   NOT NULL,
  order_type_desc  VARCHAR2(100)  NOT NULL,
  CONSTRAINT order_types_pk PRIMARY KEY (id),
  CONSTRAINT order_types_uk UNIQUE (code)
);

-- DROP TABLE order_statuses PURGE;

CREATE TABLE order_statuses (
  order_status_id    NUMBER         NOT NULL,
  order_status_code  VARCHAR2(10)   NOT NULL,
  order_status_desc  VARCHAR2(100)  NOT NULL,
  CONSTRAINT order_statuses_pk PRIMARY KEY (id),
  CONSTRAINT order_statuses_uk UNIQUE (code)
);

Having many similar tables seems like a messy solution to some designers, so they feel the need to consolidate them into a single table, under the guise of making the data model simpler, and reducing the amount of maintenance screens. This consolidation results in the One True Lookup Table (OTLT) anti-pattern. The starting point for our shared lookup table might look like this.

-- DROP TABLE lookups PURGE;

CREATE TABLE lookups (
  lookup_id     NUMBER         NOT NULL,
  lookup_key    VARCHAR2(100)  NOT NULL,
  lookup_value  VARCHAR2(500)  NOT NULL,
  CONSTRAINT lookups_pk PRIMARY KEY (lookup_id),
  CONSTRAINT lookups_uk UNIQUE (lookup_key)
);

You will need a way of distinguishing between the types of lookup data, so we might add a lookup type or domain column. Something to remember is we might have the same key for multiple lookup types, so we have to include the lookup type in the unique key. This might give us something like this.

-- DROP TABLE lookups PURGE;

CREATE TABLE lookups (
  lookup_id     NUMBER         NOT NULL,
  lookup_type   VARCHAR2(20)   NOT NULL,
  lookup_key    VARCHAR2(100)  NOT NULL,
  lookup_value  VARCHAR2(500)  NOT NULL,
  CONSTRAINT lookups_pk PRIMARY KEY (lookup_id),
  CONSTRAINT lookups_uk UNIQUE (lookup_type, lookup_key)
);

The lookup type is not very descriptive, so it might be better to have a proper description kept somewhere.

-- DROP TABLE lookups PURGE;
-- DROP TABLE lookup_types PURGE;

CREATE TABLE lookup_types (
  lookup_type_id     NUMBER         NOT NULL,
  lookup_type_code   VARCHAR2(20)   NOT NULL,
  lookup_description VARCHAR2(100)  NOT NULL,
  CONSTRAINT lookup_types_pk PRIMARY KEY (lookup_type_id),
  CONSTRAINT lookup_types_uk UNIQUE (lookup_type_code)
);

CREATE TABLE lookups (
  lookup_id         NUMBER         NOT NULL,
  lookup_type_code  VARCHAR2(20)   NOT NULL,
  lookup_key        VARCHAR2(100)  NOT NULL,
  lookup_value      VARCHAR2(500)  NOT NULL,
  CONSTRAINT lookups_pk PRIMARY KEY (lookup_id),
  CONSTRAINT lookups_uk UNIQUE (lookup_type_code, lookup_key),
  CONSTRAINT lookup_lookup_type_fk FOREIGN KEY (lookup_type_code) REFERENCES lookup_types (lookup_type_code)
);

Note. I would not normally foreign key to a unique key like this, as things change over time, but it makes the SQL even tougher if we don't.

We pat ourselves on the back at our wonderful solution...

The problem with this solution is it is dealing with a single datatype for the key and the value. What happens if we need a variety of key-value pairs from a data type perspective? Either we ignore the data type variations and incur additional data type conversions, or we add more optional columns of differing types, like NUMBER, DATE and even string length variants. We tell yourself it's fine, as it doesn't waste a lot of space to have empty columns. The problem now is we have made an important bit of information optional, when it really shouldn't be.

CREATE TABLE lookups (
  lookup_id                  NUMBER         NOT NULL,
  lookup_type_code           VARCHAR2(20)   NOT NULL,
  lookup_key                 VARCHAR2(100)  NOT NULL,
  lookup_string_value        VARCHAR2(500),
  lookup_short_string_value  VARCHAR2(100),
  lookup_long_string_value   VARCHAR2(4000),
  lookup_number_value        NUMBER,
  lookup_date_value          DATE,
  lookup_ts_value            TIMESTAMP,
  CONSTRAINT lookups_pk PRIMARY KEY (lookup_id),
  CONSTRAINT lookups_uk UNIQUE (lookup_type_code, lookup_key),
  CONSTRAINT lookup_lookup_type_fk FOREIGN KEY (lookup_type_code) REFERENCES lookup_types (lookup_type_code)
);

Then a couple of our basic lookups need a few additional fields to be kept with them. That means splitting them out of the generic lookup table, and revisiting all bits of the application code that reference it, or adding some additional columns to the lookup table, maybe as "flex fields", which can be used to store different things. Invariably we pick the "easy option" and end up with something like this.

CREATE TABLE lookups (
  lookup_id                  NUMBER         NOT NULL,
  lookup_type_code           VARCHAR2(20)   NOT NULL,
  lookup_key                 VARCHAR2(100)  NOT NULL,
  lookup_string_value        VARCHAR2(500),
  lookup_short_string_value  VARCHAR2(100),
  lookup_long_string_value   VARCHAR2(4000),
  lookup_number_value        NUMBER,
  lookup_date_value          DATE,
  lookup_ts_value            TIMESTAMP,
  flex_field_1               VARCHAR2(4000),
  flex_field_2               VARCHAR2(4000),
  flex_field_3               VARCHAR2(4000),
  flex_field_4               VARCHAR2(4000),
  flex_field_5               VARCHAR2(4000),
  CONSTRAINT lookups_pk PRIMARY KEY (lookup_id),
  CONSTRAINT lookups_uk UNIQUE (lookup_type_code, lookup_key),
  CONSTRAINT lookup_lookup_type_fk FOREIGN KEY (lookup_type_code) REFERENCES lookup_types (lookup_type_code)
);

Before we know it, our "elegant solution" to save on reference tables becomes a complete disaster area.

So What's The Problem?

There are so many problems with this approach. Here are a few.

It makes the SQL look ugly (see below).
Many statements will require multiple joins to the lookup table. The extra join columns make the statements look bigger and scarier. There will be the same number of joins when using separate lookup tables, but those joins will be simpler.
Multiple references to the same table can make it hard to determine what is happening in the execution plan, as you will see those repeated references there, and have to refer to the predicates to understand the context of table reference. If you were using separate lookup tables, it would be clear which table you were referring to at any point of the execution plan.
You can't foreign key to this type of table. Technically you can if you are willing to put both columns (lookup_type_code and lookup_key) in the table, but you won't because it is ugly. This means there is a good chance your data integrity will be compromised over time. It's really easy to foreign key to individual lookup tables, and therefore protect your data.
It's hard to control the contents of the table. It's a shared resource, so check constraints and triggers are problematic. If you need users to have different privileges, depending on which lookup they are dealing with, things are going to get messy. That would be really easy with separate lookup tables.
If you need to make a change for one reference type, like extending the size of the key or value, it affects all reference data. Using separate lookup tables isolates the change.
Over time, many reference tables take on additional data. To model that you would need to either split out that reference data from this shared lookup table, or start adding optional columns to cope with the "one-off" issues. A change like this is really simple for separate lookup tables.
Data types matter. You should always use the correct data type, as it will reduce the number of data type conversions needed. Implicit data type conversions are bugs waiting to happen!
Performance can be a problem with the OTLT approach as it's hard for the optimizer to make sound judgements about the data. The optimizer cares about cardinality, but it may be hard to make that decision if you are dealing with a large number of rows, most of which are irrelevant in any one specific context. The optimizer also cares about high/low values, but these are not be relevant to any one lookup, but shared. We've also mentioned you probably won't foreign key to this data, which will reduce the amount of information the optimizer has when making its decision. You may have artificially made columns optional, that are actually mandatory, a key must have a value, but which column? I think you get the message.

Ugly SQL?

We are probably going to join to this table multiple times from queries. Here is an example using ANSI and Non-ANSI join syntax.

-- ANSI Join
SELECT o.order_id,
       o.order_type_code,
       ot.lookup_value AS order_type_description,
       o.order_status_code,
       os.lookup_value AS order_status_description
FROM   orders o
       JOIN  lookups ot
         ON  o.order_type_code = ot.lookup_key
         AND ot.lookup_type_code = 'ORDTYP'
       JOIN  lookups os
         ON  o.order_status_code = os.lookup_key
         AND os.lookup_type_code = 'ORDSTAT'

-- Non-ANSI Join
SELECT o.order_id,
       o.order_type_code,
       ot.lookup_value AS order_type_description,
       o.order_status_code,
       os.lookup_value AS order_status_description
FROM   orders o,
       lookups ot,
       lookups os
WHERE  o.order_type_code = ot.lookup_key
AND    ot.lookup_type_code = 'ORDTYP'
AND    o.order_status_code = os.lookup_key
AND    os.lookup_type_code = 'ORDSTAT'

What we can see is every reference to the lookup table requires two columns, and we are going to see the same table repeated in the execution plan multiple times. The table name and the column names don't give us any clue about why we are using this table.

I've seen examples of this where the number of references to the lookup table were in double figures.

Alternatively, we can do the job properly and things look clean, with correct table and column references.

-- ANSI Join
SELECT o.order_id,
       ot.order_type_code,
       ot.order_type_desc,
       os.order_status_code,
       os.order_status_desc
FROM   orders o
       JOIN order_types ot ON o.order_type_id = ot.order_type_id
       JOIN order_statuses os ON o.order_status_id = order_status_id

-- Non-ANSI Join
SELECT o.order_id,
       ot.order_type_code,
       ot.order_type_desc,
       os.order_status_code,
       os.order_status_desc
FROM   orders o,
       order_types ot,
       order_statuses os
WHERE  o.order_type_id = ot.order_type_id
AND    o.order_status_id = os.order_status_id

Conclusion

The One True Lookup Tables (OTLT) approach seems to make sense at first. When designing a logical model it looks kind-of nice. Once you get down to implementing it with some real code, it starts to go wrong, and as the system evolves over time, that's just going to get worse.

Some people claim reasonable results where they stick to basic lookups, but I feel they've just not reached the pain point yet.

Please don't do this!

Hope this helps. Regards Tim...

Back to the Top.