OUG Ireland 2015 : The Journey Begins

The day started early, about 1 hour before my alarm in fact. I got up, lay in the bath for a while drinking a can of Monster and considering the day ahead, got out of the bath, puked, then got my shit together ready for the taxi. I’ve been ill this week. That combined with sleep deprivation, nerves and the Monster kinda turned my stomach.

The taxi ride to the airport was really good. The driver was a really cool bloke and I enjoyed talking to him.

I arrived at the airport with a couple of hours to spare. It’s a bit silly for such a short flight when I have to check in online and I only have a laptop as baggage, but I would rather be early than late for my £27 flight to Dublin. :)

We got seated in the plane and were told we had a 60 minute delay. One guy started to freak out. It was like an episode of the Jeremy Kyle show. I was waiting for someone to come in with the DNA results, to prove he was not the father of the baby etc. No punches were thrown…

We ended up taking off about 50 minutes late. My first session was after 12:00, so I wasn’t too worried. The girls next to me used the time to do their make-up. Apparently it takes about 40 minutes to achieve “the natural look”. The combination of that, the selfies and the giggling about stuff on Snapchat made me feel very old!

On arrival it was a quick taxi ride to Croke Park, then OUG Ireland started for me.

Cheers

Tim…


OUG Ireland 2015 : The Journey Begins was first posted on March 20, 2015 at 11:05 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Time to Update the Feed

For those of you who enjoy our content via the feed (thank you), I have news.

Next week, I’ll be changing the feed’s name, so if you want to continue to receive AppsLab goodness in your feed reader of choice or in your inbox, you’ll need to come back here and subscribe again.

Or maybe it’s time to switch over to our Twitter (@theappslab) or Facebook Page, if that’s your thing. I did nuke the Google+ Page, but I doubt anyone will notice it’s gone.

Nothing else has changed.Possibly Related Posts:

SQLTXPLAIN and the AWR Warehouse, Part I

Yeah, so I did it-  I installed SQLTXPLAIN, (SQLT) on the AWR Warehouse!  From previous experience with this fantastic tool, I was pretty sure I’d have some great new adventures with the AWR Warehouse data and I wanted to see if it was something that I could get running with the AWR Warehouse.  This is a work in progress, so I’m going to do this blog in multiple parts to ensure we don’t miss anything.

Installation

I didn’t expect SQLT AWR features to work out of the box.  I still needed to install it, run a few reports and see what would trip it up from using the AWR Warehouse repository data.

The installation was pretty much standard-  no differences from the requirements on any other database, including a default tablespace, connection information for the AWR Warehouse repository and other pertinent data.  The one thing I did do and you should do to work with the AWR Warehouse is to use the “T” value for having the Tuning and Diagnostic Pack, so you can use its’ features with AWR Warehouse.

  1. Licensed Oracle Pack. (T, D or N)

You can specify T for Oracle Tuning, D for Oracle Diagnostic or N for none. If T or D is selected, SQLT may include licensed content within the diagnostics files it produces. Default isT. If N is selected, SQLT installs with limited functionality.

The features that are currently supported with the AWR Warehouse from the command line are the following, per the Oracle Support Doc (Doc ID 1614107.1)

SQLT may use the Oracle Diagnostic and/or the Oracle Tuning Packs if your site has a license for them. These two provide enhanced functionality to the SQLT tool. During SQLT installation you can specify if one of these two packages is licensed by your site. If none, SQLT still provides some basic information that can be used for initial SQL diagnostics.

sqlt1

With the installation complete, now the fun starts, or that’s what I thought!

The DBID Conundrum

For my AWR Warehouse environment, (my tiny one that suffices for my testing currently… :))  I can gather the information on my databases that I’m consolidating the performance data and show the “source database” that is of interest.  I want to see if I can run AWR xtracts and other processes from the warehouse or if it will fail.  My initial query to look at what exists in my AWR Warehouse is going to take me to the DBNSMP.CAW_DBID_MAPPING table:

SQL> select new_dbid, target_name from dbsnmp.caw_dbid_mapping;

 

  NEW_DBID
----------
TARGET_NAME
-----------------------------------------------------------------
3017376167
cawr
2710603395 ß This is the one we want to work with!
SH
2973036624
repo

Which we can match up to the distinct DBIDs in one of the AWR objects that also contain SQL_IDs and PLAN_HASH_VALUE data:

SQL> select distinct DBID from <insert AWR Object Here>; 
      DBID
----------
2710603395
2973036624
3017376167

As this is a 12.1.0.2 database, (expected with an AWR Warehouse repository, we recommend 12.1.0.2) I’ve granted the inherit privileges that are required for new DB12c execution of SQLT by SYS, but I’m still having issues even running the AWR centric SQT features.  I’m curious why, but I have some ideas where the problems might lie.

If you run a SQLID Specific, (or any AWR report) report, it will display what Instance the AWR will to choose to run the report against:

sqlt2

 

As we can see, we have multiple DBIDs, Instance name and host information, (although I’ve hidden that… :))

SQLT And The Quest For the DBID Value

The SQLT, I first assume must do something very similar, bypassing the ability to utilize the other DBID data and coming back with an error stating that the SQL_ID’s doesn’t exist when you attempt to run reports against them:

SQL> START /u01/home/kellyn/sqlt/run/sqltxprext.sql 8ymbm9h6ndphq  <SQT Password>

sqlt3

To verify the issue, I run the XTPREXT with a SQL_ID from the repository DBID, which would show in the local AWR:

SQL> START /u01/home/kellyn/sqlt/run/sqltxprext.sql aycb49d3343xq <SQLT Password>

Per the run of the successful SQL_ID, I was able to view the following from the sqlxtract.log the step that sets the DBID:

sqlt4

We can see in the above log the DEFINE 2 passes in the DBID for the For me to take advantage of SQLT, I need to find a way around the code that is setting the DBID and other instance level information.  The objects exist to support an AWR Warehouse design, just as with a local AWR, the DBID is then populated into the SQLT objects to produce reports:

sqlt5

Now if I can update the code that populates these tables to produce the reports, then SQLT becomes AWR Warehouse compliant.

Making SQLT AWR Warehouse Compliant

The code that needs to be updated are in all coming from one location and due to the “clean” coding practices from Carlos, Mauro and others, this is quite simple to perform.

Working with the package body, sqcpkga.pkb and the package source sqcpkga.pks.  We’ll identify the following areas that the AWR Warehouse will be dependent upon-

You’ll  see the calls that are populated with the values:

  FUNCTION awr_report_html (
   p_dbid        IN NUMBER,
   p_inst_num    IN NUMBER,
   p_bid         IN NUMBER,
   p_eid         IN NUMBER,
   p_rpt_options IN NUMBER DEFAULT 0 )

These values are then part of the package body:

  /* -------------------------
   *
   * public get_database_id
   *
   * ------------------------- */
  FUNCTION get_database_id
  RETURN NUMBER
  IS
    l_dbid v$database.dbid%TYPE;
  BEGIN
    EXECUTE IMMEDIATE 'SELECT dbid FROM v$database'||s_db_link INTO l_dbid;
    RETURN l_dbid;
  END get_database_id; 
  /*************************************************************************************/

Now this isn’t going to do me any good “as is” with the AWR Warehouse, where we have multiple dbids and instance ID’s but we need to pass the value in properly.

We’ll start going through the changes step by step.  The code is well written, but involved in what it produces and we’ll ensure that we take each one into consideration before updating and making it AWR Warehouse compliant.

I foresee this as part of the installation someday, (this is for you, SQLT guys)-  If the installer states, as we demonstrated earlier, that they have the tuning pack, then the install will then know you have licenses to use the AWR and will switch from the existing code to the one that I will propose to make it AWR Warehouse complaint.  The AWR Warehouse, as we’ve discussed, retains the DBID and instance allocation for all databases added to the AWR Warehouse repository, so we just need to make sure we use it if we are allowed by our licensing.

I focused on the dba_hist_database_instance object, as it contains about 95% of the pertinent data that SQLT was getting from the v$database, v$instance, gv$** objects and so on.

SQL> desc dba_hist_database_instance;
 Name Null? Type
 ----------------------------------------- -------- --------------
 DBID NOT NULL NUMBER
 INSTANCE_NUMBER NOT NULL NUMBER
 STARTUP_TIME NOT NULL TIMESTAMP(3)
 PARALLEL NOT NULL VARCHAR2(3)
 VERSION NOT NULL VARCHAR2(17)
 DB_NAME VARCHAR2(9)
 INSTANCE_NAME VARCHAR2(16)
 HOST_NAME VARCHAR2(64)
 LAST_ASH_SAMPLE_ID NOT NULL NUMBER
 PLATFORM_NAME VARCHAR2(101)
 CON_ID NUMBER

Armed with this information, we can then make the first of many necessary changes-

  /* -------------------------
   *
   * public get_database_id
   *
   * Enhancement by KPGorman
   * AWRW compliant, now passes in DBNAME 
   * 03/09/2015
   * ------------------------- */
  FUNCTION get_database_id (p_dbname IN VARCHAR2)
  RETURN NUMBER
  IS
    l_dbid.dbid%TYPE;
  BEGIN
    EXECUTE IMMEDIATE 'SELECT distinct(dbid) FROM DBA_HIST_DATABASE_INSTANCE'||s_db_link || 'where DB_NAME = p_dbname' INTO l_dbid;    RETURN l_dbid;
  END get_database_id;

  /*************************************************************************************/

There were six changes that were required to this code to get started-

  1. get_database_id
  2. get_database_name
  3. get_sid
  4. get_instance_number
  5. get_instance_name
  6. get_host_name
  7. get_host_name_short

Now there are more areas that need attention, like code that populates the database version, the OS platform, database properties, etc.  These are pulled from the instance level and not from the AWR tables, too.

Luckily for me, this is all set in one place and not all over in the code, (great development work is wonderful to see!)  There is some other code that is using the gv$** to ensure it captures global data for RAC environments, too.   Again, I have to stress how well this is written and how easy they are making my job for my Proof of Concept, (POC).

After making the required changes, I recompile the one package and package body, along with the sqltxprext.sql in the SQLT/run director that is involved.  The objects inside the database that I’m working with are in the SQLTXADMIN schema-  not the SQLTXPLAIN schema.  Keep in mind, this is just for the POC, but for a proper installation, I would expect the installer for SQLTXPLAIN to look and see if we have the tuning pack and then with this verification, switch to the correct SQLT$A package and executables to be released and ensure we are using the AWR data instead of the v$** objects.

SQL> alter package SQLTXADMIN.SQLT$A compile body;
Package body altered.

SQL> alter package SQLTXADMIN.SQLT$A compile;
Package altered.

Moment of Truth

Now that I’ve made the changes and everything has compiled successfully, it’s time to test it out with a SQL_ID from one of my source databases.  Now one thing to keep in mind, I left for the OUGN Norway conference between the start of this POC and the end, so I had to pick up where I left off.  It took me some time to verify that I’d left off in the right spot and I had to make a few more changes for the host data, etc., but we’re ready to run this now!

sqlt6

I now pass the DBID and was initially worried about a few of the data results in the screen, but after viewing the sqlxtract.log, I was less worried.  Most of the values that are required to pass to SQLT to ensure proper handling is correct, but it appears I have a couple more changes to implement before I’m finished making EVERYTHING compliant.  The parameters for the call look very good upon first inspection though:

sqlt7

The DEFINE 2 now shows the correct DBID, as does the DEFINE_CONNECT_IDENTIFIER.  The zip file is created as part of the XPRECT output and I just need to dig into it to see if there is anything more I need to change that I might have missed vs.making any assumptions from the sqltxtract.log, which is a bit high level and jumbled with execution errors vs. the demands I’m making on this command line tool.

Thanks for staying with me on this one and stay tuned for the output to see how successful I was! :)

 



Tags:  , ,

Del.icio.us
Facebook
TweetThis
Digg
StumbleUpon


Copyright © DBA Kevlar [SQLTXPLAIN and the AWR Warehouse, Part I], All Right Reserved. 2015.

SQL Developer 3 and Timestamp Conversion

I shouldn’t be using an old version of SQL Developer like v3 (3.2.10.09) anyway but… cut a long story short … I am.
I’ve also got v4 which doesn’t display the same problem

Just pointing out an oddity whilst investigating something for someone:

with x as 
(select to_date('28-MAR-2015 01:30','DD-MON-YYYY HH24:MI') dt1
 ,      to_date('29-MAR-2015 01:30','DD-MON-YYYY HH24:MI') dt2
 from   dual)
select sessiontimezone
,      dt2 "date"
,      cast(dt2 as timestamp) "timestamp"
,      to_Char(cast(dt2 as timestamp),'DD-MON-YYYY HH24:MI') "string"
from   x;

Run as Script (F5):

SESSIONTIMEZONE date               timestamp                       string                   
--------------- ------------------ ------------------------------- -----------------
Europe/London   29-MAR-2015 01:30  29-MAR6-2015 01.30.00.000000000 29-MAR-2015 01:30

Run Statement (F9):

Europe/London   29-MAR-2015 01:30   29-MAR-2015 02.30.00.000000000 29-MAR-2015 01:30

Spot the oddity in column “timestamp” in the “Run Statement (F9)” output.

There is something dodgy going on related to timestamp display.

Colleagues in other regions have not had the same problem so I wonder if some environment variable is/isn’t getting passed along for one of these tool/driver code paths.

I’ve seen dodgy timestamp behaviour before when trying to run dbms_xplan.display_cursor(null,null) in SQL Developer and getting the plan not of my target statement but of some timestamp lookup somewhere – either tool or driver.

For example, if I run the statement above together with a dbms_xplan select but this time via “Run As Script (F5)”, e.g.

with x as 
(select to_date('28-MAR-2015 01:30','DD-MON-YYYY HH24:MI') dt1
 ,      to_date('29-MAR-2015 01:30','DD-MON-YYYY HH24:MI') dt2
 from   dual)
select sessiontimezone
,      dt2 "date"
,      cast(dt2 as timestamp) "timestamp"
,      to_Char(cast(dt2 as timestamp),'DD-MON-YYYY HH24:MI') "string"
from   x;

select * from table(dbms_xplan.display_cursor(null,null));

Then apparently my last run script (the meaning of the first null parameter to display_cursor) is:

SQL_ID  0gzt83m5pxufx, child number 0 
------------------------------------- 
/*+ NO_SQL_TRANSLATION */SELECT VALUE FROM V$NLS_PARAMETERS WHERE PARAMETER = 'NLS_TIMESTAMP_FORMAT' 
 
Plan hash value: 1805486652 
 
-------------------------------------------------------------------------- 
| Id  | Operation        | Name             | Rows  | Bytes | Cost (%CPU)| 
-------------------------------------------------------------------------- 
|   0 | SELECT STATEMENT |                  |       |       |     1 (100)| 
|*  1 |  FIXED TABLE FULL| X$NLS_PARAMETERS |     1 |    31 |     0   (0)| 
-------------------------------------------------------------------------- 
 
Predicate Information (identified by operation id): 
--------------------------------------------------- 
 
   1 - filter(("PARAMETER"='NLS_TIMESTAMP_FORMAT' AND 
              "PARAMETER"<>'NLS_SPECIAL_CHARS' AND "INST_ID"=USERENV('INSTANCE')))

Unfortunately, this could all largely be irrelevant to the OP’s question which was why does the following code return the wrong result:

select CAST(FROM_TZ( TO_TIMESTAMP('29-MAR-15 16:30:00','DD-MON-YY HH24:MI:SS') , TO_CHAR('Australia/Sydney')) AT TIME ZONE 'America/New_York' AS timestamp)
from dual;
29-MAR-2015 02.30.00.000000000

But if we run this:

select CAST(FROM_TZ( TO_TIMESTAMP('29-MAR-15 16:30:00','DD-MON-YY HH24:MI:SS') , TO_CHAR('Australia/Sydney')) AT TIME ZONE 'America/New_York' AS timestamp)
,      to_char(CAST(FROM_TZ( TO_TIMESTAMP('29-MAR-15 16:30:00','DD-MON-YY HH24:MI:SS') , TO_CHAR('Australia/Sydney')) AT TIME ZONE 'America/New_York' AS timestamp),'DD-MON-YYYY HH24:MI')
from dual;
29-MAR-2015 02.30.00.000000000	29-MAR-2015 01:30

Then perhaps it suggests that this is just this tool or jdbc error and not something related to DST timezone data files, etc?

Also, there’s something about timestamps with (local) time zones which always causes trouble and which you’re forced to relearn every time… maybe it’s just me?


OAUX Tidbits

Here come some rapid fire tidbits about upcoming and recently past Oracle Applications User Experience (@usableapps) events.

Events of the Near Past

Laurie Pattison’s (@lsptahoe) team (@InnovateOracle) has been organizing events focused around stimulating and fostering innovation for quite some time now.

I’ve always been a big fan of group-think-and-work exercises, e.g. design jams, hackathons, ShipIts, code sprints, etc.

Our team frequently participates in and supports these events, e.g. Tony O was on a team that won a couple awards at the Future of Information design jam back in early February and John and Julia served as mentors at the Visualizing Information design jam a few weeks ago.

You may recall Julia’s visualization analysis and thinking; John has an equally informative presentation, not yet shared here, but we can hope.

Watch Laurie’s blog for information about more innovation events.

Events of the Near Future

It’s conference season again, and we’ll be bouncing around the globe spreading our emerging technologies user experience goodness.

Fresh off a hot session at UTOUG (h/t OG Friend of the ‘Lab Floyd) and gadget-hounding at SXSW Interactive, Noel (@noelportugal) will be in Eindhoven, the Netherlands for the Oracle Benelux User Group Experience Day, March 23 and 24.

Our fearless leader, Jeremy Ashley (@jrwashley) will be there as well giving the opening keynote. Bob Rhubart (@OTNArchBeat) recorded a video to tell you all about that. Check it out here:

While Noel enjoys Europe, I’ll be in Washington D.C. speaking at Oracle HCM World, along with Thao and Ben.

After that, we’ll have boots on the ground at Oracle Modern CX and Collaborate 15 in Las Vegas. Stay tuned for more, or if you’ll be at any conferences during Conference Season 2015 and wonder if OAUX will be there, check out our Events page.Possibly Related Posts:

One Year With Oracle

Team Oracle

Just over a year ago, it became public that I was leaving Enkitec for Oracle.  Most folks had similar questions surrounding my decision, including my logic in giving up my ACE Director, an assumption that I would just disappear into the Oracle “machine”, considering the sheer size of the company and with my prominent company work history, what had prompted the decision.

In this last year, my career has definitely surprised more than a few folks.  I, in no way, disappeared into Oracle.  In fact, I’ve been made more prominent and in doing so, assisted in shining a light on the Enterprise Manager group.  I’m continually offered great opportunities to be more and do more with the OEM group, which I appreciate greatly.  I interviewed my potential managers as much as my employer interviewed me.  I believe your job is only as good as your boss and my managers are incredibly important to all that I’m able to do in the technical arena, along with the Oracle community.

iysvy  **Thanks to Philippe Fierens for the video

I presented at 11 conferences, a master class, three webinars and a number of local events.  During this time, I worked with customers and implemented a large number of Strategic Customer Program, (SCP) “Fast Track” engagements for Enterprise Manager.  These are projects that EM12c isn’t an after-thought to a project-  it’s the FOCUS of it.  I knew if I wanted to gain more knowledge of Enterprise Manager, that I was going to need to go back to the source and that meant Oracle.  There’s been no disappointment here-  I get to be part of the first line of huge OEM projects that implement HA Enterprise Manager environments with advanced features, plug-ins and other enhanced architecture designs.

Inside Oracle, I’m working with some of the best of the best in the Enterprise Manager world.  The Architects, Product Managers and Consulting Members of the Technical Staff that make Enterprise Manager the great product it is are who I get to interact with every day.  I have insight into the framework, plug-ins and other features that I simply would not get outside Oracle.  I have access to systems and development vision that I wouldn’t have as a customer.  The Strategic Customer Program, (SCP) is the first line to directing the product with the vision it requires, as so often many that are working on developing the features may be very distant from the customers-  where we are the ones interacting with them and are expected to come up with the answers.  This has shown in my blog posts, as I dive deeper into new EM12c features, plug-ins, etc.

My removal of my ACE Director, as it is a customer program, was in my mind, not a loss, but just something I had already achieved and I needed to go onto new challenges. I think my only concern was due to the small number of women involved in the program, but I found that I can still support the ACE program and am still working to support ACE submissions and mentoring.  That’s what’s really important to me. The only people I have seen viewing this as a mistake are commonly those that look down at others for something, no matter what and are dealing with their own insecurities, so I’ve found myself pretty unconcerned with these few.

After joining Oracle, I was able to locate the bylaws for Oracle user group involvement as an employee and retain my Conference Director role at Rocky Mtn. Oracle User Group, (RMOUG), by becoming a non-voting Board Member Emeritus.  I’ve found new ways to involve myself with the user group community, also taking on the role as the user group liaison for the Enterprise Manager group for all interaction.  I’ll be working with IOUG in the future to find ways to streamline and assist other user groups with consolidating our resources and helping each other be more successful- doing more with less, as many of the regional user groups are non-profit, have small budgets and limited resources.

Oracle has been very supportive of my Women in Technology initiatives.  They know this is something very important to the world of technology and my own hopes for a more balance IT workforce.  To have my manager email me about something I’m working on and end it with, “btw, read your <insert WIT article title here> the other day, great read!” and then tell me about some of the things he’s dong with his kids is really great to hear. That he recognizes the importance of what I’m doing, the essential work of mentoring those around us is impressive, too.

Some felt that my social media presence would be impacted-  assuming that Oracle’s Social Media Policy would suppress my personal brand after coming on board.  The opposite is quite true.  I’d been aware for quite some time, that if you had an issue with social media, it was due to companies NOT having a social media policy.  This leaves those that don’t understand social media or are uncomfortable with social media to make assumptions of what is allowed or what they think SHOULD be allowed.  I have never had an issue with a company surrounding social media when a company had a social media policy.  My social media value has done nothing but risen since joining Oracle and I thank everyone for their continued support.

My first year has been one of growth, opportunities and greater depth of knowledge in the Oracle Enterprise Manager product.  I am thrilled with the new opportunities that are on the horizon and look forward to the upcoming year!

25c365c

 



Tags:  

Del.icio.us
Facebook
TweetThis
Digg
StumbleUpon


Copyright © DBA Kevlar [One Year With Oracle], All Right Reserved. 2015.

Heat Map Visualization for SystemTap with PyLatencyMap

Topic: PyLatencyMap v1.2 brings heat map visualization on the CLI for latency data collected by SystemTap.

 

VirtualBox 4.3.26

VirtualBox 4.3.26 has been released. Downloads and changelog are in the usual places.

It’s only a couple of weeks since the last release, so I guess there was something causing people significant headaches, which warranted a new release. Not that I noticed any problems. :)

Happy upgrading!

Cheers

Tim…


VirtualBox 4.3.26 was first posted on March 17, 2015 at 1:27 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Working with multiple Trees on the same APEX Page (4.2/5.0)

Ever tried to put multiple Trees on the same page in APEX 4.2?

Here's my example:


On the left Tree the "Selected Node Link with" is set to hidden item P1_TREE1, on the right Tree the value that sets the tree is set to P1_TREE2. At least it should do that, but by default if you're values are not unique - in this case empno is the value behind both trees - it doesn't work...
Your first tree will be highlighted twice; one time for ALLEN and one time for SCOTT and not as it should be, on the left ALLEN selected and on the right SCOTT selected.

To fix this issue in APEX 4.2, you need to adapt your select statement so you have unique values for both trees. I typically put a meaningful character in front of the value; for example you can concatenate 'N' for new values (new tree) and 'O' for old values (old tree).


So that fixes the issue of the "Selected Node Page Item" issue with non unique values.
Behind the tree implementation in APEX 4.2, jsTree is used. APEX is not using the most recent version and the way the JavaScript is coded doesn't really work that nicely. If you really like jsTree and for example want to search in it, or you have a very large dataset, check out Tom Petrus' tree plugin and explanation.

So what about APEX 5.0? The tree implementation in APEX 5.0 has the same functionalities, but is a lot better. Behind the scenes jsTree is not used anymore, instead the tree that is also used in the Page Designer is used. The issue with unique values is off the table, so no worries anymore in APEX 5.0.


MySQL Sounds Like Fun

I love finding out new things about MySQL. Last week, I stumbled on a query that had the phrase “SOUNDS LIKE” in it. Sounds made-up, right? Turns out MySQL is using a known “soundex” algorithm common to most databases, and popular in use cases in geneaology.

The basic idea is that words are encoded according to their consonants. Consonants that sound similar (like M and N) are given the same code. Here’s a simple example:

(“soundex” and “sounds like” are different ways of doing the same thing in these queries)

MariaDB> select soundex("boom");
+-----------------+
| soundex("boom") |
+-----------------+
| B500            |
+-----------------+

MariaDB> select soundex("bam");
+----------------+
| soundex("bam") |
+----------------+
| B500           |
+----------------+

MariaDB> select soundex("bin");
+----------------+
| soundex("bin") |
+----------------+
| B500           |
+----------------+

This simple example isn’t terribly useful, but if you were trying to find similar, but differently spelled, names across continents, it could be helpful:

MariaDB> select soundex("William");
+--------------------+
| soundex("William") |
+--------------------+
| W450               |
+--------------------+

MariaDB> select soundex("Walaam");
+-------------------+
| soundex("Walaam") |
+-------------------+
| W450              |
+-------------------+

MariaDB> select soundex("Willem");
+-------------------+
| soundex("Willem") |
+-------------------+
| W450              |
+-------------------+

MariaDB> select soundex("Williama");
+---------------------+
| soundex("Williama") |
+---------------------+
| W450                |
+---------------------+

And you could probably agree these variations match as well:

MariaDB> select soundex("Guillaume");
+----------------------+
| soundex("Guillaume") |
+----------------------+
| G450                 |
+----------------------+

MariaDB> select soundex("Uilleam");
+--------------------+
| soundex("Uilleam") |
+--------------------+
| U450               |
+--------------------+

MariaDB> select soundex("Melhem");
+-------------------+
| soundex("Melhem") |
+-------------------+
| M450              |
+-------------------+

MariaDB> select soundex("Uilliam");
+--------------------+
| soundex("Uilliam") |
+--------------------+
| U450               |
+--------------------+

Well, that’s pretty neat. Of course, I want to try the silliest word I can think of:

MariaDB> select soundex("supercalifragilisticexpealidocious");
+-----------------------------------------------+
| soundex("supercalifragilisticexpealidocious") |
+-----------------------------------------------+
| S162416242321432                              |
+-----------------------------------------------+

So the algorithm doesn’t stop at 3 digits; good to know.

What does the algorithm do? Luckily MySQL is open source, and so we can look in the source code:

This looks like the raw mapping. And then this is called into a function that loops through the characters in the word.

/* ABCDEFGHIJKLMNOPQRSTUVWXYZ */
/* :::::::::::::::::::::::::: */
const char *soundex_map= "01230120022455012623010202";

Note that even though it’s called “sounds like” it is really simply a character mapping based on an agreement by the developers’ ears which characters sounds similar. That’s of course an oversimplification, and I see in the code comments the following:

/****************************************************************
* SOUNDEX ALGORITHM in C *
* *
* The basic Algorithm source is taken from EDN Nov. *
* 14, 1985 pg. 36. *

But despite hitting up several librarians, I can’t seem to get a copy of this. Someone out there has a copy sitting around, right?

As a side note, this is obviously specific to the English language. I found references to German and other languages having soundex mappings, and would be curious to see those and hear of any language-specific ways to do this.

Curiosity aside, here’s a real use.

I pulled down some government climate data. Let’s say the location field has some of my favorite misspellings of “Durham” in it:

MariaDB [weather]> select distinct(two), count(two) from weather.temps group by two;
+--------------------------------------------+------------+
| two                                        | count(two) |
+--------------------------------------------+------------+
| NULL                                       |          0 |
| DRM                                        |         51 |
| DURHAM                                     |    1101887 |
| DURM                                       |         71 |
| NCSU                                       |    1000000 |
| RALEIGH DURHAM INTERNATIONAL AIRPORT NC US |    1096195 |
| RDU AIRPORT                                |    1000000 |
+--------------------------------------------+------------+

A “LIKE” clause won’t work terribly well here.

I confirmed the misspellings would match as I expected:

MariaDB [weather]> select soundex("Durham"), soundex("Durm"), soundex("DRM");
+-------------------+-----------------+----------------+
| soundex("Durham") | soundex("Durm") | soundex("DRM") |
+-------------------+-----------------+----------------+
| D650              | D650            | D650           |
+-------------------+-----------------+----------------+

So instead of manually creating a query like:

MariaDB [weather]> select count(two) from weather.temps where two='DRM' or two='DURHAM' or two='DURM';
+------------+
| count(two) |
+------------+
|    1102009 |
+------------+

I can simply do this:

MariaDB [weather]> select count(two) from weather.temps where two sounds like 'Durham';
+------------+
| count(two) |
+------------+
|    1102009 |
+------------+

There are more than several ways to do string comparisons, but I enjoyed finding this one.

(Bonus points will be granted to the first person who comments that RDU is also Durham and submits a unique query to include it in the count.)