Author Archives

Top Linux processes and Oracle

Today due to application bug I had to check several time mapping between Linux PID processes and Oracle sessions. With small BASH script this is quite easy now

# ps doesn't show real time CPU usage but average - skewed for long running processed
#TOP5=`ps xo pid --sort pcpu | tail -5 | xargs echo | sed -e 's/s/,/g'`
TOP5=`top -b -d 1 -n 1 | head -12 | tail -5 | awk '{print $1;}' | xargs echo | sed -e 's/s/,/g'`

sqlplus / as sysdba set linesize 200 pagesize 999
select spid, s.username, s.program, sql_id, event from v$session s, v$process p where s.paddr = p.addr and spid in ($TOP5);


Oracle on AIX – where’s my cpu time ?

This story begins couple of weeks ago when I was asked to look on performance problem for critical processes before go-live. There were differences between run times between different test systems and between runs with 10046 trace enabled or not.

First of all I started with tracing issue. I have checked server CPU and disk utilization using NMON (tool name is important here). There wasn’t any problem with CPU utilization – or at least this is what I was thinking at that time. Tracing was enabled with level 12 (bind and wait) and it generate 2 GB of trace file for 30 min work. First I decided to limit trace level to 8 and give it another try. In the meantime, I double-checked with Oracle community on Twitter (thanks @CaryMillsap and @MartinBerx) that there is not any know Oracle bugs related to trace and performance. Changing tracing level from 12 to 8 solve issue – I mean there no difference between process running with trace and without trace and also limit size of the file to 130MB from 2 GB (don’t ask ?)

In next step I focused on analyze differences between run times on two systems. It is quite easy to do with trace files and I was hoping for short solution.

My first surprise was that profile of SQL in both trace file were almost identical – there was not OS calls (waits) and query spend whole time on CPU. Execution plan were same and amount of data was same as well. Elapsed time for both executions was different and was equal to wall clock time recorded in process logs. To clarify my finding I will focus now on one UPDATE statement. 
When I compared elapsed time with CPU time for EXEC db call I have seen a huge gap between both values – on faster system ratio between CPU time / elapsed time was around 60 % and slower one around 37 % – remember there is no OS calls (wait events)
According to all my knowledge if there is no OS calls CPU time should be equal to elapsed time with probably some difference related to measurement or rounding. If there are differences, there is a huge chance that Oracle process was per-empted from CPU and was waiting in CPU queue. Well according to NMON stats system was utilized in 30 % only so why Oracle process was to wait in queue if there are lot of CPU’s free ? So maybe NMON is reporting system utilization it in wrong way and server itself is overloaded ? That question had to be answered.
I was lucky enough and I had a chance to run Oracle processes on idle server when no one else was connected. I got small LPAR with one core and 4 logical CPU’s as AIX SMT was enabled. I created a small test 

a number;
for i in 1..100000 loop
select count(*) into a from dba_users;
end loop;

and I was surprised with results. I doesn’t trust nmon output anymore so I run topas  observing CPU utilization – it showed around 63 % of utilization for my session ! why not 95 – 100 % ? there was nothing else running there ?
I was even more surprised when I checked trace file – here is output formatted by tkprof for query I was looking at

SQL ID: 821zd9kr9k4yr Plan Hash: 3759210642


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 100000 1.66 2.73 0 0 0 0
Fetch 100000 53.75 86.03 0 4200000 0 100000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 200001 55.41 88.77 0 4200000 0 100000

55.51 / 88.71 * 100 % = 62.5 %

Hey ! Dude where is my CPU time ? Did my process spend 33 s in CPU run queue ? on idle box ?

I needed more data and at that stage I had three hypothesis – Oracle bug, configuration problem or SMT option. I had funny feeling that SMT option is related to these results somehow but I checked MOS looking for bugs and asked Twitter friends to run same test on other system. (@MartinDBA  confirmed my findings – Thanks for your help).

Next step was to disable SMT option and I learn that AIX is so nice that you can do it online.  I rerun my test – Bingo – topas is showing 95 – 98 % of CPU utilization for my Oracle process and trace file is confirming it

SQL ID: 821zd9kr9k4yr Plan Hash: 3759210642


call count cpu elapsed disk query current rows
------- ------ -------- ---------- ---------- ---------- ---------- ----------
Parse 1 0.00 0.00 0 0 0 0
Execute 100000 2.69 2.78 0 0 0 0
Fetch 100000 85.08 88.33 0 4200000 0 100000
------- ------ -------- ---------- ---------- ---------- ---------- ----------
total 200001 87.78 91.12 0 4200000 0 100000

87.78 / 91.12 * 100 % = 96.3 %

This is what I was looking for. Based on both observations I was sure that there is no bug’s in tracing functionality for this version of Oracle ( and I was sure that Oracle process can spin CPU to almost 100 %. There was still question – why there is a huge gap in cpu to elapsed time with SMT enabled. I start looking for IBM documentation and bugs and after some time I found that page – Understanding Processor Utilization on Power Systems – AIX and this picture (linked from IBM wiki page)

So for new PowerPC 7 if your process is running 1 sec on logical CPU (thread)  CPU utilization (user plus system) is 0.63 sec only. If you compare this one with no SMT enabled (or previous version of CPU) you can learn that if process was running for 1 sec on logical CPU it used to be reported as 1 sec of time used by CPU in user and system mode.

How tools are calculating CPU utilization for one process for  1 sec ? you need to divide CPU time used by this processes by 1 sec. So for pre-powerpc 7 or for power pc 7 without SMT

1 s on cpu / 1 s real time * 100 % = 100 %

For powerpc7 with 4 SMT

0.63 on cpu / 1 s * 100 % = 63 %

Have I seen this number before ? Yes this is CPU time taken from trace file divided by elapsed time ! My mystery has been solved – 63 % is now 100 % at least for that version of AIX and Oracle.

Where is a problem ? 

If you are using Cary Millsap Method-R methodology to diagnose performance problems you need to remember that unaccounted time for trace files will be impacted by this feature/bug/you name it.

I think this post will be updated when I will have more details.