The Ad Hoc Reporting Myth

Empowering users! Giving users access to the information they need, when they need it! Allowing users to decide what they need! These are all great ideas and there are plenty of products out there that can be used to achieve this. The question must be, is it really necessary?

There will always be some users that need this functionality. They will need up-to-the-second ad hoc reporting and will invest their time into getting the most from the tools they are given. There is also a large portion of the user base that will quite happily use what they are given and will *never* invest in the tool set. They don’t see it as part of their job and basically just don’t care.

Back when I started IT, most projects had some concept of a reporting function. A group of people that would discuss with the user base the type of reporting that was needed and identify what was *really needed* and what were just the never ending wish list of things that would never really be used. They would build these reports and they would go through user acceptance and be signed off. It sounds like the bad old days, but what you were left with were a bunch of well defined reports, written by people who were “relatively speaking” skilled at reporting. What’s more, the reporting function could influence the application design. The quickest way to notice that “One True Lookup Table” is a bad design is to try and do some reporting queries. You will soon change your approach.

With the advent of ad hoc reporting, the skills base gradually eroded. We don’t need a reporting function any more! The users are in charge! All we need is this semantic layer and the users can do it all for themselves! Then the people building the semantic layers got lazy and just generated what amounts to a direct copy of the schema. Look at any database that sits behind one of these abominations and I can pretty much guarantee the most horrendous SQL in the system is generated by ad hoc reporting tools! You can blame the users for not investing more time in becoming an expert in the tool. You can blame the people who built the semantic layer for doing a poor job. You can blame the tools. What it really comes down to is the people who used ad hoc reporting as a “quick and easy” substitute for doing the right thing.

There will always be a concept of “standard reports” in any project. Stuff that is known from day one that the business relies on. These should be developed by experts who do it using efficient SQL. If they are not time-critical, they can be scheduled to spread out the load on the system, yet still be present when they are needed. This would relieve some of the sh*t-storm of badly formed queries hitting the database from ad hoc reporting.

I’m going to file this under #ThoseWereTheDays, #GrumpyOldMen and #ItProbablyWasntAsGoodAsIRemember…

Cheers

Tim…


The Ad Hoc Reporting Myth was first posted on October 20, 2014 at 8:42 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Plan depth

A recent posting on OTN reminded me that I haven’t been poking Oracle 12c very hard to see which defects in reporting execution plans have been fixed. The last time I wrote something about the problem was about 20 minhts ago referencing 11.2.0.3; but there are still oddities and irritations that make the nice easy “first child first” algorithm fail because the “depth” calculated by Oracle doesn’t match the “level” that you would get from a connect-by query on the underlying plan table. Here’s a simple fail in 12c:


create table t1
as
select
	rownum 			id,
	lpad(rownum,200)	padding
from	all_objects
where	rownum <= 2500
;

create table t2
as
select	* from t1
;

-- call dbms_stats to gather stats

explain plan for
select
	case mod(id,2) 
		when 1 then (select max(t1.id) from t1 where t1.id <= t2.id) 
		when 0 then (select max(t1.id) from t1 where t1.id >= t2.id) 
	end id  
from	t2
;

select * from table(dbms_xplan.display);

It ought to be fairly clear that the two inline scalar subqueries against t1 should be presented at the same level in the execution hierarchy; but here’s the execution plan you get from Oracle:

-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |  2500 | 10000 | 28039   (2)| 00:00:02 |
|   1 |  SORT AGGREGATE      |      |     1 |     4 |            |          |
|*  2 |   TABLE ACCESS FULL  | T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   3 |    SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  4 |     TABLE ACCESS FULL| T1   |   125 |   500 |    11   (0)| 00:00:01 |
|   5 |  TABLE ACCESS FULL   | T2   |  2500 | 10000 |    11   (0)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."ID"<=:B1)
   4 - filter("T1"."ID">=:B1)

As you can see, the immediate (default?) visual impression you get from the plan is that one of the subqueries is subordinate to the other. On the other hand if you check the id and parent_id columns from the plan_table you’ll find that lines 1 and 3 are both direct descendents of line 0 – so they ought to have the same depth. The plan below is what you get if you run the 8i query from utlxpls.sql against the plan_table.


SQL> select id, parent_id from plan_table;

        ID  PARENT_ID
---------- ----------
         0
         1          0
         2          1
         3          0
         4          3
         5          0


--------------------------------------------------------------------------------
| Operation                 |  Name    |  Rows | Bytes|  Cost  | Pstart| Pstop |
--------------------------------------------------------------------------------
| SELECT STATEMENT          |          |     2K|    9K|  28039 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  SORT AGGREGATE           |          |     1 |    4 |        |       |       |
|   TABLE ACCESS FULL       |T1        |   125 |  500 |     11 |       |       |
|  TABLE ACCESS FULL        |T2        |     2K|    9K|     11 |       |       |
--------------------------------------------------------------------------------


So next time you see a plan and the indentation doesn’t quite seem to make sense, perhaps a quick query to select the id and parent_id will let you check whether you’ve found an example where the depth calculation produces a misleading result.


How to setup git as a daemon

This is a quick post on using git on a server. I use my Synology NAS as a fileserver, but also as a git repository server. The default git package for Synology enables git usage on the command line, which means via ssh, or via web-DAV. Both require a logon to do anything with the repository. That is not very handy if you want to clone and pull from the repository. Of course there are ways around that (basically setting up password-less authentication, probably via certificates), but I wanted simple, read-only access without authentication. If you installed git on a linux or unix server you get the binaries, but no daemon, which means you can only use ssh if you want to use that server for central git repositories.

Running git via inetd
What I did is using inetd daemon to launch the git daemon. On any linux or unix server with the inetd daemon, and on Synology too, because it uses linux under the covers, it’s easy to setup git as a server.

First, check /etc/services for the following lines:

git               9418/tcp                   # git pack transfer service
git               9418/udp                   # git pack transfer service

Next, add the following line in the inetd.conf (which is /etc/inetd.conf on my synology):

git stream tcp nowait gituser /usr/bin/git git daemon --inetd --verbose --export-all --base-path=/volume1/homes/gituser

What you should look for in your setup is:
– gituser: this is the user which is used to run the daemon. I created a user ‘gituser’ for this.
– /usr/bin/git: of course your git binary should be at that fully specified path, otherwise inetd can’t find it.
– git daemon:
— –inetd: notify the git executable that it is running under inetd
— –export-all: all git repositories underneath the base directory will be available
— –base-path: this makes the git root directory be set to this directory. In my case, I wanted to have all the repositories in the home directory of the gituser, which is /volume1/homes/gituser in my case.

And make the inetd deamon reload it’s configuration with kill -HUP:

# killall -HUP inetd

Please mind this is a simple and limited setup, if you want to set it up in a way with more granular security, you should look into gitolite for example.


Tagged: daemon, git, howto, linux, synology

ReadyNAS 104

My old NAS went pop a little while ago and I’ve spent the last few weeks backing up to alternate servers while trying to decide what to get to replace it.

Reading the reviews on Amazon is a bit of a nightmare because there are always scare stories, regardless how much you pay. In the end I decided to go for the “cheap and cheerful” option and bought a ReadyNAS 104. I got the diskless one and bought a couple of 3TB WD Red disks, which were pretty cheap. It supports the 4TB disks, but they are half the price again and I’m mean. Having just two disks means I’ve got a single 3TB RAID 1 volume. I can add a third and fourth disk, which will give me approximately 6 or 9 TB. It switches to RAID 5 by default with more than 2 disks.

The setup was all web based, so I didn’t have any OS compatibility issues. Installation was really simple. Slipped in the disks. Plugged the ethernet cable to my router and turned on the power. I went to the website (readycloud.netgear.com), discovered my device and ran through the setup wizard. Job done. I left it building my RAID 1 volume overnight, but I was able to store files almost immediately, while the volume was building.

The web interface for the device is really simple to use. I can define SMB/AFP/NFS shares in a couple of clicks. Security is really obvious. I can define iSCSI LUNs for use with my Linux machines and it has TimeMachine integration if you want that.

The cloud-related functionality is optional, so if you are worried about opening up a potential security hole, you can easily avoid it. I chose not to configure it during the setup wizard.

I was originally going to spend a lot more on a NAS, but I thought I would chance this unit. So far I’m glad I did. It’s small, solid and silent. Fingers crossed it won’t go pair-shaped. :)

I’ve got all the vital stuff on it now. I’ll start backing up some of my more useful VMs to it and see if I need to buy some more disks. I’ve got about 10TB of storage on servers, but most of it is taken up with old VMs I don’t care about, so I will probably be a bit selective.

Cheers

Tim…

PS. I think NetGear might be doing a revamp of their NAS lineup soon, so you might get one of these extra cheap in the near future. They’re already claim to be about 50% RRP, but most RRPs are lies. :)

 


ReadyNAS 104 was first posted on October 18, 2014 at 2:36 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Yosemite : It’s like OS X, but more boring to look at!

I went on my MacBook last night and saw I had updates available on the App Store. I figured this was one of those Twitter updates that seem to happen every time you blink. Much to my surprise it was a new version of OS X. You can tell how little of an Apple fanboy I am. I didn’t even know this was due, let alone here already. :)

I figured, what the heck and let it start. About 20 minutes later it was done and now I have Yosemite on my MacBook Pro (mid 2009). I wasn’t really timing, so that’s a guess.

First impressions.

  • It’s like OS X, but more boring to look at! Everything is flat and looks a little bland. I’m told this is the look and feel from the iPhone, but I don’t have one of those so I don’t know. I’m sure in a week I won’t remember the old look. The only reminder is the icons for all the non-Apple software I have installed, which still look like they are trying to fit in with the old look. :)
  • I asked one of my colleagues at work and he said it is meant to be faster. I don’t see that myself, but this is a 5 year old bit of kit.
  • Launchpad is straight out of GNOME3. I never use it anyway. Perhaps it always looked like this???
  • Mission Control and Dashboard are also things I never use, so I can’t tell if they have changed for the better or not. :)
  • The light colour background of the Application menu looks odd. Not bad, but different.

What’s broken? So far nothing. I can run VirtualBox, iTerm, Chrome and PowerPoint, so that is pretty much all I do with the laptop.

So in conclusion, Yosemite has completely changed my whole world and Apple are a bunch of geniuses right? Well, actually it’s a pretty mundane change as far as my usage is concerned. I’m sure it’s all terribly cloudy and someone will throw a “rewritten from the ground up” in there somewhere, kinda like Microsoft do when they release the same stuff year after year with a different skin…

By the way, it didn’t cost me anything to upgrade from pretty to bland!

Cheers

Tim…

 

 


Yosemite : It’s like OS X, but more boring to look at! was first posted on October 17, 2014 at 6:19 pm.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Debugging High CPU Usage Using Perf Tool and vmcore Analysis

There are several tools and technologies available to debug deeper into high CPU utilization in a system; perf, sysrq, oprofile, vmcore, and more. In this post, I will narrate the course of debugging a CPU utilization issue using technologies like perf and vmcore.

Following sar output is from a system which faces high %system usage.

[root@prod-smsgw1 ~]# sar 1 14
Linux 2.6.32-431.20.5.el6.x86_64 (xxxxx) 08/08/2014 _x86_64_ (8 CPU)05:04:57 PM CPU %user %nice %system %iowait %steal %idle
05:04:58 PM all 2.90 0.00 15.01 0.38 0.00 81.72
05:04:59 PM all 2.02 0.00 10.83 0.13 0.00 87.03
05:05:00 PM all 3.27 0.00 13.98 0.76 0.00 81.99
05:05:01 PM all 9.32 0.00 16.62 0.25 0.00 73.80

From ‘man sar’.

%system
Percentage of CPU utilization that occurred while executing at the system level (kernel). Note
that this field includes time spent servicing hardware and software interrupts.

This means that the system is spending considerable time on catering to kernel code. System runs a java application which is showing high CPU usage.

perf – Performance analysis tools for Linux, is a good place to start in these kind of scenarios.

‘perf record’ command would capture system state for all cpus in perf.data file. -g would allow call graph and -p allows profiling a process.

‘perf report’ command would show the report.

Samples: 18K of event ‘cpu-clock’, Event count (approx.): 18445, Thread: java(3284), DSO: [kernel.kallsyms]
58.66% java [k] _spin_lock ?
31.82% java [k] find_inode ?
2.66% java [k] _spin_unlock_irqrestore ?
2.44% java [k] mutex_spin_on_owner

Here we can see that considerable time is spend in spinlock and find_inode code for the java application..

While investigation was going on, system crashed and dumped a vmcore. Vmcore is a memory dump of the system captured by tools like kdump.

I downloaded the debuginfo file and extracted the vmlinux to analyse the vmcore.

# wget wget http://debuginfo.centos.org/6/x86_64/kernel-debuginfo-2.6.32-431.20.5.el6.x86_64.rpm
# rpm2cpio kernel-debuginfo-2.6.32-431.20.5.el6.x86_64.rpm |cpio -idv ./usr/lib/debug/lib/modules/2.6.32-431.20.5.el6.x86_64/vmlinux

Then ran following command.

# crash ./usr/lib/debug/lib/modules/2.6.32-431.20.5.el6.x86_64/vmlinux /var/crash/127.0.0.1-2014-08-07-17:56:19/vmcoreKERNEL: ./usr/lib/debug/lib/modules/2.6.32-431.20.5.el6.x86_64/vmlinux
DUMPFILE: /var/crash/127.0.0.1-2014-08-07-17:56:19/vmcore [PARTIAL DUMP]
CPUS: 8
DATE: Thu Aug 7 17:56:17 2014
UPTIME: 1 days, 13:08:01
LOAD AVERAGE: 91.11, 91.54, 98.02
TASKS: 1417
NODENAME: xxxxx
RELEASE: 2.6.32-431.20.5.el6.x86_64
VERSION: #1 SMP Fri Jul 25 08:34:44 UTC 2014
MACHINE: x86_64 (2000 Mhz)
MEMORY: 12 GB
PANIC: “Oops: 0010 [#1] SMP ” (check log for details)
PID: 11233
COMMAND: “java”
TASK: ffff88019706b540 [THREAD_INFO: ffff880037a90000]
CPU: 6
STATE: TASK_RUNNING (PANIC)

From the vmcore I see that dtracedrv module was loaded and unloaded (possibly for running dtrace), this resulted in several warnings (first warning from ftrace is expected) and then kernel panicked as memory got corrupted. Instruction pointer is corrupted, which points to memory corruption. Looks like Panic was triggered by dtrace module.

/tmp/dtrace/linux-master/build-2.6.32-431.20.5.el6.x86_64/driver/dtrace.c:dtrace_ioctl:16858: assertion failure buf->dtb_xamot != cached
Pid: 8442, comm: dtrace Tainted: P W ————— 2.6.32-431.20.5.el6.x86_64 #1
Pid: 3481, comm: java Tainted: P W ————— 2.6.32-431.20.5.el6.x86_64 #1
Call Trace:
[] ? dump_cpu_stack+0x3d/0×50 [dtracedrv]
[] ? generic_smp_call_function_interrupt+0×90/0x1b0
[] ? smp_call_function_interrupt+0×27/0×40
[] ? call_function_interrupt+0×13/0×20
[] ? _spin_lock+0x1e/0×30
[] ? __mark_inode_dirty+0x6c/0×160
[] ? __set_page_dirty_nobuffers+0xdd/0×160
[] ? nfs_mark_request_dirty+0x1a/0×40 [nfs]
[] ? nfs_updatepage+0x3d2/0×560 [nfs]
[] ? nfs_write_end+0×152/0x2b0 [nfs]
[] ? iov_iter_copy_from_user_atomic+0×92/0×130
[] ? generic_file_buffered_write+0x18a/0x2e0
[] ? nfs_refresh_inode_locked+0x3e1/0xbd0 [nfs]
[] ? __generic_file_aio_write+0×260/0×490
[] ? __put_nfs_open_context+0×58/0×110 [nfs]
[] ? dtrace_vcanload+0×20/0x1a0 [dtracedrv]
[..]
BUG: unable to handle kernel paging request at ffffc90014fb415e
IP: [] 0xffffc90014fb415e
PGD 33c2b5067 PUD 33c2b6067 PMD 3e688067 PTE 0
Oops: 0010 [#1] SMP
last sysfs file: /sys/devices/system/node/node0/meminfo
CPU 6
Modules linked in: cpufreq_stats freq_table nfs fscache nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs ipv6 ppdev parport_pc parport microcode vmware_balloon sg vmxnet3 i2c_piix4 i2c_core shpchp ext4 jbd2 mbcache sd_mod crc_t10dif vmw_pvscsi pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dtracedrv]Pid: 11233, comm: java Tainted: P W ————— 2.6.32-431.20.5.el6.x86_64 #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[] [] 0xffffc90014fb415e
RSP: 0018:ffff880037a91f70 EFLAGS: 00010246
RAX: 0000000000000001 RBX: 0000000000000219 RCX: ffff880037a91d40
RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000000000
RBP: 00007fba9a67f4c0 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000000 R11: 00000000000003ff R12: 000000000001d4c0
R13: 0000000000000219 R14: 00007fb96feb06e0 R15: 00007fb96feb06d8
FS: 00007fb96fec1700(0000) GS:ffff880028380000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffc90014fb415e CR3: 000000031e49e000 CR4: 00000000000407e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process java (pid: 11233, threadinfo ffff880037a90000, task ffff88019706b540)
Stack:
0000000000000000 0000000000002be1 ffffffff8100b072 0000000000000293
000000000000ebe6 0000000000002be1 0000000000000000 0000000000000007
00000030692df333 000000000001d4c0 0000000000000001 00007fb96feb06d8
Call Trace:
[] ? system_call_fastpath+0×16/0x1b
Code: Bad RIP value.
RIP [] 0xffffc90014fb415e
RSP
CR2: ffffc90014fb415e
crash>

This allowed me to have have a look at the CPU usage issue happening in the system. Other way to capture a vmcore is to manually panic the system using sysrq + c.

None of the runnable and uninterruptable_sleep processes are running for long time..

Looking at the oldest D state process..

crash> bt 4776
PID: 4776 TASK: ffff88027f3daaa0 CPU: 6 COMMAND: “java”
#0 [ffff88027f3dfd88] schedule at ffffffff815287f0
#1 [ffff88027f3dfe50] __mutex_lock_killable_slowpath at ffffffff8152a0ee
#2 [ffff88027f3dfec0] mutex_lock_killable at ffffffff8152a1f8
#3 [ffff88027f3dfee0] vfs_readdir at ffffffff8119f834
#4 [ffff88027f3dff30] sys_getdents at ffffffff8119f9f9
#5 [ffff88027f3dff80] system_call_fastpath at ffffffff8100b072
RIP: 00000030692a90e5 RSP: 00007fa0586c51e0 RFLAGS: 00000206
RAX: 000000000000004e RBX: ffffffff8100b072 RCX: 00007fa0cd2cf000
RDX: 0000000000008000 RSI: 00007fa0bc0de9a8 RDI: 00000000000001f6
RBP: 00007fa0bc004cd0 R8: 00007fa0bc0de9a8 R9: 00007fa0cd2fce58
R10: 00007fa0cd2fcaa8 R11: 0000000000000246 R12: 00007fa0bc004cd0
R13: 00007fa0586c5460 R14: 00007fa0cd2cf1c8 R15: 00007fa0bc0de980
ORIG_RAX: 000000000000004e CS: 0033 SS: 002b

Looking at its stack..

crash> bt -f 4776
PID: 4776 TASK: ffff88027f3daaa0 CPU: 6 COMMAND: “java”
[..]
#2 [ffff88027f3dfec0] mutex_lock_killable at ffffffff8152a1f8
ffff88027f3dfec8: ffff88027f3dfed8 ffff8801401e1600
ffff88027f3dfed8: ffff88027f3dff28 ffffffff8119f834
#3 [ffff88027f3dfee0] vfs_readdir at ffffffff8119f834
ffff88027f3dfee8: ffff88027f3dff08 ffffffff81196826
ffff88027f3dfef8: 00000000000001f6 00007fa0bc0de9a8
ffff88027f3dff08: ffff8801401e1600 0000000000008000
ffff88027f3dff18: 00007fa0bc004cd0 ffffffffffffffa8
ffff88027f3dff28: ffff88027f3dff78 ffffffff8119f9f9
#4 [ffff88027f3dff30] sys_getdents at ffffffff8119f9f9
ffff88027f3dff38: 00007fa0bc0de9a8 0000000000000000
ffff88027f3dff48: 0000000000008000 0000000000000000
ffff88027f3dff58: 00007fa0bc0de980 00007fa0cd2cf1c8
ffff88027f3dff68: 00007fa0586c5460 00007fa0bc004cd0
ffff88027f3dff78: 00007fa0bc004cd0 ffffffff8100b072crash> vfs_readdir
vfs_readdir = $4 =
{int (struct file *, filldir_t, void *)} 0xffffffff8119f7b0
crash>crash> struct file 0xffff8801401e1600
struct file {
f_u = {
fu_list = {
next = 0xffff88033213fce8,
prev = 0xffff88031823d740
},
fu_rcuhead = {
next = 0xffff88033213fce8,
func = 0xffff88031823d740
}
},
f_path = {
mnt = 0xffff880332368080,
dentry = 0xffff8802e2aaae00
},

[..]

crash> mount|grep ffff880332368080
ffff880332368080 ffff88033213fc00 nfs nanas1a.m-qube.com:/vol/test /scratch/test/test.deploy/test/test-internal

The process was waiting while reading from above nfs mount.

Following process seems to the culprit.

crash> bt 9104
PID: 9104 TASK: ffff8803323c8ae0 CPU: 0 COMMAND: “java”
#0 [ffff880028207e90] crash_nmi_callback at ffffffff8102fee6
#1 [ffff880028207ea0] notifier_call_chain at ffffffff8152e435
#2 [ffff880028207ee0] atomic_notifier_call_chain at ffffffff8152e49a
#3 [ffff880028207ef0] notify_die at ffffffff810a11ce
#4 [ffff880028207f20] do_nmi at ffffffff8152c0fb
#5 [ffff880028207f50] nmi at ffffffff8152b9c0
[exception RIP: _spin_lock+30]
RIP: ffffffff8152b22e RSP: ffff88001d209b88 RFLAGS: 00000206
RAX: 0000000000000004 RBX: ffff88005823dd90 RCX: ffff88005823dd78
RDX: 0000000000000000 RSI: ffffffff81fd0820 RDI: ffffffff81fd0820
RBP: ffff88001d209b88 R8: ffff88017b9cfa90 R9: dead000000200200
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88005823dd48
R13: ffff88001d209c68 R14: ffff8803374ba4f8 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
— —
#6 [ffff88001d209b88] _spin_lock at ffffffff8152b22e
#7 [ffff88001d209b90] _atomic_dec_and_lock at ffffffff81283095
#8 [ffff88001d209bc0] iput at ffffffff811a5aa0
#9 [ffff88001d209be0] dentry_iput at ffffffff811a26c0
#10 [ffff88001d209c00] d_kill at ffffffff811a2821
#11 [ffff88001d209c20] __shrink_dcache_sb at ffffffff811a2bb6
#12 [ffff88001d209cc0] shrink_dcache_parent at ffffffff811a2f64
#13 [ffff88001d209d30] proc_flush_task at ffffffff811f9195
#14 [ffff88001d209dd0] release_task at ffffffff81074ec8
#15 [ffff88001d209e10] wait_consider_task at ffffffff81075cc6
#16 [ffff88001d209e80] do_wait at ffffffff810760f6
#17 [ffff88001d209ee0] sys_wait4 at ffffffff810762e3
#18 [ffff88001d209f80] system_call_fastpath at ffffffff8100b072

From upstream kernel source..

/**
* iput – put an inode
* @inode: inode to put
*
* Puts an inode, dropping its usage count. If the inode use count hits
* zero, the inode is then freed and may also be destroyed.
*
* Consequently, iput() can sleep.
*/
void iput(struct inode *inode)
{
if (inode) {
BUG_ON(inode->i_state & I_CLEAR);if (atomic_dec_and_lock(&inode->i_count, &inode->i_lock))
iput_final(inode);
}
}
EXPORT_SYMBOL(iput);#include
/**
* atomic_dec_and_lock – lock on reaching reference count zero
* @atomic: the atomic counter
* @lock: the spinlock in question
*
* Decrements @atomic by 1. If the result is 0, returns true and locks
* @lock. Returns false for all other cases.
*/
extern int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);
#define atomic_dec_and_lock(atomic, lock)
__cond_lock(lock, _atomic_dec_and_lock(atomic, lock))

#endif /* __LINUX_SPINLOCK_H */

Looks like the process was trying to drop dentry cache and was holding to the spinlock while dropping an inode associated with it. This resulted in other processes waiting on spinlock, resulting in high %system utilization.

When the system again showed high %sys usage I checked and found large slab cache.

[root@xxxxx ~]# cat /proc/meminfo
[..]
Slab: 4505788 kB
SReclaimable: 4313672 kB
SUnreclaim: 192116 kB

Checking slab in a running system using slabtop, I saw that nfs_inode_cache is the top consumer.

ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
[..]
2793624 2519618 90% 0.65K 465604 6 1862416K nfs_inode_cache

I ran ‘sync’ and then ‘echo 2 > /proc/sys/vm/drop_caches’ to drop the dcache, which fixed the high %sys usage in the system.

[root@xxxxx ~]# sar 1 10
Linux 3.10.50-1.el6.elrepo.x86_64 (prod-smsgw4.sav.mqube.us) 08/12/2014 _x86_64_ (8 CPU)11:04:45 AM CPU %user %nice %system %iowait %steal %idle
11:04:46 AM all 1.51 0.00 13.22 0.50 0.00 84.76
11:04:47 AM all 1.25 0.00 12.55 0.13 0.00 86.07
11:04:48 AM all 1.26 0.00 8.83 0.25 0.00 89.66
11:04:49 AM all 1.63 0.00 11.93 0.63 0.00 85.80
^C
[root@xxxxx ~]# sync
[root@xxxxx ~]# sar 1 10
Linux 3.10.50-1.el6.elrepo.x86_64 (prod-smsgw4.sav.mqube.us) 08/12/2014 _x86_64_ (8 CPU)11:05:23 AM CPU %user %nice %system %iowait %steal %idle
11:05:24 AM all 1.50 0.00 13.03 0.75 0.00 84.71
11:05:25 AM all 1.76 0.00 9.69 0.25 0.00 88.30
11:05:26 AM all 1.51 0.00 9.80 0.25 0.00 88.44
11:05:27 AM all 1.13 0.00 10.03 0.25 0.00 88.60
^C
[root@xxxxx ~]# echo 2 > /proc/sys/vm/drop_caches
[root@xxxxx ~]# cat /proc/meminfo
[..]
Slab: 67660 kB

[root@prod-smsgw4 ~]# sar 1 10
Linux 3.10.50-1.el6.elrepo.x86_64 (prod-smsgw4.sav.mqube.us) 08/12/2014 _x86_64_ (8 CPU)

11:05:58 AM CPU %user %nice %system %iowait %steal %idle
11:05:59 AM all 1.64 0.00 1.38 0.13 0.00 96.86
11:06:00 AM all 2.64 0.00 1.38 0.38 0.00 95.60
11:06:01 AM all 2.02 0.00 1.89 0.25 0.00 95.84
11:06:02 AM all 2.03 0.00 1.39 4.68 0.00 91.90
11:06:03 AM all 8.21 0.00 2.27 2.65 0.00 86.87
11:06:04 AM all 1.63 0.00 1.38 0.13 0.00 96.86
11:06:05 AM all 2.64 0.00 1.51 0.25 0.00 95.60

From kernel documentation,

drop_cachesWriting to this will cause the kernel to drop clean caches, dentries and
inodes from memory, causing that memory to become free.To free pagecache:
echo 1 > /proc/sys/vm/drop_caches
To free dentries and inodes:
echo 2 > /proc/sys/vm/drop_caches
To free pagecache, dentries and inodes:
echo 3 > /proc/sys/vm/drop_caches

java application was traversing through nfs and was accessing large number of files, resulting in large number of nfs_inode_cache entries, resulting in in a large dcache.

Tuning vm.vfs_cache_pressure would be a persistent solution for this.

From kernel documentation,

vfs_cache_pressure
——————Controls the tendency of the kernel to reclaim the memory which is used for
caching of directory and inode objects.At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a “fair” rate with respect to pagecache and
swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
never reclaim dentries and inodes due to memory pressure and this can easily
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

NZOUG14 Beckons

New Zealand is famous for Kiwis, pristine landscape, and the New Zealand Oracle User Group (NZOUG) conference.  The location of choice is New Zealand when it comes to making Lord of the Rings and making Oracle Lord of the Databases.

NZOUG 2014 will be held 19–21 November in the Owen G. Glenn Building at the University of Auckland. The main conference will be held on the 20th and 21st, preceded by a day of workshops on the 19th. It’s one of the premier Oracle conferences in Southern hemisphere.

Where there is Oracle, there is Pythian. Pythian will be present in full force in NZOUG 2014.

Following are Pythian sessions at NZOUG14:

12c Multi-Tenancy and Exadata IORM: An Ideal Cloud Based Resource Management
Fahd Mirza Chughtai

Everyone Talks About DR – But Why So Few Implement It
Francisco Munoz Alvarez

DBA 101: Calling All New Database Administrators
Gustavo Rene Antunez

My First 100 Days with an Exadata
Gustavo Rene Antunez

Do You Really Know the Index Structures?
Deiby Gómez

Oracle Exadata: Storage Indexes vs Conventional Indexes
Deiby Gómez

Oracle 12c Test Drive
Francisco Munoz Alvarez

Why Use OVM for Oracle Database
Francisco Munoz Alvarez

Please check the full agenda of NZOUG14 here.

Log Buffer #393, A Carnival of the Vanities for DBAs

Bloggers get connected to both the databases and their readers through their blogs. Bloggers act like a bridge here. Log Buffer extends this nexus through the Log Buffer Edition.

Oracle:

MS Sharepoint and Oracle APEX integration.

Just a couple of screenshots of sqlplus+rlwrap+cygwin+console.

Say “Big Data” One More Time (I dare you!)

Update OEM Harvester after 12.1.0.4 Upgrade

Insight in the Roadmap for Oracle Cloud Platform Services.

SQL Server:

Troubleshoot SQL P2P replication doesn’t replicate DDL schema change.

Set-based Constraint Violation Reporting in SQL Server.

Where do you start fixing a SQL Server crash when there isn’t a single clue?

A permission gives a principal access to an object to perform certain actions on or with the object.

When you can’t get to your data because another application has it locked, a thorough knowledge of SQL Server concurrency will give you the confidence to decide what to do.

MySQL:

MySQL 5.7.5- More variables in replication performance_schema tables.

Multi-source replication for MySQL has been released as a part of 5.7.5-labs-preview downloadable from labs.mysql.com.

How to install multiple MySQL instances on a single host using MyEnv?

Percona Toolkit for MySQL with MySQL-SSL Connections.

InnoDB: Supporting Page Sizes of 32k and 64k.

sreadtim

Here’s a question that appeared in my email a few days ago:

 

Based on the formula: “sreadtim = ioseektim + db_block_size/iotrfrspeed” sreadtim should always bigger than ioseektim.

But I just did a query on my system, find it otherwise, get confused,

SQL> SELECT * FROM SYS.AUX_STATS$;<

SNAME                          PNAME                               PVAL1 PVAL2
------------------------------ ------------------------------ ---------- --------------------
SYSSTATS_INFO                  STATUS                                    COMPLETED
SYSSTATS_INFO                  DSTART                                    10-08-2014 10:45
SYSSTATS_INFO                  DSTOP                                     10-10-2014 10:42
SYSSTATS_INFO                  FLAGS                                   1
SYSSTATS_MAIN                  CPUSPEEDNW                     680.062427
SYSSTATS_MAIN                  IOSEEKTIM                              10
SYSSTATS_MAIN                  IOTFRSPEED                           4096
SYSSTATS_MAIN                  SREADTIM                            4.716
SYSSTATS_MAIN                  MREADTIM                            2.055
SYSSTATS_MAIN                  CPUSPEED                             1077
SYSSTATS_MAIN                  MBRC                                    4
SYSSTATS_MAIN                  MAXTHR                          956634112
SYSSTATS_MAIN                  SLAVETHR                           252928

How do we explain this ?

 

This question highlights two points – one important, the other only slightly less so.

The really important point is one of interpretation.  Broadly speaking we could reasonably say that the (typical) time required to perform a single block read is made up of the (typical) seek time plus the transfer time which, using the names of the statistics above, would indeed give us the relationship: sreadtim = ioseektim + db_block_size/iotfrspeed; but we have to remember that we are thinking of a simplified model of the world. The values that we capture for sreadtim include the time it takes for a request to get from Oracle to the O/S, through the various network software and hardware layers and back again, the formula ignores those components completely and, moreover, doesn’t allow for the fact that some “reads” could actually come from one of several caches without any physical disc access taking place; similarly we should be aware that the time for an actual I/O seek would vary dramatically with the current position  of the read head, the radial position of the target block, the speed and current direction of movement of the read head, and the rotational distance to the target block. The formula is not attempting to express a physical law, it is simply expressing an approximation that we might use in a first line estimate of performance.

In fact we can see in the figures above that multi-block reads (typically of 4 blocks)  were faster than single block reads on this hardware for the duration of the sampling period – and that clearly doesn’t fit with the simple view embedded in our formula of how disc drives work.  (It’s a fairly typical effect of SANs, of course, that large read requests make the SAN software start doing predictive read-ahead, so that the next read request from Oracle may find that the SAN has already loaded the required data into its cache.)

There is, however, the second point that these figures highlight – but you have to be in the know to spot the detail: whatever the complexities introduced by SAN caching, we’re notlooking at the right figures. The ioseektim and iotfrspeed shown here are the default values used by Oracle. It looks as if the user has called dbms_stats.gather_system_stats() with a 48 hour workload (8th Oct to 10th Oct), but hasn’t yet executed the procedure using the ‘noworkload’ option. Perhaps the ioseektim and iotfrspeed figures from a noworkload call would look a little more reasonable when compared with the 4.716 milliseconds of the workload single block read. There may still be a large gap between the model and the reality, but until the two sets of figures we’re using come from the same place we shouldn’t even think about comparing them.


Setting Up Chargeback in EM 12.1.0.4

Some time back, I posted a blog entry about Metering and Chargeback in Enterprise Manager, where I covered the basic capabilities of the Chargeback Application. It’s taken me some time to get back to the promised followup blog post on setting up Chargeback, what with Oracle Open World getting in the way of things, but that’s what I’m finally doing now.

Chargeback can be setup in EM12c by any user with the EM_CBA_ADMIN role, or of course as SYSMAN. Once you are logged in as a user with the relevant privileges, follow the path “Enterprise → Chargeback”:

Image00001

The default currency symbol displayed in the Chargeback application is the dollar sign ($). Note this is just a symbol and does not have any impact on chargeback calculations, but you can change it if you want by clicking on the “Settings” tab:

Image00002

You can enter a new currency symbol here, but for this environment we will leave it at the default and click “Change Plans”:

Image00003

The first thing we want to do is set rates for the universal change plan. This covers the CPU Usage, Memory Allocation and Storage Allocation metrics. Click “Set Rates”:

Image00004

Set the rates appropriately and click “Save”:

Image00005

While the universal charge plan is useful, there are situations where you want to apply charges on other entities. That’s where an extended charge plan can be used. To create an extended charge plan, click “Create” then “Plan”:

Image00006

Provide a meaningful name for the charge plan, then click “Add” to select an entity type for which you can set rates:

Image00007

You can add multiple entity types to the charge plan at once, so choose “Host” and “Oracle Pluggable Database” then click “OK”:

Image00008

You can now setup specific configurations for each entity type, by selecting the entity then clicking “Setup Configurations …”:

Image00009

Click the “Add” button:

Image00010

In this example, we want to charge different rates for different machine architectures. This is the sort of thing you would do if you wanted to charge more for maintaining machines that are either outdated or where you have less skills and may need to bring in consultants. To do this, you can set the “Condition Item” to “Machine Architecture”, then click the “Search” button to search for different values you can choose:

Image00011

In this scenario, we want to set up three different configurations:
• Intel x86_64 as our default
• Intel i686 as more expensive because it’s outdated
• PA-RISC 64-bit as more expensive since we have less skills in that area.
You can select each in turn from the list and click “OK”, like this:

Image00012

Click “OK”:

Image00013

On the “Setup Configurations” pop-up, click “OK” again:

Image00014

You can now specify multipliers for each architecture. In this example, we’re leaving our preferred architecture (Intel x86-64) at 1, setting Intel i686 to 2x, PA-RISC 64 bit to 1.5x and other architectures to 3x. Next we can click “Oracle Pluggable Database” to setup configurations for that:

Image00015

Click “Setup Configurations”:

Image00017

Click “Add…”:

Image00018

On the “Add Configuration” pop-up, select “Version” for the “Condition Item”:

Image00019

In this scenario, we want to charge more for clients that have not switched to the latest version of the database as a way of encouraging migration so we set the “Condition Operator” and “Condition Value” to validate that. We could also have chosen to charge more when using paid options like Partitioning and so on. Click “OK”:

Image00020

On the “Setup Configurations” pop-up, click “OK”:

Image00021

Back on the “Create Plan: HR Chargeback Plan” page, we can increase the multiplier for configurations that aren’t on the latest version and click “Save”:

Image00022

Now we have the charge plans configured the way we want, we can also define cost centers that the charges will be assigned to. Cost centers are normally configured in a business hierarchy, so let’s walk through the process of building that. We start by clicking the “Cost Centers” tab:

Image00023

Cost centers can either be imported from an existing LDAP configuration (done via the “Action” menu) or defined in the Chargeback application itself. In this example, we’ll walk through adding them to the Chargeback application. To do this, click “Add”:

Image00024

On the “New Cost Center” pop-up, enter a cost center name and display name, then click “OK”:

Image00025

You can do this as many times as you like. To add a cost center in a business hierarchy, first create a cost center at the top level (as shown above) then create a cost center using the “Member of” radio button and select the relevant top level cost center, as shown here:

Image00026

Once you have defined all the cost centers you need, the next step is to add the relevant entities. An entity is either a target in Enterprise Manager, or a custom resource that has been added to Enterprise Manager. Custom resources can include target types for which there is no current out-of-the-box Chargeback support (covered in more detail in the documentation). To add entities, click the “Entities” tab:

Image00027

Click “Add Entities”:

Image00028

This will start the “Add Entities” wizard. Click “Add” to select targets for chargeback:

Image00029

Firstly, we want to add hosts. To search for hosts to add, deselect “All” and select “Host” from the “Target Type” dropdown:

Image00030

From the list of hosts, select one or more hosts to add as an entity and click “Select”:

Image00031

On the Make Assignments wizard step, select the entity name and click “Assign Plan”:

Image00032

Select “HR Chargeback Plan” and click “OK” to add the chargeback plan to this host:

Image00033

To add a cost center to this host, click “Assign Cost Center”:

Image00034

Select the relevant cost center and click “OK”:

Image00035

Back on the “Make Assignments” step, click “Next”:

Image00036

Review the settings and if they are correct, click “Submit”:

Image00037

You can walk through the same process to add one of the container databases and a PaaS Infrastructure Zone as well. Once that has been done, you can schedule an on-demand data collection to start seeing the information for these entities in the Chargeback application. To do that, select “On-demand data collection” from the “Action” dropdown menu:

Image00038

Click “Yes” on the “Confirmation” pop-up:

Image00039

In the “Information” message, click “here” to view the job status:

Image00040

Once the job status changes to “1 Succeeded”, click on “Enterprise” then “Chargeback”:

Image00041

You should see information being collected both in the “Summary” and “Usage Trends” regions:

Image00042

It might seem like a very long-winded setup, but in reality Chargeback only takes you a short time to configure. Of course, once it’s configured you then have all the joys of getting money out of the end clients for their resource usage – good luck with that one, I can’t help you there! :)