On Tuesday, October 16, 2012 at 5:21 AM, wrote:
I'm running chef on aws ubuntu's and a few of my ubuntu instances crashed.When looking at the log I found that the chef-client process which runs asinit.d script, stumbles upon what seems to be a kernel bug.Full details are here: https://gist.github.com/3885828And here's the gist of the gist is:[119832.732086] BUG: unable to handle kernel NULL pointer dereference at0000000000000038[119832.732111] IP: [<ffffffff81053527>] pick_next_task_fair+0xa7/0x1a0[119832.732124] PGD 1cee6f067 PUD 1cd5e7067 PMD 0[119832.732132] Oops: 0000 [#1] SMP[119832.732137] last sysfs file:/sys/devices/system/cpu/cpu1/cache/index2/shared_cpu_map[119832.732145] CPU 1[119832.732147] Modules linked in: acpiphp raid456 async_pq async_xor xorasync_memcpy async_raid6_recov raid10 raid6_pq async_tx raid1 raid0 multipathlinear[119832.732172][119832.732177] Pid: 7896, comm: chef-client Not tainted 2.6.38-8-virtual#42-Ubuntu...$ uname -aLinux hostname 2.6.38-8-virtual #42-Ubuntu SMP Mon Apr 11 04:06:34 UTC 2011x86_64 x86_64 x86_64 GNU/LinuxHad anyone seem that?Been googling that but nothing of note came up...
I see a few ways out...1. upgrade ubuntu and hope I don't see this again. (it doesn't happen everyday, but of the month I'd been using chef it happened four times to threedifferent hosts, out of 20 hosts)2. Instead of using init script, use a cron or other method to run chef-clienton the hour.3. OK, move to centos or something else desperate...
I'd be happy to get to the bottom of it and see where the bug really comes fromand whether it was fixed in newer versions of the kernel but I shamefully admitthat so far I'd failed to get to the bottom of this bug.what would you do?
Archive powered by MHonArc 2.6.16.