[chef-dev] Bizarre inodes on RHEL6 after chef-client run


Chronological Thread 
  • From: Alan Milligan < >
  • To:
  • Subject: [chef-dev] Bizarre inodes on RHEL6 after chef-client run
  • Date: Thu, 26 Apr 2012 15:23:45 +1000

I have come across a rather large (and repeatable) problem trying to use Chef on RHEL6 systems.  This works fine on Ubuntu, SLES11, and our own BastionLinux (a Fedora variant) - I very much suspect it's a kernel bug in RHEL6, but would very much like to know if anyone else has come across it.

To exemplify the issue, we're using the SNMP recipe to write /etc/snmp/snmpd.conf, start snmpd, then attempt a walk.  Unfortunately, after the chef-client run, /etc/snmp/snmpd.conf has a suspiciously large inode number and the snmpd daemon, whilst up, isn't responding.

So - prior to the chef-client run (all good):

  snmp]# ls -il
total 8
132819 -rw-r--r--. 1 root root 2024 Apr 26 10:58 snmpd.conf
132810 -rw-r--r--. 1 root root  220 Nov 16 01:07 snmptrapd.conf

  snmp]# snmpwalk -v1 -cpublic 127.0.0.1 system
SNMPv2-MIB::sysDescr.0 = STRING: Linux rheltest.drives.xxxx 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64
SNMPv2-MIB::sysObjectID.0 = OID: NET-SNMP-MIB::netSnmpAgentOIDs.10
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1415) 0:00:14.15
SNMPv2-MIB::sysContact.0 = STRING: Root <
 >
SNMPv2-MIB::sysName.0 = STRING: rheltest.drives.xxxx
...


Now taint snmpd.conf to enforce overwriting on next chef run ....


  snmp]# vi snmpd.conf 

  snmp]# /etc/rc.d/init.d/snmpd restart
Stopping snmpd:                                            [  OK  ]
Starting snmpd:                                            [  OK  ]

  snmp]# chef-client
[Thu, 26 Apr 2012 11:00:08 +1000] INFO: *** Chef 0.10.8 ***
[Thu, 26 Apr 2012 11:00:08 +1000] INFO: Run List is [role[myxxx-debug]]
[Thu, 26 Apr 2012 11:00:08 +1000] INFO: Run List expands to [myxxx]
[Thu, 26 Apr 2012 11:00:08 +1000] INFO: Starting Chef Run for rheltest.drives.xxxxx
[Thu, 26 Apr 2012 11:00:08 +1000] INFO: Running start handlers
[Thu, 26 Apr 2012 11:00:08 +1000] INFO: Start handlers complete.
[Thu, 26 Apr 2012 11:00:09 +1000] INFO: Loading cookbooks [perl, snmp]
[Thu, 26 Apr 2012 11:00:09 +1000] INFO: Processing package[net-snmp] action install (snmp::default line 21)
[Thu, 26 Apr 2012 11:00:10 +1000] INFO: Processing package[net-snmp-utils] action install (snmp::default line 21)
[Thu, 26 Apr 2012 11:00:10 +1000] INFO: Processing service[snmpd] action start (snmp::default line 34)
[Thu, 26 Apr 2012 11:00:10 +1000] INFO: Processing service[snmpd] action enable (snmp::default line 34)
[Thu, 26 Apr 2012 11:00:10 +1000] INFO: Processing template[/etc/snmp/snmpd.conf] action create (snmp::default line 38)
[Thu, 26 Apr 2012 11:00:10 +1000] INFO: template[/etc/snmp/snmpd.conf] backed up to /var/chef/backup/etc/snmp/snmpd.conf.chef-20120426110010
[Thu, 26 Apr 2012 11:00:10 +1000] INFO: template[/etc/snmp/snmpd.conf] mode changed to 644
[Thu, 26 Apr 2012 11:00:10 +1000] INFO: template[/etc/snmp/snmpd.conf] updated content
[Thu, 26 Apr 2012 11:00:10 +1000] INFO: template[/etc/snmp/snmpd.conf] sending restart action to service[snmpd] (delayed)
[Thu, 26 Apr 2012 11:00:10 +1000] INFO: Processing service[snmpd] action restart (snmp::default line 34)
[Thu, 26 Apr 2012 11:00:12 +1000] INFO: service[snmpd] restarted
[Thu, 26 Apr 2012 11:00:12 +1000] INFO: Chef Run complete in 3.802594 seconds
[Thu, 26 Apr 2012 11:00:12 +1000] INFO: Running report handlers
[Thu, 26 Apr 2012 11:00:12 +1000] INFO: Report handlers complete

  snmp]# 

  snmp]# 


  snmp]# ls -il
total 8
544050 -rw-r--r--. 1 root root 2024 Apr 26 11:00 snmpd.conf
132810 -rw-r--r--. 1 root root  220 Nov 16 01:07 snmptrapd.conf


  snmp]# snmpwalk -v1 -cpublic 127.0.0.1 system
Timeout: No Response from 127.0.0.1


The inode number assigned should be something circa 132xxx,  not 544xxx.  The file itself is still readable and appears to behave completely normally *except within* sysv init scripts (starting snmpd from the command line works).  If I copy/move the file such that it gets a 132xxx inode, then everything works fine out of init.d, all with exactly the same file content.  I also have synonymous issues with OpenLDAP recipe's and sysv startup, so do anticipate this being quite endemic.

Chef itself is entirely innocuous with it's template generation: it simply does pure Ruby File.open('w+'), and FileUtils.mv which I expect rely on stdio and whatever underlying file implementation (in our case, ext4).  It is quite difficult to point at Chef or Ruby software as being at fault as it's all sooo simple at this level.

The chef/gems stack is *exactly* the same on our unaffected BastionLinux as RHEL6, chef 0.10.8 (http://linux.last-bastion.net/LBN/up2date/cloud/13 for full repo details).

The version of Ruby RHEL6 ships is 1.8.7.352.  I actually compiled and deployed BastionLinux's Ruby on the RHEL box (1.8.7.334 but with rb-readline support for irb command history), and still have the issue - with *everything* but the kernel identical.  The RHEL6 kernel version is 2.6.32-220, whereas we're at 2.6.34.8-68 on BastionLinux.

I'd love to know if anyone else has experienced similar issues as it basically makes chef useless for RHEL6 deployments.  I am happy to raise this with Red Hat, but RHEL6 is hardly beta, and I really want to make sure I'm not missing something ...

Alan



Archive powered by MHonArc 2.6.16.

§