I have come across a rather large (and repeatable) problem trying to
use Chef on RHEL6 systems. This works fine on Ubuntu, SLES11, and
our own BastionLinux (a Fedora variant) - I very much suspect it's a
kernel bug in RHEL6, but would very much like to know if anyone else
has come across it. To exemplify the issue, we're using the SNMP recipe to write /etc/snmp/snmpd.conf, start snmpd, then attempt a walk. Unfortunately, after the chef-client run, /etc/snmp/snmpd.conf has a suspiciously large inode number and the snmpd daemon, whilst up, isn't responding. So - prior to the chef-client run (all good): snmp]# ls -il total 8 132819 -rw-r--r--. 1 root root 2024 Apr 26 10:58 snmpd.conf 132810 -rw-r--r--. 1 root root 220 Nov 16 01:07 snmptrapd.conf snmp]# snmpwalk -v1 -cpublic 127.0.0.1 system SNMPv2-MIB::sysDescr.0 = STRING: Linux rheltest.drives.xxxx 2.6.32-220.el6.x86_64 #1 SMP Wed Nov 9 08:03:13 EST 2011 x86_64 SNMPv2-MIB::sysObjectID.0 = OID: NET-SNMP-MIB::netSnmpAgentOIDs.10 DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1415) 0:00:14.15 SNMPv2-MIB::sysContact.0 = STRING: Root < > SNMPv2-MIB::sysName.0 = STRING: rheltest.drives.xxxx ... Now taint snmpd.conf to enforce overwriting on next chef run .... snmp]# vi snmpd.conf snmp]# /etc/rc.d/init.d/snmpd restart Stopping snmpd: [ OK ] Starting snmpd: [ OK ] snmp]# chef-client [Thu, 26 Apr 2012 11:00:08 +1000] INFO: *** Chef 0.10.8 *** [Thu, 26 Apr 2012 11:00:08 +1000] INFO: Run List is [role[myxxx-debug]] [Thu, 26 Apr 2012 11:00:08 +1000] INFO: Run List expands to [myxxx] [Thu, 26 Apr 2012 11:00:08 +1000] INFO: Starting Chef Run for rheltest.drives.xxxxx [Thu, 26 Apr 2012 11:00:08 +1000] INFO: Running start handlers [Thu, 26 Apr 2012 11:00:08 +1000] INFO: Start handlers complete. [Thu, 26 Apr 2012 11:00:09 +1000] INFO: Loading cookbooks [perl, snmp] [Thu, 26 Apr 2012 11:00:09 +1000] INFO: Processing package[net-snmp] action install (snmp::default line 21) [Thu, 26 Apr 2012 11:00:10 +1000] INFO: Processing package[net-snmp-utils] action install (snmp::default line 21) [Thu, 26 Apr 2012 11:00:10 +1000] INFO: Processing service[snmpd] action start (snmp::default line 34) [Thu, 26 Apr 2012 11:00:10 +1000] INFO: Processing service[snmpd] action enable (snmp::default line 34) [Thu, 26 Apr 2012 11:00:10 +1000] INFO: Processing template[/etc/snmp/snmpd.conf] action create (snmp::default line 38) [Thu, 26 Apr 2012 11:00:10 +1000] INFO: template[/etc/snmp/snmpd.conf] backed up to /var/chef/backup/etc/snmp/snmpd.conf.chef-20120426110010 [Thu, 26 Apr 2012 11:00:10 +1000] INFO: template[/etc/snmp/snmpd.conf] mode changed to 644 [Thu, 26 Apr 2012 11:00:10 +1000] INFO: template[/etc/snmp/snmpd.conf] updated content [Thu, 26 Apr 2012 11:00:10 +1000] INFO: template[/etc/snmp/snmpd.conf] sending restart action to service[snmpd] (delayed) [Thu, 26 Apr 2012 11:00:10 +1000] INFO: Processing service[snmpd] action restart (snmp::default line 34) [Thu, 26 Apr 2012 11:00:12 +1000] INFO: service[snmpd] restarted [Thu, 26 Apr 2012 11:00:12 +1000] INFO: Chef Run complete in 3.802594 seconds [Thu, 26 Apr 2012 11:00:12 +1000] INFO: Running report handlers [Thu, 26 Apr 2012 11:00:12 +1000] INFO: Report handlers complete snmp]# snmp]# snmp]# ls -il total 8 544050 -rw-r--r--. 1 root root 2024 Apr 26 11:00 snmpd.conf 132810 -rw-r--r--. 1 root root 220 Nov 16 01:07 snmptrapd.conf snmp]# snmpwalk -v1 -cpublic 127.0.0.1 system Timeout: No Response from 127.0.0.1 The inode number assigned should be something circa 132xxx, not 544xxx. The file itself is still readable and appears to behave completely normally *except within* sysv init scripts (starting snmpd from the command line works). If I copy/move the file such that it gets a 132xxx inode, then everything works fine out of init.d, all with exactly the same file content. I also have synonymous issues with OpenLDAP recipe's and sysv startup, so do anticipate this being quite endemic. Chef itself is entirely innocuous with it's template generation: it simply does pure Ruby File.open('w+'), and FileUtils.mv which I expect rely on stdio and whatever underlying file implementation (in our case, ext4). It is quite difficult to point at Chef or Ruby software as being at fault as it's all sooo simple at this level. The chef/gems stack is *exactly* the same on our unaffected BastionLinux as RHEL6, chef 0.10.8 (http://linux.last-bastion.net/LBN/up2date/cloud/13 for full repo details). The version of Ruby RHEL6 ships is 1.8.7.352. I actually compiled and deployed BastionLinux's Ruby on the RHEL box (1.8.7.334 but with rb-readline support for irb command history), and still have the issue - with *everything* but the kernel identical. The RHEL6 kernel version is 2.6.32-220, whereas we're at 2.6.34.8-68 on BastionLinux. I'd love to know if anyone else has experienced similar issues as it basically makes chef useless for RHEL6 deployments. I am happy to raise this with Red Hat, but RHEL6 is hardly beta, and I really want to make sure I'm not missing something ... Alan |
Archive powered by MHonArc 2.6.16.