Yeah, places that I've been where managers have been afraid of config management (CFengine at the time) running on a schedule has resulted in an accretion of changes over time, and then once enough changes got queued up that we had to run it on a server and the change window was scheduled and it was approved by our CRB board and appropriate offerings were burned to the gods of ITIL, the changes would often wind up causing outages because so many changes hit the server and it was hard to determine the impact ahead of time. But the outages were all contained to change windows and were approved, so I guess that makes it okay. A tactic that I've used in the past has been to run CM only once per day and run it with a 12-hour random splay and time it for 8pm-8am. Changes can be committed during the business day and they don't immediately take effect, then they can get tested or pushed out manually. And if anything goes wrong, it'll start hitting servers at 8pm and you have a longer window before it hits your entire infrastructure and more time for you to get monitoring alerts and stop the changes rolling out. If you just run Chef every 30 minutes with a 5 minute random splay, then its likely that by the time your monitoring alerts you and you start taking action that the change has hit your entire infrastructure. By only doing the "scheduled" runs once per day you still keep the deltas between runs small, you allow yourself some time to stop your CM tool before it all rolls out, and you also reduce the load on your chef server infrastructure (or on our HEC infrastructure). The other thing is that if you only run Chef once a week or once a month on-demand, then you're not getting the "self-repairing" and SOX/PCI-DSS "prevent control" features of configuration management. If you're running it nightly then any junior SA or malicious attacker that logs into the server and manually changes the state of critical files will have those changes immediately rolled back. That produces prevent controls that auditors really like. That also trains your junior SAs to not make with the typey-typey on the keyboard and to use the CM program -- otherwise they tend to fall back to old behaviors of making changes on the console and then its not their fault they did that, its going to be Chef's fault that it rolled those changes back when its eventually run and reverts those changes and the service crashes. On 1/13/14 1:32 PM, David Petzel wrote: " type="cite"> |
Archive powered by MHonArc 2.6.16.