[chef] Broader workflows around Chef


Chronological Thread 
  • From: Dan Adams < >
  • To: < >
  • Subject: [chef] Broader workflows around Chef
  • Date: Sun, 01 Jul 2012 20:19:49 +0100
  • Mail-reply-to: < >

Hi

I think the usage of Chef itself has been pretty well covered in terms of general good practice etc, and testing is starting to get a lot of attention too, with quite a few different approaches and tools springing up. However I wanted to kick off a general discussion of other general practices and workflows around Chef that people are using, and talk about some of the non-core issues and challenges around running Chef, to figure out what others are doing and how we can learn and adopt the best patterns. In particular:

1) How are you handling emergency changes?
We've been all by-the-book and got ourselves a nice little CI pipeline to push our infra code through. This does mean that if someone says "we released X feature and its not working because Y configuration value isn't in place", our lead time to be able to deliver that change to the business is longer if we stick with our standard workflow. So, we stick with the workflow and the business loses patience, or we ditch the benefits of the CI pipeline/test and push it live straight away, possibly breaking something. I guess the issue here is that in a pre-CM world. you can make the change you want directly and have a high chance of making the right change quickly. Yes it doesn't scale and yes it can lead to drift and every host being a snowflake, but it meets sharp timescales. Have you had to re-reducate the wider business on why you do things differently now? Do you have an emergency change procedure inside or outside of Chef?

2) How are you handling upgrades of Chef itself?
Obviously Chef is now an integral part of your infrastructure... but how do you manage it? Manually? (wouldn't that be just as bad as how you were managing the rest of your infrastructure before?) Via Chef itself (how meta can you get?). And are you sticking with long release cycles and older versions or staying on the latest client and server build all the time, surfing the crest and rolling out fresh new bugs to your production CM system?

3) What monitoring of the chef server *and chef clients* do you use? What are you graphing and what are you alerting on?
This is probably a big one for devops types, and there's no really nice built in console for this. What are you monitoring, graphing and alerting on? Because the system uses pull, not push, some failed runs may only be visible from the client, so you can't easily monitor this from the server (unless you put something custom in place). How are you handling this?

4) How are you handling training and other issues around Chef passing from the delivery team through AIS into a legacy/inherited system?
You probably put Chef in yourself or as part of a small team, but your team may grow or as the product ages in your infrastructure it will become legacy or at least new to incomers to your team or department. How are you handling this? Has it itself been a driver for hiring more "operations coders" than pure sysadmins in that a core piece of tech now requires coding knowledge and practices? Or are you a dev and found that you don't need a sysadmin at all, or found that the devs you are hiring don't want to or struggle to work on and understand the platform stuff?

Just interested to hear how people are handling these and other questions

Many thanks

Dan



Archive powered by MHonArc 2.6.16.

§