- From: Dan Adams <
>
- To: <
>
- Subject: [chef] Broader workflows around Chef
- Date: Sun, 01 Jul 2012 20:19:49 +0100
- Mail-reply-to: <
>
Hi
I think the usage of Chef itself has been pretty well covered in terms
of general good practice etc, and testing is starting to get a lot of
attention too, with quite a few different approaches and tools springing
up. However I wanted to kick off a general discussion of other general
practices and workflows around Chef that people are using, and talk
about some of the non-core issues and challenges around running Chef, to
figure out what others are doing and how we can learn and adopt the best
patterns. In particular:
1) How are you handling emergency changes?
We've been all by-the-book and got ourselves a nice little CI pipeline
to push our infra code through. This does mean that if someone says "we
released X feature and its not working because Y configuration value
isn't in place", our lead time to be able to deliver that change to the
business is longer if we stick with our standard workflow. So, we stick
with the workflow and the business loses patience, or we ditch the
benefits of the CI pipeline/test and push it live straight away,
possibly breaking something. I guess the issue here is that in a pre-CM
world. you can make the change you want directly and have a high chance
of making the right change quickly. Yes it doesn't scale and yes it can
lead to drift and every host being a snowflake, but it meets sharp
timescales. Have you had to re-reducate the wider business on why you do
things differently now? Do you have an emergency change procedure inside
or outside of Chef?
2) How are you handling upgrades of Chef itself?
Obviously Chef is now an integral part of your infrastructure... but
how do you manage it? Manually? (wouldn't that be just as bad as how you
were managing the rest of your infrastructure before?) Via Chef itself
(how meta can you get?). And are you sticking with long release cycles
and older versions or staying on the latest client and server build all
the time, surfing the crest and rolling out fresh new bugs to your
production CM system?
3) What monitoring of the chef server *and chef clients* do you use?
What are you graphing and what are you alerting on?
This is probably a big one for devops types, and there's no really nice
built in console for this. What are you monitoring, graphing and
alerting on? Because the system uses pull, not push, some failed runs
may only be visible from the client, so you can't easily monitor this
from the server (unless you put something custom in place). How are you
handling this?
4) How are you handling training and other issues around Chef passing
from the delivery team through AIS into a legacy/inherited system?
You probably put Chef in yourself or as part of a small team, but your
team may grow or as the product ages in your infrastructure it will
become legacy or at least new to incomers to your team or department.
How are you handling this? Has it itself been a driver for hiring more
"operations coders" than pure sysadmins in that a core piece of tech now
requires coding knowledge and practices? Or are you a dev and found that
you don't need a sysadmin at all, or found that the devs you are hiring
don't want to or struggle to work on and understand the platform stuff?
Just interested to hear how people are handling these and other
questions
Many thanks
Dan
Archive powered by MHonArc 2.6.16.