[chef] Re: Re: Re: Re: Re: One-Shot runlists with inheritance


Chronological Thread 
  • From: Dan Nemec < >
  • To:
  • Subject: [chef] Re: Re: Re: Re: Re: One-Shot runlists with inheritance
  • Date: Fri, 28 Jan 2011 15:18:31 -0500
  • Domainkey-signature: a=rsa-sha1; c=nofws; d=nemecfamily.com; s=snocky; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=I1NK5mZfmUDoqw45Z3o7GttSMlyXBEIGRuTqsnTq0p2ANDh3UQucpAEn6FG62BqFyM j1PKuSZt6DRrH0fxJu0RqzII4YGJ0dlbNFR1D4LA+1Udw7b5uiG+pMX8b+h6GrpXL9qS OtYUw55x6enEjJP697kzvc5rf1Nv+QTdAZVRY=

Sorry for the really long post.

Here is our use case:

I agree that one-off runlists are a component of overall orchestration. Right now we use Control Tier for orchestration. It can handle the workflow [take server out of load, wait for connections to drain, deploy code to server, run smoke test, put server back in load]. We want to use Chef for the "Deploy Code" step. Actually, we plan to use it to deploy configuration and all configuration dependencies where Control Tier deploys just the code. (We don't have any Chef implemented, so these are currently only plans. We do have Control Tier running and have been using it for over a year orchestrating deployments).

In our case the thought process is that Control Tier would dispatch a "chef-client -j <runlist>" or some such thing to the node that is being acted upon. We want that runlist to have only what is important to that activity. For a code deployment the runlist would deploy application code. For system updates the runlist would update system things. Any runlist that runs on the node is going to need some shared set of attributes on the node. We need a whole lifecycle of keeping the node attributes up to date  so that all the new configuration for the upcoming deployment is loaded prior to the deployment.

Answering your second question here, Before we knew all the details about Chef, we had the concept of an "attribute runlist" and an "action runlist" where the attribute runlist would be one runlist used to manage all node attributes and would not have any recipes that would actually perform work on the node. Then, we would maintain a collection of activity runlists that perform sets of system actions relying on the existing attributes on the node.

Now, we plan on one more variation. I'll prepend it with a disclaimer that we are an "old-school" shop learning new tricks. We have a 10+ year old code base and 10 years of process built around the caution that comes from countless painful deployments. We don't have the luxury of wiping the slate clean so we have to make incremental improvements and build on each success. That being said, we plan to "pre-deploy" most of our changes. So, the day before the scheduled deployment we plan to lay down all the code and configuration that is needed for the deployment in a location near the running code. Then, the deployment becomes more of [Stop, flip links, update database, Start]. In this case we would have a runlist that would pre-deploy configurations and a separate one that would activate the configurations.

Let me know if I am unaware of a feature here: Expanding on the notion of an "attribute runlist", node attributes should be persistent feature of a node. If I set an attribute that says my administrators email address is "> , then I shouldn't have to have a role in every runlist to assure that my admin email is always set. "chef-client -j" is destructive in that it only maintains attributes in the runlist that it ran. This doe create the problem that if you have persistent attributes you need a method of removing them. It is a challenging problem when specifying your attributes to make a process to be able to remove ones you no longer need. Chef will provide the way to delete, the user must figure out what to delete.

Now, to answer your first questions: I do not think that maintaining one node object per activity set would be practical in the long run.

Dan

On Fri, Jan 28, 2011 at 2:31 PM, Charles Duffy < "> > wrote:
This speaks more to orchestration than to one-off run lists, but let me comment --

My most interesting workflow I've been interesting in modeling is along the lines of the following:

"If average load across all application servers is less than 1.0, no more than 1/5 of all app servers are out of the pool, and this node is flagged as having at least one recipe in the pending-downtime list:
 - remove this node from the load balancer's pool
 - wait for all requests to drain
 - run all recipes in the pending-downtime list, removing each from said list after successful completion
 - when pending-downtime list is empty, put this server back into the pool"

...where several different recipes have the ability to add their own entries to the pending-downtime list (which could be anything from a firewall reconfiguration to an application restart to a full-system reboot)

Of course, the "no more than 1/5 of all app servers are out of the pool" requirement calls for some care to avoid race conditions.

If y'all are working on an orchestration solution, I would be very interested to hear how it addresses this kind of use case.

On Fri, Jan 28, 2011 at 1:12 PM, Chris Walters < " target="_blank"> > wrote:
Dan,

Absolutely. One-off run lists are one of the most requested features. They also fit into some of the preliminary discussions we've had about orchestration models. We plan to get a design together for one-off run lists in the next few weeks to share with the community for feedback.

If you're willing to comment on your use case more, here are a few questions that I have.

For your use case, does the multi-node solution with a shared base run list work, or do you actually need to have only one node object for the purpose of searching?

Should run lists be first-class objects instead of just properties on nodes and roles? Should they be able to contain not only roles and recipes but run list-containing entities (nodes and other dis-embodied run lists), as well?

If anyone else has opinions on any aspect of one-off run lists, please respond, as well.

Thank you for your input.

-chris


On Fri, Jan 28, 2011 at 7:31 AM, Dan Nemec < " target="_blank"> > wrote:
So, the obligatory next questions is:

"Is this anywhere on the roadmap?"

Thanks for the suggestion about multiple nodes. We'll play with that and see if it may be a workable, but not ideal solution.

Dan


On Wed, Jan 26, 2011 at 5:46 PM, Chris Walters < " target="_blank"> > wrote:
Hi Dan,

There isn't currently a way that I can think of to run one run list after another except to package up the main run list into a role and prepend that role to the one-off run list's items.

As for one-off run lists, there isn't currently a built-in solution. Since a single server can be managed by many chef nodes, one way to do it is to have different JSON files like you do, but run them as different nodes. Something like:

infrastructure maintenance runs:
"chef-client -j infra-maint.json -n node-XYZ-infra-maint"

deployment team runs:
"chef-client -j deployment.json -n node-XYZ-deployment"

etc.

Does that help?

-chris

On Wed, Jan 26, 2011 at 1:55 PM, < " target="_blank"> > wrote:
We have run into an interesting problem. We want to segregate runlists by
activity (e.g infrastructure maintenance, deployment, one-off, etc…). But we
want all the runlists to share some common role information about a node. We
have a node that has some roles (datacenter, servergroup, tier) that are
important identifiers and drive selection of certain attributes. We want
different groups to be able to do maintenance on their parts at different times
without impacting others. So if a sysadmin wants to update /etc/hosts he
shouldn’t have to worry if the application team has put in a new attribute
for a deployment later. The sysadmin can run a runlist that only affects the
parts of the system he is responsible for without worrying that an application
deployment recipe will run. Conversely in a software deployment the deployment
team should be able to update the applications without updating the operating
system (given the os changes are not part of the software deployment).

I thought “chef-client -j” would do this, but it didn’t. This is what I
did: I created a node and bootstrapped it with a runlist of its identity roles.
I then made a json file with a  runlist for a set of activity and ran the
runlist via “chef-client -j <jsonfile>”. The problem is that the runlist
for the node that existed before chef-client gets wiped out and only the
runlist in the json file gets run thus wiping out its “identity” and
breaking the one-off runlist because certain attributes no longer exist.

I’d like to be able to append a runlist on the fly to an existing runlist on
the node where the new runlist exists on the node only for the duration of the
chef-client run. The node has a “base” runlist that should always be run,
but I want to run some other recipes and roles one at a time while keeping the
“base” runlist. I do not want to have to copy the base runlist into the
json file of the one-shot runlist that I am running as I’m trying to keep the
“activity” runlists environment independent.

Is there a way to run a one-off runlist on a node that is effectively appended
to the runlist that is already on the node and is removed after the run?

Thanks,

Dan








Archive powered by MHonArc 2.6.16.

§