chef - [chef] RE: Re: Re: Re: how to set up cluster that has dependencies?

First login ?
Lost password ?

Subscribers: 1946
Owners
Bryan McLellan
Joshua Timberman
Nathen Harvey
Seth Chisamore
Serdar Sutay

Subscribe
Unsubscribe
Info
Archive

Post

RSS
Shared documents

General discussion about Chef

[chef] RE: Re: Re: Re: how to set up cluster that has dependencies?

From: < >
To: < >
Subject: [chef] RE: Re: Re: Re: how to set up cluster that has dependencies?
Date: Fri, 23 Sep 2011 09:59:49 +0200

Very interesting thought

I am using Chef for more than one years and from my experience, Chef is not a good option for runtime state management and also I think that it is not intended to do .

What I mean runtime state management are e.g. if a service is up or down, serving clients or idle, under loaded or overloaded etc.

This kind of runtime state is not only more dynamic but also changes independently (usually in command/control style) from configuration artifacts, which is domain of chef and chef manages them well.

So I do think that runtime state management which usually requires multi nodes orchestration need different tool

Other question we have encountered is application´s deployment, should we use chef to deploy application? The arguments against it is unlike deployment of rpm packages, gem, etc. deployment of application is more dynamic (in our case daily deployment vs. weekly/monthly in case of configuration and infrastructure software) also it need kind of multi nodes orchestration and people is more comfortable with certain level of control. Fully automatic application deployment in pull mode is not just difficult to implement but also hard to persuade operation team

De: Kevin Nuckolls [mailto:
Enviado el: viernes, 23 de septiembre de 2011 1:15
Para:
Asunto: [chef] Re: Re: Re: how to set up cluster that has dependencies?

Doozer is probably the most mature middle ground between Noah and Zookeeper. It relies on logical time which implements a version of paxos and is similar in spirit to Google's Chubby and Apache Zookeeper. If you don't require the notion of ephemeral nodes (which you might), then it's a really nice and lightweight solution for distributed configuration and distributed locks. It also happens to have a persistent data structure style interface so you can look back and see what the entire state of the configuration was at a given point in time. I think the part that has the most allure to me is that and the clarity of the API. A major design decision of doozer was a clean api so that clients could be easily written. It's a major part of the heroku architecture and they open sourced it recently. Your use case is one of the primary things they use it for.

Glu is also meant to solve the "last-mile" problem of dependent configuration management. It's built around zookeeper. It's meant to calculate differences between what the data and configuration on your nodes should be and what it is. From speaking with some who have worked with it, I hear it's less than an optimal solution. But a solution none-the-less. This was recently open sourced by LinkedIn.

For simple purposes, a hybrid solution of fabric/capistrano/rundeck/azkaban with the chef search api may be powerful enough albiet complex. Beyond that you'll need to look into systems which provide you with distributed locks. Unfortunately there's not a great deal of "canned" solutions for integration with chef other than noah but it is still a non-distributed single point of failure.

Since I seem to have thoughts on the matter, I'll go ahead and outline my opinions on what chef is good at and what it is not (yet) good at. To me it appears that there are four kinds of configuration.

1. Base system packages, utilities, and applications

2. Data (database backups, read-only data, etc)

3. Configuration that expects data to be in the correct spot at the right time

4. Multi-Node configuration (routing, orchestration/locks/cache management, group membership, node states)

IMO in trivial use-cases chef is very good at 1, good enough at 2 and 3 and can at least achieve 4 if you're very careful. In larger scenarios I believe it's only very good at #1 and perhaps the wrong abstraction entirely for 2, 3, and 4. I think these are important things to tell beginners because the breadth of what could be needed by chef in different environments has very high variance. The tooling surrounding these problems are still rapidly growing and different teams will have different needs. Unfortunately, for moderately complex architectures I feel like a passing understanding of what the following tools can do is worthy knowledge to obtain while attempting to design a new system that you wish to (fully or partially) automate with configuration management.

Papers worth reading:

Time, Clocks, and the Ordering of Events in a Distributed System (grandaddy paxos paper)

http://research.microsoft.com/en-us/um/people/lamport/pubs/time-clocks.pdf

The Chubby lock service for loosely coupled distributed systems

http://labs.google.com/papers/chubby.html

Open source implementations / approximations of paxos / chubby:

http://xph.us/2011/04/13/introducing-doozer.html

https://github.com/ha/doozerd

http://zookeeper.apache.org/

Fits the same mold as the others but is not distributed:

https://github.com/lusis/Noah

https://github.com/lusis-cookbooks/noah

Dependency management / workflow managers:

http://sna-projects.com/azkaban/

http://rundeck.org/

Deployment managers:

https://github.com/capistrano/capistrano

http://docs.fabfile.org/en/1.2.2/index.html

Other useful things:

https://github.com/linkedin/glu

http://sna-projects.com/norbert/

Hope that helps.

-Kevin

@kevinnuckolls

On Wed, Sep 21, 2011 at 9:28 PM, Noah Kantrowitz < "> > wrote:

On Sep 21, 2011, at 7:18 PM, Aaron Abramson wrote:

> The best thing to do would be to go through one of the getting-started tutorials: http://help.opscode.com/kb/otherhelp/build-a-lamp-stack
>
> Watch that, follow along and deploy it yourself (if you have access to EC2), or just read through it. It will give you a good idea on how nodes can query and search within templates.
>
> Look through the php-quick-start repo, the haproxy cookbook searches chef for the apache nodes and updates the templates accordingly.
>
> ----- Original Message -----
> From: "jeff stroomer" < "> >
> To: ">
> Sent: Wednesday, September 21, 2011 7:00:17 PM GMT -06:00 US/Canada Central
> Subject: [chef] how to set up cluster that has dependencies?
>
> Chef folks,
>
> I have a question concerning the best way to use Chef to set up a cluster of
> nodes that have dependencies on one another. (Apologies in advance if this is a
> naïve question, but I’m new to Chef.)
>
> For concreteness, suppose I want node V to run varnish, node T to run tomcat,
> and node M to run mongo. And let’s say that T needs to know the IP address
> of M, and V needs to know the IP address of T. I believe that each node can
> register its IP address in a database maintained on the Chef server, and
> recipes run by each node can query this database. For things to work properly,
> I ought to set up M first, then T, and finally V.
>
> My question is this: How I should plan to use Chef so that the setup of various
> nodes happens in the right order? Do I write a recipe that sets up M first,
> then T, and finally V? If so, then what is that recipe associated to? Or
> should I instead have a recipe for V that sets up T, and also have the setup
> recipe for T begin by setting up M? Or should I write recipes for T, M, and V
> that query the database, and don’t do anything unless they can find the IP
> addresses they need?

Chef doesn't (yet) address this kind of multi-node orchestration issue. search() does make the integration parts easy, but that isn't enough to handle a highly fluid environment sometimes. One option is just careful recipe construction, usually just making the Chef run abort early if a search for a needed component comes up empty can do the trick, as long as you run Chef in polling mode on a tight cycle. Another option is to use something like RunDeck or Fabric to execute chef-client in the correct order and not move on with the deployment until certain gate conditions are met. Beyond that you enter into the world of tools like Noah and ZooKeeper, which are built very specifically for this. Noah is somewhat new, but is also less encumbered by legacy Java craziness compared to ZooKeeper. ZooKeeper is probably your best bet though, as it has a powerful and flexible set of distributed locking and configuration primitives. We (Opscode) are also very interested in exploring this space as it is indeed a common problem and while we don't want to end up with a poor reimplementation of one or all of these something with tighter integration to Chef recipes would be awesome. Hope that helps!

--Noah

PS: Also a semi-related shootout to AJ's new pylon tool to do distributed master elections.

-----BEGIN PGP SIGNATURE-----

iEYEARECAAYFAk56nToACgkQA9fEp2eRsU+ACQCgiY94ToO+5GMMRbyrhqWfWNmy
q2oAoKrbnEB90ety+b9k9/4NYHaWnfO5
=88yU
-----END PGP SIGNATURE-----

ADVERTENCIA LEGAL
Este mensaje se dirige exclusivamente a su destinatario y puede contener información confidencial y/o sujeta al secreto profesional, cuya divulgación no está permitida por la ley. Si no es vd. el destinatario de este mensaje o lo ha recibido por error, queda informado de que la lectura, utilización, divulgación y/o copia de este mensaje, cualquiera que fuera su finalidad, está prohibida por la ley. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente por esta misma vía y proceda a su destrucción. El correo electrónico y las comunicaciones por medio de Internet no permiten asegurar la confidencialidad de los mensajes que se transmiten ni su integridad o correcta recepción. Si no consintiese la utilización del correo electrónico, le rogamos nos lo comunique de forma inmediata. ING DIRECT no asume ninguna responsabilidad por estas circunstancias.
LEGAL WARNING
This message is intended exclusively for its addressee and may contain information that is CONFIDENTIAL and/or protected by a professional privilege, protected from disclosure by law. If you are not the intended recipient or you have received it in error, you are hereby notified that any read, dissemination, disclosure and/or copy of this message, for any purpose, is strictly prohibited by law. If this message has been received in error, please immediately notify us vía e-mail and delete it. E-mail and Internet do not guarantee the confidentiality, nor the completeness or proper reception of the messages sent. Should you not agree to the use of e-mail, you are kindly requested to notify us immediately. ING DIRECT does not assume any liability for those circumstances.

[chef] Re: how to set up cluster that has dependencies?, Aaron Abramson, 09/21/2011
- [chef] Re: Re: how to set up cluster that has dependencies?, Noah Kantrowitz, 09/21/2011
  - [chef] Re: Re: Re: how to set up cluster that has dependencies?, Kevin Nuckolls, 09/22/2011
    - [chef] RE: Re: Re: Re: how to set up cluster that has dependencies?, le.huy, 09/23/2011
    - [chef] Re: Re: Re: Re: how to set up cluster that has dependencies?, Peter Norton, 09/26/2011