chef - [chef] Re: Request for input on orchestration

Subscribers: 1946
Owners
Bryan McLellan
Joshua Timberman
Nathen Harvey
Seth Chisamore
Serdar Sutay

Subscribe
Unsubscribe
Info
Archive

Post

RSS
Shared documents

General discussion about Chef

[chef] Re: Request for input on orchestration

From: AJ Christensen < >
To:
Subject: [chef] Re: Request for input on orchestration
Date: Thu, 23 Feb 2012 11:08:16 +1300
Authentication-results: mr.google.com; spf=pass (google.com: domain of designates 10.50.149.162 as permitted sender) ; dkim=pass

Yo,

I'm in the unfortunate position of having built *many* orchestrations
like this, around Chef, many of them in private organziations not to
be open sourced.

Many of them scoffed at by underwhelmed CxO who have spent too much
time reading the Wikipedia definition of Orchestration or having
"process" or "workflow" managers force-fed to them by VCs and Big
Enterprise. (</rant>)

WOT:

On 29 January 2011 09:26, Chris Walters
< >
wrote:
> Ohai Chefs!
>
> We're in the preliminary stages of designing possible solutions for
> orchestration and would like to understand the community's
> requirements.
>
> I'm going to write down my thoughts and questions. Nothing is gospel,
> so please feel free to comment on everything, including the framing.
>
> Background:
>
> Chef, as currently conceived, does a great job of exposing a model for
> how to get a system from either an embryonic state or a slightly
> misconfigured state to the desired state, mainly via the mechanism of
> resource idempotence.
>
> What I think is not yet well-modeled is how to go from one
> well-configured state to a completely different well-configued
> state. It also doesn't yet model synchronization of actions across
> multiple boxes in that there isn't a first-class way to gate actions
> that are dependent on the completion of steps on other servers. For
> example, a complex migration or deployment might require bringing
> boxes up or down, copying data, cleanly removing artifacts or services
> installed by previous chef runs, not restarting load balancers until
> some quorum of webservers have re-started, etc.
>
> We'd like to collect the use cases, requirements, and thoughts that
> best serve the community.

It would be great to have something built in for Chef, and that is the
road I had been walking with Pylon, a gem for chef that has a DCell
substrate running in the background; then you get actors and
messaging, and you can just build shit.

Obviously this approach doesn't work for most people because you have
to ship code, moderately complex, etc.. but it's what I've been
wanting to build to solve this.

> 1) What do you think the scope of orchestration is and is not?

I didn't read or write any books on this shit, so yeah, ymmv:

when I have built to solve orchestration, our primary use case is
generally a directory service; the ability for a recipe to register a
service (with all of the parameters required to *connect* to the
service) in the directory. It's also the other half of that, client
recipes who need to use those components. they should either error and
relaunch with a fresh state, or block [if you like]

> 2) What are the use cases that you would like to see an orchestration
> system/DSL accommodate? The more specific and granular the steps of
> the orchestration, the better. (If you would not like your use case
> made public but would nonetheless like it considered during design,
> validation, and testing, please send it to me directly at
> )

2x loadbalancer
4x webserver all launched

requirement: webservers added to loadbalancer table *only* when the
deploy is complete, not just node convergent

jenkins (ci, deploy) -> publishes packages, deploy messages, from/to version

loadbalancer -> talks to all active webservers via substrate
loadbalancer

webserver
webserver
webserver
webserver

requirement: binary packaged asset published by jenkins system is
rolling deployed to webservers with 0 downtime at the loadbalancer
layer

webserver 1-4 receive "deploy" message, agree on consensus, leader is
allocated for deploy slot; leader signals other workers, one-by-one,
to perform deploy, smoketest, and re-add to pool. no outage is visible
to the loadbalancer layer, as the connections are presented to
webservers through a consensus protocol FSM replicator (e.g. Paxos).
we could trigger an alert condition on one of the deploy slots failing
or even aggressively destroy and rebuild it.

You could do A/B style cut over with this too, would be another
signalling strategy locked down by a leader.

note: I'm currently trying to build this, I don't know what it will
look like or why I am trying to build it, but it's chock full of
science and shit: https://github.com/fujin/pylon/tree/feature/paxos --
the actor concurrency model has been great for prototyping
multi-decree paxos.

Here's the "search based" one we use for day to day, non crazy batman
shit: https://github.com/fujin/chef-discovery

>
> 3) What generic primitives do you think would be useful in such a
> system?

You probably want to have some hash values that the client, when
calling discover_service, can use to actually talk to it, right?
register_service :service, options = {}

Find the latest instantiation of this service? find the leader?
Restrict to environment? Get the ipaddress, get the options?
discover_service :service

How do you quantify which copy of the service you want, if multiple
are available? where is the conflict resolution handled?

Where is the state stored? What is the possibility that system
decisions will be made without consistent state?

I am super excited about this and would love to help out with
anything, feel free to ping at any time.

Robops mandates the creation of this software.

Cheers,

--AJ
>
> Thanks!
> Chris Walters

[chef] Re: Request for input on orchestration, John E. Vincent (lusis), 02/22/2012
- <Possible follow-up(s)>
- [chef] Re: Request for input on orchestration, AJ Christensen, 02/22/2012