- From: Eric Horne <
>
- To:
- Subject: [chef] Re: Re: Push jobs vs SSH
- Date: Sat, 07 Feb 2015 05:47:41 -0800
Thank you for this. I
didn't realize knife ssh doesn't scale well.
I'm familiar with ansible, it does a pretty nice job wrapping ssh (I
only mention this because you said you weren't familiar with another
tool that wrapped ssh and worked at scale). But I'm not managing 25,000
nodes either. :)
-Eric
"
type="cite">
" photoname="Lamont Granquist"
src="jpg8x6yPyDb4b.jpg"
name="compose-unknown-contact.jpg" height="25px" width="25px">
Friday, February
06, 2015 9:18 AM
Scaling is a big problem
with ssh. Doing jobs over ssh really buckles
when you start hitting a thousand servers as the target for one job.
The protocol is slow and cpu hungry, at scale it can take hours to hit
all the servers just from the ssh client overhead on the central box,
which leads to having to build fan out to distribute ssh connections
over multiple source servers. Its also unreliable. You have to wrap it
with your own failure checks and timeouts and then sometimes it just
fails for no reason because its designed as an interactive login
protocol first and foremost and its reliability as an "RPC" mechanism is
poor -- so you need to detect that and retry individual failed hosts or
else just re-run the job you're doing multiple times. The way that ssh
trust test to grow in an organization also tends to result in really bad
security holes. I've seen bidirectional full meshes of ssh trust
constructed across 25,000 servers so that any compromise on any one
system would lead to login access on all the other servers (at a company
with otherwise really good security -- but ssh trust was way too 'easy'
and 'useful').
At a small scale of 400 servers or so it can work fine, you don't
see
the anomalous failures often enough, the runs are short enough (and
probably faster with current horsepower and lots of cores), and its
generally contained enough that you can stay on top of the security
issues. Scale it out, though, and you'll eventually find where its
really not designed to do that. And 'knife ssh' would need a bunch more
work to make it more reliable (I don't know of any other tool that wraps
ssh that does it any better, though, since mostly people find that
inflection point where ssh starts to be a really poor tool for the job
and then ditch the protocol).
And that's also having said that I don't know enough about what
we've
built to comment on what layers we've added on top. The ability to set
a job to run and report back success if a quorum of servers succeed (but
failure if not enough servers succeed) is a vital higher layer that
knife ssh certainly doesn't provide and becomes important when you can
just about guarantee that one or two of your servers may be down, but
that's fine (if you're pushing software to 2,500 targets all at once,
you'll find that linux itself often just isn't reliable enough to
guarantee that they're all up), yet you want to know if half your
deployments fail because that's really bad.
" photoname="Eric Horne"
src="jpg8x6yPyDb4b.jpg"
name="compose-unknown-contact.jpg" height="25px" width="25px">
Friday, February
06, 2015 5:34 AM
What's the difference between
chef push jobs and knife ssh?
I'm researching a use case for chef in which we want highly
controlled
deployments to orchestrate across several different systems. The
traditional "pull" doesn't fit the bill well because it is difficult to
get that to coordinate properly across systems.
From what I've read so far, chef push jobs are essentially a daemon
running on the remote servers that allow arbitrary (but
pre-defined/whitelisted) execution of commands. Aside from perhaps a
cleaner white-listing concept (over forced-ssh), how is this different
(better) than just using knife ssh?
I'm failing to see the benefits or use cases of chef push jobs over
knife ssh. The documentation is lacking in terms of how it is intended
to be used. Are push jobs better suited for different situations (and
what are those situations?)
Thanks for the help!
-Eric
|
- [chef] Re: Push jobs vs SSH, (continued)
Archive powered by MHonArc 2.6.16.