- From: Adam Jacob <
>
- To: Seth Chisamore <
>
- Cc: AJ Christensen <
>, Daniel DeLeo <
>, Chef Dev <
>
- Subject: [[chef-dev]] Re: [[chef-dev]] Re: [[chef-dev]] Re: [[chef-dev]] CHEF-2224
- Date: Thu, 14 Apr 2011 17:52:43 -0500
Works for me.
Adam
On Thu, Apr 14, 2011 at 5:51 PM, Seth Chisamore
<
>
wrote:
>
So as a compromise can we dump the attribute data out to /tmp like we do
>
with the stack trace now (/tmp/chef-stacktrace.out)? Maybe we can just make
>
the output on a failure contain node attribute data + exception...ie
>
/tmp/chef-failure.dump.
>
>
Seth
>
--
>
Opscode, Inc.
>
Seth Chisamore, Technical Evangelist
>
IRC, Skype, Twitter, Github: schisamo
>
>
>
>
On Thu, Apr 14, 2011 at 5:42 PM, AJ Christensen
>
<
>
>
wrote:
>
>
>
> Yo!
>
>
>
> On 15 April 2011 10:36, Daniel DeLeo
>
> <
>
>
> wrote:
>
> >
>
> >
>
> >
>
> > On Thursday, April 14, 2011 at 3:17 PM, Adam Jacob wrote:
>
> >
>
> > On Thu, Apr 14, 2011 at 5:09 PM, AJ Christensen
>
> > <
>
>
> > wrote:
>
> >
>
> > We've had failed first-run (but attributes-saved) nodes show up in
>
> > monitoring (via search), so I'm -1 on this being a bad behavior
>
> > change; it certainly is a change of behavior, but with some testing,
>
> > I'd totally support it.
>
> >
>
> > I would argue this is exactly what you want - those nodes in fact
>
> > should be triggering alarms - the services they were supposed to be
>
> > running (given that your intent at bootstrap time was to have a
>
> > working system with that run list, not a non-existent one) - and they
>
> > now fail. You don't want the situation where the now-stranded systems
>
> > are not included in your monitoring because of a failed bootstrap, do
>
> > you?
>
> >
>
> > I think that's exactly what I want. In the case that the chef run will
>
> > succeed, I don't want chef-client to run on the load balancer and pick
>
> > up a
>
> > node that isn't running the application yet, and I don't want the
>
> > monitoring
>
> > system to expect nodes to be running a service that hasn't been
>
> > configured
>
> > yet. I only want these things to happen after the machine is in a
>
> > working
>
> > state. Currently I have to do a dance to disable alerts in the NMS
>
> > before
>
> > they start paging people when I'm bringing up a new node.
>
>
>
> (we do the same dance; add nagios_notifications_disabled role,
>
> converge, bring up nodes.. verify, remove
>
> nagios_notifications_disabled role)
>
>
>
> I replied off list, but have been thinking about a layer above chef
>
> that our NMS confers with to determine good nodes to monitor.
>
>
>
> I too am part of the camp that doesn't want bad nodes discovered, and
>
> this is one of those cases.
>
>
>
> > In the failure case, I'm either going to be keeping an eye on the nodes
>
> > and
>
> > look at why they failed, or if I'm creating a lot of them in an
>
> > automated
>
> > way, I'll automate recovery or notification for the failure case.
>
> >
>
> >
>
> > I think this could be a regression too, cause I seem to recall a time
>
> > where the attributes weren't saved to the node prior to the
>
> > application of the resource collection.
>
> >
>
> > It is a regression - there was a time when we didn't do this, and we
>
> > put this behavior in specifically for cases like the above.
>
> >
>
> > In addition, this is a common early work pattern - you're tweaking
>
> > recipes, you're testing, and then you're building new systems from
>
> > scratch. The change away from storing the data early makes that loop
>
> > less intuitive (I've had 3 different people today comment on it.)
>
> >
>
> > I've tweaked recipes and I've tested and I've built systems from
>
> > scratch.
>
> > What I have not done is inspected the attribute data on the server to
>
> > debug
>
> > recipes. Can you give an example of why this is a necessary or superior
>
> > debugging technique? The log output has always been much more useful to
>
> > me,
>
> > and there's always the log resource if I need more.
>
>
>
> Only times I've done this are when we're having trouble validating
>
> that the on-disk changes are taking effect; most times we'd usually
>
> just use increased logging & throw debug statements around.
>
>
>
> > The fewer decision points diagnosing node-bootstrap-failure rings true.
>
> >
>
> > I feel like this is a red herring - if it brings you joy to include -j
>
> > /etc/chef/first-boot.json every time, go for it. :)
>
> >
>
> > The issue isn't whether you enjoy using the -j flag, it's that the way
>
> > things are currently, you sometimes need it and sometimes don't. For
>
> > example, if I'm bootstrapping a node, and I forget to set the node_name
>
> > and
>
> > I don't have a valid FQDN yet, chef will fail when it attempts to
>
> > determine
>
> > the node name from the FQDN, *before* creating and saving the node. In
>
> > this
>
> > case, I have to use -j after I fix the problem, because I never
>
> > successfully
>
> > contacted the server. If chef-client fails after the initial save, I
>
> > don't.
>
> > The only thing that works for every case is to re-run with -j.
>
> >
>
> >
>
> > Adam
>
> >
>
> > --
>
> > Opscode, Inc.
>
> > Adam Jacob, Chief Product Officer
>
> > T: (206) 619-7151 E:
>
> >
>
> >
>
> > --
>
> > Daniel DeLeo
>
> > Software Design Engineer
>
> > Opscode, Inc.
>
> >
>
>
--
Opscode, Inc.
Adam Jacob, Chief Product Officer
T: (206) 619-7151 E:
Archive powered by MHonArc 2.6.16.