[chef] Abusing Startup Handlers


Chronological Thread 
  • From: oscar schneider < >
  • To: chef < >
  • Subject: [chef] Abusing Startup Handlers
  • Date: Wed, 28 Mar 2012 11:38:05 +0200

Hi,

I'm currently thinking about abusing the Chef startup handlers for more stability in a compute cluster.

We are running a compute cluster for scientific analysis with a resource management system where you have a master node responsible for scheduling analysis jobs to execution nodes. For each of the execution nodes I would like to do the following:

1. When they start a chef run, they send a "Don't send any new workloads to me" message to the master nodeĀ 
2. When the chef run was successful they send a "Ok. I'm up for some more jobs now." message to the master node.

My goal is to have nodes on which a chef run fails to be disabled in the resource management system because a failed chef run most of the time means that the node is in a bad/undefined state so chances are that new jobs scheduled there will fail.

I could obviously have the node disabled with an execute resource in the very first recipe, but what if one of the ohai plugins gets stuck and chef never makes it to that first recipe?

Therefore I would like to disable the execution node at the very beginning of the chef run. Is this possible with the startup handlers?

As I understood it, the start up handlers are run after ohai. Wouldn't it be reasonable to run them before ohai starts, thus making the startup handlers the very first thing to run on a node?

Cheers,

Oscar



Archive powered by MHonArc 2.6.16.

§