hi,
you cant do it easily with chef, not at least with standard chef components and within a single chef run. Chef can not query or notify remote nodes in realtime. As of now, i have used two approaches with different levels of success (for setting up multi node clusters/systems that requires certain steps to be in certain order ), and can not be converged completely without the presence of certain other nodes.
1) Keep all the installation and setup logic separate from the core cluster config resource. i.e. everything except the bare minimal configs required to start individual service. Install them in one phase, in parallel. These first phase run list should leave some footprint via attributes.
In the second phase, alter the run lists of nodes, and add the config recipe (and that should start the main service). chef run invocation order will be exactly same as you do it manually . The config recipes should exploit the attributes (and search based on them) to figure out things. Once both of these are working , you can minimize the chef run intervals to set up things faster (i prefer to use a chef run after every 5 mins for the first couple of chef runs at least.
2) you can also use something like flock of chefs (note, its highly experimental, required ruby 1.9 & celluloid etc) to do remote notifications. This is far more convenient, but also complex and errors can be difficult to debug. But you'll be able to do things staying within the chef recipes, i.e. you can set up the nodes, and keep all the services stopped till the first dependency is resolved, and the first dependency can remote notify the second, second resource can remote notify the third etc.
Optionally, you can use ansible of mco like external relatime dispatching systems also with the 1st setup.
my workflows are derived from build server setups has similar characteristics (like settng up jenkins or Go or teamcity farms) as persistence layer cluster solutions (like mysql replication, mongo replicasets, cassandra clusters etc). But i am now working more on fail over, which has similar challenges , but the solution requires much faster reconfiguration. I have learned the above mentioned workflow does not work for this. So, if you need to reconfigure your system within few seconds (say in case of failover), this wont gonna work. Otherwise, if you can bare some delay, then this is the simplest solution i could think of
best
ranjib