[chef] Re: Not detecting service start failure


Chronological Thread 
  • From: Yoshi Spendiff < >
  • To: chef < >
  • Subject: [chef] Re: Not detecting service start failure
  • Date: Wed, 13 May 2015 10:23:31 -0700

Happens in chef client 12.3 and 12.0.3

Is there a global ignore_failure true that can be/is set that could be on accidentally?

Has this behaviour changed in Chef 12+?

On Tue, May 12, 2015 at 1:25 PM, Yoshi Spendiff < " target="_blank"> > wrote:
Looks like it's not just the pound service either. I changed my recipe to remove all Listen entries for the ports.conf file in apache configuration (so they can be pushed out via vhost), which generates an error when attempting to start httpd, and httpd failed silently also.


On Tue, May 12, 2015 at 1:01 PM, Yoshi Spendiff < " target="_blank"> > wrote:
This is doing my nut in.

The redhat service provider is used, I can see that from the debug output:

  * service[pound] action start[2015-05-12T16:35:51+00:00] INFO: Processing service[pound] action start (indochino-role::web-server line 94)
[2015-05-12T16:35:51+00:00] DEBUG: providers for generic service resource enabled on node include: [Chef::Provider::Service::Init, Chef::Provider::Service::Redhat, Chef::Provider::Service::Upstart]
[2015-05-12T16:35:51+00:00] DEBUG: providers that refused resource service[pound] were: [Chef::Provider::Service::Upstart]
[2015-05-12T16:35:51+00:00] DEBUG: providers that support resource service[pound] include: [Chef::Provider::Service::Init, Chef::Provider::Service::Redhat]
[2015-05-12T16:35:51+00:00] DEBUG: providers that survived replacement include: [Chef::Provider::Service::Redhat]
[2015-05-12T16:35:51+00:00] DEBUG: service[pound] you have specified a status command, running..
pound              0:off    1:off    2:on    3:on    4:on    5:on    6:off
Starting Pound: starting...
[FAILED]
POUND INIT COMMAND EXITED WITH CODE OF 1
Starting Pound: starting...
[FAILED]
POUND INIT COMMAND EXITED WITH CODE OF 1
Starting Pound: starting...
[FAILED]
POUND INIT COMMAND EXITED WITH CODE OF 1
Starting Pound: starting...
[FAILED]
POUND INIT COMMAND EXITED WITH CODE OF 1
Starting Pound: starting...
[FAILED]
POUND INIT COMMAND EXITED WITH CODE OF 1
[2015-05-12T16:35:53+00:00] INFO: service[pound] started


I've tried to follow the stack as best I could, with my admittedly meager ruby grokking skills, and as far as I can work out an exception should be raised if the service fails to start.

So from what I can tell:
1. The redhat provider (Chef::Provider::Service::Init::Redhat) is loaded, which sets the following parameter @init_command = "/sbin/service ", which is correct
2. The service is loaded using load_current_resource in Chef::Provider::Service::Simple. The status is then verified either by a custom command if specified or the default using the default_init_script from the provider above.
3. If the condition for starting is met (i.e. the service is not running) and the start action is specified then the action_start function in Chef::Provider::Service is run, which adds the start_service function to convergence using 'converge_by'. The start_service function comes from either Chef::Provider::Service::Simple or Chef::Provider::Service::Init depending on if a custom start command is specified. Either way the command is sent to a shell out.
4. The start command is execute via the shell_out_with_systems_locale! function in Chef::Mixin::ShellOut which calls shell_out_with_systems_locale, which in turn calls shell_out_command, which in turn creates an instance of Mixlib::ShellOut to run the command
5. The .error! method of Mixlib::ShellOut is called against the instance after the command is run, which should generate a ShellCommandFailed exception with the message invalid!("Expected process to exit with #{valid_exit_codes.inspect}, but received '#{exitstatus}'") if error?

The last is the message I see if I run the recipe by chef-apply (and what I'm pretty sure I normally see when a service fails). Adding some hacky debugging to the chef code on a test instance I can see that Mixlib::ShellOut picks up the non-zero exit code, confirms it's not a valid exit code and I can see the error status is set to generate an error if the .error! method is called, which it is.
I can only assume that whatever in the convergence causes the service start command to loop 5 times also catches the exception and handles it somehow, I can't work that bit out.

This is the debugging fwiw (unsure why action_start and start_service aren't called the first time on the loop):
  * service[pound] action start[2015-05-12T19:35:34+00:00] INFO: shell_out_command method called
[2015-05-12T19:35:34+00:00] INFO: ShellOut exit status was 1. Allowed exit codes are [0]
[2015-05-12T19:35:34+00:00] INFO: shell_out_command error status is true
[2015-05-12T19:35:34+00:00] INFO: action_start method called
[2015-05-12T19:35:34+00:00] INFO: start_service method called
[2015-05-12T19:35:34+00:00] INFO: shell_out_with_systems_locale! method called
[2015-05-12T19:35:34+00:00] INFO: shell_out_with_systems_locale method called
[2015-05-12T19:35:34+00:00] INFO: shell_out_command method called
[2015-05-12T19:35:35+00:00] INFO: ShellOut exit status was 1. Allowed exit codes are [0]
[2015-05-12T19:35:35+00:00] INFO: shell_out_command error status is true
[2015-05-12T19:35:35+00:00] INFO: shell_out_with_systems_locale! error status is true
[2015-05-12T19:35:35+00:00] INFO: start_service method called
[2015-05-12T19:35:35+00:00] INFO: shell_out_with_systems_locale! method called
[2015-05-12T19:35:35+00:00] INFO: shell_out_with_systems_locale method called
[2015-05-12T19:35:35+00:00] INFO: shell_out_command method called
[2015-05-12T19:35:35+00:00] INFO: ShellOut exit status was 1. Allowed exit codes are [0]
[2015-05-12T19:35:35+00:00] INFO: shell_out_command error status is true
[2015-05-12T19:35:35+00:00] INFO: shell_out_with_systems_locale! error status is true
[2015-05-12T19:35:35+00:00] INFO: start_service method called
[2015-05-12T19:35:35+00:00] INFO: shell_out_with_systems_locale! method called
[2015-05-12T19:35:35+00:00] INFO: shell_out_with_systems_locale method called
[2015-05-12T19:35:35+00:00] INFO: shell_out_command method called
[2015-05-12T19:35:35+00:00] INFO: ShellOut exit status was 1. Allowed exit codes are [0]
[2015-05-12T19:35:35+00:00] INFO: shell_out_command error status is true
[2015-05-12T19:35:35+00:00] INFO: shell_out_with_systems_locale! error status is true
[2015-05-12T19:35:35+00:00] INFO: start_service method called
[2015-05-12T19:35:35+00:00] INFO: shell_out_with_systems_locale! method called
[2015-05-12T19:35:35+00:00] INFO: shell_out_with_systems_locale method called
[2015-05-12T19:35:35+00:00] INFO: shell_out_command method called
[2015-05-12T19:35:36+00:00] INFO: ShellOut exit status was 1. Allowed exit codes are [0]
[2015-05-12T19:35:36+00:00] INFO: shell_out_command error status is true
[2015-05-12T19:35:36+00:00] INFO: shell_out_with_systems_locale! error status is true
[2015-05-12T19:35:36+00:00] INFO: start_service method called
[2015-05-12T19:35:36+00:00] INFO: shell_out_with_systems_locale! method called
[2015-05-12T19:35:36+00:00] INFO: shell_out_with_systems_locale method called
[2015-05-12T19:35:36+00:00] INFO: shell_out_command method called
[2015-05-12T19:35:36+00:00] INFO: ShellOut exit status was 1. Allowed exit codes are [0]
[2015-05-12T19:35:36+00:00] INFO: shell_out_command error status is true
[2015-05-12T19:35:36+00:00] INFO: shell_out_with_systems_locale! error status is true
[2015-05-12T19:35:36+00:00] INFO: service[pound] started

    - start service service[pound]


This is where I need someone to tell me it's either working as intended and I'm doing it wrong or it's a bug, because I'm outta talent.

On Tue, May 12, 2015 at 8:52 AM, Yoshi Spendiff < " target="_blank"> > wrote:
Same thing happens if I boot up an instance and use chef-client against a Chef server. Says it's starting, says it supports status, says it's running, it's not.

chef-apply with the same service declaration works correctly on that instance as well.

[2015-05-12T15:44:11+00:00] DEBUG: service[pound] supports status, running
[2015-05-12T15:44:13+00:00] INFO: service[pound] started


But running service pound status returns an exit code of 3 and there are no processes called 'pound'

Is Chef actually using the init scripts? I thought if supports :status => true was set then essentially service pound status was run, and if supports :status => false then Chef checks for a process name (based on the declaration name?). In both cases it should fail?

On Tue, May 12, 2015 at 8:13 AM, Yoshi Spendiff < " target="_blank"> > wrote:
The init script already includes /etc/rc.d/init.d/functions, which is the same thing.

Why is chef-apply working but not Chef solo?

On Mon, May 11, 2015 at 9:12 PM, Fouts, Chris < " target="_blank"> > wrote:
On top of the script in /etc/init.d that starts the service, try adding the following line

. /etc/init.d/function

Chris

Sent from my iPad using Mail+ for Outlook

From: Yoshi Spendiff
Sent: 5/11/15, 9:57 PM
To: chef
Subject: [chef] Not detecting service start failure

Hi,

I'm having trouble understanding why Chef isn't detecting that a service is failing to start. I'm attempting to get Pound (proxy) to work. The package installs fine and I have a service declaration that says the service is started when it's not. I'm running via chef solo at the moment (it's a vagrant box) and running a chef-solo run with debug logging produces the following:

[2015-05-12T01:45:50+00:00] DEBUG: service[pound] you have specified a status command, running..
pound is stopped
pound              0:off    1:off    2:on    3:on    4:on    5:on    6:off
Starting Pound: starting...
[FAILED]
Starting Pound: starting...
[FAILED]
Starting Pound: starting...
[FAILED]
Starting Pound: starting...
[FAILED]
Starting Pound: starting...
[FAILED]
[2015-05-12T01:45:52+00:00] INFO: service[pound] started
- start service service[pound]


This are my recipe declaration:


# Install pound and deploy configuration file from template
package 'Pound' do
  options '--enablerepo=epel'
end

template '/etc/pound.cfg' do
  source 'pound.cfg.erb'
  mode 0644
end

service 'pound' do
  supports :restart => false, :reload => false, :status => true
  action [:start, :enable]
end

I have tried using the init_command and status_command options, as well as disabling and enabling the supports :status => true option

If I put the above service declaration into a test file and run it with chef-apply then it fails as it should:
[2015-05-12T01:56:28+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: service[pound] ((chef-apply cookbook)::(chef-apply recipe) line 5) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
---- Begin output of /sbin/service pound start ----
STDOUT: Starting Pound: [FAILED]
STDERR: starting...
---- End output of /sbin/service pound start ----
Ran /sbin/service pound start returned 1


The pound init script is set up to run correctly, a non-zero exit code is returned on unsuccessful start of pound.

There are no processes called pound running on the system and there are no other service declarations called pound in all of my cookbooks.

What am I missing?

Cheers



--



--



--



--



--
Yoshi Spendiff
Ops Engineer
Indochino
Mobile: +1 778 952 2025



Archive powered by MHonArc 2.6.16.

§