[chef] chef-server 11 hung forever until chef-server-ctl restart


Chronological Thread 
  • From: Hui Hu < >
  • To:
  • Subject: [chef] chef-server 11 hung forever until chef-server-ctl restart
  • Date: Mon, 18 Nov 2013 20:48:50 +0800

Hi guys,

I'm using chef-server-11.0.8-1.el5 on CentOS 5. My chef-server hungs forever after OS bootup.
Thanks in advance if any suggestion.

Below is my env:

tailf /etc/inittab :

SV:123456:respawn:/opt/chef-server/embedded/bin/runsvdir-start
SE:123456:once:/usr/sbin/mybootup.sh

/usr/sbin/mybootup.sh will run 'chef-server-ctl reconfigure' on every system boot and upload some cookbooks and roles.

My issue sometimes the chef-server hungs after /usr/sbin/mybootup.sh completes. Sometimes erchef hungs, sometimes bookshelf hungs.  This occurs seldom.

e.g. This command will hung forever:

wget http://127.0.0.1:8000/
--2013-11-18 12:41:53--  http://127.0.0.1:8000/
Connecting to 127.0.0.1:8000... connected.
HTTP request sent, awaiting response...

After run' chef-server-ctl restart' , the chef-server works well.

The chef logs doesn't show any error.   BTW how to set log level of erchef and bookshelf to debug ?

sudo chef-server-ctl tail
==> /var/log/chef-server/erchef/erchef.log.idx <==

==> /var/log/chef-server/erchef/erchef.log.siz <==
 
==> /var/log/chef-server/erchef/current <==
2013-11-18_08:57:40.35284 Loaded key default of type RSAPrivateKey from file /etc/chef-server/chef-webui.pem
2013-11-18_08:57:40.37516
2013-11-18_08:57:40.37517 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.37517 starting chef_index_sup
2013-11-18_08:57:40.37575
2013-11-18_08:57:40.37576 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.37577 Connecting to Rabbit at 127.0.0.1:5672/chef (exchange: <<>>)
2013-11-18_08:57:40.50012
2013-11-18_08:57:40.50013 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.50014 Negotiated maximums: (Channel = 0, Frame = 131072, Heartbeat = 0)

==> /var/log/chef-server/erchef/erchef.log.1 <==
2013-11-18T08:59:28Z "> INFO req_id=urcqSgbispND1AVfpvbeUA==; status=201; method=POST; path=/roles; user=myuser; msg={created,<<"mapr_jobtracker">>}; req_time=9; rdbms_time=4; rdbms_count=3
2013-11-18T08:59:32Z "> INFO req_id=n6nXCO3iagR34yZ388D9TA==; status=201; method=POST; path=/roles; user=myuser; msg={created,<<"mapr_zookeeper">>}; req_time=9; rdbms_time=3; rdbms_count=3
2013-11-18T08:59:32Z "> INFO req_id=h2OZJpyk2I4UOwgfE2hweg==; status=201; method=POST; path=/roles; user=myuser; msg={created,<<"mapr_hbase_client">>}; req_time=29; rdbms_time=5; rdbms_count=3
2013-11-18T08:59:32Z "> INFO req_id=W7tUM5z5HqUeD4RgphehVg==; status=201; method=POST; path=/roles; user=myuser; msg={created,<<"mapr_hbase_regionserver">>}; req_time=8; rdbms_time=3; rdbms_count=3

==> /var/log/chef-server/chef-expander/current <==
role[0000000000005295a31d2e5d5ee87dcc] database[chef_00000000000000000000000000000000] transit,xml,solr-post | 0,0.00047397613525390625,0.012217044830322266 |
2013-11-18_08:59:32.52864 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed role[000000000000ea2b2572dfa22ca1af0c] database[chef_00000000000000000000000000000000] transit,xml,solr-post | 0,0.0004630088806152344,0.03265976905822754 |
2013-11-18_08:59:32.54012 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed role[0000000000006d8e8b23feacd9b38599] database[chef_00000000000000000000000000000000] transit,xml,solr-post | 0,0.00034499168395996094,0.017193078994750977 |
2013-11-18_08:59:32.91701 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed role[000000000000f9d444c1a03859122ec6] database[chef_00000000000000000000000000000000] transit,xml,solr-post | 0,0.00034499168395996094,0.00807332992553711 |

==> /var/log/chef-server/chef-solr/current <==
2013-11-18_08:59:35.58380 Nov 18, 2013 8:59:35 AM org.apache.solr.core.QuerySenderListener newSearcher
2013-11-18_08:59:35.58381 INFO: QuerySenderListener done.
2013-11-18_08:59:35.58469 Nov 18, 2013 8:59:35 AM org.apache.solr.core.SolrCore registerSearcher
2013-11-18_08:59:35.58470 INFO: [] Registered new searcher main
2013-11-18_08:59:35.58566 Nov 18, 2013 8:59:35 AM org.apache.solr.search.SolrIndexSearcher close
2013-11-18_08:59:35.58567 INFO: Closing main
2013-11-18_08:59:35.58567     fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-11-18_08:59:35.58568     filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-11-18_08:59:35.58568     queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=1,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=1,cumulative_evictions=0}
2013-11-18_08:59:35.58569     documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

==> /var/log/chef-server/rabbitmq/current <==
2013-11-18_08:56:49.33375 starting exchange, queue and binding recovery                         ...done
2013-11-18_08:56:49.34651 starting mirror queue slave sup                                       ...done
2013-11-18_08:56:49.34700 starting adding mirrors to queues                                     ...done
2013-11-18_08:56:49.34751 -- message delivery logic ready
2013-11-18_08:56:49.34767 starting error log relay                                              ...done
2013-11-18_08:56:49.34852 starting networking                                                   ...done
2013-11-18_08:56:49.37482 starting direct_client                                                ...done
2013-11-18_08:56:49.37526 starting notify cluster nodes                                         ...done
2013-11-18_08:56:49.37544
2013-11-18_08:56:49.37545 broker running

==> /var/log/chef-server/bookshelf/current <==
2013-11-18_09:54:37.84878         badarg}}
2013-11-18_10:23:23.94760
2013-11-18_10:23:23.94763 =ERROR REPORT==== 18-Nov-2013::10:23:23 ===
2013-11-18_10:23:23.94763 {error,{{mini_s3,make_signed_url_authorization},
2013-11-18_10:23:23.94764         [<<"SECRETKEY">>,"get","/",undefined,
2013-11-18_10:23:23.94764          [{'Accept',"*/*"},
2013-11-18_10:23:23.94764           {'Connection',"Keep-Alive"},
2013-11-18_10:23:23.94765           {'Host',"127.0.0.1:4321"},
2013-11-18_10:23:23.94765           {'User-Agent',"Wget/1.11.4 Red Hat modified"}]],
2013-11-18_10:23:23.94765         badarg}}

==> /var/log/chef-server/postgresql/current <==
2013-11-18_08:57:00.82999 LOG:  database system was shut down at 2013-11-18 08:56:57 GMT
2013-11-18_08:57:00.84494 LOG:  autovacuum launcher started
2013-11-18_08:57:00.84620 LOG:  database system is ready to accept connections

==> /var/log/chef-server/nginx/rewrite-port-9080.log <==

==> /var/log/chef-server/nginx/access.log <==
10.111.88.50 - - [18/Nov/2013:10:00:04 +0000]  "GET //cookbooks?num_versions=1 HTTP/1.1" 504 "300.013" 182 "-" "Chef Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0; x86_64-linux; +http://opscode.com)" "127.0.0.1:8000" "504" "300.006" "11.4.0" "algorithm=sha1;version=1.0;" "myuser" "2013-11-18T09:55:04Z" "2jmj7l5rSw0yVb/vlWAYkK/YBwk=" 973
10.111.88.50 - - [18/Nov/2013:10:00:15 +0000]  "GET /roles HTTP/1.1" 504 "300.020" 182 "-" "Chef Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0; x86_64-linux; +http://opscode.com)" "127.0.0.1:8000" "504" "300.014" "11.4.0" "algorithm=sha1;version=1.0;" "myuser" "2013-11-18T09:55:15Z" "2jmj7l5rSw0yVb/vlWAYkK/YBwk=" 953
10.111.88.50 - - [18/Nov/2013:11:55:12 +0000]  "GET /roles HTTP/1.1" 504 "300.007" 182 "-" "Chef Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0; x86_64-linux; +http://opscode.com)" "127.0.0.1:8000" "504" "300.002" "11.4.0" "algorithm=sha1;version=1.0;" "myuser" "2013-11-18T11:50:12Z" "2jmj7l5rSw0yVb/vlWAYkK/YBwk=" 953

==> /var/log/chef-server/nginx/current <==

==> /var/log/chef-server/nginx/error.log <==
2013/11/18 09:34:17 [error] 4557#0: *1041 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.111.88.50, server: <hostname>, request: "GET /search/node?q=cluster_name:test&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000 HTTP/1.1", upstream: "http://127.0.0.1:8000/search/node?q=cluster_name:test&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000", host: "<hostname>:9443"
2013/11/18 09:39:14 [error] 4557#0: *1043 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.111.88.50, server: <hostname>, request: "GET /roles HTTP/1.1", upstream: "http://127.0.0.1:8000/roles", host: "<hostname>:9443"

Thanks
Jesse Hu



Archive powered by MHonArc 2.6.16.

§