[chef] Re: chef-server 11 hung forever until chef-server-ctl restart


Chronological Thread 
  • From: Hui Hu < >
  • To:
  • Subject: [chef] Re: chef-server 11 hung forever until chef-server-ctl restart
  • Date: Mon, 18 Nov 2013 21:13:24 +0800

the CPU and Mem looks normal on my chef-server

top - 13:08:29 up  4:12,  3 users,  load average: 0.01, 0.02, 0.00
Tasks: 203 total,   1 running, 202 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   8174384k total,  3595892k used,  4578492k free,    64628k buffers
Swap:  2096472k total,        0k used,  2096472k free,  1435928k cached

The strange thing is all my cookbook and roles are uploaded successfully, but later on the erchef hungs. Not sure erchef hungs before or after my chef-client tries to synchronizing cookbook.

The rabbitmq (5672) and bookshelf (4321) is responding, but erchef (8000) hungs.  Sometimes bookshelf hung too.  My issue should not be the same as https://tickets.opscode.com/browse/CHEF-3921 , all 'depend' in metadata.rb of my cookbook doesn't have version constraints.

wget http://127.0.0.1:8000/
--2013-11-18 13:08:45--  http://127.0.0.1:8000/
Connecting to 127.0.0.1:8000... connected.
HTTP request sent, awaiting response...

wget http://127.0.0.1:5672/
--2013-11-18 13:08:58--  http://127.0.0.1:5672/
Connecting to 127.0.0.1:5672... connected.
HTTP request sent, awaiting response... 200 No headers, assuming HTTP/0.9
Length: unspecified
Saving to: `index.html'

wget http://127.0.0.1:4321/bookshelf/
--2013-11-18 13:09:44--  http://127.0.0.1:4321/bookshelf/
Connecting to 127.0.0.1:4321... connected.
HTTP request sent, awaiting response... 403 Forbidden
2013-11-18 13:09:44 ERROR 403: Forbidden.

Thanks
Jesse Hu,  Project Serengeti


2013/11/18 Hui Hu < " target="_blank"> >
Hi guys,

I'm using chef-server-11.0.8-1.el5 on CentOS 5. My chef-server hungs forever after OS bootup.
Thanks in advance if any suggestion.

Below is my env:

tailf /etc/inittab :

SV:123456:respawn:/opt/chef-server/embedded/bin/runsvdir-start
SE:123456:once:/usr/sbin/mybootup.sh

/usr/sbin/mybootup.sh will run 'chef-server-ctl reconfigure' on every system boot and upload some cookbooks and roles.

My issue sometimes the chef-server hungs after /usr/sbin/mybootup.sh completes. Sometimes erchef hungs, sometimes bookshelf hungs.  This occurs seldom.

e.g. This command will hung forever:

wget http://127.0.0.1:8000/
--2013-11-18 12:41:53--  http://127.0.0.1:8000/
Connecting to 127.0.0.1:8000... connected.
HTTP request sent, awaiting response...

After run' chef-server-ctl restart' , the chef-server works well.

The chef logs doesn't show any error.   BTW how to set log level of erchef and bookshelf to debug ?

sudo chef-server-ctl tail
==> /var/log/chef-server/erchef/erchef.log.idx <==

==> /var/log/chef-server/erchef/erchef.log.siz <==
 
==> /var/log/chef-server/erchef/current <==
2013-11-18_08:57:40.35284 Loaded key default of type RSAPrivateKey from file /etc/chef-server/chef-webui.pem
2013-11-18_08:57:40.37516
2013-11-18_08:57:40.37517 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.37517 starting chef_index_sup
2013-11-18_08:57:40.37575
2013-11-18_08:57:40.37576 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.37577 Connecting to Rabbit at 127.0.0.1:5672/chef (exchange: <<>>)
2013-11-18_08:57:40.50012
2013-11-18_08:57:40.50013 =INFO REPORT==== 18-Nov-2013::08:57:40 ===
2013-11-18_08:57:40.50014 Negotiated maximums: (Channel = 0, Frame = 131072, Heartbeat = 0)

==> /var/log/chef-server/erchef/erchef.log.1 <==
2013-11-18T08:59:28Z " target="_blank"> INFO req_id=urcqSgbispND1AVfpvbeUA==; status=201; method=POST; path=/roles; user=myuser; msg={created,<<"mapr_jobtracker">>}; req_time=9; rdbms_time=4; rdbms_count=3
2013-11-18T08:59:32Z " target="_blank"> INFO req_id=n6nXCO3iagR34yZ388D9TA==; status=201; method=POST; path=/roles; user=myuser; msg={created,<<"mapr_zookeeper">>}; req_time=9; rdbms_time=3; rdbms_count=3
2013-11-18T08:59:32Z " target="_blank"> INFO req_id=h2OZJpyk2I4UOwgfE2hweg==; status=201; method=POST; path=/roles; user=myuser; msg={created,<<"mapr_hbase_client">>}; req_time=29; rdbms_time=5; rdbms_count=3
2013-11-18T08:59:32Z " target="_blank"> INFO req_id=W7tUM5z5HqUeD4RgphehVg==; status=201; method=POST; path=/roles; user=myuser; msg={created,<<"mapr_hbase_regionserver">>}; req_time=8; rdbms_time=3; rdbms_count=3

==> /var/log/chef-server/chef-expander/current <==
role[0000000000005295a31d2e5d5ee87dcc] database[chef_00000000000000000000000000000000] transit,xml,solr-post | 0,0.00047397613525390625,0.012217044830322266 |
2013-11-18_08:59:32.52864 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed role[000000000000ea2b2572dfa22ca1af0c] database[chef_00000000000000000000000000000000] transit,xml,solr-post | 0,0.0004630088806152344,0.03265976905822754 |
2013-11-18_08:59:32.54012 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed role[0000000000006d8e8b23feacd9b38599] database[chef_00000000000000000000000000000000] transit,xml,solr-post | 0,0.00034499168395996094,0.017193078994750977 |
2013-11-18_08:59:32.91701 [Mon, 18 Nov 2013 08:59:32 +0000] INFO: indexed role[000000000000f9d444c1a03859122ec6] database[chef_00000000000000000000000000000000] transit,xml,solr-post | 0,0.00034499168395996094,0.00807332992553711 |

==> /var/log/chef-server/chef-solr/current <==
2013-11-18_08:59:35.58380 Nov 18, 2013 8:59:35 AM org.apache.solr.core.QuerySenderListener newSearcher
2013-11-18_08:59:35.58381 INFO: QuerySenderListener done.
2013-11-18_08:59:35.58469 Nov 18, 2013 8:59:35 AM org.apache.solr.core.SolrCore registerSearcher
2013-11-18_08:59:35.58470 INFO: [] Registered new searcher main
2013-11-18_08:59:35.58566 Nov 18, 2013 8:59:35 AM org.apache.solr.search.SolrIndexSearcher close
2013-11-18_08:59:35.58567 INFO: Closing main
2013-11-18_08:59:35.58567     fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-11-18_08:59:35.58568     filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
2013-11-18_08:59:35.58568     queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=1,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=1,cumulative_evictions=0}
2013-11-18_08:59:35.58569     documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

==> /var/log/chef-server/rabbitmq/current <==
2013-11-18_08:56:49.33375 starting exchange, queue and binding recovery                         ...done
2013-11-18_08:56:49.34651 starting mirror queue slave sup                                       ...done
2013-11-18_08:56:49.34700 starting adding mirrors to queues                                     ...done
2013-11-18_08:56:49.34751 -- message delivery logic ready
2013-11-18_08:56:49.34767 starting error log relay                                              ...done
2013-11-18_08:56:49.34852 starting networking                                                   ...done
2013-11-18_08:56:49.37482 starting direct_client                                                ...done
2013-11-18_08:56:49.37526 starting notify cluster nodes                                         ...done
2013-11-18_08:56:49.37544
2013-11-18_08:56:49.37545 broker running

==> /var/log/chef-server/bookshelf/current <==
2013-11-18_09:54:37.84878         badarg}}
2013-11-18_10:23:23.94760
2013-11-18_10:23:23.94763 =ERROR REPORT==== 18-Nov-2013::10:23:23 ===
2013-11-18_10:23:23.94763 {error,{{mini_s3,make_signed_url_authorization},
2013-11-18_10:23:23.94764         [<<"SECRETKEY">>,"get","/",undefined,
2013-11-18_10:23:23.94764          [{'Accept',"*/*"},
2013-11-18_10:23:23.94764           {'Connection',"Keep-Alive"},
2013-11-18_10:23:23.94765           {'Host',"127.0.0.1:4321"},
2013-11-18_10:23:23.94765           {'User-Agent',"Wget/1.11.4 Red Hat modified"}]],
2013-11-18_10:23:23.94765         badarg}}

==> /var/log/chef-server/postgresql/current <==
2013-11-18_08:57:00.82999 LOG:  database system was shut down at 2013-11-18 08:56:57 GMT
2013-11-18_08:57:00.84494 LOG:  autovacuum launcher started
2013-11-18_08:57:00.84620 LOG:  database system is ready to accept connections

==> /var/log/chef-server/nginx/rewrite-port-9080.log <==

==> /var/log/chef-server/nginx/access.log <==
10.111.88.50 - - [18/Nov/2013:10:00:04 +0000]  "GET //cookbooks?num_versions=1 HTTP/1.1" 504 "300.013" 182 "-" "Chef Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0; x86_64-linux; +http://opscode.com)" "127.0.0.1:8000" "504" "300.006" "11.4.0" "algorithm=sha1;version=1.0;" "myuser" "2013-11-18T09:55:04Z" "2jmj7l5rSw0yVb/vlWAYkK/YBwk=" 973
10.111.88.50 - - [18/Nov/2013:10:00:15 +0000]  "GET /roles HTTP/1.1" 504 "300.020" 182 "-" "Chef Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0; x86_64-linux; +http://opscode.com)" "127.0.0.1:8000" "504" "300.014" "11.4.0" "algorithm=sha1;version=1.0;" "myuser" "2013-11-18T09:55:15Z" "2jmj7l5rSw0yVb/vlWAYkK/YBwk=" 953
10.111.88.50 - - [18/Nov/2013:11:55:12 +0000]  "GET /roles HTTP/1.1" 504 "300.007" 182 "-" "Chef Knife/11.4.0 (ruby-1.9.3-p286; ohai-6.16.0; x86_64-linux; +http://opscode.com)" "127.0.0.1:8000" "504" "300.002" "11.4.0" "algorithm=sha1;version=1.0;" "myuser" "2013-11-18T11:50:12Z" "2jmj7l5rSw0yVb/vlWAYkK/YBwk=" 953

==> /var/log/chef-server/nginx/current <==

==> /var/log/chef-server/nginx/error.log <==
2013/11/18 09:34:17 [error] 4557#0: *1041 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.111.88.50, server: <hostname>, request: "GET /search/node?q=cluster_name:test&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000 HTTP/1.1", upstream: "http://127.0.0.1:8000/search/node?q=cluster_name:test&sort=X_CHEF_id_CHEF_X%20asc&start=0&rows=1000", host: "<hostname>:9443"
2013/11/18 09:39:14 [error] 4557#0: *1043 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.111.88.50, server: <hostname>, request: "GET /roles HTTP/1.1", upstream: "http://127.0.0.1:8000/roles", host: "<hostname>:9443"

Thanks
Jesse Hu




Archive powered by MHonArc 2.6.16.

§