[chef] Re: Re: Re: Re: Chef Server timeouts


Chronological Thread 
  • From: Cassiano Leal < >
  • To: < >
  • Cc:
  • Subject: [chef] Re: Re: Re: Re: Chef Server timeouts
  • Date: Wed, 24 Jul 2013 18:18:08 -0300

Ok, so I “solved” this issue by deleting all cookbooks from the server, then re-uploading them while carefully checking versions pinned down for the environment at the same time.


Some of them had lower versions than the current one pinned down and something must have gone awry because of that, but there’s no way I could pinpoint the problem by looking at the logs.


I really wish there was more information spit out. One good example would be that when a node doesn’t find a cookbook *version* it wouldn’t log that it couldn’t find *the cookbook* on the server. Printing out which version it wants along with the name would go a long way here. :)


--
Cassiano Leal
http://cassianoleal.com
http://twitter.com/cassianoleal

On July 24, 2013 at 16:17:46, Cassiano Leal ( ) wrote:

I got new ones:


==> error.log <==

2013/07/24 16:15:56 [error] 859#0: *1771 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.1.137, server: ip-10-0-1-10.sa-east-1.compute.internal, request: "POST /environments/production/cookbook_versions HTTP/1.1", upstream: "http://127.0.0.1:8000/environments/production/cookbook_versions", host: "10.0.1.10:443"


==> access.log <==

10.0.1.137 - - [24/Jul/2013:16:15:56 -0300]  "POST /environments/production/cookbook_versions HTTP/1.1" 504 "300.113" 182 "-" "Chef Client/11.4.4 (ruby-1.9.3-p286; ohai-6.16.0; x86_64-linux; +http://opscode.com)" "127.0.0.1:8000" "504" "300.065" "11.4.4" "algorithm=sha1;version=1.0;" "apps" "2013-07-24T19:11:11Z" "FkL+xFBHbcjWA94iv8c+Izoud/w=“ 1059


Another thing I noticed is that beam.smp is consuming 100% of a CPU core, and it’s been that way for hours.


--
Cassiano Leal
http://cassianoleal.com
http://twitter.com/cassianoleal

On July 24, 2013 at 15:46:40, Stephen Delano ( ) wrote:

Hi there,

Can you find the corresponding POST request in /var/log/chef-server/nginx/access.log? The post will be to "/environments/production/cookbook_versions" and might be able to shed some more light on where the timeout occurred (e.g. connection timeout vs. read timeout).

There is an outstanding bug in the depsolver erlang library that has been causing CPU hangs on the chef server: http://tickets.opscode.com/browse/CHEF-3921

Cheers!
Stephen


I ran chef-server-ctl test and got these failures:


http://pastie.org/private/6jqnpte37herj8jyomhmq


Any idea what’s happening? Some indexing gone wrong maybe?



On July 24, 2013 at 14:16:41, Cassiano Leal ( "> ) wrote:

The only things logged for erchef are:


2013-07-24T17:08:09Z "> INFO req_id=x+AfrOJ1AUtORncdS9mIfg==; status=200; method=GET; path=/nodes/apps; user=apps; msg=[]; req_time=504; rdbms_time=422; rdbms_count=2

2013-07-24T17:08:09Z "> INFO req_id=4QrUsBd/rJ77XlBB2hWogw==; status=200; method=GET; path=/roles/api; user=apps; msg=[]; req_time=479; rdbms_time=373; rdbms_count=2

2013-07-24T17:08:10Z "> INFO req_id=55zYGCExTCxAYJAaXdGMGA==; status=200; method=GET; path=/roles/manager; user=apps; msg=[]; req_time=488; rdbms_time=356; rdbms_count=2

2013-07-24T17:08:10Z "> INFO req_id=a8NzqLLMByrRSPSdbvazQw==; status=200; method=GET; path=/roles/web; user=apps; msg=[]; req_time=476; rdbms_time=349; rdbms_count=2

2013-07-24T17:08:11Z "> INFO req_id=Ec3C1KgepqDadQu/WJpTPA==; status=200; method=GET; path=/environments/production; user=apps; msg=[]; req_time=469; rdbms_time=342; rdbms_count=2


I’m not sure how this is helpful..



On July 24, 2013 at 12:42:19, Chris ( "> ) wrote:

BTW, port 8000 should be the erChef process


On Wed, Jul 24, 2013 at 8:41 AM, Chris < " target="_blank"> > wrote:
I've run into this too. I just migrated to OSC 11 on Monday and less than 24 hrs later had this problem. It only happened once though. I don't have any more info on it either.


On Wed, Jul 24, 2013 at 8:34 AM, Cassiano Leal < " target="_blank"> > wrote:

Hi,


I’m having an issue with my OSS Chef Server. Clients time out while trying to connect to it.


I found this on /var/log/chef-server/nginx/error.log:


2013/07/24 12:29:44 [error] 859#0: *30 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 10.0.1.160, server: ip-10-0-1-10.sa-east-1.compute.internal, request: "POST /environments/production/cookbook_versions HTTP/1.1", upstream: "http://127.0.0.1:8000/environments/production/cookbook_versions", host: "10.0.1.10:443”


My guess is that the problem is in the “upstream” service that runs on port 8000, but I’m not sure which service that is. Where do I look?


Cheers,




--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.



--
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.



Archive powered by MHonArc 2.6.16.

§