On Fri, Aug 1, 2014 at 4:28 AM, Jesse Hu < " target="_blank"> > wrote:
Hi Steven,
I found that erchef['depsolver_worker_count'] is listed on http://docs.getchef.com/config_rb_chef_server_enterprise_optional_settings.html , but not on http://docs.getchef.com/config_rb_chef_server_optional_settings.html.
Does it mean depsolver_worker_count is not applicable on Open Source Chef Server 11 ?
If it works for Open Source Chef Server 11, how can I know how many depsolver_workers are running after I set it in /etc/chef-server/chef-server.rb ?Yes, this works as well in recent versions of Open Source Chef Server 11 (11.1+), and the docs should be updated to reflect this. A quick way too verify the number of depsolver workers that are running is executing `pgrep -fl depselector`. You'll see the number of ruby processes that are waiting to receive cookbook dependency problems for solving.And erchef['depsolver_timeout'] = 5000 means the server side timeout when resolving cookbook dependencies? What's the chef-client side timeout when syncing cookbooks and how to increase it ?Correct, 5000ms is the time that erchef gives depselector / gecode to reach a solution given the current cookboook universe, run list, and environment constraints.What do you mean by "client side timeouts"? What timeouts are you seeing?
Thanks
Jesse Hu
Steven Danna wrote On Thu, 31 Jul 2014 09:25:22 +0100 :Hi, On Thu, Jul 31, 2014 at 6:05 AM, Jesse Hu " target="_blank">< > wrote:If adding more CPU cores, will increase erchef['depsolver_worker_count'] and nginx['worker_processes'] solve the 503 issue? Both these 2 attributes' devalue value are related to CPU cores number ? nginx['worker_processes'] should automatically scale based on CPU count if you run a chef-server-ctl reconfigure. You will, however, need to manually bump the depsolver_worker_count.<jesse> Yes, the chef-client exits after getting 503. How can I tell the chef-client to retry syncing cookbook or calling Chef Node or Search APIs? Is there a settings in /etc/chef/client.rb? If not, I may re-spawn chef-client when getting 503 error.By default, any API calls should be retried if they receive a HTTP response code in the 500-599 range. However, I believe that, the calls to actually download the cookbook content from s3/bookshelf are not retried. I suspected depsolver because of the serverside error messages from erchef, but perhaps you are seeing other errors as well that just didn't show up in your tail command on the server. It's possible that bookshelf or nginx is also failing. For the server The Chef support team wrote a handy script that I've uploaded to a gist here: https://gist.githubusercontent.com/stevendanna/279658e5fb3961f4b347/raw/1bf2afae25a05a0b3699ded6cb80139fa6250046/gistfile1.txt if you run it with the argument OSC, it will create a tarball of the last bits of all of your log files. So if you generate some failures and then run it, it should contain all the easily reachable evidence of what is going on. If you'd like, you could email me (not the whole list) the tarball and I can try to take a look. It would also be helpful if you sent along an example of a chef-client run that is failing with debug logging turned on.Is there a limit for number of the concurrent chef clients served by a single Chef Server 11 ? The depsolver workers number seems a blocker, unless giving more CPU cores.For any system there is a limit of how many concurrent connections it can handle based on the available resources (CPU, RAM, etc) and various limits imposed by the operating system (max file handles, max processes, etc). Since EC has a number of components it is hard to say what the hard limit is for a given machine. Depsolving is limited by the number of depsolver workers. But, in most cases, the task the workers do complete quickly. Further, chef-client *should* retry 503s when depsolving, so the number of concurrent /chef-client runs/ that can be handled typically far exceeds simply the number of depsolver workers. However, at some point, you do just need to feed depsolver more CPUs.<jesse> I tried 'no_lazy_load false' in my 300 nodes cluster, it still threw 503 ERROR. I see there is a commit make no_lazy_load the default in 7 days ago. As it described, my chef-client might run for a long time (1~2 hours), so I think I'd better set no_lazy_load to true.Since you are running your own server; however, you can set your s3_url_ttl to anything you want, avoiding the problem of expired links. However, I think it is probably best to definitively identify where your bottleneck is before tweaking more settings anyway. Cheers, Steven
--
Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104
Archive powered by MHonArc 2.6.16.