Hi Stephen,
Thanks for your valuable tips. There is a chef bug CHEF-ISSUE-1904 that chef-client doesn't retry when getting HTTP 50X errors from Chef Server. I made a patch for it : https://github.com/opscode/chef/pull/1912 With this patch, I'm able to provision a 500 nodes cluster using Chef 11 without any chef-client exception. From the chef-client output, all chef-clients did retry about 100+ times in total and a single chef-client retried at most 2 times. Here is my chef version and configuration: Open Source Chef Server 11.1.4 on a CentOS 5.10 with 4 core CPU and 8G Memory Chef Client 11.14.2 ==== /etc/chef-server/chef-server.rb ==== chef_server_webui['enable'] = false nginx['ssl_port'] = 9443 nginx['non_ssl_port'] = 9080 chef_solr['heap_size'] = 3072 chef_solr['commit_interval'] = 3000 chef_solr['poll_seconds'] = 6 chef_expander['nodes'] = 4 erchef['s3_parallel_ops_timeout'] = 30000 erchef['s3_url_ttl'] = 3600 erchef['depsolver_timeout'] = 5000 erchef['depsolver_worker_count'] = 10 # default value is 5 (per 2 CPU cores). So I set it to 10 for my 4 CPU cores. erchef['db_pool_size'] = 250 # this is because I have a few Chef Node Save/Search API calls in the cookbook during chef-client execution. postgresql['max_connections'] = 350 ==== /etc/chef/client.rb ==== log_location STDOUT chef_server_url "https://hostname:9443" validation_client_name "chef-validator" node_name "mynode" log_level :info no_lazy_load true ssl_verify_mode :verify_peer ssl_ca_path "/etc/chef/trusted_certs" # this is to reduce the chef node data size for speeding up Chef Search API calls Ohai::Config[:disabled_plugins] = [:Azure, :Filesystem, :Cloudv2, :Virtualization, :Virtualizationinfo, :Dmi, :Zpools, :Blockdevice, :Lsb, :Nodejs, :Languages, :Php, :Lua, :Perl, :C, :Java, :Python, :Erlang, :Groovy, :Ruby, :Mono, :Os, :Openstack, :Cloud, :Rackspace, :Ps, :Command, :Initpackage, :Rootgroup, :Keys, :Sshhostkey, :Ohai, :Chef, :Ohaitime, :Passwd, :Gce, :Systemprofile, :Linode, :Ipscopes, :Eucalyptus, :Ec2] Thanks Jesse Hu Stephen Delano wrote On Fri, 1 Aug 2014 09:30:12 -0700 : " type="cite"> |
Archive powered by MHonArc 2.6.16.