[chef] Re: Re: Scaling erchef horizontally


Chronological Thread 
  • From: DV < >
  • To:
  • Subject: [chef] Re: Re: Scaling erchef horizontally
  • Date: Tue, 29 Apr 2014 23:11:30 -0700

Hmm, Stephen, when you say "indexable objects are stored on chef server", you mean there's a call that comes in to the API (say, create new node) that goes to erchef, which goes to chef-expander and rabbitmq? In that case, one chef-expander and rabbitmq per erchef are appropriate, it seems, as long as each erchef talks to its own chef-expander and rabbitmq.

Here's how we've set up Chef 11 at my company: https://www.dropbox.com/s/q41172dtrth4yw5/chef11_layout.png

The Web UI / API hosts have [chef-expander chef-server-webui erchef rabbitmq nginx], Postgres/Bookshelf/Solr hosts are dedicated to their role. Everything is set up with chef-server cookbook and custom roles, except Postgres (since chef-server cookbook doesn't allow master/slave config). Bookshelf is replicated on filesystem level (slave is read-only until replication is broken).

We've ran this for a few months and haven't seen any issues yet.


On Thu, Apr 24, 2014 at 1:04 PM, Stephen Delano < " target="_blank"> > wrote:
There should be some more crash logs from the console telling you what's going on with erchef, but you're also going to have some other issues with the setup you've described. If you're running enough erchef servers, you might want to check that you're not exceeding the available connections of the PostgreSQL server.

Multiple Bookshelfs:
Bookshelf was not designed to be run on multiple nodes. It has local disk-based storage for the contents of your cookbooks.

Multiple Chef Expanders / RabbitMQ / Solr:
You also don't want to run multiple search stacks. When indexable objects are stored on the chef server, their contents are shuffled off to a RabbitMQ queue for which there is a chef-expander listener that's ready to consume that data, "expand" it, and send it to Solr for indexing. First, if you have multiple expanders as consumers to the rabbit queue, you're introducing the chance that the data is indexed out-of-order. This problem is exacerbated when you start to add multiple RabbitMQs (which erchef talk to which queues) and multiple Solrs (which erchefs and expanders talk to which Solr).


On Thu, Apr 24, 2014 at 9:42 AM, Darío Ezequiel Nievas < " target="_blank"> > wrote:
Hi Guys,
I'm having a bit of a problem trying to scale erchef between several nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the following services:
  -bookshelf
  -chef-expander
  -chef-server-webui
  -erchef
  -nginx

-2 servers behind a load balancer, runing these services:
  -chef-solr
  -rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb

Now, the problem

I can't seem to have erchef listening on port 8000 on both servers at the same time. When erchef starts on one of the servers, it starts crashing on the other one

=CRASH REPORT==== 24-Apr-2014::12:35:15 ===
  crasher:
    initial call: sqerl_client:init/1
    pid: <0.131.0>
    registered_name: []
    exception exit: {stop,timeout}
      in function  gen_server:init_it/6 (gen_server.erl, line 320)
    ancestors: [<0.112.0>,pooler_pool_sup,pooler_sup,sqerl_sup,<0.107.0>]
    messages: []
    links: [<0.112.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 4181
    stack_size: 24
    reductions: 22425
  neighbours:

=SUPERVISOR REPORT==== 24-Apr-2014::12:35:15 ===
     Supervisor: {<0.112.0>,pooler_pooled_worker_sup}
     Context:    child_terminated
     Reason:     {stop,timeout}
     Offender:   [{pid,<0.131.0>},
                  {name,sqerl_client},
                  {mfargs,{sqerl_client,start_link,undefined}},
                  {restart_type,temporary},
                  {shutdown,brutal_kill},
                  {child_type,worker}]

-If I stop erchef on node 1, the crash reports stop, and erchef starts listening on node2:8000
-Then, If I try to start erchef on node1, It won't work, unless I stop it on node2



Is there a way to avoid this, in order to be able to scale as many erchef instances as needed?

Thanks in advance!



Dario Nievas (Snowie)
MercadoLibre Cloud Services
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 11-6370-6406
Tel : +54(11) 4640-8443



--
Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104



--
Best regards, Dmitriy V.



Archive powered by MHonArc 2.6.16.

§