[chef] Re: Scaling erchef horizontally


Chronological Thread 
  • From: Stephen Delano < >
  • To: " " < >
  • Subject: [chef] Re: Scaling erchef horizontally
  • Date: Thu, 24 Apr 2014 13:04:19 -0700

There should be some more crash logs from the console telling you what's going on with erchef, but you're also going to have some other issues with the setup you've described. If you're running enough erchef servers, you might want to check that you're not exceeding the available connections of the PostgreSQL server.

Multiple Bookshelfs:
Bookshelf was not designed to be run on multiple nodes. It has local disk-based storage for the contents of your cookbooks.

Multiple Chef Expanders / RabbitMQ / Solr:
You also don't want to run multiple search stacks. When indexable objects are stored on the chef server, their contents are shuffled off to a RabbitMQ queue for which there is a chef-expander listener that's ready to consume that data, "expand" it, and send it to Solr for indexing. First, if you have multiple expanders as consumers to the rabbit queue, you're introducing the chance that the data is indexed out-of-order. This problem is exacerbated when you start to add multiple RabbitMQs (which erchef talk to which queues) and multiple Solrs (which erchefs and expanders talk to which Solr).


On Thu, Apr 24, 2014 at 9:42 AM, Darío Ezequiel Nievas < " target="_blank"> > wrote:
Hi Guys,
I'm having a bit of a problem trying to scale erchef between several nodes

First, let me give you guys an overview of my environment
-2 (there will be more) servers behind a load balancer, running the following services:
  -bookshelf
  -chef-expander
  -chef-server-webui
  -erchef
  -nginx

-2 servers behind a load balancer, runing these services:
  -chef-solr
  -rabbitmq

-a Postgresql cluster (using pgpool) for the chefdb

Now, the problem

I can't seem to have erchef listening on port 8000 on both servers at the same time. When erchef starts on one of the servers, it starts crashing on the other one

=CRASH REPORT==== 24-Apr-2014::12:35:15 ===
  crasher:
    initial call: sqerl_client:init/1
    pid: <0.131.0>
    registered_name: []
    exception exit: {stop,timeout}
      in function  gen_server:init_it/6 (gen_server.erl, line 320)
    ancestors: [<0.112.0>,pooler_pool_sup,pooler_sup,sqerl_sup,<0.107.0>]
    messages: []
    links: [<0.112.0>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 4181
    stack_size: 24
    reductions: 22425
  neighbours:

=SUPERVISOR REPORT==== 24-Apr-2014::12:35:15 ===
     Supervisor: {<0.112.0>,pooler_pooled_worker_sup}
     Context:    child_terminated
     Reason:     {stop,timeout}
     Offender:   [{pid,<0.131.0>},
                  {name,sqerl_client},
                  {mfargs,{sqerl_client,start_link,undefined}},
                  {restart_type,temporary},
                  {shutdown,brutal_kill},
                  {child_type,worker}]

-If I stop erchef on node 1, the crash reports stop, and erchef starts listening on node2:8000
-Then, If I try to start erchef on node1, It won't work, unless I stop it on node2



Is there a way to avoid this, in order to be able to scale as many erchef instances as needed?

Thanks in advance!



Dario Nievas (Snowie)
MercadoLibre Cloud Services
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 11-6370-6406
Tel : +54(11) 4640-8443



--
Stephen Delano
Software Development Engineer
Opscode, Inc.
1008 Western Avenue
Suite 601
Seattle, WA 98104


Archive powered by MHonArc 2.6.16.

§