chef - [chef] Re: RE: Chef11 HA

Subscribers: 1946
Owners
Bryan McLellan
Joshua Timberman
Nathen Harvey
Seth Chisamore
Serdar Sutay

Subscribe
Unsubscribe
Info
Archive

Post

RSS
Shared documents

General discussion about Chef

[chef] Re: RE: Chef11 HA

From: Jesse Campbell < >
To: chef < >
Subject: [chef] Re: RE: Chef11 HA
Date: Fri, 15 Feb 2013 12:01:46 -0500

In the past, the official answer had been that the private chef paid offering comes out of the box with HA.
Have you worked with your company's money people to see if they'll shell out for it? I'd love to say my company would, but it has been an uphill battle... giving back to the community through patches is great, but giving back to ops code by paying to keep them in the black would be nice.

For HA, you'll need to take a look inside the current installer. There are multiple back end store components (solr, bookshelf, postgres) which all need replication or clustering, then there are middle tier services like the chef expander (or whatever it is called now) and the message queue (used to be rabbitmq), and the chef server api, all of which need to be deployed in multiple places hitting those replicated backends (the MQ might want to be treated like a back end component).

Then you'll want load balancing between the server api endpoints, then you'll want to have the webui and knife and chef client pointing at the load balancer.
If you heavily use the webui, deploy that in multiple places too, and load balance it.

For multiple datacenters, you'll want to get some kind of reliable replication for the backend components (solr, bookshelf, postgres), and have separate copies of the front and middle tiers in each DC pointing to the replicated back end.

It isn't an easy problem to solve, which is why opscode is hoping you'll pay for it :)

-Jesse

On Feb 15, 2013 9:16 AM, "Baruch Shpirer" < " target="_blank"> > wrote:

How would you go about creating the HA pair?

Some docs/drafts/pointers?

From: Adam Jacob [mailto: " target="_blank"> ]
Sent: Thursday, February 14/02/2013 20:13
To: " target="_blank">
Subject: [chef] Re: RE: Re: Re: Re: RE: Re: Chef11 HA

Yes – have an HA pair (or at least HA Backends, with multiple API front-ends) in each failure domain. Make each failure domain highly available, and make the system partition tolerant by enforcing that no writes ever need to cross the boundary.

Adam

From: Baruch Shpirer < " target="_blank"> >
Reply-To: " " target="_blank"> " < " target="_blank"> >
Date: Thursday, February 14, 2013 5:10 PM
To: " " target="_blank"> " < " target="_blank"> >
Subject: [chef] RE: Re: Re: Re: RE: Re: Chef11 HA

Can you define “to treat each as an isolated failure domain, make them HA”

From: Adam Jacob [ " target="_blank">mailto: ]
Sent: Thursday, February 14/02/2013 20:05
To: " target="_blank">
Subject: [chef] Re: Re: Re: RE: Re: Chef11 HA

We tend to recommend against this, as you are usually leaking both data and control across failure domains.

Think about it this way: when it fails, do you really want to add the increased latency? How about data replication when you are split brained? How do you fail back to being in multiple datacenters? Is one primary, the other passive?

The alternative is to treat each as an isolated failure domain, make them HA, and solve the consistency problem at the delivery of data level. It works much, much better.

Best,

Adam

From: Mark Pimentel < " target="_blank"> >
Reply-To: " " target="_blank"> " < " target="_blank"> >
Date: Thursday, February 14, 2013 9:56 AM
To: " " target="_blank"> " < " target="_blank"> >
Subject: [chef] Re: Re: RE: Re: Chef11 HA

Say this scenario is configured across sites, with each chef server serving different data centers. Would the keys be the same for both servers?

This would be used in a scenario where we have a main deployment chef whereby we would control all objects with the complementary servers replicating cookbook data as well as user and node information. The other servers would simply replicate back their node information.

On Thu, Feb 14, 2013 at 12:01 PM, Adam Jacob < " target="_blank"> > wrote:

Using DRBD for this is a good idea. If you share /var/opt/chef-server via
DRBD, you can use the normal mechanisms for starting/stopping the cluster,
and be certain you will have identical data.

Private Chef supports this configuration out of the box, fwiw, but it's
equally possible with Open Source Chef.

Best,
Adam

On 2/13/13 8:45 PM, "Baruch Shpirer" < " target="_blank"> > wrote:

>Is there any draft to the HA procedure/setup?
>
>Also, if I configure postgresql for master-master replication
>and use drbd for the bookshelf folder
>does it mean i got 2 identical servers in async mode?
>Will clients be using same validation public key in both sites?
>
>Baruch
>
>-----Original Message-----
>From: Seth Falcon [mailto: " target="_blank"> ]
>Sent: Monday, February 11/02/2013 17:35
>To: < " target="_blank"> >
>Subject: [chef] Re: Chef11 HA
>
>
>On Feb 11, 2013, at 1:29 PM, Vaidas Jablonskis wrote:
>
>> This might be slightly unrelated to this conversion, but what I wonder
>>what is stored in postgres database?
>
>All of the Chef object data is stored in the db. You can explore the
>schema a bit like this:
>
> :~# su - opscode-pgsql
>$ bash
> :~$ which psql
>/opt/chef-server/embedded/bin/psql
> :~$ psql opscode_chef psql (9.2.1)
>Type "help" for help.
>
>opscode_chef=# \d
> List of relations
> Schema | Name | Type | Owner
>--------+-------------------------------+----------+---------------
> public | checksums | table | opscode-pgsql
> public | clients | table | opscode-pgsql
> public | cookbook_version_checksums | table | opscode-pgsql
> public | cookbook_version_dependencies | view | opscode-pgsql
> public | cookbook_versions | table | opscode-pgsql
> public | cookbook_versions_by_rank | view | opscode-pgsql
> public | cookbooks | table | opscode-pgsql
> public | cookbooks_id_seq | sequence | opscode-pgsql
> public | data_bag_items | table | opscode-pgsql
> public | data_bags | table | opscode-pgsql
> public | environments | table | opscode-pgsql
> public | joined_cookbook_version | view | opscode-pgsql
> public | nodes | table | opscode-pgsql
> public | osc_users | table | opscode-pgsql
> public | roles | table | opscode-pgsql
> public | sandboxed_checksums | table | opscode-pgsql
> public | schema_info | table | opscode-pgsql
>(17 rows)
>
>And find the script used to initialize the schema here:
>https://github.com/opscode/chef_db/blob/master/priv/pgsql_schema.sql
>
>> What happens when this database gets corrupted or data is lost, for
>>instance?
>
>Bad things happen. If the db data is lost or corrupted, so is your Chef
>Server.
>
>+ seth
>
>
>
>

--
Thanks,

Mark

[chef] Re: Re: Re: Re: Chef11 HA, (continued)