[chef] Re: Re: Re: Single centralized git repo vs. git repo per cookbook


Chronological Thread 
  • From: Igor Serebryany < >
  • To:
  • Cc: Ed Schwab < >, Wes Deviers < >
  • Subject: [chef] Re: Re: Re: Single centralized git repo vs. git repo per cookbook
  • Date: Mon, 10 Mar 2014 12:24:36 -0700

i've probably advocated for this approach too many times on this list, but Airbnb uses a single-repo approach and it has worked great and scaled even better. the advantages for the SREs supporting Airbnb is:

* we can see what changes are coming in various cookbooks all in one place
* we can ensure that our various cookbooks remain up to standard
* we don't need any additional tooling beyond git and a bash script (no berkshelf or chef-server for instance)
* we don't have to worry about versioning anything
* it's very easy to do rollbacks (reverts) if something goes wrong

this as scaled from the two engineers to first introduced chef to basically every engineer at the company. our chef repo has 50+ contributors and last week the number was 20:

Inline image 1

we've described our approach pretty comprehensively in a blog post here: http://nerds.airbnb.com/making-breakfast-chef-airbnb/

from a CI perspective, we've created a docker framework which can figure out which roles are affected based on which cookbooks have changed and then run tests on docker instances running those roles. a single-repo approach makes it much easier to build this dependency graph. this is not quite ready for prime-time, but we're working on open-sourcing it when we can.

the biggest drawback for me has been working with open-source cookbooks; for instance, i have to constantly port changes between our internal version of the smartstack cookbook and the open-source version https://github.com/airbnb/smartstack-cookbook ; probably some tooling can help here, but this doesn't happen often enough so i haven't investigated making it better.

alas, we didn't get a chance to give a talk about this approach at the upcoming chefconf, but feel free to ping me directly if you have any questions.


On Mon, Mar 10, 2014 at 10:50 AM, steve . < " target="_blank"> > wrote:
We launched our big-company-wide Chef initiative with a single repo o' cookbooks.  The repo wasn't designed to any one particular purpose, but it contained a number of roles that were designed to exercise many of the cookbooks in it and yield a working service as a starting point.  We started with a couple dozen cookbooks.

Once the number of contributors to this repository went above five, managing contributions and releases became a big headache.

As soon as we were able to re-tune our CI to trigger off of individual repo changes as well as the master repo, we split the cookbooks out (using a shockingly short git filter-branch command and 'hub' to create the GHE repos in the cookbooks org) and left unified-repo forks a Cheffile as an upgrade path to split-repo.

(This was shortly before everyone decided Berkshelf was the future but procedurally generating a Berksfile is just as easy... the formats are quite similar! )

The benefits of this approach to the maintainer(s) of an individual cookbook should be pretty obvious - you have one clearly-defined issue / pull request queue, you don't have to worry too much about rebasing against a fast-moving repository, etc. ...

This has also made the mechanics of a "release" much easier - we update the pinned versions in that central Cheffile, pull together a change log and send out an e-mail once a month.  People who want to stay bleeding edge on a cookbook can pull in off-cycle versions if they want.

It's also more straightforward from a CI approach - each potential dependency is in its own repository with its own trigger, so it's a bit easier to constrain the scope of integration to just what's changed ... though of course it's still possible to do this in a unified repository.  

One CI area we haven't really explored enough internally is getting successfully-tested/released changes in one cookbook to trigger CI runs in dependent cookbooks, though.  (In the meantime, we're triggering manual builds in the days/hours before release)

In summary, three years ago it might have made sense to have everything in the same bucket, but I don't think that approach scales up to larger teams and/or higher frequencies of contributions.



On Mon, Mar 10, 2014 at 8:03 AM, Morgan Blackthorne < " target="_blank"> > wrote:
Wanted to bump this thread and see if anyone else had further feedback on this...


On Friday, February 21, 2014, Morgan Blackthorne < " target="_blank"> > wrote:
This is actually something that we've been discussing at my workplace. Right now, we have one master repo for all of our cookbooks, each in their own subdirectory. Bamboo is polling this repo and will execute a Rake task on updates to push out new changes via Berkshelf, where the cookbooks are listed using the 'rel' tag (assuming it passes knife cookbook test on each of the cookbooks). We also have a secondary scheduled job that runs foodcritic/rubocop and reports on the results.

Given that we're only using this repo for our own internal cookbooks which are too specific to be of any use to anyone else (even if Legal would allow us to share them), what are the pros/cons of this approach? It seems like we would lower git contention between members of our team if we broke them out into different repos, but I'm not sure how we would then refactor the CI jobs. One thing I like about this approach as that the only thing we have to do in regards to CI is just to add the new cookbook to the Berksfile and it just works. If we set up Bamboo to monitor multiple repos, that increases the chance that someone will add a new cookbook and forget to monitor that new repo in both Bamboo jobs (pushing and linting). Not to mention that it complicates the jobs themselves which now have to pull in multiple repos-- Berks will handle that fine, but knife cookbook test will need them all checked out to execute, as will foodcritic/rubocop on the linting side, and I definitely like the acceptance criteria of passing knife before being pushed with berks. I don't like the thought of pushing it up to the chef server with potentially broken ruby code.

Now, we could do per-repo Rakefile/Berksfile setups, but that increases the overhead of setting up a new cookbook. And the idea of having 20+ jobs in Bamboo, each for their own cookbook, seems wrong to me.

Thoughts?

--
~*~ StormeRider ~*~

"Every world needs its heroes [...] They inspire us to be better than we are. And they protect from the darkness that's just around the corner."

(from Smallville Season 6x1: "Zod")

On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS


On Fri, Feb 21, 2014 at 2:46 PM, Pete Cheslock < > wrote:
Just like the choice between using which configuration management.  My vote is to pick one and go.  Starting with a single repo is the easiest to get started for beginners.  And as you scale you can split out into separate cookbooks.  




On Fri, Feb 21, 2014 at 5:36 PM, Booker Bense < > wrote:
I doubt there's a hard and fast rule to apply to all situations, but there has been a lot of experience with using a single repo for the entire set of chef cookbooks. That was more or less the default recommendation 3 years ago. Almost everyone that started there has changed to a repo per cookbook. 

At this point I think you have to have a really strong reason not to use a repo per cookbook. Or at least a repo per cookbook suite ( a set of related cookbooks that have interdependencies. )

Having a separate repo for each cookbook will make automated testing easier and it also imposes some discipline on creating dependencies. Automated config management is a powerful amplifier, but unfortunately it amplifies stupid just
as fast as clever. The more testing you do the better, and at this point the tools are there to make TDD part of your Chef
workflow. 

- Booker C. Bense 



On Fri, Feb 21, 2014 at 2:14 PM, Alex Myasnikov < > wrote:

Ohai Chefs,

 

I am trying to understand what advantages (and disadvantages if any?) are there in having a git repo per each cookbook in the chef-repo as opposed to having all of one’s application cookbooks in a single git repo.

 

Up to this point I was thinking of a single repo containing all cookbooks (minus community ones managed by Berkshelf), however I came across a few references (below) that mentioned having git repo per cookbook. It seems like the latter helps CI, but I am not sure how exactly and what tangible benefits are there and what potential tradeoffs are. Is having a repo per each cookbook that’s developed constitutes a best practice?

 

First reference is from last year’s ChefConf presentation in Getting More Chefs in the Kitchen - Andrew Gross  (Slide depicting master repo consisting of individual repos per cookbook)

 

And then Nathen Harvey’s blog post on MVT had this snippet:

  1. gem install foodcritic
  2. Go to Travis CI and follow the Sign In link at the top.
  3. Activate the GitHub Service Hook for your cookbook’s repository from your TravisCI profile page. Each of your cookbooks has its own repository, right?!

http://technology.customink.com/blog/2012/06/04/mvt-foodcritic-and-travis-ci/

 

Setup:

 

Chef Server 11

Berkshelf 2.X

 

Thanks in advance.<



--
--
~*~ StormeRider ~*~

"Every world needs its heroes [...] They inspire us to be better than we are. And they protect from the darkness that's just around the corner."

(from Smallville Season 6x1: "Zod")

On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS






Archive powered by MHonArc 2.6.16.

§