[chef] Re: Re: Re: Single centralized git repo vs. git repo per cookbook


Chronological Thread 
  • From: Jesse Nelson < >
  • To: chef < >
  • Cc: Ed Schwab < >, Wes Deviers < >
  • Subject: [chef] Re: Re: Re: Single centralized git repo vs. git repo per cookbook
  • Date: Mon, 10 Mar 2014 12:48:48 -0700

Similar to other stories I started with a mono repo, and quickly this became maintenance hell. Managing upstream forks, managing multiple committers. Incompatible merges. All of these things happened at one time or another. We used braid for a while then moved to using berkshelf.  Although there are things with berks i am not happy about, overall it has been easier to manage. 

IMO a cookbook is a software project with it's own lifecycle. As such it should be treated as it's own project. Managing it's versions deps and testing independently in it's own repo makes this easier. 

A major benefit to breaking out cookbooks was to unearth some hidden assumptions and tight coupling we had. Tooling for a cook repo is simple to automate with scripts, We use rake[1] tasks to keep a lot of the testing framework up to snuff by merging in a skeleton cookbook. 

Another benefit was re-use across groups. Breaking up cookbooks from a chef-repo allows every group in our large org to manage their infrastructure independently, but for all teams to funnel work back into the cookbooks. This promotes collaboration vs forking everything from the mono repo. Or one team owning every infra. 

Our workflow is one in which our CI emits tested cooks into a 'cookbook' server, and everyone uses that as a source for their berks. So that we can have confidence. Our CI versions[2] our cookbooks automatically, and in general we 'trust' versions. There are 'repo' and 'integration' tests that run based on the downstream cookbook jobs.

We are still actively trying to improve our pipeline especially around integration and release process. 

Jesse Nelson



On Mon, Mar 10, 2014 at 10:50 AM, steve . < " target="_blank"> > wrote:
We launched our big-company-wide Chef initiative with a single repo o' cookbooks.  The repo wasn't designed to any one particular purpose, but it contained a number of roles that were designed to exercise many of the cookbooks in it and yield a working service as a starting point.  We started with a couple dozen cookbooks.

Once the number of contributors to this repository went above five, managing contributions and releases became a big headache.

As soon as we were able to re-tune our CI to trigger off of individual repo changes as well as the master repo, we split the cookbooks out (using a shockingly short git filter-branch command and 'hub' to create the GHE repos in the cookbooks org) and left unified-repo forks a Cheffile as an upgrade path to split-repo.

(This was shortly before everyone decided Berkshelf was the future but procedurally generating a Berksfile is just as easy... the formats are quite similar! )

The benefits of this approach to the maintainer(s) of an individual cookbook should be pretty obvious - you have one clearly-defined issue / pull request queue, you don't have to worry too much about rebasing against a fast-moving repository, etc. ...

This has also made the mechanics of a "release" much easier - we update the pinned versions in that central Cheffile, pull together a change log and send out an e-mail once a month.  People who want to stay bleeding edge on a cookbook can pull in off-cycle versions if they want.

It's also more straightforward from a CI approach - each potential dependency is in its own repository with its own trigger, so it's a bit easier to constrain the scope of integration to just what's changed ... though of course it's still possible to do this in a unified repository.  

One CI area we haven't really explored enough internally is getting successfully-tested/released changes in one cookbook to trigger CI runs in dependent cookbooks, though.  (In the meantime, we're triggering manual builds in the days/hours before release)

In summary, three years ago it might have made sense to have everything in the same bucket, but I don't think that approach scales up to larger teams and/or higher frequencies of contributions. 



On Mon, Mar 10, 2014 at 8:03 AM, Morgan Blackthorne < " target="_blank"> > wrote:
Wanted to bump this thread and see if anyone else had further feedback on this...


On Friday, February 21, 2014, Morgan Blackthorne < " target="_blank"> > wrote:
This is actually something that we've been discussing at my workplace. Right now, we have one master repo for all of our cookbooks, each in their own subdirectory. Bamboo is polling this repo and will execute a Rake task on updates to push out new changes via Berkshelf, where the cookbooks are listed using the 'rel' tag (assuming it passes knife cookbook test on each of the cookbooks). We also have a secondary scheduled job that runs foodcritic/rubocop and reports on the results.

Given that we're only using this repo for our own internal cookbooks which are too specific to be of any use to anyone else (even if Legal would allow us to share them), what are the pros/cons of this approach? It seems like we would lower git contention between members of our team if we broke them out into different repos, but I'm not sure how we would then refactor the CI jobs. One thing I like about this approach as that the only thing we have to do in regards to CI is just to add the new cookbook to the Berksfile and it just works. If we set up Bamboo to monitor multiple repos, that increases the chance that someone will add a new cookbook and forget to monitor that new repo in both Bamboo jobs (pushing and linting). Not to mention that it complicates the jobs themselves which now have to pull in multiple repos-- Berks will handle that fine, but knife cookbook test will need them all checked out to execute, as will foodcritic/rubocop on the linting side, and I definitely like the acceptance criteria of passing knife before being pushed with berks. I don't like the thought of pushing it up to the chef server with potentially broken ruby code.

Now, we could do per-repo Rakefile/Berksfile setups, but that increases the overhead of setting up a new cookbook. And the idea of having 20+ jobs in Bamboo, each for their own cookbook, seems wrong to me.

Thoughts?

--
~*~ StormeRider ~*~

"Every world needs its heroes [...] They inspire us to be better than we are. And they protect from the darkness that's just around the corner."

(from Smallville Season 6x1: "Zod")

On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS


On Fri, Feb 21, 2014 at 2:46 PM, Pete Cheslock < > wrote:
Just like the choice between using which configuration management.  My vote is to pick one and go.  Starting with a single repo is the easiest to get started for beginners.  And as you scale you can split out into separate cookbooks.  




On Fri, Feb 21, 2014 at 5:36 PM, Booker Bense < > wrote:
I doubt there's a hard and fast rule to apply to all situations, but there has been a lot of experience with using a single repo for the entire set of chef cookbooks. That was more or less the default recommendation 3 years ago. Almost everyone that started there has changed to a repo per cookbook. 

At this point I think you have to have a really strong reason not to use a repo per cookbook. Or at least a repo per cookbook suite ( a set of related cookbooks that have interdependencies. )

Having a separate repo for each cookbook will make automated testing easier and it also imposes some discipline on creating dependencies. Automated config management is a powerful amplifier, but unfortunately it amplifies stupid just
as fast as clever. The more testing you do the better, and at this point the tools are there to make TDD part of your Chef
workflow. 

- Booker C. Bense 



On Fri, Feb 21, 2014 at 2:14 PM, Alex Myasnikov < > wrote:

Ohai Chefs,

 

I am trying to understand what advantages (and disadvantages if any?) are there in having a git repo per each cookbook in the chef-repo as opposed to having all of one’s application cookbooks in a single git repo.

 

Up to this point I was thinking of a single repo containing all cookbooks (minus community ones managed by Berkshelf), however I came across a few references (below) that mentioned having git repo per cookbook. It seems like the latter helps CI, but I am not sure how exactly and what tangible benefits are there and what potential tradeoffs are. Is having a repo per each cookbook that’s developed constitutes a best practice?

 

First reference is from last year’s ChefConf presentation in Getting More Chefs in the Kitchen - Andrew Gross  (Slide depicting master repo consisting of individual repos per cookbook)

 

And then Nathen Harvey’s blog post on MVT had this snippet:

  1. gem install foodcritic
  2. Go to Travis CI and follow the Sign In link at the top.
  3. Activate the GitHub Service Hook for your cookbook’s repository from your TravisCI profile page. Each of your cookbooks has its own repository, right?!

http://technology.customink.com/blog/2012/06/04/mvt-foodcritic-and-travis-ci/

 

Setup:

 

Chef Server 11

Berkshelf 2.X

 

Thanks in advance.<



--
--
~*~ StormeRider ~*~

"Every world needs its heroes [...] They inspire us to be better than we are. And they protect from the darkness that's just around the corner."

(from Smallville Season 6x1: "Zod")

On why I hate the phrase "that's so lame"... http://bit.ly/Ps3uSS






Archive powered by MHonArc 2.6.16.

§