chef - Re: storing easily queried metadata

Subscribers: 1946
Owners
Bryan McLellan
Joshua Timberman
Nathen Harvey
Seth Chisamore
Serdar Sutay

Subscribe
Unsubscribe
Info
Archive

Post

RSS
Shared documents

General discussion about Chef

Re: storing easily queried metadata - was Re: some questions

From: Ian Kallen <spidaman.list@gmail.com>
To: chef@lists.opscode.com
Subject: Re: storing easily queried metadata - was Re: some questions
Date: Fri, 24 Apr 2009 09:10:27 -0700
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=b7SyRQSylP936VD/RDQW00yzM1yQYoSvymuHO4EDcPD+6O/9wf2Vrke4o1tqVrRx61 b2odMXbvymDgwOh6o3wdtunTEd2Odg4NgRR6VRZ6vjf16EKShRGnsNMzBBbcD0sU88eB 6BhUOLyX0cYgb7DtbUicf7mcic14InXDMWAZU=

I dunno, "couch orm" seems like a misnomer to me, there's no "r" in couchdb (it's not relational). And yea, multiple backends sounds like overkill, I'd much rather see new features in chef then a science project around an (seemingly gratuitous) abstraction for that. In general, I think the most common case access to the data is for composed objects (nodes, cookbooks, etc), de/marshalling overhead of ORM'ing the data isn't necessary for those cases so Adam's decision to use a document store makes a lot of sense. For simply querying by metadata elements, it seems to me that full text search (ferret, solr, etc) should be sufficient. However, if it's really import to have relational, transactional access, I would suggest considering a minimal number of rdbms tables to support that and treat the couch documents as the moral equivalent of a denormalized table to access composed the documents (similar to what friendfeed does http://bret.appspot.com/entry/how-friendfeed-uses-mysql though they explicitly relax transactional consistency to address web request latency concerns). -Ian

snacktime wrote:

On Thu, Apr 23, 2009 at 9:46 PM, David Lee <david.lee@kanji.com.au <mailto:david.lee@kanji.com.au>> wrote:

    Actually, before I go offering any more opinions, can I please ask
    what we're actually doing? In reading the responses, I realised I
    don't know enough about what's going on yet to offer very
    constructive input.

    *Please* correct me if i'm wrong, but it seems like we're
    basically wanting to:

    1) store node data records, which are a fairly large, arbitrarily
    structured nested hash (ruby / json), with string key/value pairs

    2) find node records which match some very simple criteria, and
    return the entire node data structure for matches

    3) store some additional metadata about nodes themselves, recipes,
    and other Chef classes / objects; these bits of metadata would be
    lightweight and possibly act like polymorphically associated
    ActiveRecord objects.

    Is this a reasonable summary?

I think that eventually you will want to attach attributes to a variety of chef objects, such as cookbooks, recipes, etc..

    If it is, would a decent native Ruby object database be a pretty
    reasonable thing to use as a backend? Does such a thing exist?

Not that I know of.

The problem with multiple backend stores is the lowest common denominator. If you are going to mix sql/non sql backends, you need a reference implementation to go by. It wouldn't be much work to drop in activerecord, but the minute I start using activerecord functionality that doesn't exist in the other backends, I've just broken chef for everyone not using activerecord.

Off the top of my head, I would probably pick one of the full featured orm's as a reference, and then define a subset of that orm's functionality that can be used in chef core. The goal being to use a reference implementation that has enough functionality to carry you forward. I suspect activerecord, datamapper, or couchrest would be good candidates.

The only other approach I can think of is to use couchdb OR sql, but not both, and then just pick the best orm and stick with it.

Since chef already uses couchdb, I think just using the best couchdb orm is the saner approach. Multiple backends of completely different types will lead to chaos.

If others care to chime in and come to some consensus, I'd be happy to start coding it up.
Chris

--
Ian Kallen
blog: http://www.arachna.com/roller/spidaman
tweetz: http://twitter.com/spidaman
vox: 415.505.5208

Re: some questions, (continued)