[chef-dev] Re: Qustions about Chef searching module(couchDB/solr)


Chronological Thread 
  • From: Ranjib Dey < >
  • To: Xiaoding Bian Cloud Services RANDD - 72410 < >
  • Cc: " " < >
  • Subject: [chef-dev] Re: Qustions about Chef searching module(couchDB/solr)
  • Date: Sun, 4 Aug 2013 14:07:18 -0700

search(:node, "name:node01")" and "node[:name]="node01", node.save
this will not work from reipes, as nodes run with readonly/non-admin api clients, which can not save other nodes , except themselves. If they try to save other nodes, it will bork (or you have to run chef in that node with an admin api client, which is a bad idea). You can of course do these things from knife (which uses your [dev/ops] api client , which normally have admin privileges).

1) Chef client does not read from couchdb. Chef client only communicate with chef server over http(s). Chef server in turn talk to couchdb. When chef server put stuff inside the couchdb, it also publish a job for indexing it. A separate daemon (chef expander) picks up the job abd post it back to solr (via another daemon called chef-solr, which is wrapper around standard solr).
2) whole databag iirc , but im not very confident on that, you need to confirm it..
3) Yes, couchdb is pretty disk hungry. Also older chef server ship with a solr setting that will limit the data you can index, which surfaces as silent failures.

I'll strongly advise against using the couchdb stack, its not maintained anymore. If you are not aware chef 11 server is rewritten in Erlang , and the persistent store is now postgres. Together they provide much better performance that the couch /ruby-merb stack.

Even with Chef 11 server , i dont think searching after every 10 seconds is a great idea. Not because of performance concerns, its more about the stale data. You can end up in getting data that are invalid or does not contain the latest node information. You can easily design an eventually consistent infra on top of chef, but remember the minimum time gap for possible stale data is proportional to your chef run intervals. Which is fine if you are dealing with load balancer configuration or database config generation etc, but not ideal for doing fail over. You should off load those things to a more fault tolerant, real time system. Ideally we would like to reuse the Chef scripts, its DSL (resource, providers, notifications etc) but i'm  not aware of any such things.

best
ranjib



On Sun, Aug 4, 2013 at 7:47 AM, Xiaoding Bian Cloud Services RANDD - 72410 < " target="_blank"> > wrote:
Hi chef-devs:
    I am an engineer from VMware, recently we met a problem when using Chef's searching feature, so I'd like to know if 
anyone can share some knowledge/details about searching module's implementation(we are using Chef-0.10.8). 
   Basically, Chef provides two operations like  "search(:node, "name:node01")" and "node[:name]="node01", node.save".
   I'd like to know when executing node.save or search function, what exact actions chef-server will do, for example:
   (1)When and How does Chef read data from couchdb and update solr indexes?
   (2)when updating solr indexes, does Chef fetch the whole data bag or just the data changed recently.
   (3)If we tend to run search function very frequently, every 10 seconds for example, and the databag maintains hundreds 
       of records, is there potential performance concerns about chef-server/couchdb/solr from your perspective?

Thanks a lot,
-bxd




Archive powered by MHonArc 2.6.16.

§