[chef] Re: ideas on testing a clustered application w/ Test-Kitchen


Chronological Thread 
  • From: John Dewey < >
  • To:
  • Subject: [chef] Re: ideas on testing a clustered application w/ Test-Kitchen
  • Date: Tue, 7 May 2013 01:31:11 -0700

Would also be nice if the runner doesn't use chef-solo.  I like where Vegabond is
going.  The ability to truly do integration testing by spinning up dependent systems/services,
and test the cookbook in isolation is needed.

I have abandoned test kitchen all together in favor of chefspec, since I can stub/mock
dependencies that chef-solo cannot resolve.  Looking forward to you work, so we can
add test kitchen to the openstack cookbooks, and get those much needed integration
tests.

For example, I have a use-cases were a cookbook depends on rabbit, mysql, memcached to be
running, and those roles registered into the chef server.  At that point the prerequisites
are met and the cookbook can install recipes, and perform assertions.  This is where I
like the Vegabond approach.

John

On Tuesday, May 7, 2013 at 1:10 AM, Bryan Berry wrote:

I have some ideas on extending test-kitchen to test clustered
applications and I would love some feedback before I go coding off in
a particular direction.

Problem: I deal primarily with distributed applications and testing
the related cookbooks can be a pain. I also have to make sure these
cookbooks work across different linux distros. Test-Kitchen was not
originally created with this use case in mind though at this time I
don't see any reason it couldn't support this use case.

Vagabond[1], written by Chris Roberts, extends .kitchen.yml to include
a clusters component, among other things. I would love to see
test-kitchen absorb some of that functionality or at least provide
extension points to make this more easily pluggable.

Testing a cluster works differently than testing an individual node. I
want to interrogate the state of the cluster as a whole, not look
inside each individual server. To do this I need to wait until all
servers in a cluster converge or at least a quorum of them do. Once a
quorum of nodes has converged, run a series of tests against the the
cluster. These tests execute on the client executing `kitchen` rather
than inside the nodes of the cluster.

Here are the steps in brief:
1. Converge all nodes in a cluster
2. Wait for quorum of nodes to converge
3. execute tests against the cluster

I would put the tests for a cluster in my_cluster/test/cluster/cluster_name

Applications like Elasticsearch, Zookeeper, or Cassandra don't have a
master node, so each node has an identical run_list and attribute set

clusters:
default:
- member: zk1
- member: zk2
- member: zk3

platforms:
- name: ubuntu-12.04
- name: centos-6.3

suites:
- name: default
run_list: [ "recipe[zookeeper]" ]

To test the default cluster on CentOS, `kitchen test --cluster
default --platform centos-6.3 `

Let's make this even DRYer

clusters:
default:
node_count: 3
quorum: 2

platforms:
- name: ubuntu-12.04
- name: centos-6.3

suites:
- name: default
run_list: [ "recipe[zookeeper]" ]

The test for this example zookeeper cluster would connect to one
zookeeper node, make sure it sees the other zookeeper nodes. I am
paraphrasing the zookeeper code because I am not certain of what the
actual call would be.

my_cluster/test/cluster/default/check_members.rb

require 'zk'

describe "zookeeper cluster" do
before(:all) do
@zk = ZK.new(some_ip)
end

it "sees the other members" do
peers = @zk.get("system/peers")
peers.should == ACTUAL_PEERS
end

end

The primary challenge here is resolving the names of the members of
the cluster. One way to do this is to access the statefiles for the
nodes in .kitchen/*.yml. Another would be to somehow access the
@instances array for the current Kitchen config. Yet another option
would be to stand up a chef-zero server.

What about a distributed app where each node does not have an
identical run_list? Here is how I would handle that for something like
HBase that has a "hmaster" that stores metadata for the whole cluster.

clusters:
default:
- member: head1
run_list: [ "recipe[hbase::hmaster]" ]
- member: store1
run_list: [ "recipe[hbase::data_node]" ]
- member: store2
run_list: [ "recipe[hbase::data_node]" ]

platforms:
- name: ubuntu-12.04
- name: centos-6.3

suites:
- name: default
run_list: [ "recipe[zookeeper]" ]



I know that test-kitchen is an unopinionated tool and attempts to be
workflow agnostic but I feel that what I have shown here is a fairly
simple workflow. The cluster definition I have presented is not
suitable for modeling failover in a cluster or specific states. At
that point one should use a custom workflow tool like chef-workflow or
simply custom rake tasks.

Depending on the feedback I get from this I plan to extend
test-kitchen or Vagabond to handle the workflow I have described here.
Thanks for reading!






Archive powered by MHonArc 2.6.16.

§