Report bugs | Sign in
Powered by Melange
Release 0-6-20100201
Last modified on 2009-04-14 01:33:22.259090 by Randall Leeds

Partitioning and Clustering Support for CouchDB

Proposal for Google Summer of Code 2009

Randall Leeds <randall.leeds at gmail>

 

Abstract

The CouchDB web site describes the project as "a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API." While distributed deployments have been achieved with the help of proxies and smart external scripting, the core of CouchDB itself does not currently support distributing the database across multiple machines. This proposal presents a plan for implementing changes to the CouchDB core and for writing additional modules to directly support distributed deployments.

 

Goals

  1. Increase read and write throughput
  2. Increase total storage capacity
  3. Maintain compatibility by keeping the external API of the system consistent with that for a single node deployment
  4. Allow configurable topology to let administrators optimize for their needs

 

Motivation

In the current era of internet-scale computing it is too much to expect that a single database server can adequately handle all the load of thousands of concurrent requests and updates. However, as more servers and disks are added the task of balancing the load and maintaining a consistent view of the data becomes more difficult, typically requiring costly synchronization between nodes.

 

CouchDB is uniquely positioned to take a strong lead in scalable, reliable database systems because it is developed in the Erlang[1] language and utilizes the Open Telecom Platform. Both the core Erlang language and the Open Telecom Platform libraries support the development of scalable, reliable systems using a clear on consistent paradigm for communication between nodes in a distributed environment.

 

Discussion

In recent months, the CouchDB community has been growing rapidly and one feature that is often requested is built in support for configuring CouchDB servers in a cluster. Blog posts have discussed ways to cluster CouchDB using a separate HTTP proxy such an nginx[2], including work recently released as open source as couchdb-lounge[3]. However, no supporting functionality exists in the core of CouchDB even as clustering and partitioning continue to be commonly requested features.

 

The proposed project aims to provide the initial support for these features. While the two terms are often used interchangeably, a distinction will be made here between the orthogonal features of 'clustering' and 'partitioning'. Clustering is the duplication of an entire CouchDB server to provide load-balancing or hot fail-over and increased reliability. Partitioning is the practice of partitioning the storage space into shards and distributing the shards across servers to provide increased throughput, e.g. when disk performance becomes a significant bottleneck. The project presented herein aims to support both features.

 

Deliverables

  1. Consistent hashing Erlang module
  2. Fast CouchDB proxy using Erlang message passing
  3. Quantitative evaluation of performance impact for common deployment topologies
  4. An API for inspecting per-node cluster health and statistics
  5. Refactoring of the core CouchDB server to cleanly support items 1-3

At the time of writing, the CouchDB community is not ready to settle on the details of implementation required to execute this proposal. As a result, much of the difficulty in this project will be to drive community discussion toward consensus on a model which best supports the diverse needs of the CouchDB community.

 

Example of Basic Tests

couchTests.partition_basics = function(debug) {
var result = JSON.parse(CouchDB.request("GET", "/").responseText);
T(result.couchdb == "Welcome");

var db = new CouchDB("test_suite_db");
...

//PUT _design doc with partition schema
//Say just two partitions

//Verify three partitions
T(db.info().partitions == 2);

//Store two new documents
var docA = {_id:"0",a:1}; //Assume for now we know this shards to node A
var docB = {_id:"1000",b:1}; //...and to node B
var resultA = db.save(docA);
... //verify
var resultB = db.save(docB);
... //verify

//Get the partition information
var uriA = [resultA.partition.node_uri, resultA.partition.dbname, resultA.id].join("/");
var uriB = [resultB.partition.node_uri, resultB.partition.dbname, resultB.id].join("/");
var BonA = [resultA.partition.node_uri, resultA.partition.dbname, resultB.id].join("/");

//Test bad store of B on A
docB.rev = resultB.rev;
xhr = CouchDB.request("PUT", BonA, {body: JSON.stringify(docB)});

T(xhr.status == 403); //Forbidden: Can't store it on the wrong partition
TEquals(uriB, xhr.responseJSON.redirect, "bad store should return correct url");

//Test bad load
xhr = CouchDB.request("GET", BonA);
T(xhr.status == 301); //Moved permanently
//Make sure it resides at the other node
T(xhr.getResponseHeader("Location") == uriB);

//...etc... <to be determined>

Approximate Timeline

  • April 20 - Community Bonding - High-Level Preliminary Discussion (has already begun)
  • May 23 - API polishing for consistent hash and proxy modules
  • June 1 - Work on consistent hash and proxy modules
  • June 22 - Profiling, experimenting and discussing implementation details of the clustering/partitioning components
  • June 29 - Identify testing plan and build tests to guide and stabalize feature development
  • July 1 - Begin core refactoring work
  • July - Rolling checkpoints, tested incrementally, with fluid, regular integration
  • August 17 - 'Pencils Down' date. Breathe, rejoice, keep contributing?

 

Other Obligations

I will be enrolled in one class outside of the computer science field from June 22 through August 7. I have no other obligations and can dedicate full-time effort towards the project goals.

 

Biography

I am currently a student in my fourth year of study at Brown University in Rhode Island, USA persuing a Bachelor of Science degree in Computer Science. My primary interests include highly parallel, distributed systems, real-time systems, and computer networking. Outside of academics (but within the field of computing), I am constantly inspired by the open source model of development and the commons-based peer production communities that drive and define it. I have been involved as a small contributor to various open source projects since 2000, including the Atom/RSS filtering site Melkjug[4], the Demeter terrain rendering engine[5], the DuskRPG game engine[6] and the Python client library for CouchDB[7]. I look forward to seeing the ideals of free software expanded to other industries to improve efficiency in collaboration, and particularly applications of free software to improve government transparency.

 

Outside of my field of study, I maintain an interest in digital signal processing, electronically mediated music composition, production and performance, alpine skiing and dancing. Further information about me, including a recent CV and links to my various public profiles can most reliably be found at http://claimid.com/randallleeds.

 

References

[1] http://erlang.org/

[2] http://nginx.net/

[3] http://code.google.com/p/couchdb-lounge/

[4] http://www.openplans.org/projects/melkjug/project-home

[5] http://sourceforge.net/projects/demeter/

[6] http://sourceforge.net/projects/duskrpg

[7] http://code.google.com/p/couchdb-python/