General
ClusterMate project implements basic components of a multi-node storage system using single-node storage components provided by StoreMate project. It is designed as a toolkit for building distributed storage systems, either for stand-alone systems or as base for other distributed systems (like message queues).
ClusterMate
adds hash-based partitioning of content to implement replicated storage. Although most work is done by clients (base client library implementation included), additional timestamp-based synchronization mechanism is provided for content repair. Repair mechanism is also used when replacing failed nodes or expanding/shrinking cluster. Conflict resolution is configurable; details depend on higher-level system (TransiStore
, for example, has very simple model due to immutable contents).
Project license is Apache 2.0.
Check out Project Wiki for more.
Status
Project is complete to the degree that other systems can be built on it.
Publicly available projects that build on ClusterMate include:
- TransiStore is a distributes temporary data storage: useful for things like storing intermediate results for Map/Reduce systems, or for buffering large database result sets.
Features
Default storage system has following features:
- Data store
- Immutable: that is, write-once, can not modify
- No size limitations, storage works for data of any size, although may not be optimal for smallest payloads
- Metadata/data separation: metadata stored in a database (by StoreMate), data on filesystem or inlined (small entries)
- Automatic on-the-fly compression of data (not metadata)
- Access
- Clustered CRUD (PUT, GET/HEAD, DELETE)
- Streaming (for large data)
- On-the-fly compression/uncompression
- Hash-based routing
- Range queries within specified prefixes (that is, routing can use path prefixes)
- Distributed operation
- Configurable N-way replication and sharding
- Automatic server-to-server data repair to handle partial updates and transient outages
- Clients can choose subset of N for successful operations with lower latency
Sub-modules
Project is a multi-module Maven project. Sub-modules are can be grouped in following categories:
api
: Basic datatypes shared by client and service modulesjson
: Jackson converters for core datatypes from 'api' module and StoreMate or client-server and server-server communicationclient
:- single-node access components ("call")
- clustered-access ("operation")
- Specific sub-modules for different HTTP clients:
client-ahc
for implementation using Async HTTP Clientclient-jdk
that uses basic JDK-providedjava.net.HttpURLConnection
service
:- Configurable implementation of distributed store and matching entry points
dropwizard
- Deployable container around
service
that exposes CRUD endpoints, entry listing, building on DropWizard framework (which runs on Jersey+Jetty)