14 December 2009

Running TDB on a cloud storage system

Project Voldemort is an open-source (Apache2 license) distributed, scalable, fault-tolerant, key-value storage system for large scale data. Being a key-value store the only operations it provides are:

  • get(key) -> value
  • put(key, value)
  • delete(key)

Key and value can be various custom types but at the lowest level, they are arrays of bytes. Serialization schemes on top of byte arrays given structure but access is only via the key (so no filters or joins as part of the store. It's built for scale and speed, and fault tolerance.

TDB has internal APIS so that difefrentindexing scheme or different stroage technologies can be plugged in. A key-value store can be used as the storage layer for TDB.

There are two areas of storage needed: the node table (a 2-way mapping between the data making up the RDF terms and the associated, fixed size NodeId) and the indexes, which provide the matching of triple patterns. See the TDB Design Notes.

But a key-value store isn't an ideal backend. The node table is a pair of key-value stores because all that is needed is lookup between RDF term and the NodeId. The issues that arise are the granularity of access. TDB heavily caches the mapping in the query engine.

The indexes don't naturally map to key-value access because looking up a triple pattern results in all matches. There are (at least) two ways of doing this. Either store something like all PO pairs and use S as a key (a bit like Jena's memory model), or use the key-value store to hold part of a datastructure and access it like a disk.

TDB uses threaded B+Trees with a pluggable disk block layer (this is used to switch between 32 and 64 bit modes) so the key-value store a block storage is a simple fit. Because B+Trees store the entries in sorted order, caching means that a block probably contains all the PO for a given S if a look up is by S so these two schemes end up being similar even though at the design level they are quite different.

Both apects are relying on the query engine doing caching to work sensibly to compensate for the mismatch in requirements (triple match for joins) and interface granularity (for node access).

See also:

Does Project Voldemort work as storage for TDB? Yes, and with only a small amount of code. Not surprisingly, the performance is limited in this experimental system (e.g. storing invidiual RDF terms in the node table needs better management to avoid latency and overhead in the round trip to the remote table). Truncating to only the used space then compressing would be useful on teh indexes (see RDF-3X for an interesting compression scheme). But it's a workable scheme and the style of using a key-value store shows TDB can be ported to a wide variety of environments because key-value stores are currently a very active area - project Voldemort provides a cloud-centric stiorage fabric.

I've started putting experimental systems on github. This experiment is available in the TDB/V repository. These are not released, supported systems; they are the source code and development setup (for Eclipse usually). I used Project Voldemort release v0.57.