Friday, February 25, 2011

Introducing ElephantDB: a distributed database specialized in exporting data from Hadoop

ElephantDB is a database that specializes in exporting key/value data from Hadoop. We have been running it in production at BackType for over half a year now and are excited to be open-sourcing it. In this post, I'll introduce ElephantDB, show how to use it, and then compare it to other databases out there. ElephantDB is hosted on GitHub here.

Unlike most other databases, ElephantDB dissassociates the creation of a database index from the serving of that index. ElephantDB is comprised of two components. The first is a library that is used in a MapReduce job to create an indexed key/value dataset that is stored on a distributed filesystem. The second component, ElephantDB server, is a daemon that downloads a subset of a dataset and serves it in a read-only, random-access fashion. A group of ElephantDB servers working together to serve a full dataset is called a ring. Both the creation and serving of a dataset are done in a fully distributed fashion.

Posted via email from miner49r

No comments: