General Overview

OlegDB is a concurrent, pretty fast K/V hash-table with an Erlang frontend. It uses the Murmur3 hashing algorithm to hash and index keys. We chose Erlang for the server because it's functional, uses the actor model and the pattern matching is ridiculous.

Installation

Installing OlegDB is pretty simple, you only need a POSIX compliant system, make, gcc/clang (thats all we've tested) and Erlang. You'll also need the source code for Oleg.

Once you have your fanciful medley of computer science tools, you're ready to dive into a lengthy and complex process of program compilation. Sound foreboding? Have no fear, people have been doing this for at least a quarter of a century.

I'm going to assume you've extracted the source tarball into a folder called ~/src/olegdb and that you haven't cd'd into it yet. Lets smash some electrons together:

$ cd ~/src/olegdb
$ make
$ sudo make install

If you really wanted to, you could specify a different installation directory. The default is /usr/local. You can do this by setting PREFIX

$ sudo make PREFIX=/usr/ install

Actually running OlegDB and getting it do stuff after this point is trivial, if your installation prefix is in your PATH you should just be able to run something like the following:

$ olegdb <data_directory>

...where <data_directory> is the place you want Oleg to store persistent data information. Make it /dev/null if you want, I don't care. You can also specify IP/port information from the commandline:

$ olegdb /tmp 1978 #Starts OlegDB listening on port 1978
$ olegdb /tmp 0.0.0.0 1337 #Starts OlegDB listening on the 0.0.0.0 IP, with port 1337
$ olegdb /tmp data.shithouse.tv 666 #Hostnames work too

Getting Started

Communicating with OlegDB is done via a pretty simple REST interface. You POST to create/update records, GET to retrieve them, DELETE to delete, and HEAD to get back some information about them. Probably.

For example, to store the value Raphael into the named database turtles under the key red you could use something like the following:

$ curl -X POST -d 'Raphael' http://localhost:8080/turtles/red

Retrieving data is just as simple:

$ curl http://localhost:8080/turtles/red

Deleting keys can be done by using DELETE:

$ curl -X DELETE http://localhost:8080/turtles/red

You can also tell Oleg what the Content-Type is:

$ curl -X POST -H "Content-Type: text/html" -d '<p>Raphael</p>' http://localhost:8080/turtles/red

OlegDB supports lazy key expiration. You can specify an expiration date by setting the X-OlegDB-use-by header to a UTC POSIX timestamp .


$ curl -X POST \
-H "X-OlegDB-use-by: $(date +%s)" \
-H "Content-Type: application/json" \
-d '{turtle: "Johnny", age: 34}' http://localhost:8080/turtles/Johnny
> POST /turtles/Johnny HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0
> Host: localhost:8080
> Accept: */*
> X-OlegDB-use-by: 1394323192
> Content-Type: application/json
> Content-Length: 27
> 
* upload completely sent off: 27out of 27 bytes
< HTTP/1.1 200 OK
< Server: OlegDB/fresh_cuts_n_jams
< Content-Type: text/plain
< Connection: close
< Content-Length: 7
<
無駄

$ curl -v http://localhost:8080/turtles/Johnny
> GET /turtles/Johnny HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Status: 404 Not Found
< Server: OlegDB/fresh_cuts_n_jams
< Content-Length: 26
< Connection: close
< Content-Type: text/plain
<
These aren't your ghosts.

As you can hopefully tell, the POST succeeds and a 200 OK is returned. We used the bash command `date +%s` which returns a timestamp. Then, immediately trying to access the key again results in a 404, because the key expired.

If you want to retrieve the expiration date of a key, you can do so by sending HEAD:


$ curl -v -X HEAD http://localhost:8080/turtles/Johnny
> HEAD /turtles/Johnny HTTP/1.1
> User-Agent: curl/7.35.0
> Host: localhost:8080
> Accept: */*
>
< HTTP/1.1 200 OK
* Server OlegDB/fresh_cuts_n_jams is not blacklisted
< Server: OlegDB/fresh_cuts_n_jams
< Content-Length: 0
< Content-Type: application/json
< Expires: 1395368972
<

What the hell is up with your responses?

We have fun with our HTTP responses. Really all you need is the HTTP status code to see if something worked or not. 404 means not found, 200 means the operation completed successfully, 500 if something bad happened, etc.

liboleg

Macros

VERSION

#define VERSION "0.1.0"

The current version of the OlegDB.

KEY_SIZE

#define KEY_SIZE 250

The hardcoded upperbound for key lengths.

HASH_MALLOC

#define HASH_MALLOC 65536

The size, in bytes, to allocate when initially creating the database. ol_bucket pointers are stored here.

PATH_LENGTH

#define PATH_LENGTH 256

The maximum length of a database's path.

DB_NAME_SIZE

#define DB_NAME_SIZE 64

Database maximum name length.

DEVILS_SEED

#define DEVILS_SEED 666

The seed to feed into the murmur3 algorithm.

Type Definitions

ol_val

typedef unsigned char *ol_val;

Typedef for the values that can be stored inside the database.

Enums

ol_feature_flags

typedef enum {
    OL_F_APPENDONLY     = 1 << 0,
    OL_F_SEMIVOL        = 1 << 1,
    OL_F_REGDUMPS       = 1 << 2
} ol_feature_flags;

Feature flags tell the database what it should be doing.

OL_F_APPENDONLY: Enable the append only log

OL_F_SEMIVOL: Tell servers that it's okay to fsync every once in a while

OL_F_REGDUMPS: Tell servers to snapshot the data using ol_save() regularly

ol_state_flags

typedef enum {
    OL_S_STARTUP        = 0,
    OL_S_AOKAY          = 1
} ol_state_flags;

State flags tell the database what it should be doing.

OL_S_STARTUP: The DB is starting, duh.

OL_S_AOKAY: The database is a-okay

Structures

ol_bucket

typedef struct ol_bucket {
    char              key[KEY_SIZE]; /* The key used to reference the data */
    size_t            klen;
    char              *content_type;
    size_t            ctype_size;
    ol_val            data_ptr;
    size_t            data_size;
    uint32_t          hash;
    struct ol_bucket  *next; /* The next ol_bucket in this chain, if any */
    struct tm         *expiration;
} ol_bucket;

This is the object stored in the database's hashtable. Contains references to value, key, etc.

key[KEY_SIZE]: The key used for this bucket.

klen: Length of the key.

*content_type: The content-type of this object. Defaults to "application/octet-stream".

ctype_size: Length of the string representing content-type.

data_ptr: Location of this key's value.

data_size: Length of the value in bytes.

hash: Hashed value of this key.

next: Collisions are resolved via linked list. This contains the pointer to the next object in the chain, or NULL.

expiration: The POSIX timestamp when this key will expire.

ol_database

typedef struct ol_database {
    void      (*get_db_file_name)(struct ol_database *db,const char *p,char*);
    void      (*enable)(int, int*);
    void      (*disable)(int, int*);
    bool      (*is_enabled)(int, int*);
    char      name[DB_NAME_SIZE];
    char      path[PATH_LENGTH];
    char      *dump_file;
    char      *aol_file;
    FILE      *aolfd;
    int       feature_set;
    short int state;
    int       rcrd_cnt;
    int       key_collisions;
    time_t    created;
    size_t    cur_ht_size;
    ol_bucket **hashes;
} ol_database;

The object representing a database.

get_db_file_name: A function pointer that returns the path/name.db to reduec code duplication. Used for writing and reading of dump files.

enable: Helper function to enable a feature for the database instance passed in.

disable: Helper function to disable a database feature.

is_enabled: Helper function that checks weather or not a feature is enabled.

name: The name of the database.

path[PATH_LENGTH]: Path to the database's working directory.

dump_file: Path and filename of db dump.

aol_file: Path and filename of the append only log.

aolfd: Pointer of FILE type to append only log.

feature_set: Bitmask holding enabled/disabled status of various features. See ol_feature_flags.

state: Current state of the database. See ol_state_flags.

rcrd_cnt: Number of records in the database.

key_collisions: Number of key collisions this database has had since initialization.

created: Timestamp of when the database was initialized.

cur_ht_size: The current amount, in bytes, of space allocated for storing ol_bucket objects.

**hashes: The actual hashtable. Stores ol_bucket instances.

ol_meta

typedef struct ol_meta {
    time_t uptime;
} ol_meta;

Structure used to record meta-information about the database.

Functions

ol_open

ol_database *ol_open(char *path, char *name, int features);

Opens a database for use.

*path: The directory where the database will be stored.

*name: The name of the database. This is used to create the dumpfile, and keep track of the database.

features: Features to enable when the database is initialized. ORd.

Returns: A new database object.

ol_close

int ol_close(ol_database *database);

Closes a database cleanly, frees memory and makes sure everything is written.

*database: The database to close.

Returns: 0 on success, 1 if not everything could be freed.

ol_close_save

int ol_close_save(ol_database *database);

Dumps and closes a database cleanly, frees memory and makes sure everything is written.

*database: The database to close.

Returns: 0 on success, 1 if not everything could be freed.

ol_unjar

ol_val ol_unjar(ol_database *db, const char *key, size_t klen);

Unjar a value from the mayo. Calls ol_unjar_ks with a dsize of null.

*db: Database to retrieve value from.

*key: The key to use.

klen: The length of the key.

Returns: A pointer to an ol_val object, or NULL if the object was not found.

ol_unjar_ks

ol_val ol_unjar_ds(ol_database *db, const char *key, size_t klen, size_t *dsize);

Unjar a value from the mayo. Makes ksize a reference to the size of the data returned.

*db: Database to retrieve value from.

*key: The key to use.

klen: The length of the key to use.

*dsize: The key to use.

Returns: A pointer to an ol_val object, or NULL if the object was not found.

ol_jar

int ol_jar(ol_database *db, const char *key, size_t klen, unsigned char *value, size_t vsize);

Put a value into the mayo. It's easy to piss in a bucket, it's not easy to piss in 19 jars. Uses default content type.

*db: Database to retrieve value from.

*key: The key to use.

klen: The length of the key.

*value: The value to insert.

vsize: The size of the value in bytes.

Returns: 0 on sucess.

ol_jar_ct

int ol_jar_ct(ol_database *db, const char *key, size_t klen, unsigned char *value, size_t vsize,
        const char *content_type, const size_t content_type_size);

Put a value into the mayo. It's easy to piss in a bucket, it's not easy to piss in 19 jars. Allows you to specify content type.

*db: Database to retrieve value from.

*key: The key to use.

klen: The key to use.

*value: The value to insert.

vsize: The size of the value in bytes.

*content_type: The content type to store, or really anything. Store your middle name if you want to.

content_type_size: The length of the content_type string.

Returns: 0 on sucess.

ol_content_type

char *ol_content_type(ol_database *db, const char *key, size_t klen);

Retrieves the content type for a given key from the database.

*db: Database to retrieve value from.

*key: The key to use.

klen: The length of the key.

Returns: Stored content type, or NULL if it was not found.

ol_expiration

struct tm *ol_expiration_time(ol_database *db, const char *key, size_t klen);

Retrieves the expiration time for a given key from the database.

*db: Database to retrieve value from.

*key: The key to use.

klen: The length of the key.

Returns: Stored struct tm *representing the time that this key will expire, or NULL if not found.

ol_scoop

int ol_scoop(ol_database *db, const char *key, size_t klen);

Removes an object from the database. Get that crap out of the mayo jar.

*db: Database to retrieve value from.

*key: The key to use.

klen: The length of the key.

Returns: 0 on success, 2 if the object wasn't found.

ol_uptime

int ol_uptime(ol_database *db);

Gets the time, in seconds, that a database has been up.

*db: Database to retrieve value from.

Returns: Uptime in seconds since database initialization.

ol_spoil

int ol_spoil(ol_database *db, const char *key, size_t klen, struct tm *expiration_date);

Sets the expiration value of a key. Will fail if no bucket under the chosen key exists.

*db: Database to retrieve value from.

*key: The key to use.

klen: The length of the key.

expiration_date: The UTC time to set the expiration to.

Returns: 0 upon success, -1 if otherwise.

ol_ht_bucket_max

int ol_ht_bucket_max(size_t ht_size);

Does some sizeof witchery to return the maximum current size of the database.

*ht_size: The size you want to divide by sizeof(ol_bucket).

Returns: The maximum possible bucket slots for db.