Handling Key-value Stores

Key-value Store vs. Sparse Array

A key-value store can be thought of as a dataframe with one or more "key" columns (typically of string type) and one or more "value" columns (of any type). The most important requirement is to provide very efficient search on the key columns. Therefore, according to our discussion on the Handling Dataframes section, you can represent your key-value store as a multi-dimensional sparse array, where the "key" column are the dimensions and the "value" columns are the attributes.

Why Use TileDB as a Key-value Store?

TileDB is not designed to work as a special-purpose key-value store. There are other excellent key-value store solutions out there. You may want to use TileDB as a key-value store for the following reasons:

  • Cloud support: TileDB works very efficiently on AWS S3, Google Cloud Storage and Azure Blob Storage, implementing tons of low-level optimizations and making heavy use of parallel IO.

  • Multi-key parallelism: TileDB very fast multi-range subarray support via heavy use of multi-threading can be quite useful when you wish to efficiently retrieve many (e.g., thousands) of keys in a single query, without fetching and decompressing/decrypting the common tiles more than once.

  • Interoperability: You can leverage TileDB's integrations (e.g., Spark, Dask, MariaDB, PrestoDB, etc) to run diverse queries on your key-value store. More importantly, you can use SQL queries to join your key-value store with any other TileDB array.