TileDB Embedded

TileDB Embedded is a powerful engine architected around multi-dimensional arrays that enables storing and accessing:

  • Dense arrays (e.g., satellite images)

  • Sparse arrays (e.g., LiDAR, genomics)

  • Dataframes (any data in tabular form, via dense or sparse arrays)

  • Key-values (mappings between keys and values, via sparse arrays)

You can use TileDB to store data in a variety of applications, such as Genomics, Geospatial, Finance and more. The power of TileDB stems from the fact that any data can be modeled efficiently as either a dense or a sparse multi-dimensional array, which is the format used internally by most data science tooling. By storing your data and metadata in TileDB arrays, you abstract all the data storage and management pains, while efficiently accessing the data with your favorite data science tool via our numerous integrations.

TileDB Embedded has the following features:

  • Tiling (i.e., chunking) for fast slicing

  • Multiple compression, encryption and checksum filters

  • Cloud storage (AWS S3, Google Cloud Storage, Azure Blob Storage)

  • Fully multi-threaded implementation

  • Parallel IO

  • Data versioning (rapid updates, time traveling)

  • Array metadata

  • Array groups

  • Embeddable C++ library

  • Numerous APIs (C, C++, Python, Java, R, Go)

  • Numerous integrations (Spark, Dask, MariaDB, GDAL, etc.)

Code and APIs

The TileDB Embedded engine is built in C++. It exposes C and C++ APIs and comes with a Docker image.

We maintain a growing set of language APIs built on top of the C and C++ APIs:

We also maintain numerous integrations with SQL engines and popular data science tools using the above APIs.

How to get started

  1. Install TileDB Embedded for your favorite language and see how to use it.

  2. Run the various examples.

  3. Read the internals to understand how TileDB Embedded implements the array model and format.

  4. If you are using anything other than your local disk (e.g., S3), check out the backends section.

  5. Browse the API reference and API usage for quickly finding information.

  6. For maximizing performance, see the performance tips and configuration parameters.