TileDB Embedded is a powerful engine architected around multi-dimensional arrays that enables storing and accessing:
Dense arrays (e.g., satellite images)
Sparse arrays (e.g., LiDAR, genomics)
Dataframes (any data in tabular form, via dense or sparse arrays)
Key-values (mappings between keys and values, via sparse arrays)
You can use TileDB to store data in a variety of applications, such as Genomics, Geospatial, Finance and more. The power of TileDB stems from the fact that any data can be modeled efficiently as either a dense or a sparse multi-dimensional array, which is the format used internally by most data science tooling. By storing your data and metadata in TileDB arrays, you abstract all the data storage and management pains, while efficiently accessing the data with your favorite data science tool via our numerous integrations.
TileDB Embedded has the following features:
Tiling (i.e., chunking) for fast slicing
Multiple compression, encryption and checksum filters
Cloud storage (AWS S3, Google Cloud Storage, Azure Blob Storage)
Fully multi-threaded implementation
Data versioning (rapid updates, time traveling)
Embeddable C++ library
Numerous APIs (C, C++, Python, Java, R, Go)
Numerous integrations (Spark, Dask, MariaDB, GDAL, etc.)
The TileDB Embedded engine is built in C++. It exposes C and C++ APIs and comes with a Docker image.
We maintain a growing set of language APIs built on top of the C and C++ APIs:
We also maintain numerous integrations with SQL engines and popular data science tools using the above APIs.