These docs contain detailed information about the basic TileDB concepts, format, and usage. All software included here is open-source under the MIT License.
TileDB is a powerful engine architected around multi-dimensional arrays that enables storing and accessing:
Dense arrays (e.g., satellite images)
Sparse arrays (e.g., LiDAR, genomics)
Dataframes (any data in tabular form)
Key-values (mappings between keys and values)
You can use TileDB to store data in a variety of applications, such as Genomics, Geospatial, Finance and more. The power of TileDB stems from the fact that any data can be modeled efficiently as either a dense or a sparse multi-dimensional array (even a dataframe and a key-value store), which is the format used internally by most data science tooling. By storing your data and metadata in TileDB arrays, you abstract all the data storage and management pains, while efficiently accessing the data with your favorite data science tool.
TileDB has the following features:
Cloud storage (AWS S3, Google Cloud Storage, Azure Blob Storage)
Tiling (i.e., chunking) for fast slicing
Multiple compression, encryption and checksum filters
Fully multi-threaded implementation
Data versioning (rapid updates, time traveling)
Embeddable C++ library
Numerous integrations (Spark, Dask, MariaDB, GDAL, etc.)
To address the growing community of data scientists that wish to share data and code with peers and mitigate the pains that come with cluster deployments to compute at scale, we built TileDB Cloud, a serverless platform built on top of the open-source, open-spec TileDB engine.
The TileDB core engine is built in C++. It exposes C and C++ APIs and comes with a Docker image.
Efficient language bindings for TileDB core.
Connectors to distributed computing frameworks that allows you to scale out your computations on TileDB arrays.
Connectors to popular databases that allow you to perform efficient SQL queries on TileDB arrays, even when they are stored on cloud object stores like AWS S3.
A powerful variant store solution built on TileDB core with 2D sparse arrays.
Connectors to popular geospatial tooling that allows you to operate directly on data stored as TileDB dense or sparse arrays.
TileDB Cloud is a serverless platform where you can register your TileDB arrays stored on AWS S3 (without surrendering ownership - you continue to own your data in your S3 buckets) and share them with any other user defining the desirable access policies. The third party you share your data with does not have to download or host your data. They can access and compute on your data directly in TileDB Cloud, enjoying excellent performance.
Moreover, on TileDB Cloud you can spin up Jupyter notebooks and work directly in the UI console, avoiding tool deployments and installations. You just sign up and go in seconds.
Finally, TileDB Cloud offers you a completely serverless computational experience. You can perform SQL and user-defined functions, optionally organized in task dependency graphs (similar to dask.delayed), without thinking about clusters and paying only for what you use. This is beneficial if you'd like to avoid deploying clusters and cost from idle compute.
See the TileDB Cloud docs for more information.