Terminology

Array metadata

The array metadata are simple key-value pairs that the user can attach to an array. The key is a string and the value can be of any datatype. The array metadata is typically small. Time traveling applies to array metadata as well, i.e., opening an array at a timestamp will fetch only the array metadata created at or before the given timestamp.

Both the array schema and the array metadata store information about the array, and the user is responsible for setting and configuring them. The easiest way to remember the difference between the array metadata and the array schema is the following:

  • The array metadata stores user-specific data about the array that is arbitrary key-value pairs.

  • The array schema stores system-specific data about the array that has a fixed structure (e.g., a dimension name, domain and datatype).

Array schema

The array schema stores all the details about the array definition. Some of the data it holds are:

  • Attributes (name, datatype, filters)

  • Dimensions (name, datatype, domain, filters)

  • Tile extent and capacity

  • Tile and cell order

See the Data Format for more details.

Array sharing

TileDB Cloud offers a very simple way of sharing arrays with anyone on the planet. It effectively provides a way to defined access control on arrays, and log every single action for auditability purposes.

Attribute

A non-empty cell (in either a dense or sparse array) is not limited to storing a single value. Each cell stores a tuple with a structure that is common to all cells. Each tuple element corresponds to a value on a named attribute of a certain type. An attribute can be:

  • Fixed-sized: an attribute value in a cell may consist of one or a fixed number of values of the same datatype

  • Variable-sized: an attribute value in a cell may consist of a variable number of values of the same datatype, i.e., different cells may store a different number of values on this attribute.

The figure below shows an example of an array with 3 attributes; a1 of type int32, a2 of type char: var and a3 of type float32: 2. Every non-empty cell must store 1 int32 value on a1, any number of char values on a2 and exactly 2 float32 values on a3.

An array with 3 attributes (a1, a2, a3)

Note that the above is the logical representation of the attribute values in the cells. TileDB physically stores the values in a "columnar" manner, i.e., all the values along each of the attributes are stored in separate files. See Data Format for more details.

Cell

An ordered tuple of dimension domain values, called coordinates, identifies an array cell. The order of the coordinates must follow the order in which the array dimensions were specified. The figure below depicts an example of cell (3, 4) assuming that the dimension order is d1, d2.

An array cell with coordinates (3,4)

Consolidation

TileDB writes in immutable fragments, which are self-contained directories inside the array directory. If the number of fragments becomes excessive or if several fragments are small in size, it is beneficial to consolidate them into smaller ones. TileDB offers various ways to consolidate fragments. It also enables consolidating only the fragment metadata footers in order to boost performance on cloud object stores.

Coordinates

The coordinates of an array cell is an ordered tuple of dimension domain values that identifies it. In dense arrays, the coordinates of each cell are unique. In sparse arrays, the same coordinates may appear more than once.

Data tile

TileDB adopts the so-called columnar format and stores the (non-empty) cell values for each attribute separately. A data tile is a subset of cell values on a particular attribute. We explain the data tile separately for dense and sparse fragments, and its relationship to the space tile. The data tile is the atomic unit of compression and IO.

In a dense fragment, each space tile corresponds to N data tiles, where N is the number of attributes.

Example data tiles in a dense array with two attributes (int32 and char)

Contrary to dense fragments, there is no correspondence between space tiles and data tiles in sparse fragments. Consider the 8x8 fragment with 4x4 space tiles in the figure below. Assume for simplicity that the array stores a single int32 attribute. The non-empty cells are depicted in blue color. If we followed the data tiling technique of dense fragment, we would have to create 4 data tiles, one for each space tile. TileDB does not materialize empty cells, i.e., it stores only the values of the non-empty cells in the data files. Therefore, the space tiles would produce 4 data tiles with 3 (upper left), 12 (upper right), 1 (lower left) and 2 (lower right) non-empty cells.

A sparse array with imbalanced occupancy of space tiles

The physical tile size imbalance that may result from space tiling can lead to ineffective compression (if numerous data tiles contain only a handful of values), and inefficient reads (if the subarray you wish to read only partially intersects with a huge tile, which needs to be fetched in its entirety and potentially decompressed). Ideally, we wish every data tile to store to the same number of non-empty cells. Recall that this is achieved in the dense case by definition, since each space tile has the same shape (equal number of cells) and all cells in each space tile are non-empty. Finally, since the distribution of the non-empty cells in the array may be arbitrary, it is phenomenally difficult to fine-tune the space tiling in a way that leads to load-balanced data tiles storing an acceptable number of non-empty cells, or even completely unattainable.

TileDB solves this seemingly very challenging problem in a surprisingly simple manner. It first sorts the non-empty cells along the global cell order (defined by specifying the space tiles, tile and cell order as in the dense case), then separates the cell attribute values (as in the dense case), and then creates data tiles on each attribute by grouping adjacent cells based on a user-defined parameter called capacity.

Data tiles in sparse fragments with different capacities

In other words, the space tiles in sparse fragments are used to determine the global cell order that will dictate which cell values will be grouped together in the same data tile. Another difference to dense fragments is that sparse fragments create extra data tiles for the coordinates of the non-empty cells, which is important in reading.

Dense vs. sparse array

A dense array consists primarily of dense fragments, but also allows sparse fragments to support fast ad hoc cell updates. On the other hand, a sparse array consists only of sparse fragments.

There are mainly three differences between a dense and a sparse array:

  1. A dense array is used when the majority of the cells are non-empty (within any hyper-rectangular sub domain), whereas a sparse array when the majority of the cells are empty.

  2. The dimensions of a dense array must have the same datatype, whereas the dimensions of a sparse array may have different datatypes.

  3. The dimensions of a dense array can only be of integer data type, whereas the dimensions of a sparse array may be of any data type (even real or string).

  4. Every cell in a dense array is uniquely identified by its coordinates, whereas a sparse array can permit multiplicities, i.e., cells with the same coordinates but potentially different attribute values, as well as real (float32, float64) and string domains.

TileDB provides a unified API for both dense and sparse arrays.

Dimension

A multi-dimensional array consists of a set of ordered dimensions. A dimension has a name, a datatype and a domain. The figure below shows an example of two int32 dimensions, d1 with domain [1,4] and d2 with domain [2,6].

An array with two dimensions

Domain

The array domain (or simply domain) is the hyperspace defined by the domains of the array dimensions. In a dense array, all dimensions must have the same type (homogeneous dimensions) and can only be integers. In a sparse array, the dimensions may have different type (heterogenous dimensions) and can be of any data type (even real and string).

The non-empty domain is the tightest hyper-rectangle that contains all non-empty cells. An example is shown in the figure below.

The non-empty domain is [2,3] x [3,5]

The dimension domains can have negative, real and string values. An array cell is still identified by its coordinates, which take any value from the corresponding dimension domain.

Dimensions may have negative, real or string domains

In our examples, the orientation of each dimension domain is rather arbitrary and does not affect the array definition. It is just a matter of convention. For example, the lower values may be at the top or bottom of the vertical dimension.

Empty cell

Not all array cells may contain values. A cell that contains values is called non-empty cell, otherwise it is called empty.

Fragment

A fragment is a timestamped snapshot of a portion of the array, which is produced during writes. A fragment may be dense or sparse as shown in the figure below. In a dense fragment, the non-empty cells are contained in a full hyper-rectangle in the domain. This hyper-rectangle may cover the full domain or any subdomain. In a sparse fragment, the non-empty cells may be arbitrary, i.e., not necessarily comprise a full hyper-rectangle.

Dense and sparse fragment examples

An array may consist of multiple fragments. Those fragments are completely transparent to the user, who only sees the combined logical view of the array upon reading. This is produced by superimposing the more recent fragments on top of the older ones, with the more recently written cells overwriting the older ones. A dense array may consist of both dense and sparse fragments, but a sparse array may consist only of sparse fragments.

Logical view of array with 3 fragments

Fragment metadata

The fragment metadata is system-specific information about a fragment. Some of the information this metadata includes is:

  • Dense or sparse

  • Non-empty domain

  • Tile offsets

  • Tile sizes

  • R-Tree (for the sparse case)

See the Data Format for the detailed description of the fragment metadata.

Global cell order

The tile and cell order collectively determine the global cell order. The global cell order is essentially a mapping from the multi-dimensional cell space to the 1-dimensional physical storage space for the non-empty cells, i.e., it is the order in which TileDB stores the cell values on disk. The figure below shows the 4 possible global cell orders resulting from all combinations of tile/cell orders. The numbers indicate the relative positions of the non-empty cells along the global order.

Global cell order examples

R-tree

TileDB uses R-trees for multi-dimensional indexing in sparse fragments. This allows for fast pruning of irrelevant data tiles during reading.

TileDB uses R-tree for fast reading in sparse fragments

Serverless compute

TileDB Cloud offers the capability of array slicing as well as computing arbitrary UDFs in a serverless manner, i.e., without the need to manually spin up compute instances and clusters in the cloud. TileDB Cloud aims at massive scalability.

Space tile & tile extent

A space tile is defined by specifying a tile extent along each dimension. The domain of each dimension is partitioned into segments equal to the tile extent, and hyper-rectangular tiles are formed in the multi-dimensional array space. The space tile concept applies to both dense and sparse arrays (as well as real dimensions) and is independent of the actual data stored in the array.

Tile extents and space tiles

Subarray

A subarray is an array slice. A single-range subarray is defined by a domain range along each dimension. A multi-range subarray is defined by a multiple ranges per dimension. The resulting slice of a multi-range subarray is oriented by the cross-product of the ranges along all dimensions.

Single- and multi-range subarray examples

Task DAG

TileDB Cloud provides a way to execute task DAGs, i.e., directed acyclic graphs of UDFs, in a serverless manner. The DAG vertices represent the tasks and the edges specify dependencies between the tasks. More specifically, a task cannot begin its execution until all the tasks from its incoming edges complete their execution. TileDB Cloud is responsible for executing and monitoring the task DAGs. Task DAGs provide a powerful way to implement complex workflows and sophisticated distributed algorithms.

Tile & cell order

TileDB allows the user to specify an order for the space tiles, as well as the cells inside each space tile. The order can be:

  • Row-major: Assuming each tile or cell can be identified by a set of coordinates in the multi-dimensional space, row-major means that the rightmost coordinate index “varies the fastest”.

  • Column-major: Assuming each tile or cell can be identified by a set of coordinates in the multi-dimensional space, column-major means that the leftmost coordinate index “varies the fastest”.

Tile and cell order examples

Time traveling

TileDB writes in immutable fragments, each of which is timestamped. It then offers a way to open the array at user-specified time stamps and, therefore, read views of the array at different times.

User-defined function (UDF)

TileDB Cloud allows for serverless execution of user-defined functions (UDF) written in various languages. Those are essentially arbitrary computations that are dispatched to the TileDB Cloud platform, avoiding the need to manually spin up compute instances and clusters in the cloud. TileDB Cloud allows the execution of UDFs in parallel, enabling massive scalability.

Vacuuming

TileDB retains the fragments after a consolidation process in order to continue to offer fine-grained time traveling. TileDB offers a vacuuming process that deletes the consolidated fragments in order to save space.

Variable-sized attribute

Each data tile of a variable-sized attribute is always accompanied by an offset tile, which stores the starting byte position of every variable-sized value in the data tile. This allows TileDB to quickly locate the i-th cell value in a data tile in constant time.

Offset tile of a variable-sized attribute data tile