The array metadata are simple key-value pairs that the user can attach to an array. The key is a string and the value can be of any datatype. The array metadata is typically small. Time traveling applies to array metadata as well, i.e., opening an array at a timestamp will fetch only the array metadata created at or before the given timestamp.
Both the array schema and the array metadata store information about the array, and the user is responsible for setting and configuring them. The easiest way to remember the difference between the array metadata and the array schema is the following:
- The array metadata stores user-specific data about the array that is arbitrary key-value pairs.
- The array schema stores system-specific data about the array that has a fixed structure (e.g., a dimension name, domain and datatype).
The array schema stores all the details about the array definition. Some of the data it holds are:
- Attributes (name, datatype, filters)
- Dimensions (name, datatype, domain, filters)
- Tile extent and capacity
- Tile and cell order
TileDB Cloud offers a very simple way of sharing arrays with anyone on the planet. It effectively provides a way to defined access control on arrays, and log every single action for auditability purposes.
A non-empty cell (in either a dense or sparse array) is not limited to storing a single value. Each cell stores a tuple with a structure that is common to all cells. Each tuple element corresponds to a value on a named attribute of a certain type. An attribute can be:
- Fixed-sized: an attribute value in a cell may consist of one or a fixed number of values of the same datatype
- Variable-sized: an attribute value in a cell may consist of a variable number of values of the same datatype, i.e., different cells may store a different number of values on this attribute.
The figure below shows an example of an array with 3 attributes;
float32: 2. Every non-empty cell must store 1
a1, any number of
a2and exactly 2
An array with 3 attributes (a1, a2, a3)
Note that the above is the logical representation of the attribute values in the cells. TileDB physically stores the values in a "columnar" manner, i.e., all the values along each of the attributes are stored in separate files. See Data Format for more details.
An ordered tuple of dimension domain values, called coordinates, identifies an array cell. The order of the coordinates must follow the order in which the array dimensions were specified. The figure below depicts an example of cell
(3, 4)assuming that the dimension order is
An array cell with coordinates (3,4)
TileDB writes in immutable fragments, which are self-contained directories inside the array directory. If the number of fragments becomes excessive or if several fragments are small in size, it is beneficial to consolidate them into smaller ones. TileDB offers various ways to consolidate fragments. It also enables consolidating only the fragment metadata footers in order to boost performance on cloud object stores.
The coordinates of an array cell is an ordered tuple of dimension domain values that identifies it. In dense arrays, the coordinates of each cell are unique. In sparse arrays, the same coordinates may appear more than once.
TileDB adopts the so-called columnar format and stores the (non-empty) cell values for each attribute separately. A data tile is a subset of cell values on a particular attribute. We explain the data tile separately for dense and sparse fragments, and its relationship to the space tile. The data tile is the atomic unit of compression and IO.
In a dense fragment, each space tile corresponds to
Ndata tiles, where
Nis the number of attributes.
Example data tiles in a dense array with two attributes (int32 and char)
Contrary to dense fragments, there is no correspondence between space tiles and data tiles in sparse fragments. Consider the
4x4space tiles in the figure below. Assume for simplicity that the array stores a single
int32attribute. The non-empty cells are depicted in blue color. If we followed the data tiling technique of dense fragment, we would have to create 4 data tiles, one for each space tile. TileDB does not materialize empty cells, i.e., it stores only the values of the non-empty cells in the data files. Therefore, the space tiles would produce 4 data tiles with 3 (upper left), 12 (upper right), 1 (lower left) and 2 (lower right) non-empty cells.
A sparse array with imbalanced occupancy of space tiles
The physical tile size imbalance that may result from space tiling can lead to ineffective compression (if numerous data tiles contain only a handful of values), and inefficient reads (if the subarray you wish to read only partially intersects with a huge tile, which needs to be fetched in its entirety and potentially decompressed). Ideally, we wish every data tile to store to the same number of non-empty cells. Recall that this is achieved in the dense case by definition, since each space tile has the same shape (equal number of cells) and all cells in each space tile are non-empty. Finally, since the distribution of the non-empty cells in the array may be arbitrary, it is phenomenally difficult to fine-tune the space tiling in a way that leads to load-balanced data tiles storing an acceptable number of non-empty cells, or even completely unattainable.
TileDB solves this seemingly very challenging problem in a surprisingly simple manner. It first sorts the non-empty cells along the global cell order (defined by specifying the space tiles, tile and cell order as in the dense case), then separates the cell attribute values (as in the dense case), and then creates data tiles on each attribute by grouping adjacent cells based on a user-defined parameter called capacity.
Data tiles in sparse fragments with different capacities
In other words, the space tiles in sparse fragments are used to determine the global cell order that will dictate which cell values will be grouped together in the same data tile. Another difference to dense fragments is that sparse fragments create extra data tiles for the coordinates of the non-empty cells, which is important in reading.
There are three main differences between a dense and a sparse array:
- 1.A dense array is used when the majority of the cells are non-empty (within any hyper-rectangular sub domain), whereas a sparse array when the majority of the cells are empty.
- 2.The dimensions of a dense array must have the same datatype, whereas the dimensions of a sparse array may have different datatypes.
- 3.The dimensions of a dense array can only be of integer data type, whereas the dimensions of a sparse array may be of any data type (even real or string).
- 4.Every cell in a dense array is uniquely identified by its coordinates, whereas a sparse array can permit multiplicities, i.e., cells with the same coordinates but potentially different attribute values, as well as real (
float64) and string domains.
TileDB provides a unified API for both dense and sparse arrays.
A multi-dimensional array consists of a set of ordered dimensions. A dimension has a name, a datatype and a domain. The figure below shows an example of two
An array with two dimensions
The array domain (or simply domain) is the hyperspace defined by the domains of the array dimensions. In a dense array, all dimensions must have the same type (homogeneous dimensions) and can only be integers. In a sparse array, the dimensions may have different type (heterogenous dimensions) and can be of any data type (even real and string).
The non-empty domain is the tightest hyper-rectangle that contains all non-empty cells. An example is shown in the figure below.
The non-empty domain is [2,3] x [3,5]
The dimension domains can have negative, real and string values. An array cell is still identified by its coordinates, which take any value from the corresponding dimension domain.
Dimensions may have negative, real or string domains
In our examples, the orientation of each dimension domain is rather arbitrary and does not affect the array definition. It is just a matter of convention. For example, the lower values may be at the top or bottom of the vertical dimension.
Not all array cells may contain values. A cell that contains values is called non-empty cell, otherwise it is called empty.
In case the user slices some empty space from a dense array, the selected attributes are assigned special fill values (to represent "empty"). TileDB uses default values but also provides a way to specify the fill value during the creation of an attribute and the array schema.
TileDB offers a variety of compressors and other filters, which are applied to data tiles of the attributes and dimensions when writing to disk (called filtering) or when reading from disk (called unfiltering).
A fragment is a timestamped snapshot of a portion of the array, which is produced during writes. A fragment may be dense or sparse as shown in the figure below. In a dense fragment, the non-empty cells are contained in a full hyper-rectangle in the domain. This hyper-rectangle may cover the full domain or any subdomain. In a sparse fragment, the non-empty cells may be arbitrary, i.e., not necessarily comprise a full hyper-rectangle.
Dense and sparse fragment examples
An array may consist of multiple fragments. Those fragments are completely transparent to the user, who only sees the combined logical view of the array upon reading. This is produced by superimposing the more recent fragments on top of the older ones, with the more recently written cells overwriting the older ones. A dense array may consist of both dense and sparse fragments, but a sparse array may consist only of sparse fragments.
Logical view of array with 3 fragments
The fragment metadata is system-specific information about a fragment. Some of the information this metadata includes is:
- Dense or sparse
- Non-empty domain
- Tile offsets
- Tile sizes
- R-Tree (for the sparse case)
The tile and cell order collectively determine the global cell order. The global cell order is essentially a mapping from the multi-dimensional cell space to the 1-dimensional physical storage space for the non-empty cells, i.e., it is the order in which TileDB stores the cell values on disk. The figure below shows the 4 possible global cell orders resulting from all combinations of tile/cell orders. The numbers indicate the relative positions of the non-empty cells along the global order.
Global cell order examples
Groups allows hierarchically organizing arrays and other groups.
An incomplete query occurs when the result size of a subarray is larger than the allocated buffers that will hold the result. TileDB gracefully handles this case via result estimation and subarray partitioning.
The non-empty domain of an array is the minimum bounding hype-rectangle that tightly encompasses all non-empty cells in the array.
Each data tile of a nullable attribute is always accompanied by a validity tile, which stores the validity of each corresponding cell. Non-zero validity values represent non-null cells, while zero values represent null cells. This is applicable to both fixed-size and variable-sized attributes.
Validity tile of a fixed-sized attribute
Validity tile of a variable-sized attribute
TileDB uses R-trees for multi-dimensional indexing in sparse fragments. This allows for fast pruning of irrelevant data tiles during reading.
TileDB uses R-tree for fast reading in sparse fragments
A space tile is defined by specifying a tile extent along each dimension. The domain of each dimension is partitioned into segments equal to the tile extent, and hyper-rectangular tiles are formed in the multi-dimensional array space. The space tile concept applies to both dense and sparse arrays (as well as real dimensions) and is independent of the actual data stored in the array.
Tile extents and space tiles
A subarray is an array slice. A single-range subarray is defined by a domain range along each dimension. A multi-range subarray is defined by a multiple ranges per dimension. The resulting slice of a multi-range subarray is oriented by the cross-product of the ranges along all dimensions. Multi-range subarrays are applicable only to reads. Multi-range subarrays are applicable to both dense and sparse arrays.
Single- and multi-range subarray examples
TileDB allows the user to specify an order for the space tiles, as well as the cells inside each space tile. The order can be:
- Row-major: Assuming each tile or cell can be identified by a set of coordinates in the multi-dimensional space, row-major means that the rightmost coordinate index “varies the fastest”.
- Column-major: Assuming each tile or cell can be identified by a set of coordinates in the multi-dimensional space, column-major means that the leftmost coordinate index “varies the fastest”.
Tile and cell order examples
TileDB writes in immutable fragments, each of which is timestamped. It then offers a way to open the array at user-specified time stamps and, therefore, read views of the array at different times.
TileDB Cloud allows for serverless execution of user-defined functions (UDF) written in various languages. Those are essentially arbitrary computations that are dispatched to the TileDB Cloud platform, avoiding the need to manually spin up compute instances and clusters in the cloud. TileDB Cloud allows the execution of UDFs in parallel, enabling massive scalability.
TileDB retains the fragments after a consolidation process in order to continue to offer fine-grained time traveling. TileDB offers a vacuuming process that deletes the consolidated fragments in order to save space.
Each data tile of a variable-sized attribute is always accompanied by an offset tile, which stores the starting byte position of every variable-sized value in the data tile. This allows TileDB to quickly locate the i-th cell value in a data tile in constant time.
Offset tile of a variable-sized attribute data tile