Writing

TileDB is architected to support parallel batch writes, i.e., writing collections of cells with multiple processes or threads. Each write operation creates one or more dense or sparse fragments. Updating an array is equivalent to initiating a new write operation, which could either insert cells in unpopulated areas of the domain or overwrite existing cells (or a combination of the two). TileDB handles each write separately and without any locking. Each fragment is immutable, i.e., write operations always create new fragments, without altering any other fragment.

Dense Writes

A dense write is applicable only to dense arrays and creates one or more dense fragments. In a dense write, the user provides:

  • The subarray to write into (it must be single-range).

  • The buffers that contain the attribute values of the cells that are being written.

  • The cell order within the subarray (which must be common across all attributes), so that TileDB knows which values correspond to which cells in the array domain. The cell order may be row-major, column-major, or global.

The example below illustrates writing into a subarray of an array with a single attribute. The figure depicts the order of the attribute values in the user buffers for the case of row- and column-major cell order. TileDB knows how to appropriately re-organize the user-provided values so that they obey the global cell order before storing them to disk. Moreover, note that TileDB always writes integral space tiles to disk. Therefore, it will inject special empty values (depicted in grey below) into the user data to create full data tiles for each space tile.

A dense write in row- or column-major order

Writing in the array global order needs a little bit more care. The subarray must be specified such that it coincides with space tile boundaries, even if the user wishes to write in a smaller area within that subarray. The user is responsible for manually adding any necessary empty cell values in her buffers. This is illustrated in the figure below, where the user wishes to write in the blue cells, but has to expand the subarray to coincide with the two space tiles and provide the empty values for the grey cells as well. The user must provide all cell values in the global order, i.e., following the tile order of the space tiles and the cell order within each space tile.

A dense write in global order

Writing in global order requires knowledge of the space tiling and cell/tile order, and is rather cumbersome to use. However, this write mode leads to the best performance, because TileDB does not need to internally re-organize the cells along the global order. It is recommended for use cases where the data arrive already grouped according to the space tiling and global order (e.g., in geospatial applications).

TileDB uses the following default fill values for empty cells in dense writes, noting that the user can specify any other fill value upon array creation:

Datatype

Default fill value

TILEDB_CHAR

Minimum char value

TILEDB_INT8

Minimum int8 value

TILEDB_UINT8

Maximum uint8 value

TILEDB_INT16

Minimum int16 value

TILEDB_UINT16

Maximum uint16 value

TILEDB_INT32

Minimum int32 value

TILEDB_UINT32

Maximum uint32 value

TILEDB_INT64

Minimum int64 value

TILEDB_UINT64

Maximum uint64 value

TILEDB_FLOAT32

NaN

TILEDB_FLOAT64

NaN

TILEDB_ASCII

0

TILEDB_UTF8

0

TILEDB_UTF16

0

TILEDB_USC2

0

TILEDB_USC4

0

TILEDB_ANY

0

TILEDB_DATETIME_*

Minimum int64 value

Sparse Writes

Sparse writes are applicable to both dense and sparse arrays and create one or more sparse fragments. The user must provide:

  • The attribute values to be written.

  • The coordinates of the cells to be written.

  • The cell layout of the attribute and coordinate values to be written (must be the same across attributes and dimensions). The cell layout may be unordered or global.

Note that sparse writes do not need to be constrained in a subarray, since they contain the explicit coordinates of the cells to write into. The figure below shows a sparse write example with the two cell orders. The unordered layout is the easiest and most typical. TileDB knows how to appropriately re-organize the cells along the global order internally before writing the values to disk. The global layout is once again more efficient but also more cumbersome, since the user must know the space tiling and the tile/cell order of the array, and manually sort the values before providing them to TileDB.