Physical Storage

File Structure

In this section we describe the file structure of a TileDB array. For details on the binary format of the various files, see the Format Description.

TileDB stores all the information about an array in a single directory, as shown in the figure below. The directory contains the array schema file, a lock file (used in file-locking for providing process-safety), potentially multiple fragments directories and an array metadata directory.

File structure of TileDB arrays and groups

Each fragment directory corresponds to a fragment (see Writing). It contains a fragment metadata file, a file storing the coordinates if the fragment is sparse (this file does not appear in dense fragments), one file for each fixed-sized attribute and two files for each variable-sized attribute. For the coordinates and the fixed-sized attributes, the corresponding files store the values in the global cell order, grouped into tiles and appropriately filters (see Tile Filters). For fixed-sized attributes, one file stores the actual values (also in the global cell order as fixed-sized attributes) and one file stores the starting offsets of the variable-sized values in the first file.

The array metadata directory stores a set of files, each corresponding to a single array metadata write operation. That is, each file here corresponds to an array metadata "fragments", similar to array fragments. Every file is a blob containing the written key-value array metadata pairs (see Array Metadata).

Finally, a TileDB group is just a directory with an empty file __tiledb_group.tdb.

Array Names

The array name is the actual path to the physical array directory. It can be any URI, e.g., a local path (which can be absolute, relative, or starting with file://), an HDFS path (starting with hdfs://), an S3 path (starting with s3://), etc.

Fragment Names

The fragment name has format __t1_t2_uuid, where uuid is a unique identifies to prevent fragments from having name collisions upon their creation. When a fragment is first created, t1 == t2 == t, where t is the creation timestamp of the fragment. When a set of fragments gets consolidated (always adjacent in time), t1 and t2 are the timestamps of the first and last consolidated fragment in the time order. Each timestamp is in ms elapsed since 1970-01-01 00:00:00 +0000 (UTC).

TileDB uses the timestamp information in the fragment names to quickly prune them in time traveling read queries.

Array Metadata File Names

The array metadata file names have the same format as the fragment names. Also they can get consolidated similarly to fragments (see Consolidation). Therefore, the timestamps and UUID in their names serve the same purpose as that in fragment names.