https://github.com/TileDB-Inc/TileDB/blob/<version>/format_spec/FORMAT_SPEC.md
where <version>
is the TileDB Embedded version you are looking for (e.g., 2.6.3
).aws s3 sync src_uri dest_uri
.<timestamped_name>
has format __timestamp_timestamp_uuid
, where:timestamp
is timestamp in milliseconds elapsed since 1970-01-01 00:00:00 +0000 (UTC)uuid
is a unique identifiert1
and t2 > t1
. The logical array view when the user reads the array at any timestamp t3 >= t2
contains all the cells written in the array before t3
, with the more recent cells overwriting the older cells. In the special case of sparse arrays that accepts duplicates (which can be specified in the array schema), if a cell was written more than one times, all cell replicas are maintained and returned to the user upon reads.<timestamped_name>
has format __t1_t2_uuid_v
, where:t1
and t2
are timestamps in milliseconds elapsed since 1970-01-01 00:00:00 +0000 (UTC)uuid
is a unique identifierv
is the format version__fragment_metadata.tdb
) stores important data about the fragment, such as the name of its array schema, its non-empty domain, indexes, and other information that facilitates fast reads.a1.tdb
, a2.tdb
, ...(a1.tdb, a1_var.tdb)
, (a2.tdb, a2_var.tdb)
, ... . The second *_var.tdb
file of the pair contains the var-sized values, whereas the first contains the starting byte offsets of each value in the second file.d1.tdb
, d2.tdb
, ...(d1.tdb, d1_var.tdb)
, (d2.tdb, d2_var.tdb)
, ... The second *_var.tdb
file of the pair contains the var-sized values, whereas the first contains the starting byte offsets of each value in the second file.a1_validity.tdb
, a2_validity.tdb
, ...., associated with attribute a1
, a2
, ...., respectively. The validity files are simple bytevectors that indicate whether a cell value is null or not. The validity files are applicable to both fixed- and var-sized attributes, but they are not applicable to dimensions. They are also optional; the user may or may not specify an attribute as nullable. __fragment_metadata.tdb
) contains a small footer with lightweight indexing information, such as the non-empty domain of the fragment. This information can serve as another layer of indexing when issuing a slicing (read) query. __fragment_meta
as shown below.<timestamped_name>.meta
has format __t1_t2_uuid_v
, where:t1
and t2
are the timestamps in milliseconds elapsed since 1970-01-01 00:00:00 +0000 (UTC) of the oldest and most recent fragment whose fragment metadata footer was consolidated in this fileuuid
is a unique identifierv
is the format version__fragment_meta/*.meta
files to retrieve the footers, eliminating the need to get the individual footers from each fragment metadata file, resulting in a considerable boost in array opening performance.<timestamped_name>.wrt
is created, where <timestamped_name>
is the same as the name of the corresponding fragment folder created in __fragments
. Since there may be numerous fragments created in an array, TileDB allows for consolidating the commit files into a single file <timestamped_name>.con
, which contains the names of the fragments whose commits are being consolidated. The name of the consolidated file contains the timestamps of the first and last commit file it consolidated. The consolidated commit file helps reduce the number of __commits/*.wrt
files, which further boosts the performance of opening the array for reading.__meta
inside the array directory, simply serialized in binary files. Those files are timestamped in the same manner as fragments for the same reasons (immutability, concurrent writes and time traveling). The metadata file organization is shown below.<timestamped_name>
has format __t1_t2_uuid_v
, where:t1
and t2
are timestamps in milliseconds elapsed since 1970-01-01 00:00:00 +0000 (UTC)uuid
is a unique identifierv
is the format version__tiledb_group.tdb
is an empty file indicating that my_group
is a TileDB group.__meta
stores key-value metadata associated with the group. This functionality is identical to that of array metadata.__group
contains timestamped files (with a similar structure to those described above for all other components, such as fragments, consolidated fragment metadata, etc), which store the absolute paths of other arrays and groups.__group
. However, the physical location of the actual groups and arrays may be in paths others that inside the group location on storage. This provides a lot of flexibility in dynamically grouping various arrays and groups, especially on (cloud) object storage, without physically moving enormous quantities of data from one physical path to another.__fragment_metadata.tdb
files in the fragment folders. If there are numerous fragments, then there will be numerous REST requests to the object store to retrieve this information. __t1_t2_uuid_v.meta
. The file contains the fragment URIs whose footers are included, along with the footers in serialized binary form. The footers contain only very small information about the fragments, such as the non-empty domain and other light metadata. t1
is the first timestamp of the first fragment whose metadata is being consolidated, and t2
is the second timestamp of the last fragment. Upon opening an array, regardless of the number of fragments, TileDB can fetch this single small file in main memory with a single REST request. In other words, the TileDB format has mechanisms for making the communication protocol with the object store more lightweight. __t1_t2_uuid3_v
, wheret1
is the first timestamp of the first fragment being consolidated, and t2
is the second timestamp of the last fragment. __t1_t2_uuid3_v.vac
, i.e., with the same name as the consolidated fragment with added suffix .vac
. This file contains the URIs of the fragments that were consolidated by fragment __t1_t2_uuid3_v
, and it is stored in the __commits
folder. The user can then vacuum them, i.e., permanently delete the consolidated fragments. The __commits/*.vac
files are used in the vacuum process so that only consolidated fragments get deleted. __t1_t2_uuid_v.con
file in folder __commits
, which stores the URIs of the fragments whose commits were consolidated in this file. The consolidated commits can then be vacuumed, leaving a single commit file in the __commits
folder. That leads to a significant boost of opening the array for reads in the presence of a large number of fragments.__t1_t2_uuid3.vac
files is used in vacuuming to delete the array metadata files that participated in consolidation. __fragment_meta/*.meta
files. TileDB allows vacuuming those fragment metadata files by keeping the latest *.meta
file and deleting the rest. An example is shown below.__fragments
folder, the two corresponding commit *.wrt
files and the *.vac
file from __commits
folder. An example is shown below.__fragments
folder, and the *.vac
file from the __commits
folder as in the first case. But this time, it will also create a new file __commits/__t1_t2_uuid5_v.ign
. This file indicates that the vacuumed fragment URIs should be ignored from the consolidated commit file __t1_t2_uuid4_v.con
upon opening the array for reading.__commits/*.com
files). An example is shown below:*.vac
files. An example is shown below.1, 2, 3
, which have the following little-endian representation when stored adjacent in memory:100, 104, 108, 112, ...
, then the resulting positive-encoded data would be 0, 4, 4, 4, ...
. This encoding is advantageous in that producing long runs of repeated values can result in better compression ratios, if a compression filter is added after positive-delta encoding.100, 104, 108, 112
can easily arise in the offsets, if for example you have a variable-length attribute of 4-byte values with mostly single values per cell instead of a variable number.uint64
. Initially, each cell of data for that attribute requires 8 bytes of storage. However, if you know that the actual value of the attribute is often 255 or less, those cells can be stored using just a single byte in the form of a uint8
, saving 7 bytes of storage per cell. The bit-width reduction filter performs this analysis and compression automatically over windows of attribute data.300, 350, 400
, the bit-width reduction filter would first determine that the minimum value in the window was 300
, and the relative cell values were 0, 50, 100
. These relative values are now less than 255 and can be represented by a uint8
value.