fragment_meta), which merges the fragment metadata footers of a subset of fragments into a single file that has suffix
.meta, stored in the array folder. This file is named similarly to fragments, i.e., it carries a timestamp range that helps with time traveling. It also contains all the URIs of the fragments whose metadata footers are consolidated in that file. Upon reading an array, only this file is efficiently fetched from the backend, since it is typically very small in size (even for hundreds of thousands of fragments).
fragmentsis passed to the consolidation function, then the fragment consolidation algorithms is executed, which is explained in detail below.
[1,4], [1,4], the second is dense and covers
[1,2], [1,2], the third is sparse as shown in the figure, and the fourth one is dense covering
[1,2], [1,4]. Observe that, if those four fragments were to be consolidated, the cells of the second and third fragment would be completely overwritten from the cells of the fourth fragment. Therefore, the existence of those two fragments would make no difference to the consolidation result. Deleting them altogether before the consolidation algorithm commences will result in boosting the algorithm performance (since fewer cells will be read and checked for overwrites).
sm.consolidation.step_size_ratio; if the size ratio of two adjacent fragments is larger than this parameter, then no fragment subset that contains those two fragments will be considered for consolidation.
sm.consolidation.amplification, which should not be exceed for a fragment subset to be eligible for consolidation. The default value
1.0means that the fragments will be consolidated if there is no amplification at all, i.e., if the size of the resulting consolidated fragment is smaller than or equal to the sum of sizes of the original fragments. As an example, this happens when the non-empty domain of the consolidated fragment does not contain any empty cells.
sm.consolidation.step_max_frags; the algorithm will select the subset of fragments (complying with all the above criteria) that has the maximum cardinality smaller than or equal to
sm.consolidation.step_max_fragsand larger than or equal to
sm.consolidation.step_min_frags. If no fragment subset is eligible with cardinality at least
sm.consolidation.step_min_frags, then the consolidation algorithm terminates.
O(max_frags * total_frags), where
total_fragsis the total number of fragments considered in a given step, and
max_fragsis equal to the
array_meta). Since the array metadata is typically small and can fit in main-memory, consolidating them is rather simple. TileDB simply reads all the array metadata (from all the existing array metadata fragments) in main memory, creates an up-to-date view of the metadata, and then flushes them to a new array metadata file that carries in its name the timestamp range determined by the first timestamp of the first array metadata and the second timestamp of the last array metadata files that got consolidated.
.vacfile is produced with all the fragment URIs that participated in consolidation. When the vacuuming function is called with mode
"fragments", all the fragment folders whose URI is in the
.vacfile get deleted.
.vacfile is produced with all the array metadata URIs that participated in consolidation. When the vacuuming function is called with mode
"array_meta", all the array metadata files whose URI is in the
.vacfile get deleted.
.metafiles except for the last one.