Choosing Tiling & Cell Layout

Dense Arrays

Recall that in dense arrays (and, more precisely, dense fragments), there is a one-to-one correspondence between a space tile and a data tile (for each attribute), and the data tile is the atomic unit of IO. TileDB will fetch all data tiles corresponding to every space tile that overlaps with the query subarray. Therefore, the tile extents along each dimension must be set in a way that the resulting space tiles follow more or less the shape of the most typical subarray queries.

Furthermore, the size of the space tile affects performance. A larger tile may lead to fetching a lot of irrelevant to the query data, but it can also result in better compression ratio and parallelism (for both filtering and IO). It is recommended for the space tile to be defined in a way that the corresponding data tiles along each attribute are at least 64KB (a typical size for the L1 cache).

Recall also that, in addition to the space tile, the tile and cell order determine the global cell order, i.e., the way the data tile values are laid out in physical storage. In general, both the tile and cell order should follow the layout in which you are expecting the read results. This maximizes your chances that the relevant data are concentrated in smaller byte regions in the physical files, which TileDB can exploit to fetch the data faster.

Sparse Arrays

In sparse arrays (and, more precisely, sparse fragments), there is no one-to-one correspondence between space tiles and data tiles. In contrast, the space tiles help determine the global cell order and, therefore, can be used to preserve the spatial locality of the values in physical storage. By determining the global cell order, space tiles effectively determine the shape of the MBRs of the data tiles. Recall that TileDB fetches only the data tiles whose MBR intersects with the subarray query. Therefore, the tile extents must once again be set in a way that the resulting space tiles have a similar shape to the subarray query.

Recall also that the size of the data tile in sparse arrays is not determined by the tile extents, but rather by the tile capacity. Similar to the case of dense arrays, it must be set appropriately to balance fetching irrelevant data and maximizing compression ratio and parallelism.

Finally, for similar reasons to the dense case, it is recommended that the tile and cell order be chosen according to the typical result layout.

Last updated