The following is a high-level overview of the read operations performed during a TileDB array query:
Array open
List all array schema files in the __schemas
prefix.
Download all array schema files from the __schemas
prefix.
List all fragments in the __fragments
prefix.
Download __fragment_metadata
files for each fragment within the timestamp bounds of the active array.
Cell selection:
(Sparse arrays only): download all dimension (d*.tdb
) tiles within the ranges requested by the query, based on the minimum bounding rectangles for each fragment.
(Dense and sparse arrays):
If the query includes a query condition, download all attribute tiles necessary to evaluate the query condition.
If the array has been consolidated with timestamps, download all timestamp tiles necessary for timestamp filtering.
Dense read:
Download all tiles in each requested attribute (a*.tdb
files) intersecting the query ranges. Tiles will be filtered based on the query condition selector, if provided.
Sparse read:
Download all tiles in each requested attribute (a*.tdb
files) with cell coordinates matching the query ranges. Tiles will be filtered based on the query condition selector, if provided.
The subarray shape must follow the space tiling of the array as much as possible. See Choosing Tiling and Cell Layout since read performance is highly dependent on the array tiling.
The most efficient layout to read in is global order, as TileDB will avoid sorting and re-organizing the result cells internally as much as possible. Row-major and col-major layouts require sorting and thus more work for TileDB. In general, the read layout must coincide as much as possible with the global order.
The larger the read, the better for performance. This is because TileDB can take advantage of parallelism and perform better parallel tile filtering and IO.
Section Read Parallelism describes in detail how TileDB internally parallelizes reads. You should consider fine-tuning the parameters explained therein.