Tuning Reads

I/O Sequence Overview

The following is a high-level overview of the read operations performed during a TileDB array query:

  • Array open

    • List all array schema files in the __schemas prefix.

    • Download all array schema files from the __schemas prefix.

    • List all fragments in the __fragments prefix.

    • Download __fragment_metadata files for each fragment within the timestamp bounds of the active array.

  • Cell selection:

    • (Sparse arrays only): download all dimension (d*.tdb) tiles within the ranges requested by the query, based on the minimum bounding rectangles for each fragment.

    • (Dense and sparse arrays):

      • If the query includes a query condition, download all attribute tiles necessary to evaluate the query condition.

      • If the array has been consolidated with timestamps, download all timestamp tiles necessary for timestamp filtering.

  • Dense read:

    • Download all tiles in each requested attribute (a*.tdb files) intersecting the query ranges. Tiles will be filtered based on the query condition selector, if provided.

  • Sparse read:

    • Download all tiles in each requested attribute (a*.tdb files) with cell coordinates matching the query ranges. Tiles will be filtered based on the query condition selector, if provided.

Subarray Shape

The subarray shape must follow the space tiling of the array as much as possible. See Choosing Tiling and Cell Layout since read performance is highly dependent on the array tiling.

Read Layout

The most efficient layout to read in is global order, as TileDB will avoid sorting and re-organizing the result cells internally as much as possible. Row-major and col-major layouts require sorting and thus more work for TileDB. In general, the read layout must coincide as much as possible with the global order.

Read Size

The larger the read, the better for performance. This is because TileDB can take advantage of parallelism and perform better parallel tile filtering and IO.

Parallelism

Section Read Parallelism describes in detail how TileDB internally parallelizes reads. You should consider fine-tuning the parameters explained therein.

Last updated