Using Performance Statistics

A lot of performance optimization for TileDB programs involves minimizing wasted work. TileDB comes with an internal statistics reporting system that can help identify potential areas of performance improvement for your TileDB programs, including reducing wasted work.

The TileDB statistics can be enabled and disabled at runtime, and a report can be dumped at any point. A typical situation is to enable the statistics immediately before submitting a query, submit the query, and then immediately dump the report. This can be done like so:

C
C++
Python
R
Java
Go
tiledb_stats_enable();
// ... create some query here
tiledb_query_submit(ctx, query);
// Dump the statistics
tiledb_stats_dump(FILE* out);
tiledb_stats_disable();
// You can also reset the stats as follows
tiledb_stats_reset();
tiledb::Stats::enable();
// ... create some query here
// Submit the query
query.submit();
// Dump the statistics
tiledb::Stats::dump(stdout);
tiledb::Stats::disable();
// You can also reset the stats as follows
tiledb::Stats::reset();
tiledb.stats_enable()
# Do some work
data = A[:]
# Dump the statistics
tiledb.stats_dump()
tiledb.stats_disable()
# You can also reset the stats as follows
tiledb.stats_reset()
tiledb_stats_enable()
# ... create some query here
A[1:4]
# Dump the statistics
tiledb_stats_print()
tiledb_stats_disable()
# You can also reset the stats as follows
tiledb_stats_reset()
Stats.enable()
// ... create some query here
query.submit()
// Dump the statistics
Stats.dump()
Stats.disable()
// You can also reset the stats as follows
Stats.reset()
tiledb.StatsEnable()
// ... create some query here
// Submit the query
query.Submit()
// Dump the statistics
tiledb.StatsDumpSTDOUT()
tiledb.StatsDisable()
// You can also reset the stats as follows
tiledb.StatsReset()

With the dump call, a report containing the gathered statistics will be printed. The report prints values of many individual counters, followed by a high-level summary printed at the end of the report. Typically the summary contains the necessary information to make high-level performance tuning decisions. An example summary is shown below:

Summary:
--------
Hardware concurrency: 4
Reads:
Read query submits: 1
Tile cache hit ratio: 0 / 1 (0.0%)
Fixed-length tile data copy-to-read ratio: 4 / 1000 bytes (0.0%)
Var-length tile data copy-to-read ratio: 0 / 0 bytes
Total tile data copy-to-read ratio: 4 / 1000 bytes (0.0%)
Read compression ratio: 1245 / 1274 bytes (1.0x)
Writes:
Write query submits: 0
Tiles written: 0
Write compression ratio: 0 / 0 bytes

Each item is explained separately below.

Hardware concurrency

The amount of available hardware-level concurrency (cores plus hardware threads).

Read query submits

The number of times a read query submit call was made.

Tile cache hit ratio

Ratio expressing utilization of the tile cache. The denominator indicates how many tiles in total were fetched from disk to satisfy a read query. After a tile is fetched once, it is placed in the internal tile cache; the numerator indicates how many tiles were requested in order to satisfy a read query that were hits in the tile cache. In the above example summary, a single tile was fetched (which missed the cache because it was the first access to that tile). If the same tile was accessed again for a subsequent query, it could hit in the cache, increasing the ratio 1/2. Higher ratios are better.

Fixed-length tile data copy-to-read ratio

Ratio expressing a measurement of wasted work for reads of fixed-length attributes. The denominator is the total number of (uncompressed) bytes of fixed-length attribute data fetched from disk. The numerator is the number of those bytes that were actually copied into the query buffers to satisfy a read query. In the above example, 1000 bytes of fixed-length data were read from disk and only 4 of those bytes were used to satisfy a read query, indicating a large amount of wasted work. Higher ratios are better.

Var-length tile data copy-to-read ratio

This ratio is the same as the previous, but for variable-length attribute data. This is tracked separately because variable-length attributes can often be read performance bottlenecks, as their size is by nature unpredictable. Higher ratios are better.

Total tile data copy-to-read ratio

The overall copy-to-read ratio, where the numerator is the sum of the fixed- and variable-length numerators, and the denominator is the sum of the fixed- and variable-length denominators.

Read compression ratio

Ratio expressing the effective compression ratio of data read from disk. The numerator is the total number of bytes returned from disk reads after filtering. The denominator is the total number of bytes read from disk, whether filtered or not. This is different than the tile copy-to-read ratios due to extra read operations for array and fragment metadata. For simplicity, this ratio currently counts all filters as “compressors”, so the ratio may not be exactly the compression ratio in the case that other filters besides compressors are involved.

Write query submits

The number of times a write query submit call was made.

Tiles written

The total number of tiles written, over all write queries.

Write compression ratio

Ratio expressing the effective compression ratio of data written to disk. The numerator is the total number of un-filtered bytes requested to be written to disk. The denominator is the total number of bytes written from disk, after filtering. Similarly to the read compression ratio, this value counts all filters as compressors.

The TileDB library is built by default with statistics enabled. You can disable statistics gathering with the -DTILEDB_STATS=OFF CMake variable.