Using Performance Statistics

A lot of performance optimization for TileDB programs involves minimizing wasted work. TileDB comes with an internal statistics reporting system that can help identify potential areas of performance improvement for your TileDB programs, including reducing wasted work.

The TileDB statistics can be enabled and disabled at runtime, and a report can be dumped at any point. A typical situation is to enable the statistics immediately before submitting a query, submit the query, and then immediately dump the report. This can be done like so:

C
C++
Python
R
Java
Go
C
tiledb_stats_enable();
// ... create some query here
tiledb_query_submit(ctx, query);
// Dump the statistics
tiledb_stats_dump(FILE* out);
tiledb_stats_disable();
// You can also reset the stats as follows
tiledb_stats_reset();
C++
tiledb::Stats::enable();
// ... create some query here
// Submit the query
query.submit();
// Dump the statistics
tiledb::Stats::dump(stdout);
tiledb::Stats::disable();
// You can also reset the stats as follows
tiledb::Stats::reset();
Python
tiledb.stats_enable()
# Do some work
data = A[:]
# Dump the statistics
tiledb.stats_dump()
tiledb.stats_disable()
# You can also reset the stats as follows
tiledb.stats_reset()
R
# Start collecting statistics
tiledb_stats_enable()
# ... create some query here
A[1:4]
# Stop collecting statistics
tiledb_stats_disable()
# Show the statistics on the console
tiledb_stats_print()
# Save the statistics to a file
tiledb_stats_dump(my_file_name)
# You can also reset the stats as follows
tiledb_stats_reset()
Java
Stats.enable()
// ... create some query here
query.submit()
// Dump the statistics
Stats.dump()
Stats.disable()
// You can also reset the stats as follows
Stats.reset()
Go
tiledb.StatsEnable()
// ... create some query here
// Submit the query
query.Submit()
// Dump the statistics
tiledb.StatsDumpSTDOUT()
tiledb.StatsDisable()
// You can also reset the stats as follows
tiledb.StatsReset()

With the dump call, a report containing the gathered statistics will be printed. The report prints values of many individual counters. Typically the summary contains the necessary information to make high-level performance tuning decisions. An example summary is shown below:

==== READ ====
- Number of read queries: 1
- Number of attempts until results are found: 1
- Number of attributes read: 17
* Number of fixed-sized attributes read: 17
- Number of logical tiles overlapping the query: 65
- Number of physical tiles read: 1105
* Number of physical fixed-sized tiles read: 1105
- Number of cells read: 6500000
- Number of result cells: 6405008
- Percentage of useful cells read: 98.5386%
...
- Read time: 1.37549 secs
* Time to compute next partition: 0.000364561 secs
* Time to compute tile coordinates: 0.000513924 secs
* Time to compute result coordinates: 2.613e-06 secs
> Time to compute sparse result tiles: 1.676e-06 secs
* Time to compute dense result cell slabs: 0.00252568 secs
* Time to copy result attribute values: 1.37141 secs
> Time to read attribute tiles: 0.220855 secs
> Time to unfilter attribute tiles: 0.369202 secs
> Time to copy fixed-sized attribute values: 0.693668 secs
- Total read query time (array open + init state + read): 1.37555 secs

The TileDB library is built by default with statistics enabled. You can disable statistics gathering with the -DTILEDB_STATS=OFF CMake variable.