TileDB supports fast and parallel aggregation of results. Currently, the results can only be aggregated over the whole returned dataset, which this page will call the default channel. To add aggregates to a query, the first thing to do is to get the default channel. For count
(nullary aggregate), no operations need to be created. For the other aggregates, an operation needs to be created on the desired column. That operation can then be applied to the default channel, whilst defining the output field name for the result (for count
, there is a constant operation that can be used to apply). Finally, buffers to receive the aggregate result can be specified using the regular buffer APIs on the query (see Basic Reading).
Note that ranges and query conditions can still be used to limit the rows to aggregate. Also note that TileDB allows getting the data and computing aggregates simultaneously. To do so, it is only required to specify buffers for the desired columns at the same time as the aggregated results. Here, the result of the aggregation will be available once the query is in a completed state (see Incomplete Queries).
Finally, here is a list of supported operations and information about the supported input field data type and the output datatype.
Aggregate operation | Input field type | Output type |
---|---|---|
Aggregate operation | Operation name |
---|---|
Count
N/A
UINT64
Sum
Numeric fields
Signed fields: INT64 Unsigned fields: UINT64 Floating point fields: FLOAT64
Min/Max
Numeric/string fields
Same as input type
Null count
Nullable fields.
UINT64
Mean
Numeric fields
FLOAT64
Count
"count"
Null count
"null_count"
Sum
"sum"
Min/Max
"min", "max"
Mean
"mean"