Tile Filters

TileDB allows the user to specify an ordered list of data transformations, such as compression, that can be applied to data tiles before they are written to disk. The user may define different filters for each attribute, the coordinates and the data tile offsets.

Available filters

TileDB supports a number of filters, and more will continue to be added in the future.

Compression filters

There are several filters performing generic compression, which are the following:

  • GZIP: Compresses with Gzip

  • ZSTD: Compresses with Zstandard

  • LZ4: Compresses with LZ4

  • RLE: Compresses with run-length encoding

  • BZIP2: Compresses with Bzip2

  • DOUBLE_DELTA: Compresses with double-delta encoding


This filter performs byte shuffling of data as a way to improve compression ratios. The byte shuffle implementation used by TileDB comes from the Blosc project.

The byte shuffling process rearranges the bytes of the input attribute cell values in a deterministic and reversible manner designed to result in long runs of similar bytes that can be compressed more effectively by a generic compressor than the original unshuffled elements. Typically this filter is not used on its own, but rather immediately followed by a compression filter in a filter list.

For example, consider three 32-bit unsigned integer values 1, 2, 3, which have the following little-endian representation when stored adjacent in memory:

0x01 0x00 0x00 0x00 0x02 0x00 0x00 0x00 0x03 0x00 0x00 0x00

The byte shuffle operation will rearrange the bytes of these integer elements in memory such that the resulting array of bytes will contain each element’s first byte, followed by each element’s second byte, etc. After shuffling the bytes would therefore be:

0x01 0x02 0x03 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00

Observe the longer run of zero-valued bytes, which will compress more efficiently.


This filter performs bit shuffling of data as a way to improve compression ratios. The bitshuffle implementation used in TileDB comes from https://github.com/kiyo-masui/bitshuffle.

Bitshuffling is conceptually very similar to byteshuffling, but operates on the bit granularity rather than the byte granularity. Shuffling at the bit level can increase compression ratios even further than the byteshuffle filter, at the cost of increased computation to perform the shuffle.

Typically this filter is not used on its own, but rather it is immediately followed by a compression filter in a filter list.

Positive-delta encoding

This filter performs positive-delta encoding. Positive-delta encoding is a form of delta encoding that only works when the delta value is positive. Positive-delta encoding can result in better compression ratios on the encoded data. Typically this filter is not used on its own, but rather immediately followed by a compression filter in a filter list.

For example, if the data being filtered was the sequence of integers 100, 104, 108, 112, ..., then the resulting positive-encoded data would be 0, 4, 4, 4, .... This encoding is advantageous in that producing long runs of repeated values can result in better compression ratios, if a compression filter is added after positive-delta encoding.

The filter operates on a “window” of values at a time, which can help in some cases to produce longer runs of repeated delta values.

Positive-delta encoding is particularly useful for the offsets of variable-length attribute data, which by definition will always have positive deltas. The above example of the form 100, 104, 108, 112 can easily arise in the offsets, if for example you have a variable-length attribute of 4-byte values with mostly single values per cell instead of a variable number.

Bit width reduction

This filter performs bit-width reduction, which is a form of compression.

Bit-width reduction examines a window of attribute values, and determines if all of the values in the window can be represented by a datatype of smaller byte size. If so, the values in the window are rewritten as values of the smaller datatype, potentially saving several bytes per cell.

For example, consider an attribute with datatype uint64. Initially, each cell of data for that attribute requires 8 bytes of storage. However, if you know that the actual value of the attribute is often 255 or less, those cells can be stored using just a single byte in the form of a uint8, saving 7 bytes of storage per cell. The bit-width reduction filter performs this analysis and compression automatically over windows of attribute data.

Additionally, each cell value in a window is treated relative to the minimum value in that window. For example, if the window size was 3 cells, which had the values 300, 350, 400, the bit-width reduction filter would first determine that the minimum value in the window was 300, and the relative cell values were 0, 50, 100. These relative values are now less than 255 and can be represented by a uint8 value.

If possible, it can be a good idea to apply positive-delta encoding before bit-width reduction, as the positive-delta encoding may further increase the opportunities to store windows of data with a narrower datatype.

Bit-width reduction only works on integral datatypes.

Tile chunks

Before filtering each data tile of an attribute, TileDB internally divides the tile into disjoint chunks. These chunks are then filtered individually.

Chunking tiles before filtering allows for better cache behavior in terms of temporal locality, as the chunk size can be chosen to fit within the L1 cache of your processor cores. This helps especially with multi-stage filter lists, as the output from the previous filter is likely to still be in L1 when used as input for the current filter.

Chunking tiles also increases the amount of parallel compute that TileDB can make effective use of. By breaking a tile into individual chunks, each chunk can then be filtered in parallel, which can result in excellent CPU utilization when combined with the cache-friendly size of the chunks.

The default chunk size used by TileDB is 64KB, which is the size of many common processor L1 caches.