TileDB-VCF is implemented as a high-performance library (
libtiledbvcf.so) written in C++. Using this library underneath, we built several interfaces to ingest and read VCF data from the TileDB-VCF arrays:
Command-line client (
C API (
Python API (
Spark API (
TileDB-VCF uses TileDB itself underneath to store and retrieve the VCF data in a sparse 2D TileDB array (details about the data model are described in a subsequent section). The usage of TileDB is largely hidden from users of TileDB-VCF, although many concepts carry over from TileDB to TileDB-VCF, such as the notion of columnar "attribute" buffers, variable-length data, and so on. Familiarity with TileDB itself is not necessary to use TileDB-VCF.
There are not yet any prepackaged versions of TileDB-VCF, so building from source is required. Currently only Linux and macOS systems are supported.
By default the high-level APIs (Python and Spark) will build TileDB-VCF itself automatically, bundling the resulting shared libraries into the final packages. However, the APIs can also be built separately.
TileDB-VCF has the following build dependencies. Please ensure these are installed on your system before building TileDB-VCF:
CMake >= 3.3
C++ compiler supporting C++11 (such as gcc 4.9 or newer)
If HTSlib is not installed on your system, TileDB-VCF will download and build a local copy automatically. However, in order for this to work the following dependencies of HTSlib must be installed beforehand:
brew install autoconf
sudo apt install autoconf automake zlib1g-dev libbz2-dev liblzma-dev