Installation

TileDB-VCF is implemented as a high-performance library (libtiledbvcf.so) written in C++. Using this library underneath, we built several interfaces to ingest and read VCF data from the TileDB-VCF arrays:

  • Command-line client (./tiledbvcf)

  • C API (#include <tiledbvcf/tiledbvcf.h>)

  • Python API (import tiledbvcf)

  • Spark API (spark.read.format("io.tiledb.vcf"))

TileDB-VCF uses TileDB itself underneath to store and retrieve the VCF data in a sparse 2D TileDB array (details about the data model are described in a subsequent section). The usage of TileDB is largely hidden from users of TileDB-VCF, although many concepts carry over from TileDB to TileDB-VCF, such as the notion of columnar "attribute" buffers, variable-length data, and so on. Familiarity with TileDB itself is not necessary to use TileDB-VCF.

There are not yet any prepackaged versions of TileDB-VCF, so building from source is required. Currently only Linux and macOS systems are supported.

By default the high-level APIs (Python and Spark) will build TileDB-VCF itself automatically, bundling the resulting shared libraries into the final packages. However, the APIs can also be built separately.

Dependencies

TileDB-VCF has the following build dependencies. Please ensure these are installed on your system before building TileDB-VCF:

  • CMake >= 3.3

  • C++ compiler supporting C++11 (such as gcc 4.9 or newer)

  • git

  • HTSlib 1.8

If HTSlib is not installed on your system, TileDB-VCF will download and build a local copy automatically. However, in order for this to work the following dependencies of HTSlib must be installed beforehand:

macOS
Ubuntu/Debian
brew install autoconf
sudo apt install autoconf automake zlib1g-dev libbz2-dev liblzma-dev