Create a Dataset

The first step before ingesting any VCF samples is to create a dataset. This effectively creates a TileDB group and the appropriate empty arrays in it.

import tiledbvcf

uri = "my_vcf_dataset" 
ds = tiledbvcf.Dataset(uri, mode = "w") # sets dataset to "Write" mode
ds.create_dataset()                     # creates the dataset and
                                        # keeps it in "Write" mode

If you wish to turn some of the INFO and FMT fields into separate materialized attributes, you can do so as follows (names should be fmt_X or info_X for a field name X - case sensitive).

import tiledbvcf

uri = "my_vcf_dataset" 
ds = tiledbvcf.Dataset(uri, mode = "w") 
ds.create_dataset(extra_attrs=["info_AA"])

Last updated