Ingest Samples

Indexed files are required for ingestion. If your VCF/BCF files have not been indexed, you can use bcftoolsto do so:

for f in data/vcfs/*.vcf.gz; do bcftools index -c $f; done

You can ingest samples into an already created dataset as follows:

import tiledbvcf

uri = "my_vcf_dataset" 
ds = tiledbvcf.Dataset(uri, mode = "w")
ds.ingest_samples(sample_uris = ["sample_1", "samples_2"])

Incremental updates work in the same manner as the ingestion above, nothing special is needed. In addition, the ingestion is thread- and process-safe and, therefore, can be performed in parallel.

Last updated