Links

CLI

Available Commands

  1. 1.
    createa new dataset
  2. 2.
    storespecified VCF files in a dataset
  3. 3.
    exportdata from a dataset
  4. 4.
    listall sample names present in a dataset
  5. 5.
    statprints high-level statistics about a dataset
  6. 6.
    utils utility functions for dataset

Create

Create an empty TileDB-VCF dataset.

Usage

tiledbvcf create -u <uri> [-a <fields>] [-c <N>] [-g <N>] [--tiledb-config <params>] [--checksum <checksum>] [-n]

Options

Flag
Description
-u,--uri
TileDB dataset URI.
-a,--attributes
Info or format field names (comma-delimited) to store as separate attributes. Names should be fmt_X or info_X for a field name X (case sensitive).
-c,--tile-capacity
Tile capacity to use for the array schema [default 10000].
-g,--anchor-gap
Anchor gap size to use [default 1000].
--tiledb-config
CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB configuration parameter settings.
--checksum
Checksum to use for dataset validation on read and writes [default "sha256"].
-n,--no-duplicates
Do not allow records with duplicate end positions to be written to the array.

Store

Ingests registered samples into a TileDB-VCF dataset.

Usage

tiledbvcf store -u <uri> [-t <N>] [-p <MB>] [-d <path>] [-s <MB>] [-n <N>] [-k <N>] [-b <MB>] [-v] [--remove-sample-file] [--tiledb-config <params>] ([-f <path>] | <paths>...) [-e <N>] [--stats] [--stats-vcf-header-array]

Options

Flag
Description
-u,--uri
TileDB dataset URI.
-t,--threads
Number of threads [default 16].
-p,--s3-part-size
[S3 only] Part size to use for writes (MB) [default 50].
-d,--scratch-dir
Directory used for local storage of downloaded remote samples.
-s,--scratch-mb
Amount of local storage (in MB) allocated for downloading remote VCF files prior to ingestion [default 0]. The you must configure enough scratch space to hold at least 20 samples. In general, you need 2 × the sample dimension's sample_bactch_size (which by default is 10). You can read more about the data model here.
-n, --max-record-buff
Max number of BCF records to buffer per file [default 50000].
-k, --thread-task-size
Max length (# columns) of an ingestion task. Affects load balancing of ingestion work across threads, and total memory consumption [default 5000000].
-b, --mem-budget-mb
The total memory budget (MB) used when submitting TileDB queries [default 1024].
-v, --verbose
Enable verbose output.
--tiledb-config
CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB configuration parameter settings.
-f, --samples-file
File with 1 VCF path to be ingested per line. The format can also include an explicit index path on each line, in the format <vcf-uri><TAB><index-uri>.
--remove-sample-file
If specified, the samples file (-f argument) is deleted after successful ingestion
-e, --sample-batch-size
Number of samples per batch for ingestion [default 10].
--stats
Enable TileDB stats
--stats-vcf-header-array
Enable TileDB stats for vcf header array usage.
--resume
Resume incomplete ingestion of sample batch.

Export

Exports data from a TileDB-VCF dataset.

Usage

Bash
tiledbvcf export -u <uri> [-O <format>] [-o <path>] [-t <fields>] ([-r <regions>] | [-R <path>]) [--sorted] [-n <N>] [-d <path>] [--sample-partition <I:N>] [--region-partition <I:N>] [--upload-dir <path>] [--tiledb-config <params>] [-v] [-c] [-b <MB>] ([-f <path>] | [-s <samples>]) [--stats] [--stats-vcf-header-array]

Options

Flag
Description
-u,--uri
TileDB dataset URI.
-O,--output-format
Export format. Options are: b: bcf (compressed); u: bcf; z: vcf.gz; v: vcf; t: TSV. [default b] .
-o,--output-path
[TSV export only] The name of the output TSV file.
-t,--tsv-fields
[TSV export only] An ordered CSV list of fields to export in the TSV. A field name can be one of SAMPLE, ID, REF, ALT, QUAL, POS, CHR, FILTER. Additionally, INFO fields can be specified by I and FMT fields with S. To export the intersecting query region for each row in the output, use the field names Q:POS, Q:END, or Q:LINE.
-r,--regions
CSV list of regions to export in the format chr:min-max.
-R,--regions-file
File containing regions (BED format).
--sorted
Do not sort regions or regions file if they are pre-sorted.
-n,--limit
Only export the first N intersecting records.
-d,--output-dir
Directory used for local output of exported samples.
--sample-partition
Partitions the list of samples to be exported and causes this export to export only a specific partition of them. Specify in the format I:N where I is the partition index and N is the total number of partitions. Useful for batch exports.
--region-partition
Partitions the list of regions to be exported and causes this export to export only a specific partition of them. Specify in the format I:N where I is the partition index and N is the total number of partitions. Useful for batch exports.
--upload-dir
If set, all output file(s) from the export process will be copied to the given directory (or S3 prefix) upon completion.
--tiledb-config
CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB configuration parameter settings.
-v,--verbose
Enable verbose output.
-c,--count-only
Don't write output files, only print the count of the resulting number of intersecting records.
-b,--mem-budget-mb
The memory budget (MB) used when submitting TileDB queries [default 2048].
--mem-budget-buffer-percentage
The percentage of the memory budget to use for TileDB query buffers [default 25].
--mem-budget-tile-cache-percentage
The percentage of the memory budget to use for TileDB tile cache [default 10].
-f,--samples-file
File with 1 VCF path to be registered per line. The format can also include an explicit index path on each line, in the format in the format <vcf-uri><TAB><index-uri>.
--stats
Enable TileDB stats
--stats-vcf-header-array
Enable TileDB stats for vcf header array usage.
--disable-check-samples
Disable validating that samples passed exist in dataset before executing query and error if any sample requested is not in the dataset.
--disable-progress-estimation
Disable progress estimation in verbose mode. Progress estimation can sometimes cause a performance impact.

List

Lists all sample names present in a TileDB-VCF dataset.

Usage

tiledbvcf list -u <uri> [--tiledb-config ]

Options

Flag
Description
-u,--uri
TileDB dataset URI.
--tiledb-config
CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB configuration parameter settings.

Stat

Prints high-level statistics about a TileDB-VCF dataset.

Usage

tiledbvcf stat -u <uri> [--tiledb-config ]

Options

Flag
Description
-u,--uri
TileDB dataset URI.
--tiledb-config
CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB configuration parameter settings.

Utils

Utils for working with a TileDB-VCF dataset, such for consolidating and vacuuming fragments or fragment metadata.

Usage

tiledbvcf utils (consolidate|vacuum) (fragment_meta|fragments) -u <uri> [--tiledb-config ]

Options

Flag
Description
-u,--uri
TileDB dataset URI.
--tiledb-config
CSV string of the format 'param1=val1,param2=val2...' specifying optional TileDB configuration parameter settings.
Last modified 1yr ago