Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
After creating the array schema, the array can be created as follows:
This will materialize the array directory and related files (e.g., the array schema) to persistent storage. Depending on the array URI, this can be on your local disk, on a distributed filesystem such as Lustre or HDFS, on AWS S3, etc.
Creating an attribute requires specifying a (optional; attribute names starting with __
are reserved) name and a (required) datatype. In the example below we create an int32
attribute called attr
.
An attribute can also store a fixed number of values (of the same datatype) in a single cell, or a variable number of values. You can specify this as follows:
An attribute may also be nullable. This allows designating each cell as valid or null. Applicable to both fixed-sized and var-sized attributes.
Note: nullable Python attributes should be used with the from_pandas
API or Pandas series with a Pandas extension dtype (e.g. StringDtype
).
Supported Attribute Datatypes:
Crossed data types are deprecated.
Attributes accept filters such as compressors. This is described in detail here.
There are situations where you may read "empty spaces" from TileDB arrays. For those empty spaces, a read query will return some default fill values for the selected attributes. You can set your own fill values for these cases as follows:
A call to setting the number of cells for an attribute (see above) sets the fill value of the attribute to its default. Therefore, make sure you set the fill values after deciding on the number of values this attribute will hold in each cell.
For fixed-sized attributes, the input fill value size should be equal to the cell size.
Array attributes and dimensions accept compressors and other filters.
The following example shows how to create a filter list with a GZIP compressor and compression level 10.
Supported compressors:
The compressors are members of the FilterType
enum and the options of the FilterOption
enum.
TileDB supports some more filters:
Supported filters (beyond compressors):
The default tile chunk size used by TileDB is 64KB, which is the size of many common processor L1 caches. You can control the chunk size by changing the option on a filter list:
This is done as follows:
This is done as follows:
If you wish all the dimensions to have the same filter list, you can set it once as follows:
If you do not specify a filter for a dimension separately, the dimension will inherit the filters set to all dimensions collectively as shown above.
You can set filter lists for the offsets of variable-sized attributes or dimensions as follows:
The offset filters are applied to all variable-sized attributes and dimensions.
In order to create an encrypted array, you simply need to pass your secret key upon the array creation:
After creating some dimensions, you can create the array domain as follows:
The order of the dimensions as added to the domain is important later when slicing subarrays. Remember to give priority to more selective dimensions, in order to maximize the pruning power during slicing.
When creating the domain, the dimension names must be unique.
When creating the array schema, the dimension and attribute names must be unique.
You can set the data tile capacity (applicable to sparse fragments), as follows:
Sparse arrays may allow multiple cells with the same coordinates to exist (dense arrays do not allow duplicates). By default, duplicates are not allowed. You can specify that a sparse array allows duplicates as follows:
When duplicates are allowed, checking for duplicates and deduplication are disabled.
You can check if the array schema is set properly as follows:
Creating a dimension requires specifying a name, the dimension datatype, the dimension domain and the space tile extent. Below you can see an example of creating an int32
dimension called dim
with domain [1,4]
, and tile extent 2
.
Supported Dimension Datatypes:
The following values are members of the TileDB.CSharp.DataType
enum:
Datatype | Description |
---|---|
Datatype | Description | Internal TILEDB Datatype Mapping |
---|---|---|
Datatype | Description |
---|---|
Compressor | Description | Option (type) |
---|---|---|
Compressor | Description | Option (type) |
---|---|---|
Compressor | Description | Option (type) |
---|---|---|
Compressor | Description | Option (type) |
---|---|---|
Compressor | Description | Option (type) |
---|---|---|
Filter | Description | Option |
---|---|---|
Filter | Description | Option |
---|---|---|
Filter | Description | Option |
---|---|---|
Filter | Description | Option |
---|---|---|
Filter | Description | Option |
---|---|---|
After creating the , and , you can create the array schema as follows:
You can set the tile and cell order as follows. The tile order may be set to row-major or column-major; the cell order may be set to row-major, column-major, or .
Datatype | Description | Array Type |
---|
Datatype | Description | Array Type | Internal TILEDB Datatype Mapping |
---|
Datatype | Description | Array Type |
---|
Datatype | Description | Array Type |
---|
Dimensions accept filters such as compressors. This is described in detail .
TILEDB_BLOB
Opaque bytes. Note: the TILEDB_BLOB
datatype does not support query conditions.
TILEDB_INT8
8-bit integer
TILEDB_UINT8
8-bit unsigned integer
TILEDB_INT16
16-bit integer
TILEDB_UINT16
16-bit unsigned integer
TILEDB_INT32
32-bit integer
TILEDB_UINT32
32-bit unsigned integer
TILEDB_INT64
64-bit integer
TILEDB_UINT64
64-bit unsigned integer
TILEDB_FLOAT32
32-bit floating point
TILEDB_FLOAT64
64-bit floating point
TILEDB_DATETIME_YEAR
Years
TILEDB_DATETIME_MONTH
Months
TILEDB_DATETIME_WEEK
Weeks
TILEDB_DATETIME_DAY
Days
TILEDB_DATETIME_HR
Hours
TILEDB_DATETIME_MIN
Minutes
TILEDB_DATETIME_SEC
Seconds
TILEDB_DATETIME_MS
Milliseconds
TILEDB_DATETIME_US
Microseconds
TILEDB_DATETIME_NS
Nanoseconds
TILEDB_DATETIME_PS
Picoseconds
TILEDB_DATETIME_FS
Femtoseconds
TILEDB_DATETIME_AS
Attoseconds
TILEDB_CHAR
Single character
TILEDB_STRING_ASCII
ASCII string
TILEDB_STRING_UTF8
UTF-8 string
TILEDB_STRING_UTF16
UTF-16 string
TILEDB_STRING_UTF32
UTF-32 string
TILEDB_STRING_UCS2
UCS-2 string
TILEDB_STRING_UCS4
UCS-4 string
TILEDB_ANY
(datatype, bytelength, value)
np.int8
8-bit integer
TILEDB_INT8
np.uint8
8-bit unsigned integer
TILEDB_UINT8
np.int16
16-bit integer
TILEDB_INT16
np.uint16
16-bit unsigned integer
TILEDB_UINT16
np.int32
32-bit integer
TILEDB_INT32
np.uint32
32-bit unsigned integer
TILEDB_UINT32
np.int64
64-bit integer
TILEDB_INT64
np.uint64
64-bit unsigned integer
TILEDB_UINT64
np.float32
32-bit floating point
TILEDB_FLOAT32
np.float64
64-bit floating point
TILEDB_FLOAT64
"ascii"
ASCII
TILEDB_STRING_ASCII
np.dtype('S1')
Character
TILEDB_CHAR
np.dtype('U1')
Unicode UTF-8
TILEDB_STRING_UTF8
"datetime64[Y]"
Years
TILEDB_DATETIME_YEAR
"datetime64[M]"
Months
TILEDB_DATETIME_MONTH
"datetime64[W]"
Weeks
TILEDB_DATETIME_WEEK
"datetime64[D]"
Days
TILEDB_DATETIME_DAY
"datetime64[h]"
Hours
TILEDB_DATETIME_HR
"datetime64[m]"
Minutes
TILEDB_DATETIME_MIN
"datetime64[s]"
Seconds
TILEDB_DATETIME_SEC
"datetime64[ms]"
Milliseconds
TILEDB_DATETIME_MS
"datetime64[us]"
Microseconds
TILEDB_DATETIME_US
"datetime64[ns]"
Nanoseconds
TILEDB_DATETIME_NS
"datetime64[ps]"
Picoseconds
TILEDB_DATETIME_PS
"datetime64[fs]"
Femtoseconds
TILEDB_DATETIME_FS
"datetime64[as]"
Attoseconds
TILEDB_DATETIME_AS
"CHAR"
Single character
"INT8"
8-bit integer
"UINT8"
8-bit unsigned integer
"INT16"
16-bit integer
"UINT16"
16-bit unsigned integer
"INT32"
32-bit integer
"UINT32"
32-bit unsigned integer
"INT64"
64-bit integer
"UINT64"
64-bit unsigned integer
"FLOAT32"
32-bit floating point
"FLOAT64"
64-bit floating point
"DATETIME_YEAR"
Years
"DATETIME_MONTH"
Months
"DATETIME_WEEK"
Weeks
"DATETIME_DAY"
Days
"DATETIME_HR"
Hours
"DATETIME_MIN"
Minutes
"DATETIME_SEC"
Seconds
"DATETIME_MS"
Milliseconds
"DATETIME_US"
Microseconds
"DATETIME_NS"
Nanoseconds
"DATETIME_PS"
Picoseconds
"DATETIME_FS"
Femtoseconds
"DATETIME_AS"
Attoseconds
TILEDB_FILTER_GZIP
GZIP
TILEDB_COMPRESSION_LEVEL (int)
TILEDB_FILTER_ZSTD
Zstandard
TILEDB_COMPRESSION_LEVEL (int)
TILEDB_FILTER_LZ4
LZ4
TILEDB_COMPRESSION_LEVEL (int)
TILEDB_FILTER_RLE
RLE
TILEDB_COMPRESSION_LEVEL (int)
TILEDB_FILTER_BZIP2
BZIP2
TILEDB_COMPRESSION_LEVEL (int)
TILEDB_FILTER_DOUBLE_DELTA
Double Delta
None
tiledb.GzipFilter()
GZIP
level (np.int32)
tiledb.ZstdFilter()
Zstandard
level (np.int32)
tiledb.LZ4Filter()
LZ4
level (np.int32)
tiledb.RleFilter()
Run-length Encoding
level (np.int32)
tiledb.Bzip2Filter()
BZIP2
level (np.int32)
tiledb.DoubleDeltaFilter()
Double Delta
None
"GZIP"
GZIP
"COMPRESSION_LEVEL" (int32)
"ZSTD"
Zstandard
"COMPRESSION_LEVEL" (int32)
"LZ4"
LZ4
"COMPRESSION_LEVEL" (int32)
"RLE"
Run-length Encoding
"COMPRESSION_LEVEL" (int32)
"BZIP2"
BZIP2
"COMPRESSION_LEVEL" (int32)
"DOUBLE_DELTA"
Double Delta
None
GzipFilter
GZIP
Level (int
)
ZstdFilter
Zstandard
Level (int
)
LZ4Filter
LZ4
Level (int
)
RleFilter
Run-length encoding
Level (int
)
Bzip2Filter
BZIP2
Level (int
)
DoubleDeltaFilter
Double delta
None
Gzip
GZIP
CompressionLevel (int)
Zstandard
Zstandard
CompressionLevel (int)
Lz4
LZ4
CompressionLevel (int)
RunLengthEncoding
RLE
CompressionLevel (int)
Bzip2
BZIP2
CompressionLevel (int)
DoubleDelta
Double Delta
None
TILEDB_BIT_WIDTH_REDUCTION
Bit width reduction
TILEDB_BIT_WIDTH_MAX_WINDOW (uint32)
TILEDB_BITSHUFFLE
Bit shuffle
None
TILEDB_BYTESHUFFLE
Byte shuffle
None
TILEDB_POSITIVE_DELTA
Positive Delta
TILEDB_POSITIVE_DELTA_MAX_WINDOW (uint32)
tiledb.BitWidthReductionFilter()
Bit width reduction
window (np.uint32)
tiledb.BitShuffleFilter()
Bit shuffle
None
tiledb.ByteShuffleFilter()
Byte shuffle
None
tiledb.PositiveDeltaFilter()
Positive delta
window (np.uint32)
BIT_WIDTH_REDUCTION
Bit width reduction
BIT_WIDTH_MAX_WINDOW (int)
BITSHUFFLE
Bit shuffle
None
BYTESHUFFLE
Byte shuffle
None
POSITIVE_DELTA
Positive Delta
POSITIVE_DELTA_MAX_WINDOW (int)
BitWidthReductionFilter
Bit width reduction
window (int)
BitShuffleFilter
Bit shuffle
None
ByteShuffleFilter
Byte shuffle
None
PositiveDeltaFilter
Positive Delta
window (int)
BitWidthReduction
Bit width reduction
BitWidthMaxWindow (uint32)
BitShuffle
Bit shuffle
None
ByteShuffle
Byte shuffle
None
PositiveDelta
Positive Delta
PositiveDeltaMaxWindow (uint32)
| Variable length string | Sparse |
| 8-bit integer | Dense & Sparse |
| 8-bit unsigned integer | Dense & Sparse |
| 16-bit integer | Dense & Sparse |
| 16-bit unsigned integer | Dense & Sparse |
| 32-bit integer | Dense & Sparse |
| 32-bit unsigned integer | Dense & Sparse |
| 64-bit integer | Dense & Sparse |
| 64-bit unsigned integer | Dense & Sparse |
| 32-bit floating point | Sparse |
| 64-bit floating point | Sparse |
| Years | Dense & Sparse |
| Months | Dense & Sparse |
| Weeks | Dense & Sparse |
| Days | Dense & Sparse |
| Hours | Dense & Sparse |
| Minutes | Dense & Sparse |
| Seconds | Dense & Sparse |
| Milliseconds | Dense & Sparse |
| Microseconds | Dense & Sparse |
| Nanoseconds | Dense & Sparse |
| Picoseconds | Dense & Sparse |
| Femtoseconds | Dense & Sparse |
| Attoseconds | Dense & Sparse |
| Variable length string | Sparse |
|
| 8-bit integer | Dense & Sparse |
|
| 8-bit unsigned integer | Dense & Sparse |
|
| 16-bit integer | Dense & Sparse |
|
| 16-bit unsigned integer | Dense & Sparse |
|
| 32-bit integer | Dense & Sparse |
|
| 32-bit unsigned integer | Dense & Sparse |
|
| 64-bit integer | Dense & Sparse |
|
| 64-bit unsigned integer | Dense & Sparse |
|
| 32-bit floating point | Sparse |
|
| 64-bit floating point | Sparse |
|
| Years | Dense & Sparse |
|
| Months | Dense & Sparse |
|
| Weeks | Dense & Sparse |
|
| Days | Dense & Sparse |
|
| Hours | Dense & Sparse |
|
| Minutes | Dense & Sparse |
|
| Seconds | Dense & Sparse |
|
| Milliseconds | Dense & Sparse |
|
| Microseconds | Dense & Sparse |
|
| Nanoseconds | Dense & Sparse |
|
| Picoseconds | Dense & Sparse |
|
| Femtoseconds | Dense & Sparse |
|
| Attoseconds | Dense & Sparse |
|
| Variable length string | Sparse |
| 8-bit integer | Dense & Sparse |
| 8-bit unsigned integer | Dense & Sparse |
| 16-bit integer | Dense & Sparse |
| 16-bit unsigned integer | Dense & Sparse |
| 32-bit integer | Dense & Sparse |
| 32-bit unsigned integer | Dense & Sparse |
| 64-bit integer | Dense & Sparse |
| 64-bit unsigned integer | Dense & Sparse |
| 32-bit floating point | Sparse |
| 64-bit floating point | Sparse |
| Years | Dense & Sparse |
| Months | Dense & Sparse |
| Weeks | Dense & Sparse |
| Days | Dense & Sparse |
| Hours | Dense & Sparse |
| Minutes | Dense & Sparse |
| Seconds | Dense & Sparse |
| Milliseconds | Dense & Sparse |
| Microseconds | Dense & Sparse |
| Nanoseconds | Dense & Sparse |
| Picoseconds | Dense & Sparse |
| Femtoseconds | Dense & Sparse |
| Attoseconds | Dense & Sparse |
| Variable length string | Sparse |
| 8-bit integer | Dense & Sparse |
| 8-bit unsigned integer | Dense & Sparse |
| 16-bit integer | Dense & Sparse |
| 16-bit unsigned integer | Dense & Sparse |
| 32-bit integer | Dense & Sparse |
| 32-bit unsigned integer | Dense & Sparse |
| 64-bit integer | Dense & Sparse |
| 64-bit unsigned integer | Dense & Sparse |
| 32-bit floating point | Sparse |
| 64-bit floating point | Sparse |
| Years | Dense & Sparse |
| Months | Dense & Sparse |
| Weeks | Dense & Sparse |
| Days | Dense & Sparse |
| Hours | Dense & Sparse |
| Minutes | Dense & Sparse |
| Seconds | Dense & Sparse |
| Milliseconds | Dense & Sparse |
| Microseconds | Dense & Sparse |
| Nanoseconds | Dense & Sparse |
| Picoseconds | Dense & Sparse |
| Femtoseconds | Dense & Sparse |
| Attoseconds | Dense & Sparse |