Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
After creating some dimensions, you can create the array domain as follows:
The order of the dimensions as added to the domain is important later when slicing subarrays. Remember to give priority to more selective dimensions, in order to maximize the pruning power during slicing.
When creating the domain, the dimension names must be unique.
Creating an attribute requires specifying a (optional; attribute names starting with __
are reserved) name and a (required) datatype. In the example below we create an int32
attribute called attr
.
An attribute can also store a fixed number of values (of the same datatype) in a single cell, or a variable number of values. You can specify this as follows:
An attribute may also be nullable. This allows designating each cell as valid or null. Applicable to both fixed-sized and var-sized attributes.
Note: nullable Python attributes should be used with the from_pandas
API or Pandas series with a Pandas extension dtype (e.g. StringDtype
).
Supported Attribute Datatypes:
Crossed data types are deprecated.
Attributes accept filters such as compressors. This is described in detail here.
There are situations where you may read "empty spaces" from TileDB arrays. For those empty spaces, a read query will return some default fill values for the selected attributes. You can set your own fill values for these cases as follows:
A call to setting the number of cells for an attribute (see above) sets the fill value of the attribute to its default. Therefore, make sure you set the fill values after deciding on the number of values this attribute will hold in each cell.
For fixed-sized attributes, the input fill value size should be equal to the cell size.
Creating a dimension requires specifying a name, the dimension datatype, the dimension domain and the space tile extent. Below you can see an example of creating an int32
dimension called dim
with domain [1,4]
, and tile extent 2
.
Supported Dimension Datatypes:
The following values are members of the TileDB.CSharp.DataType
enum:
Dimensions accept filters such as compressors. This is described in detail here.
After creating the array schema, the array can be created as follows:
This will materialize the array directory and related files (e.g., the array schema) to persistent storage. Depending on the array URI, this can be on your local disk, on a distributed filesystem such as Lustre or HDFS, on AWS S3, etc.
In order to create an encrypted array, you simply need to pass your secret key upon the array creation:
When creating the array schema, the dimension and attribute names must be unique.
You can set the data tile capacity (applicable to sparse fragments), as follows:
Sparse arrays may allow multiple cells with the same coordinates to exist (dense arrays do not allow duplicates). By default, duplicates are not allowed. You can specify that a sparse array allows duplicates as follows:
When duplicates are allowed, checking for duplicates and deduplication are disabled.
You can check if the array schema is set properly as follows:
The following example shows how to create a filter list with a GZIP compressor and compression level 10.
Supported compressors:
The compressors are members of the FilterType
enum and the options of the FilterOption
enum.
TileDB supports some more filters:
Supported filters (beyond compressors):
The default tile chunk size used by TileDB is 64KB, which is the size of many common processor L1 caches. You can control the chunk size by changing the option on a filter list:
This is done as follows:
This is done as follows:
If you wish all the dimensions to have the same filter list, you can set it once as follows:
You can set filter lists for the offsets of variable-sized attributes or dimensions as follows:
The offset filters are applied to all variable-sized attributes and dimensions.
Datatype | Description |
---|---|
Datatype | Description | Internal TILEDB Datatype Mapping |
---|---|---|
Datatype | Description |
---|---|
Datatype | Description | Array Type |
---|---|---|
Datatype | Description | Array Type | Internal TILEDB Datatype Mapping |
---|---|---|---|
Datatype | Description | Array Type |
---|---|---|
Datatype | Description | Array Type |
---|---|---|
After creating the , and , you can create the array schema as follows:
You can set the tile and cell order as follows. The tile order may be set to row-major or column-major; the cell order may be set to row-major, column-major, or .
Array attributes and dimensions accept .
Compressor | Description | Option (type) |
---|
Compressor | Description | Option (type) |
---|
Compressor | Description | Option (type) |
---|
Compressor | Description | Option (type) |
---|
Compressor | Description | Option (type) |
---|
Filter | Description | Option |
---|
Filter | Description | Option |
---|
Filter | Description | Option |
---|
Filter | Description | Option |
---|
Filter | Description | Option |
---|
If you do not specify a filter for a dimension , the dimension will inherit the filters set to all dimensions collectively as shown above.
TILEDB_BLOB
Opaque bytes. Note: the TILEDB_BLOB
datatype does not support query conditions.
TILEDB_INT8
8-bit integer
TILEDB_UINT8
8-bit unsigned integer
TILEDB_INT16
16-bit integer
TILEDB_UINT16
16-bit unsigned integer
TILEDB_INT32
32-bit integer
TILEDB_UINT32
32-bit unsigned integer
TILEDB_INT64
64-bit integer
TILEDB_UINT64
64-bit unsigned integer
TILEDB_FLOAT32
32-bit floating point
TILEDB_FLOAT64
64-bit floating point
TILEDB_DATETIME_YEAR
Years
TILEDB_DATETIME_MONTH
Months
TILEDB_DATETIME_WEEK
Weeks
TILEDB_DATETIME_DAY
Days
TILEDB_DATETIME_HR
Hours
TILEDB_DATETIME_MIN
Minutes
TILEDB_DATETIME_SEC
Seconds
TILEDB_DATETIME_MS
Milliseconds
TILEDB_DATETIME_US
Microseconds
TILEDB_DATETIME_NS
Nanoseconds
TILEDB_DATETIME_PS
Picoseconds
TILEDB_DATETIME_FS
Femtoseconds
TILEDB_DATETIME_AS
Attoseconds
TILEDB_CHAR
Single character
TILEDB_STRING_ASCII
ASCII string
TILEDB_STRING_UTF8
UTF-8 string
TILEDB_STRING_UTF16
UTF-16 string
TILEDB_STRING_UTF32
UTF-32 string
TILEDB_STRING_UCS2
UCS-2 string
TILEDB_STRING_UCS4
UCS-4 string
TILEDB_ANY
(datatype, bytelength, value)
np.int8
8-bit integer
TILEDB_INT8
np.uint8
8-bit unsigned integer
TILEDB_UINT8
np.int16
16-bit integer
TILEDB_INT16
np.uint16
16-bit unsigned integer
TILEDB_UINT16
np.int32
32-bit integer
TILEDB_INT32
np.uint32
32-bit unsigned integer
TILEDB_UINT32
np.int64
64-bit integer
TILEDB_INT64
np.uint64
64-bit unsigned integer
TILEDB_UINT64
np.float32
32-bit floating point
TILEDB_FLOAT32
np.float64
64-bit floating point
TILEDB_FLOAT64
"ascii"
ASCII
TILEDB_STRING_ASCII
np.dtype('S1')
Character
TILEDB_CHAR
np.dtype('U1')
Unicode UTF-8
TILEDB_STRING_UTF8
"datetime64[Y]"
Years
TILEDB_DATETIME_YEAR
"datetime64[M]"
Months
TILEDB_DATETIME_MONTH
"datetime64[W]"
Weeks
TILEDB_DATETIME_WEEK
"datetime64[D]"
Days
TILEDB_DATETIME_DAY
"datetime64[h]"
Hours
TILEDB_DATETIME_HR
"datetime64[m]"
Minutes
TILEDB_DATETIME_MIN
"datetime64[s]"
Seconds
TILEDB_DATETIME_SEC
"datetime64[ms]"
Milliseconds
TILEDB_DATETIME_MS
"datetime64[us]"
Microseconds
TILEDB_DATETIME_US
"datetime64[ns]"
Nanoseconds
TILEDB_DATETIME_NS
"datetime64[ps]"
Picoseconds
TILEDB_DATETIME_PS
"datetime64[fs]"
Femtoseconds
TILEDB_DATETIME_FS
"datetime64[as]"
Attoseconds
TILEDB_DATETIME_AS
"CHAR"
Single character
"INT8"
8-bit integer
"UINT8"
8-bit unsigned integer
"INT16"
16-bit integer
"UINT16"
16-bit unsigned integer
"INT32"
32-bit integer
"UINT32"
32-bit unsigned integer
"INT64"
64-bit integer
"UINT64"
64-bit unsigned integer
"FLOAT32"
32-bit floating point
"FLOAT64"
64-bit floating point
"DATETIME_YEAR"
Years
"DATETIME_MONTH"
Months
"DATETIME_WEEK"
Weeks
"DATETIME_DAY"
Days
"DATETIME_HR"
Hours
"DATETIME_MIN"
Minutes
"DATETIME_SEC"
Seconds
"DATETIME_MS"
Milliseconds
"DATETIME_US"
Microseconds
"DATETIME_NS"
Nanoseconds
"DATETIME_PS"
Picoseconds
"DATETIME_FS"
Femtoseconds
"DATETIME_AS"
Attoseconds
TILEDB_STRING_ASCII
Variable length string
Sparse
TILEDB_INT8
8-bit integer
Dense & Sparse
TILEDB_UINT8
8-bit unsigned integer
Dense & Sparse
TILEDB_INT16
16-bit integer
Dense & Sparse
TILEDB_UINT16
16-bit unsigned integer
Dense & Sparse
TILEDB_INT32
32-bit integer
Dense & Sparse
TILEDB_UINT32
32-bit unsigned integer
Dense & Sparse
TILEDB_INT64
64-bit integer
Dense & Sparse
TILEDB_UINT64
64-bit unsigned integer
Dense & Sparse
TILEDB_FLOAT32
32-bit floating point
Sparse
TILEDB_FLOAT64
64-bit floating point
Sparse
TILEDB_DATETIME_YEAR
Years
Dense & Sparse
TILEDB_DATETIME_MONTH
Months
Dense & Sparse
TILEDB_DATETIME_WEEK
Weeks
Dense & Sparse
TILEDB_DATETIME_DAY
Days
Dense & Sparse
TILEDB_DATETIME_HR
Hours
Dense & Sparse
TILEDB_DATETIME_MIN
Minutes
Dense & Sparse
TILEDB_DATETIME_SEC
Seconds
Dense & Sparse
TILEDB_DATETIME_MS
Milliseconds
Dense & Sparse
TILEDB_DATETIME_US
Microseconds
Dense & Sparse
TILEDB_DATETIME_NS
Nanoseconds
Dense & Sparse
TILEDB_DATETIME_PS
Picoseconds
Dense & Sparse
TILEDB_DATETIME_FS
Femtoseconds
Dense & Sparse
TILEDB_DATETIME_AS
Attoseconds
Dense & Sparse
"ascii"
/ np.bytes_
Variable length string
Sparse
TILEDB_STRING_ASCII
np.int8
8-bit integer
Dense & Sparse
TILEDB_INT8
np.uint8
8-bit unsigned integer
Dense & Sparse
TILEDB_UINT8
np.int16
16-bit integer
Dense & Sparse
TILEDB_INT16
np.uint16
16-bit unsigned integer
Dense & Sparse
TILEDB_UINT16
np.int32
32-bit integer
Dense & Sparse
TILEDB_INT32
np.uint32
32-bit unsigned integer
Dense & Sparse
TILEDB_UINT32
np.int64
64-bit integer
Dense & Sparse
TILEDB_INT64
np.uint64
64-bit unsigned integer
Dense & Sparse
TILEDB_UINT64
np.float32
32-bit floating point
Sparse
TILEDB_FLOAT32
np.float64
64-bit floating point
Sparse
TILEDB_FLOAT64
"datetime64[Y]"
Years
Dense & Sparse
TILEDB_DATETIME_YEAR
"datetime64['M']
Months
Dense & Sparse
TILEDB_DATETIME_MONTH
"datetime64['W']"
Weeks
Dense & Sparse
TILEDB_DATETIME_WEEK
"datetime64['D']"
Days
Dense & Sparse
TILEDB_DATETIME_DAY
"datetime64['h']"
Hours
Dense & Sparse
TILEDB_DATETIME_HR
"datetime64['m']"
Minutes
Dense & Sparse
TILEDB_DATETIME_MIN
"datetime64['s']"
Seconds
Dense & Sparse
TILEDB_DATETIME_SEC
"datetime64['ms']"
Milliseconds
Dense & Sparse
TILEDB_DATETIME_MS
"datetime64['us']"
Microseconds
Dense & Sparse
TILEDB_DATETIME_US
"datetime64['ns']"
Nanoseconds
Dense & Sparse
TILEDB_DATETIME_NS
"datetime64['ps']"
Picoseconds
Dense & Sparse
TILEDB_DATETIME_PS
"datetime64['fs']"
Femtoseconds
Dense & Sparse
TILEDB_DATETIME_FS
"datetime64['as']"
Attoseconds
Dense & Sparse
TILEDB_DATETIME_AS
"ASCII"
Variable length string
Sparse
"INT8"
8-bit integer
Dense & Sparse
"UINT8"
8-bit unsigned integer
Dense & Sparse
"INT16"
16-bit integer
Dense & Sparse
"UINT16"
16-bit unsigned integer
Dense & Sparse
"INT32"
32-bit integer
Dense & Sparse
"UINT32"
32-bit unsigned integer
Dense & Sparse
"INT64"
64-bit integer
Dense & Sparse
"UINT64"
64-bit unsigned integer
Dense & Sparse
"FLOAT32"
32-bit floating point
Sparse
"FLOAT64"
64-bit floating point
Sparse
"DATETIME_YEAR"
Years
Dense & Sparse
"DATETIME_MONTH"
Months
Dense & Sparse
"DATETIME_WEEK"
Weeks
Dense & Sparse
"DATETIME_DAY"
Days
Dense & Sparse
"DATETIME_HR"
Hours
Dense & Sparse
"DATETIME_MIN"
Minutes
Dense & Sparse
"DATETIME_SEC"
Seconds
Dense & Sparse
"DATETIME_MS"
Milliseconds
Dense & Sparse
"DATETIME_US"
Microseconds
Dense & Sparse
"DATETIME_NS"
Nanoseconds
Dense & Sparse
"DATETIME_PS"
Picoseconds
Dense & Sparse
"DATETIME_FS"
Femtoseconds
Dense & Sparse
"DATETIME_AS"
Attoseconds
Dense & Sparse
StringAscii
Variable length string
Sparse
Int8
8-bit integer
Dense & Sparse
UInt8
8-bit unsigned integer
Dense & Sparse
Int16
16-bit integer
Dense & Sparse
UInt16
16-bit unsigned integer
Dense & Sparse
Int32
32-bit integer
Dense & Sparse
UInt32
32-bit unsigned integer
Dense & Sparse
Int64
64-bit integer
Dense & Sparse
UInt64
64-bit unsigned integer
Dense & Sparse
Float32
32-bit floating point
Sparse
Float64
64-bit floating point
Sparse
DateTimeYear
Years
Dense & Sparse
DateTimeMonth
Months
Dense & Sparse
DateTimeWeek
Weeks
Dense & Sparse
DateTimeDay
Days
Dense & Sparse
DateTimeHour
Hours
Dense & Sparse
DateTimeMinute
Minutes
Dense & Sparse
DateTimeSecond
Seconds
Dense & Sparse
DateTimeMillisecond
Milliseconds
Dense & Sparse
DateTimeMicrosecond
Microseconds
Dense & Sparse
DateTimeNanosecond
Nanoseconds
Dense & Sparse
DateTimePicosecond
Picoseconds
Dense & Sparse
DateTimeFemtosecond
Femtoseconds
Dense & Sparse
DateTimeAttosecond
Attoseconds
Dense & Sparse
| GZIP |
|
| Zstandard |
|
| LZ4 |
|
| RLE |
|
| BZIP2 |
|
| Double Delta | None |
| GZIP |
|
| Zstandard |
|
| LZ4 |
|
| Run-length Encoding |
|
| BZIP2 |
|
| Double Delta | None |
| GZIP |
|
| Zstandard |
|
| LZ4 |
|
| Run-length Encoding |
|
| BZIP2 |
|
| Double Delta | None |
| GZIP | Level ( |
| Zstandard | Level ( |
| LZ4 | Level ( |
| Run-length encoding | Level ( |
| BZIP2 | Level ( |
| Double delta | None |
| GZIP |
|
| Zstandard |
|
| LZ4 |
|
| RLE |
|
| BZIP2 |
|
| Double Delta | None |
| Bit width reduction |
|
| Bit shuffle | None |
| Byte shuffle | None |
| Positive Delta |
|
| Bit width reduction |
|
| Bit shuffle | None |
| Byte shuffle | None |
| Positive delta |
|
| Bit width reduction |
|
| Bit shuffle | None |
| Byte shuffle | None |
| Positive Delta |
|
| Bit width reduction |
|
| Bit shuffle | None |
| Byte shuffle | None |
| Positive Delta |
|
| Bit width reduction |
|
| Bit shuffle | None |
| Byte shuffle | None |
| Positive Delta |
|