TileDB With Dask Array
TileDB integrates very well with
dask.array
. We demonstrate with an example below where attribute attr
stores an int32
value per cell:Python
import dask
import dask.array as da
array = da.from_tiledb('s3://my-bucket/my-dense-array',
attribute='attr',
storage_options={'vfs.s3.aws_access_key_id': 'mykeyid',
'vfs.s3.aws_secret_access_key': 'mysecret'})
print(array.mean().compute())
You can add any TileDB configuration parameter in
storage_options
. Moreover, storage_options
accepts an additional key
option, where you can pass an encryption key if your array is encrypted (see Encryption).Python
import dask
import dask.array as da
array = da.from_tiledb('s3://my-bucket/my-dense-array',
attribute='attr',
chunks=10, # or chunks=(10,)
storage_options={'vfs.s3.aws_access_key_id': 'mykeyid',
'vfs.s3.aws_secret_access_key': 'mysecret'})
print(array.mean().compute())
You can also write a Dask array into TileDB as follows:
Python
import tiledb
import numpy as np
import dask, dask.array as da
array = da.random.random((25,25))
array.to_tiledb("s3://my-bucket/my-uri", storage_options={'vfs.s3.aws_access_key_id': 'mykeyid',
'vfs.s3.aws_secret_access_key': 'mysecret'})
Note that the TileDB array does not need to exist. The above function call will create it if it does not by inferring the schema from the Dask array. To write to an existing array, you should open the array for writing as follows, which will create new fragment(s):
Python
import numpy as np
import dask, dask.array as da
array = da.random.random((25,25))
with tiledb.open(s3://my-bucket/my-uri, 'w') as A:
array.to_tiledb(A)
Using an existing
Array
object allows extra customization of the array schema beyond what is possible with the automatic array creation shown earlier. For example, to create an array with a compression filter applied to the attribute, create the schema and array first, then write to the open Array
:First Tab
import tiledb, dask, dask.array as da, numpy as np
from tiledb import Domain, Dim, Attr
uri = "/path/to/array"
# create schema with Zstd compression applied to the attribute:
schema = tiledb.ArraySchema(
Domain([Dim("x", domain=(0,99), tile=100, dtype=np.uint64),
Dim("y", domain=(0,99), tile=100, dtype=np.uint64)]),
attrs=[Attr("v", dtype=np.float64, filters=[tiledb.ZstdFilter(1)])],
sparse=False
)
# create empty array from the schema above:
tiledb.Array.create(uri, schema)
# write a 100x100 array of 1s to array created above:
with tiledb.open(uri, "w") as tdb_array:
da.ones((100,100)).to_tiledb(tdb_array)
Last modified 10mo ago