TileDB integrates very well with dask.array
. We demonstrate with an example below where attribute attr
stores an int32
value per cell:
import daskimport dask.array as da​array = da.from_tiledb('s3://my-bucket/my-dense-array',attribute='attr',storage_options={'vfs.s3.aws_access_key_id': 'mykeyid','vfs.s3.aws_secret_access_key': 'mysecret'})print(array.mean().compute())
You can add any TileDB configuration parameter in storage_options
. Moreover, storage_options
accepts an additional key
option, where you can pass an encryption key if your array is encrypted (see Encryption).
You can also set array chunking similar to Dask's chunking. For example, you can do the following:
import daskimport dask.array as da​array = da.from_tiledb('s3://my-bucket/my-dense-array',attribute='attr',chunks=10, # or chunks=(10,)storage_options={'vfs.s3.aws_access_key_id': 'mykeyid','vfs.s3.aws_secret_access_key': 'mysecret'})print(array.mean().compute())
You can also write a Dask array into TileDB as follows:
import tiledbimport numpy as npimport dask, dask.array as da​array = da.random.random((25,25))​array.to_tiledb("s3://my-bucket/my-uri", storage_options={'vfs.s3.aws_access_key_id': 'mykeyid','vfs.s3.aws_secret_access_key': 'mysecret'})
Note that the TileDB array does not need to exist. The above function call will create it if it does not by inferring the schema from the Dask array. To write to an existing array, you should open the array for writing as follows:
import numpy as npimport dask, dask.array as da​array = da.random.random((25,25))​with tiledb.open(s3://my-bucket/my-uri, 'w') as A:array.to_tiledb(A)