TileDB With Dask Delayed
dask.delayed
is a powerful feature of Dask that allows you to create arbitrary task graphs and submit them to Dask's scheduler for execution. You can be truly creative with that functionality and implement sophisticated out-of-core computations (i.e., on larger than RAM datasets) and handle highly distributed workloads.
There is no special integration needed with TileDB, as dask.delayed
is quite generic and can work with any user-defined task. We just point out here that you can use TileDB array slicing in a delayed task, which allows you to process truly large TileDB arrays on your laptop or on a large cluster.
We include a very simple example below, stressing though that one can implement much more complex algorithms on arbitrarily large TileDB arrays.
Last updated