Task Graphs 101
Last updated
Was this helpful?
Last updated
Was this helpful?
In this tutorial, you will learn:
How to use task graphs and specifically the Delayed
API
How to scale your computation, significantly boosting performance, all serverless
How to eliminate egress costs
We will use public TileDB Cloud array , which stores the data from the for the year of 2019. The original data is in CSV format with collective size of about 7GB, which is converted into a TileDB 1D sparse array with the size being compressed down to ~1GB. The selected sparse dimension is tpep_pickup_datetime
, which means that the array supports very fast range slicing (and, therefore, also partitioning) on that column of the dataset.
You can this tutorial as TileDB Cloud notebook (no login is needed). You can also easily it within the TileDB Cloud UI console, but you will need to sign up / login to do so.
You can run all the commands of this notebook in your own client. The only changes required are:
the TileDB Cloud client
using the TileDB Cloud client before running any notebook command