Task Graphs

TileDB Cloud allows you to build arbitrary (directed acyclic) task graphs to combine any number of different tasks into one workflow. You can combine serverless UDFs, SQL and array access along with even local execution of any function.

TileDB Cloud currently supports serverless task graphs only in Python, but support for more languages will be added soon.

The task graph is currently driven by the client. The client can be in a hosted notebook, your local laptop, or even a serverless UDF itself. The client manages the graph, and dispatches the execution of severless jobs or local functions.

Currently, there is no node-to-node communication in a task graph, and all results are serialized and returned to the client. If a subsequent task needs the results from a previous ones the results are returned to the client then dispatched as parameters to the following tasks. Over the next months we plan to eliminate this round trip and offer server-side handling of results between tasks.

The local driver uses the Python ThreadPoolExecutor by default to drive the tasks. The default number of workers is 4 * #cores on the client machine. Python allows multiple serverless tasks to run as they use asynchronous HTTP requests. Serverless tasks will scale elastically. As you request more tasks to be run, TileDB Cloud launches more resources to accommodate the tasks.

Local functions are subject to the Python GIL (interpreter lock) if the task graphs use the ThreadPoolExecutor (default). This limits the concurrency of local functions, however serverless functionality is minimally effected.

Last updated