Task Graphs
TileDB Cloud allows you to build arbitrary (directed acyclic) task graphs to combine any number of different tasks into one workflow. You can combine serverless UDFs, SQL and array access along with even local execution of any function.
TileDB Cloud currently supports serverless task graphs only in Python, but support for more languages will be added soon.
The task graph is currently driven by the client. The client can be in a hosted notebook, your local laptop, or even a serverless UDF itself. The client manages the graph, and dispatches the execution of severless jobs or local functions.
Currently, there is no node-to-node communication in a task graph. TileDB does offer server side passing of inputs and outputs without round tripping to a client. This provides the ability to efficiently pass data between stages of the task graph.
The local driver uses the Python ThreadPoolExecutor
by default to drive the tasks. The default number of workers is 4 * #cores
on the client machine. Python allows multiple serverless tasks to run as they use asynchronous HTTP requests. Serverless tasks will scale elastically. As you request more tasks to be run, TileDB Cloud launches more resources to accommodate the tasks.
Local functions are subject to the Python GIL (interpreter lock) if the task graphs use the ThreadPoolExecutor (default).
This limits the concurrency of local functions, however serverless functionality is minimally effected.
Last updated