Serverless UDFs

How It Works

TileDB allows you to run Python lambda-like user-defined functions (UDFs) applied on array slices on TileDB Cloud. More specifically, you write the code on your laptop using the TileDB Cloud client (see Installation), your function gets shipped and gets executed on stateless TileDB Cloud workers, and you get charged only for the time it took to run the function and the amount of data that got returned to your laptop (see Pricing for more details). You do not need to worry about launching or managing any computational resources. You can also run UDFs on any array you have access to.

TileDB Cloud runs your UDF in a separate container than the one that performs the slicing from S3 using AWS keys, and the two containers communicate only via REST. Therefore, there is no way for the UDF to compromise security in TileDB Cloud.

Running UDFs is particularly useful if you want to perform reductions (such as a sum or an average), since the amount of data returned is very small regardless of how much data you process on TileDB Cloud.

TileDB Cloud currently supports only Python UDFs, but support for more languages will be added soon.

Each TileDB Cloud worker uses up to 2GB RAM for your function. Therefore, you must consider appropriately slicing your arrays such that each slice fits in 2GB of memory (see also Parallel Computing). In the future, TileDB Cloud will offer flexibility in choosing the types of resources to run the UDF on.

Usage

Below we show how to use Python UDFs on TileDB Cloud, with an example that computes the median on the values of attribute a of slice of a 2D dense array. You just need to write your function (median in this example) that takes as input an ordered numpy dictionary, i.e., in the form {"a" : <numpy-array>, "b" : <numpy-array>}, where the keys are attributes or dimensions of the array you are querying. The reason is that this function will be applied on an array slice; recall that the Python API of TileDB returns an ordered dictionary of numpy arrays on each attribute and dimension upon a read. Then you just use the apply function of the TileDB Cloud client, which takes as input your function, a slice, and optionally a list of attributes (default is all attributes). Note that only the selected attributes must appear in the ordered dictionary that you provide as input to your function.

Python
import tiledb, tiledb.cloud, numpy
def median(numpy_ordered_dictionary):
return numpy.median(numpy_ordered_dictionary["a"])
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
with tiledb.DenseArray("tiledb://TileDB-Inc/quickstart_dense", ctx=tiledb.cloud.Ctx()) as A:
res = A.apply(median, [(1,4), (1,4)], attrs = ["a"])
print(res)