Serverless UDFs

How It Works

TileDB Cloud allows you to run any Python lambda-like user-defined function (UDF). More specifically, you write the code on your laptop using the TileDB Cloud client (see Installation), your function gets shipped and executed on stateless TileDB Cloud workers. You get charged only for the time it took to run the function and the amount of data that got returned to your laptop (see Pricing for more details). You do not need to worry about launching or managing any computational resources. You can also run UDFs on any array you have access to.

TileDB Cloud runs your UDF in a separate dedicated container for security.

Running UDFs is particularly useful if you want to perform reductions (such as a sum or an average), since the amount of data returned is very small regardless of how much data you process on TileDB Cloud.

TileDB Cloud currently supports only Python UDFs, but support for more languages will be added soon.

Each TileDB Cloud worker uses up to 2 CPUs and 2GB RAM for your function. In the future, TileDB Cloud will offer flexibility in choosing the types of resources to run the UDF on.

Cloud API Access inside UDF

For convenience and security for generic UDFs, similar to array UDF and severless SQL, a temporary access token is created and set as environment variables for TileDB to use. TileDB supports reading configuration parameters from the environment, so the config for TILEDB_REST_TOKEN and TILEDB_REST_SERVER_ADDRESS are set in the UDF container. This allows for you to access any API functionality, including running your own array slices without having to pass in any API tokens or credentials.

When the UDF is finished, the temporary token is revoked and deleted. The temporary token also has a timeout of 30 minutes for additional security. If you find you need to run a UDF longer than 30 minutes please contact us.

Packages included in UDF environment

In the environment that the UDF runs, we include the following Python 3.7 packages.

numpy pandas numexpr xarray tiledb-py pyinotify requests scipy boto3

If you would like additional packages added to the UDF environment, please vote for your packages on our feedback request board.

Usage

Below we show how to use Python UDFs in TileDB Cloud, with an example that uses numpy to compute the median of random numbers.

Python
Python
import tiledb, tiledb.cloud, numpy, random
def median():
vals = []
for i in range(0, random.randrange(1,50)):
vals.append(random.randrange(0, i))
return numpy.median(vals)
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
res = tiledb.cloud.udf.exec(median)
print(res)

Passing Arguments To UDFs

The UDF can receive any number of arguments, with keyword arguments supported as well.

Python
Python
import tiledb, tiledb.cloud, numpy, random
def multi_args(arg1, arg2, arg3=None, arg4={}):
# These will print in the logs of the udf
print("type(arg1)={}, arg1={}\n".format(type(arg1), arg1))
print("type(arg2)={}, arg2={}\n".format(type(arg2), arg2))
print("type(arg3)={}, arg3={}\n".format(type(arg3), arg3))
print("type(arg4)={}, arg4={}\n".format(type(arg4), arg4))
return
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
res = tiledb.cloud.udf.exec(multi_args,
[1,2,3],
{"dictionary": "arg2_test"},
False,
arg4=True)
print(res) # None since the function returned nothing
# View the logs
print(tiledb.cloud.last_udf_task().logs)

Asynchronous Execution

Like array UDFs and serverless SQL, an async version of UDFs is available. The _async version returns a future.

Python
Python
import tiledb, tiledb.cloud, numpy, random
def median():
vals = []
for i in range(0, random.randrange(1,50)):
vals.append(random.randrange(0, i))
return numpy.median(vals)
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
# Res will be a future
res = tiledb.cloud.udf.exec_async(median)
# call res.get() to block on the results
print(res.get())