Serverless Array UDFs

Basic Usage

Below we show how to use Python UDFs in TileDB Cloud, with an example that computes the median on the values of attribute a on a slice of a 2D dense array. You just need to write your function (median in this example) that takes as input an ordered numpy dictionary, i.e., in the form {"a" : <numpy-array>, "b" : <numpy-array>, ...}, where the keys are attribute or dimension names of the array you are querying. The reason is that this function will be applied on an array slice; recall that the Python API of TileDB returns an ordered dictionary of numpy arrays on each attribute and dimension upon a read. Then you just use the apply function of the TileDB Cloud client, which takes as input your function, a slice, and optionally a list of attributes (default is all attributes). Note that only the selected attributes must appear in the ordered dictionary that you provide as input to your function.

Python
Python
import tiledb, tiledb.cloud, numpy
def median(numpy_ordered_dictionary):
return numpy.median(numpy_ordered_dictionary["a"])
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
with tiledb.open("tiledb://TileDB-Inc/quickstart_dense", ctx=tiledb.cloud.Ctx()) as A:
# apply on subarray [1,2]x[1,2]
res = A.apply(median, [(1,2), (1,2)], attrs = ["a"])
print(res)

All slices provided as input to theapply function are inclusive.

Multi-Index Usage

Multi-index queries are supported when applying an array UDF. You can pass any number of tuples or slices using a list of lists syntax (one per dimension).

Python
Python
import tiledb, tiledb.cloud, numpy
def median(numpy_ordered_dictionary):
return numpy.median(numpy_ordered_dictionary["a"])
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
with tiledb.open("tiledb://TileDB-Inc/quickstart_dense", ctx=tiledb.cloud.Ctx()) as A:
# apply on subarrays [1,2]x[1,4] and [4,4]x[1,4]
res = A.apply(median, [[(1,2), 4], [slice(1,4)]], attrs = ["a"])
print(res)

All slices provided as input to theapply function are inclusive.

Apply Without Opening The Array

To execute an array UDF, it is not always necessary to have the array opened locally. An alternative function to apply the UDF on an array URI is provided.

Python
Python
import tiledb, tiledb.cloud, numpy
def median(numpy_ordered_dictionary):
return numpy.median(numpy_ordered_dictionary["a"])
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
# apply on subarrays [1,2]x[1,4] and [4,4]x[1,4]
res = tiledb.cloud.array.apply("tiledb://TileDB-Inc/quickstart_dense", median, [[(1,2), 4], [slice(1,4)]], attrs = ["a"])
print(res)

Asynchronous Execution

An asynchronous version of the array UDFs is available.

Python
Python
import tiledb, tiledb.cloud, numpy, random
def median():
vals = []
for i in range(1, random.randrange(2,50)):
vals.append(random.randrange(0, i))
return numpy.median(vals)
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
with tiledb.open("tiledb://TileDB-Inc/quickstart_dense", ctx=tiledb.cloud.Ctx()) as A:
# apply on subarrays [1,2]x[1,4] and [4,4]x[1,4]
# res will be a future
res = A.apply_async(median, [[(1,2), 4], [slice(1,4)]], attrs = ["a"])
# call res.get() to block on the results
print(res.get())

Selecting Who to Charge

If you you are a member of an organization, then by default the organization is charged for your array UDF. If you would like to charge the array UDF to yourself, you just need to add an extra argument namespace.

Python
Python
import tiledb, tiledb.cloud, numpy
def median(numpy_ordered_dictionary):
return numpy.median(numpy_ordered_dictionary["a"])
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
with tiledb.open("tiledb://TileDB-Inc/quickstart_dense", ctx=tiledb.cloud.Ctx()) as A:
# apply on subarray [1,2]x[1,2]
res = A.apply(median, [(1,2), (1,2)], attrs = ["a"], namespace="my_username")
print(res)

Registering Array UDFs

You can register an array UDF (similar to arrays) as follows:

Python
Python
import tiledb, tiledb.cloud, numpy
def median(numpy_ordered_dictionary):
return numpy.median(numpy_ordered_dictionary["a"])
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
tiledb.cloud.udf.register_single_array_udf(median, name="median_test", namespace="my_username")

In order to be able to register a UDF you need to set up the default storage path for you and/or your organization.

Multi-Array UDFs

TileDB Cloud provides also multi-array UDFs, i.e., UDFs that are applied to more than one arrays.

import numpy as np
import tiledb
import tiledb.cloud
array_1 = "tiledb://TileDB-Inc/array_1"
array_2 = "tiledb://TileDB-Inc/array_2"
def median(numpy_ordered_dictionary_list):
# When you have multiple arrays, the parameter
# we pass in is actually a list of ordered dictionaries.
# The list is in the order of the arrays you asked for.
return (
np.median(numpy_ordered_dictionary[0]["a"] + numpy_ordered_dictionary[1]["a"])
)
# The following will create the list of array to take part
# in the multi-array UDF. Each has as input the array name,
# a multi-index for slicing and a list of attributes to subselect on.
array_list = tiledb.cloud.array.ArrayList()
array_list.add(array_1, [(1, 4), (1, 4)], ["a"])
array_list.add(array_2, [(1, 2), (1, 4)], ["a"])
# This will execute `median` using as input the result of the
# slicing and subselection for each of the arrays in `array_list`
res = tiledb.cloud.array.exec_multi_array_udf(median, array_list)
print("Median Multi-array UDF:\n{}\n".format(res))

You can register a multi-array UDF simply as follows:

Python
Python
import tiledb, tiledb.cloud, numpy
def median(numpy_ordered_dictionary_list):
# When you have multiple arrays, the parameter
# we pass in is actually a list of ordered dictionaries.
# The list is in the order of the arrays you asked for.
return (
np.median(numpy_ordered_dictionary[0]["a"] + numpy_ordered_dictionary[1]["a"])
)
tiledb.cloud.login(username="my_username", password="my_password")
# or tiledb.cloud.login(token="my_token")
tiledb.cloud.udf.register_multi_array_udf(median, name="median_multi_array", namespace="my_username")

Retry Settings

See Retry Settings.