Serverless UDFs

TileDB Cloud allows you to run any lambda-like user-defined function (UDF). More specifically, you write the code on your laptop using the TileDB Cloud client (see Installation), your function gets shipped and executed on stateless TileDB Cloud REST workers. You get charged only for the time it took to run the function and the amount of data that got returned to your laptop. You do not need to worry about launching or managing any computational resources.

There are two types of supported UDFs:

  • Generic: These can include any code.

  • Array UDFs: These are UDFs that are applied to slices of one or more arrays.

Running UDFs is particularly useful if you want to perform reductions (such as a sum or an average), since the amount of data returned is very small regardless of how much data the UDF processes.

TileDB Cloud currently supports only Python and R UDFs, but support for more languages will be added soon.

TileDB Cloud runs your UDF in a separate dedicated container for security. Any array access is executed in parallel on the same REST worker but separate containers, and the results are sent to the UDF container using zero-copy techniques for performance.

We offer Python and R UDF images based on the following versions:

PythonR

3.9.15

4.3.2

3.7.12 (Deprecated)

Python 3.7 is deprecated in User Defined Functions and is no longer updated as of January 31st, 2024. Registered User Defined Functions under python 3.7 will continue to be available for execution with the packages listed on this page until August, 2024.

In the default environment that the UDF runs, we include the following Python packages:

PackageVersion (Python 3.9.15)Version (Python 3.7.12)

numpy

1.23.5

1.21.6

pandas

1.5.3

1.3.5

tensorflow

2.11.0

1.14.0

numexpr

2.8.7

2.8.3

numba

0.59.1

0.56.3

xarray

2024.3.0

0.20.2

tiledb

0.30.2

0.23.4

scipy

1.13.1

1.7.3

boto3

1.34.106

1.25.0

tiledbvcf

0.32.0

0.26.6

tiledbsoma

1.12.3

1.5.2

cellxgene-census

1.14.1

1.9.0

In the default environment that the UDF runs, we include the following R packages:

PackageVersion (R 4.3.2)

Rcpp

1.0.12

tiledb

0.28.2

tiledbsoma

1.12.3

curl

5.2.1-1

RcppSpdlog

0.0.17-1

jsonlite

1.8.8-1

base64enc

0.1-3

R6

2.5.1

httr

1.4.7

mmap

0.6-22

remotes

2.4.2.1

SeuratObject

5.0.2

BiocManager

1.30.22

SingleCellExperiment

1.26.0

Geospacial image (geo) is based on Python images and include the following packages:

PackageVersion

PDAL

3.4.3

rasterio

1.4.a1

fiona

1.9.5

geopandas

0.14.4

scikit-mobility

1.1.2

xarray

2024.2.0

tiledb-cf

0.9.1

tiledb-segy

0.3.0

Genomics image (genomics) is based on Python images and include the following packages:

PackageVersion (3.9.15)Version (3.7.10)

bwa

0.7.18

0.7.17

java-jdk

8.0.112

1.8.0.112

picard

3.0.0

3.0.0

samtools

1.19.0

1.16.1

sra-tools

3.1.1

3.0.9

gatk4

4.3.0.0

N/A

Imaging image (imaging-dev) is based on Python images and include the following packages:

PackageVersion (3.9.15)Version (3.7.10)

tiledb-bioimg

0.2.11

0.2.7

scikit-image

0.22.0

0.22.0

openslide

3.4.1

3.4.1

openslide-python

1.3.1

1.3.1

simpleitk

1.19.0

1.16.1

Vector search image (vectorsearch) is based on Python images and includes the following packages:

PackageVersion (3.9.15)

tiledb-vector-search

0.7.0

langchain

0.1.20

langchain-openai

0.0.8

hugginggace_hub

0.23.4

openai

1.14.3

pypdf

3.17.4

beautifulsoup4

4.12.3

tiktoken

0.5.2

PyMuPDF

1.24.7

transformers

4.42.4

orjsonl

1.0.0

If you would like additional packages added to the UDF environment, please leave your suggestion on our feedback request board.

Each UDF allows for the following configurations to be used:

TypeCPU (max)RAM (max)

standard (Default)

2

2GB

large

8

8GB

In the future, TileDB Cloud will offer more flexibility in choosing the types of resources to run the UDF on.

All UDFs will time out by default after 15 minutes, the value is configurable when submitting a UDF by using the timeout parameter.

Last updated