1 of 8

TileDB Cloud Internals

This page group provides details on the TileDB Cloud internal mechanics. You can navigate from the menu on the left, or through the following links:

Architecture

This page describes the architecture of our TileDB Cloud SaaS offering.

Currently, TileDB Cloud (SaaS) runs on AWS, but in the future it will be deployed on other cloud providers. The principles around multiple cloud regions and cloud storage described in the architecture below are directly extendible to other settings (on the cloud or on premises).

Do you wish to run TileDB Cloud under your full control on premises or on the cloud? See .

The following figure outlines the TileDB Cloud architecture, which is comprised of the following components:

Automatic Redirection
Orchestration
UI Console
System State
REST Workers
Jupyter Notebooks

We explain each of those components below.

Automatic Redirection

TileDB Cloud maintains compute clusters in multiple cloud regions, geographically distributed across the globe. The reason is that users may store their data in cloud buckets located in different regions, and it is always faster and more economical to send the compute to the data; that eliminates egress costs, reduces latency and increases network speeds. However, users may not know which region the array they are accessing is located.

To facilitate sending the compute to the appropriate region, TileDB Cloud supports automatic redirection using the Cloudflare Workers service. This provides a scalable and serverless way to lookup the region of the array being accessed (maintaining a fast key-value store that is always in sync with the System State) and issue a 302 temporary redirect to the HTTP request. TileDB Open Source and the TileDB Cloud client will honor the redirection and send the the request to the TileDB Cloud service in the proper region (see Orchestration).

If your array lives in a cloud region unsupported by TileDB Cloud, the request is sent to us-east-1. We plan a future improvement to redirect to the nearest region instead.

Currently, automatic redirection is enabled by default, and the behavior can be controlled by using a configuration parameter. The user can also always dispatch any query directly to a specific region.

Orchestration

In every cloud region, TileDB Cloud maintains a Kubernetes cluster that carries out all tasks, properly autoscaling and load balancing to match capacity with demand based upon several factors. We use the Kubernetes built in metrics and monitoring toolchain to ensure pod memory usage is monitored and we have an accurate picture of the real world workloads at all times.

Currently supported regions:

us-east-1
us-west-2
eu-west-2
ap-southeast-1

In each region we use a variety of compute EC2 instance types, predominantly from m5, c5 and r5classes.

UI Console

The TileDB Cloud user interface console (https://cloud.tiledb.com) is a web app written in React that uses the REST Workers API across the same procedures and protocols as the clients. Many of the same routes are also used directly from one of the many clients, such a TileDB-Cloud-Py or TileDB-Cloud-R. The console web app autoscales based on the load, but currently it runs only inside the us-east-1cluster.

System State

TileDB Cloud maintains persistent state about user records, arrays, UDFs, billing, activity and more by using an always encrypted MariaDB instance. This instance is maintained in the us-east-1 region. In addition, this state is replicated and synced at all times with a read-only MariaDB instance maintained in every other supported region, in order to reduce latency for the queries executed in those regions.

REST Workers

TileDB Cloud's architecture is centered around a REST API Service. The service is a Go based application which provides all of the base functionality such as user management, authentication and access control, billing and monetization (via integration with Stripe), UDF execution, and serverless SQL orchestration used in TileDB Cloud. The REST Service is deployed in Kubernetes with a stateless design that allows for distributed orchestration and execution without the need for centralized coordination or locking.

The REST Service monitors resource usage and does its own book keeping in order to determine if it can service a request or if it should inform the client to retry later. By allowing the client to manage retries and with the high availability of the REST service architecture. TileDB Cloud is able to gracefully load balance and distribute the work across multiple instances.

The REST service handles the following types of serverless tasks, building upon the TileDB Open Source library:

Access (read/write)
SQL
UDFs

Jupyter Notebooks

TileDB Cloud offers hosted Jupyter notebooks by using Jupyter Docker Stacks for the base conda environments, and Jupyterhub / Zero to Jupyterhub K8S for the runtime environment. The notebooks are spawned inside Kubernetes using kubespawner to offer an isolated environment for each user with their own dedicated and persisted storage.

Currently, Jupyter notebooks can be spawned in the us-east-1 region, but soon TileDB Cloud will support multiple regions for notebooks.

Connectivity (Firewall) Requirements

TileDB Cloud runs over standard http connectivity, using tcp ports 80 and443. Connection made on port 80 are automatically redirected to https over port 443.

Open ID Connect

TileDB Cloud provides Open ID Connect support that can be used with any Open ID Connect compatible service. TileDB Cloud provide a fixed set of IP address used for the outbound request as part of the Open ID Connect sequence.

eu-west-2

13.41.67.254
18.134.194.194
18.135.61.196

us-west-2

35.81.95.218
54.185.206.57
54.189.31.204

us-east-1

52.21.38.106
54.87.160.2
52.70.6.129

ap-southeast-1

13.213.235.67
54.255.255.186
52.76.199.70

See Corporate SSO with TileDB Cloud SaaS if you are interested in enabling OIDC support for TileDB Cloud SaaS in your own environment.

Array Access

Array access refers to any read or write operation to an array registered with TileDB Cloud and referenced via its tiledb:// URI. Each array access is directed to a particular Kubernetes cluster in a specific cloud region as explained in . Then this request is assigned to a REST worker pod in an elastic and load balanced manner. That worker uses 16 CPU cores and sets the total result buffer size for TileDB Open Source to 2GB RAM.

The REST worker performs authentication (looking up the system state), logs all activity, manages billing and monetization, and enforces the access policies. Most importantly, each REST worker is totally stateless, and requires no synchronization or locking, allowing TileDB Cloud to scale very gracefully and quickly recover from failure via retry policies.

Access Control and Logging

One of the most powerful feature of TileDB Cloud is that it allows users to share arrays, UDFs and notebooks at extreme scale, with anyone on the planet, and with diverse polices (e.g., read, write, read/write). There are no restrictions on the number of users data and code can be shared with.

Currently, TileDB Cloud supports access policies at the array level. However, soon it will support finer-grained access policies at the cell level.

TileDB Cloud also enables users to create organizations, in order to better manage access to their assets and manage billing. You can create any number of organizations.

TileDB Cloud maintains a global system state using MariaDB, recording all information required to know which assets belong to which users and who has access to the various assets.

TileDB Cloud logs everything: the task types, the users that initiated them, duration, cost, etc. All this information gets logged by the REST workers into the persistent and encrypted MariaDB instance. The activity can then be browsed on the TileDB Cloud UI console or retrieved programmatically using the TileDB Cloud client. Six months of logs are made available for instant retrieval. Contact us if you need longer retention or ways to perform offline audits of historical logs for your organization.

By default, sessions on TileDB Cloud will timeout after 8 hours. SSO session timeout is controlled by organizational policies.

Serverless SQL

TileDB Cloud allows you to perform any SQL query on your TileDB arrays in a serverless manner. No need to spin up or tear down any computational resources. You just need to have the TileDB Cloud client installed (see Installation). You get charged only for the time it took to run the SQL operation.

TileDB Cloud currently supports serverless SQL only through Python, R, and Java, but support for more languages will be added soon.

TileDB Cloud receives your SQL query and executes it on a stateless REST worker that runs a warm MariaDB instance using the MyTile storage engine. The results of the SQL query can be returned directly to you (when using TileDB-Cloud-Py version 0.4.0 or newer) or they can be written back to an S3 array of your choice (either existing or new). Any array access happens on the same REST instance running the SQL query to optimize performance.

When results are returned directly, they are sent to the client in either JSON or Apache Arrow format, and in Python they are converted into a Pandas dataframe. This is most suitable for small results, such as aggregations or limit queries.

Writing results to an S3 array is necessary to allow processing of SQL queries with large results, without overwhelming the user (who may be on their laptop). The user can always open the created TileDB array and fetch any desired data afterwards.

Each TileDB Cloud REST worker running a SQL query uses 16 CPUs and has a limit of 2GB RAM. Therefore, you must consider "sharding" a SQL query so that each result fits in 2GB of memory (see Task Graphs). In the future, TileDB Cloud will offer flexibility in choosing the types of resources to run SQL on.

All SQL queries will time out after 15 minutes.

Serverless UDFs

TileDB Cloud allows you to run any lambda-like user-defined function (UDF). More specifically, you write the code on your laptop using the TileDB Cloud client (see Installation), your function gets shipped and executed on stateless TileDB Cloud REST workers. You get charged only for the time it took to run the function and the amount of data that got returned to your laptop. You do not need to worry about launching or managing any computational resources.

There are two types of supported UDFs:

Generic: These can include any code.
Array UDFs: These are UDFs that are applied to slices of one or more arrays.

Running UDFs is particularly useful if you want to perform reductions (such as a sum or an average), since the amount of data returned is very small regardless of how much data the UDF processes.

TileDB Cloud currently supports only Python and R UDFs, but support for more languages will be added soon.

TileDB Cloud runs your UDF in a separate dedicated container for security. Any array access is executed in parallel on the same REST worker but separate containers, and the results are sent to the UDF container using zero-copy techniques for performance.

We offer Python and R UDF images based on the following versions:

Python

Python 3.7 is deprecated in User Defined Functions and is no longer updated as of January 31st, 2024. Registered User Defined Functions under python 3.7 will continue to be available for execution with the packages listed on this page until August, 2024.

In the default environment that the UDF runs, we include the following Python packages:

Package

Version (Python 3.9.15)

Version (Python 3.7.12)

In the default environment that the UDF runs, we include the following R packages:

Geospacial image (geo) is based on Python images and include the following packages:

Genomics image (genomics) is based on Python images and include the following packages:

Imaging image (imaging-dev) is based on Python images and include the following packages:

Vector search image (vectorsearch) is based on Python images and includes the following packages:

If you would like additional packages added to the UDF environment, please leave your suggestion on our feedback request board.

Each UDF allows for the following configurations to be used:

In the future, TileDB Cloud will offer more flexibility in choosing the types of resources to run the UDF on.

All UDFs will time out by default after 15 minutes, the value is configurable when submitting a UDF by using the timeout parameter.

Task Graphs

TileDB Cloud allows you to build arbitrary (directed acyclic) task graphs to combine any number of different tasks into one workflow. You can combine serverless UDFs, SQL and array access along with even local execution of any function.

TileDB Cloud currently supports serverless task graphs only in Python, but support for more languages will be added soon.

The task graph is currently driven by the client. The client can be in a hosted notebook, your local laptop, or even a serverless UDF itself. The client manages the graph, and dispatches the execution of severless jobs or local functions.

Currently, there is no node-to-node communication in a task graph. TileDB does offer server side passing of inputs and outputs without round tripping to a client. This provides the ability to efficiently pass data between stages of the task graph.

The local driver uses the Python ThreadPoolExecutor by default to drive the tasks. The default number of workers is 4 * #cores on the client machine. Python allows multiple serverless tasks to run as they use asynchronous HTTP requests. Serverless tasks will scale elastically. As you request more tasks to be run, TileDB Cloud launches more resources to accommodate the tasks.

Local functions are subject to the Python GIL (interpreter lock) if the task graphs use the ThreadPoolExecutor (default). This limits the concurrency of local functions, however serverless functionality is minimally effected.

Jupyter Notebooks

TileDB Cloud enables the user to launch Jupyter notebooks within the UI console. It spins up Jupyter notebook instances in the Kubernetes cluster in us-east-1. The user can install any extra packages in the notebook. The notebook server environment is destroyed on shutdown. Any extra packages installed will not persist across server instances.

Every user gets a 2GB persistent storage in an EBS volume (also in us-east-1). This is mounted as the home directory in the notebook server. All contents in the home directory will persist across server restarts. The user does not get charged for storage!

Currently, TileDB offers two notebook server sizes:

As explained in the Pricing and Billing section, notebooks are charged based on the size of the notebook server and duration it is run for.

Currently notebook usage is charged either to an organization a user belongs to or, if the user is not part of an organization, to the user themselves. We plan a future improvement to allow selecting who to charge for the notebook usage.

TileDB Cloud offers three notebook images, with the following installed packages:

Basic Data Science:tiledb, libtiledb-sql-py, plotly, ipywidgets, graphviz, pandas, pydot, trimesh, numpy, chardet, numba, tiledb-r, voila, opencv, tiledb-cloud, pybabylonjs, envbash, tiledb-ml
Genomics: Everything in the Basic Data Science notebook plus:snakemake, tiledb-vcf, htslib, bcftools, pybedtools
Geospatial: Everything in the Basic Data Science notebook plus:cartopy, datashader, descartes, folium, geos, geotiff, holoviews, imagemagick, laszip, libnetcdf, proj, shapely, scikit-build, proj, gdal, rasterio, mb-system, pdal, fiona, geopandas, scikit-mobility, xarray, tiledb-segy, capella-tools

Architecture

This page describes the architecture of our TileDB Cloud SaaS offering.

Do you wish to run TileDB Cloud under your full control on premises or on the cloud? See .

The following figure outlines the TileDB Cloud architecture, which is comprised of the following components:

Automatic Redirection
Orchestration
UI Console
System State
REST Workers
Jupyter Notebooks

We explain each of those components below.

Automatic Redirection

If your array lives in a cloud region unsupported by TileDB Cloud, the request is sent to us-east-1. We plan a future improvement to redirect to the nearest region instead.

Orchestration

Currently supported regions:

us-east-1
us-west-2
eu-west-2
ap-southeast-1

In each region we use a variety of compute EC2 instance types, predominantly from m5, c5 and r5classes.

UI Console

System State

REST Workers

The REST service handles the following types of serverless tasks, building upon the TileDB Open Source library:

Access (read/write)
SQL
UDFs

Jupyter Notebooks

Currently, Jupyter notebooks can be spawned in the us-east-1 region, but soon TileDB Cloud will support multiple regions for notebooks.

Connectivity (Firewall) Requirements

TileDB Cloud runs over standard http connectivity, using tcp ports 80 and443. Connection made on port 80 are automatically redirected to https over port 443.

Open ID Connect

eu-west-2

13.41.67.254
18.134.194.194
18.135.61.196

us-west-2

35.81.95.218
54.185.206.57
54.189.31.204

us-east-1

52.21.38.106
54.87.160.2
52.70.6.129

ap-southeast-1

13.213.235.67
54.255.255.186
52.76.199.70

See Corporate SSO with TileDB Cloud SaaS if you are interested in enabling OIDC support for TileDB Cloud SaaS in your own environment.