Serverless Compute

As explained in Planet-scale Sharing, slicing data from an array shared over TileDB Cloud is serverless and scales gracefully to accommodate any number of users on the planet. However, how about other types of compute, such as advanced filtering, SQL queries, Linear Algebra, any advanced distributed operation?

Here are a few challenges with computing on the cloud at scale:

  • Spinning up and monitoring clusters on the cloud is cumbersome. Moreover, the user frequently does not know how many machines to provision in a cluster for a given workload. This results in either under provisioning that impacts performance, or over provisioning that leads to wasted cost due to idle compute.

  • When you slice array data from TileDB Cloud only to further process it in your own compute environment, (1) you get charged for egress, and (2) your performance is impacted by the extra network transmission cost that occurs between the TileDB Cloud machines and your own machines.

The above challenges motivated us to take advantage of the serverless infrastructure we had already built with TileDB Cloud to provide planet-scale sharing, and offer a generic solution that allows building scalable distributed algorithms in a serverless and cost effective manner.

The TileDB Cloud serverless platform offers the following benefits:

  • The user can define any UDF or SQL query on arrays, which is executed within TileDB Cloud in a serverless manner, eliminating egress costs.

  • The user can define and execute task DAGs on TileDB Cloud in a serverless manner, which allow the implementation of arbitrary sophisticated distributed algorithms.

  • The user can spin up Jupyter notebooks directly on the TileDB Cloud console for exploratory analysis.

  • The user gets charged in a pay-as-you-go manner, eliminating costs from idle compute.