Serverless Compute 101

In this tutorial, you will learn:

  1. How to access (slice) a public array

  2. How to perform a SQL query on a public array.

  3. How to perform serverless UDFs on a public array.

We will use public TileDB Cloud array TileDB-Inc/nyc_tlc_yellow_trip_data_2019, which stores the data from the NYC yellow taxi dataset for the year of 2019. The original data is in CSV format with collective size ~7GB, which is converted into a TileDB 1D sparse array with the size being compressed down to ~1GB. The selected sparse dimension is tpep_pickup_datetime, which means that the array supports very fast range slicing on that column of the dataset.

Running on TileDB Cloud

You can preview this tutorial as TileDB Cloud notebook (no login is needed). You can also easily launch it within the TileDB Cloud UI console, but you will need to sign up / login to do so.

Running on your own client

You can run all the commands of this notebook in your own client. The only changes required are:

  1. Install the TileDB Cloud client

  2. Log in using the TileDB Cloud client before running any notebook command