TileDB Cloud allows you to perform any SQL query on your TileDB arrays in a serverless manner. No need to spin up or tear down any computational resources. You just need to have the TileDB Cloud client installed (see Installation). You get charged only for the time it took to run the SQL operation.
TileDB Cloud currently supports serverless SQL only through Python, R, and Java, but support for more languages will be added soon.
TileDB Cloud receives your SQL query and executes it on a stateless REST worker that runs a warm MariaDB instance using the MyTile storage engine. The results of the SQL query can be returned directly to you (when using TileDB-Cloud-Py
version 0.4.0
or newer) or they can be written back to an S3 array of your choice (either existing or new). Any array access happens on the same REST instance running the SQL query to optimize performance.
When results are returned directly, they are sent to the client in either JSON or Apache Arrow format, and in Python they are converted into a Pandas dataframe. This is most suitable for small results, such as aggregations or limit
queries.
Writing results to an S3 array is necessary to allow processing of SQL queries with large results, without overwhelming the user (who may be on their laptop). The user can always open the created TileDB array and fetch any desired data afterwards.
Each TileDB Cloud REST worker running a SQL query uses 16 CPUs and has a limit of 2GB RAM. Therefore, you must consider "sharding" a SQL query so that each result fits in 2GB of memory (see Task Graphs). In the future, TileDB Cloud will offer flexibility in choosing the types of resources to run SQL on.
All SQL queries will time out after 15 minutes.