1 of 32

Distributed Computing

Spark

Quickstart

Spark is a very popular analytics engine for large-scale data processing. It allows users to create distributed arrays and dataframes, use machine learning libraries, perform SQL queries, etc. TileDB-Spark is TileDB's datasource driver for Spark, which allows the user to create distributed Spark dataframes from TileDB arrays and, thus, process TileDB data with familiar tooling at great scale with minimum code changes.

TileDB-Spark Installation

Install TileDB Spark Driver

TileDB offers a prebuilt uber jar that contains all dependencies. This can be used on most Spark clusters to enable the TileDB-Spark datasource driver.

The latest jars can be downloaded from Github.

Compiling TileDB-Spark from source is simple:

git clone git@github.com:TileDB-Inc/TileDB-Spark.git
cd TileDB-Spark
./gradlew assemble
./gradlew shadowJar

This will create a jar file /path/to/TileDB-Spark/build/libs/tiledb-spark-<version>.jar.

Running Spark

To launch a spark shell with TileDB-Spark enabled simply point Spark to the jar you have obtained:

spark-shell --jars /path/to/tiledb-spark-<version>.jar

Launching an EMR Cluster

We have created two handy scripts for setting up TileDB-Spark and Apache Arrow on an EMR cluster. Arrow is optional but will increase performance if you use PySpark or SparkR.

Basic Setup

EMR requires that the bootstrap scripts be copied to an S3 bucket. You can sync the scripts from our TileDB-Spark repo to S3 as follows:

$ aws s3 sync /path/to/TileDB-Spark/scripts/emr-bootstrap s3://my_bucket/path/emr_bootstrap

Create the EMR cluster as follows:

From the AWS EMR console, click on "Create Cluster".
Click on link "Go to advanced options".
In Step 1, make sure Spark is selected.
In Step 3, click on "Bootstrap Actions", then select a custom action, and click on "Configure and add". For the "Script location", you will need to point to where you have uploaded the bootstrap scripts, eg. s3://my_bucket/path/emr_bootstrap/install-tiledb-spark.sh.
Continue to create the cluster. It typically takes 10-20 minutes for the cluster to be ready.

Arrow Bootstrap (optional)

Follow the same procedure as above, but in Step 3 add one more bootstrap action, providing the location of our CRAN packages script, e.g., s3://my_bucket/path/emr_bootstrap/install-cran-packages.sh. Moreover, under, "Optional arguments" you must add --packages arrow(potentially adding any other CRAN package of your choice).

Configuring Metrics (optional)

TileDB-Spark provides a metric source to collect timing and input metric details. This can be helpful in tracking performance of TileDB and the TileDB-Spark driver.

In Step 1 of the EMR launch cluster console, there is a section "Edit software settings". Paste the following json config, which will enable the spark metrics source from TileDB:

[{"classification":"spark-metrics","properties":{"driver.source.io.tiledb.spark.class":"org.apache.spark.metrics.TileDBMetricsSource","executor.source.io.tiledb.spark.class":"org.apache.spark.metrics.TileDBMetricsSource"}},{"classification":"spark-log4j","properties":{"log4j.logger.io.tiledb.spark":"INFO"}}]

Usage

Spark Dataframe to TileDB Array

You can create a new TileDB array from an existing Spark dataframe as follows. See Driver Options for a summary on the options you can use.

df.write
  .format("io.tiledb.spark")
  .option("uri", "s3://my_bucket/array_new")                          
  .option("schema.dim.0.name", "dimension0")
  // ... other options   
  .save()

df.write
  .format("io.tiledb.spark")
  .option("uri", "s3://my_bucket/array_new")                          
  .option("schema.dim.0.name", "dimension0")
  # ... other options
  .save()

write.df(
    df, 
    source="io.tiledb.spark",
    uri="s3://my_bucket/array_new", 
    schema.dim.0.name="dimension0")

You can write a Spark dataframe to an existing TileDB array by simply adding an "append" mode.

df.write
  .format("io.tiledb.spark")
  .option("uri", "s3://my_bucket/array_new")                          
  .option("schema.dim.0.name", "dimension0")
  .mode("append") // IMPORTANT
  // ... other options   
  .save()

df.write
  .format("io.tiledb.spark")
  .option("uri", "s3://my_bucket/array_new")                          
  .option("schema.dim.0.name", "dimension0")
  .mode("append") # IMPORTANT
  # ... other options
  .save()

write.df(
    df, 
    source="io.tiledb.spark",
    uri="s3://my_bucket/array_new", 
    schema.dim.0.name="dimension0", 
    mode="append")

TileDB Array to Spark Dataframe

You can read a TileDB array into a Spark dataframe as follows. See Driver Options for a summary on the options you can use.

val df = spark.read
              .format("io.tiledb.spark")
              .option("uri", "s3://my_bucket/my_array")
              .load()

df = spark.read
          .format("io.tiledb.spark")
          .option("uri", "s3://my_bucket/my_array")
          .load()

df <- read.df(uri = "s3://my_bucket/array_new", source = "io.tiledb.spark")

SparkSQL on TileDB Arrays

You can run SQL queries with Spark on TileDB arrays as follows:

// Create a dataframe from a TileDB array
val df = spark.read
              .format("io.tiledb.spark")
              .option("uri", "s3://my_bucket/my_array")
              .load()
              
// Create a view and run SQL
df.createOrReplaceTempView("tiledbArray");
val sql_df = spark.sql("SELECT * FROM tiledbArray")
sql_df.show()

Driver Options

TileDB Config Options

Writing Options

Reading Options

Supported Datatypes

Spark and TileDB have slight variations in their supported datatypes. This table below shows a mapping between the (core) TileDB and Spark datatypes for easy reference.

Partitioning

The TileDB-Spark data source allows you to specify a partition count when reading a TileDB Array into a distributed Spark dataframe, via the partition_count option. An example is shown below. This creates evenly sized partitions across all array dimensions, based on the volume of the subarray, in order to balance the computational load across the workers.

val df = spark.read
              .format("io.tiledb.spark")
              .option("uri", "s3://my_bucket/my_array")
              .option("partition_count", 100)
              .load()

Metrics

Reporting metrics is supported via dropwizard and the default Spark metrics setup. The metrics can be enabled by adding the following lines to your/etc/spark/conf/metric.properties file:

driver.source.io.tiledb.spark.class=org.apache.spark.metrics.TileDBMetricsSource
executor.source.io.tiledb.spark.class=org.apache.spark.metrics.TileDBMetricsSource

When loading an application jar (i.e. via the --jar CLI flag when launching a Spark shell) the metrics are available to the master node and the driver metrics will report. However, the executors will error about a class not found. This is because on each worker node a jar containing the org.apache.spark.metrics.TileDBMetricsSource must be provided in the class path. To address this, you must copy our dedicated path/to/TileDB-Spark/build/libs/tiledb-spark-metrics-<version>.jar to $SPARK_HOME/jars/.

Performance Tips

Spark has a large number of configuration parameters that can affect the performance of both the TileDB driver and the user application. In this page we provide some performance tuning tips.

Multi-threaded Reads

TileDB-Spark uses TileDB-Java and the underlying C++ libtiledb for parallel IO, compression and encryption. As such, for optimized read performance you should limit the Spark executors to one per machine and give each single executor all the resources of that machine.

Spark Configuration

Set the following Spark configuration parameters:

spark.executor.instances = # number of machines
spark.executor.memory = # 80% of total ram
spark.executor.cores = # number of cores each machine has
spark.task.cpus = # number of cores each machine has

It is important to set the Spark task CPUs to be the same as the number of executor cores. This prevents Spark from putting more than one read partition on each machine. The executor memory is set to 80% of available memory to allow for overhead on the host itself.

If using Yarn, the above configuration parameters are likely not enough. You will need to also configure Yarn similarly.

TileDB Read Parameters

There are two main TileDB Driver Options to tweak for optimizing reads, partition_count and read_buffer_size. The partition_count should be set to the number of executors, which is the number of machines in the cluster.

read_buffer_size should be set to at least 104857600 (100MB). Larger read buffers are critical to reduce the number of incomplete queries. The maximum size of the read buffers is limited based on the available RAM. If you use Yarn, the maximum buffer is also constrained by spark.yarn.executor.memoryOverhead. TileDB read/write buffers are stored off-heap.

Single-Threaded Reads

There are applications that rely on using multiple Spark tasks for parallelism, constraining each task to run on a single thread. This is common for most PySpark and SparkR applications. Below we describe how to configure Spark and the TileDB data source for optimized performance in this case.

Spark Configuration

Set the following Spark configuration parameters:

spark.executor.instances = # number of_machines * cores on each machine
spark.executor.memory = # 80% of total ram / number of executors 
spark.executor.cores = 1 
spark.task.cpus = 1

If you use Yarn, the above configuration parameters are likely not enough. You will need to also configure Yarn similarly.

TileDB Read Parameters

The read_buffer_size should be set to the largest value possible given the executor's available memory. TileDB typically has a memory overhead of 3x, and therefore 3 * read_buffer_size should be less than the Spark's off-heap maximum memory. If you use Yarn, this value is defined in spark.yarn.executor.memoryOverhead. A default value of 10MB is usually sufficient.

Thepartition_count should be set to a value of data size being read / read_buffer_size. If the data size is not known, then set the partition count to the number of executors. This might lead to over partitioning, as such you might want to try different sizes until you find an optimal size for your dataset.

Finally, it is important to set several of the TileDB parallelism configuration parameters in the Spark option()dataframe commands upon reading:

tiledb.sm.num_async_threads = 1,
tiledb.sm.num_reader_threads = 1,
tiledb.sm.num_tbb_threads = 1,
tiledb.sm.num_writer_threads = 1,
tiledb.vfs.num_threads = 1

Dask

Quickstart

Dask is a great library for parallel computing in Python. It can work on your laptop with multiple threads and processes, or on a large cluster. We will take advantage of two very appealing Dask features:

Dynamic task scheduling. We can create arbitrarily complex task graphs using dask.delayed and let Dask execute them in parallel in our cluster.
Parallel arrays and dataframes. dask.array and dask.dataframe work similar to numpy arrays and Pandas dataframes, respectively, but they are extended to work for datasets larger than the main memory and perform computations in a distributed manner by multiple processes and machines.

TileDB currently integrates only with Dask arrays, but we are working on adding support for Dask dataframes. See our roadmap for updates.

Our examples focus only on a single machine, but will work on an arbitrary Dask cluster. Describing how to deploy a Dask cluster though is out of the scope of these docs.

Installation

You can install TileDB and Dask as follows:

conda install -c conda-forge tiledb tiledb-py dask

TileDB With Dask Array

TileDB integrates very well with dask.array. We demonstrate with an example below where attribute attr stores an int32 value per cell:

import dask
import dask.array as da

array = da.from_tiledb('s3://my-bucket/my-dense-array',
                        attribute='attr',
                        storage_options={'vfs.s3.aws_access_key_id': 'mykeyid',
                                         'vfs.s3.aws_secret_access_key': 'mysecret'})
print(array.mean().compute())

You can add any TileDB configuration parameter in storage_options. Moreover, storage_options accepts an additional key option, where you can pass an encryption key if your array is encrypted (see Encryption).

You can also set array chunking similar to Dask's chunking. For example, you can do the following:

import dask
import dask.array as da

array = da.from_tiledb('s3://my-bucket/my-dense-array',
                        attribute='attr',
                        chunks=10, # or chunks=(10,) 
                        storage_options={'vfs.s3.aws_access_key_id': 'mykeyid',
                                         'vfs.s3.aws_secret_access_key': 'mysecret'})
print(array.mean().compute())

You can also write a Dask array into TileDB as follows:

import tiledb
import numpy as np
import dask, dask.array as da

array = da.random.random((25,25))

array.to_tiledb("s3://my-bucket/my-uri", storage_options={'vfs.s3.aws_access_key_id': 'mykeyid',
                                      'vfs.s3.aws_secret_access_key': 'mysecret'})

Note that the TileDB array does not need to exist. The above function call will create it if it does not by inferring the schema from the Dask array. To write to an existing array, you should open the array for writing as follows, which will create new fragment(s):

import numpy as np
import dask, dask.array as da

array = da.random.random((25,25))

with tiledb.open(s3://my-bucket/my-uri, 'w') as A:
    array.to_tiledb(A)

Using an existing Array object allows extra customization of the array schema beyond what is possible with the automatic array creation shown earlier. For example, to create an array with a compression filter applied to the attribute, create the schema and array first, then write to the open Array:

import tiledb, dask, dask.array as da, numpy as np
from tiledb import Domain, Dim, Attr

uri = "/path/to/array"

# create schema with Zstd compression applied to the attribute:
schema = tiledb.ArraySchema(
    Domain([Dim("x", domain=(0,99), tile=100, dtype=np.uint64),
            Dim("y", domain=(0,99), tile=100, dtype=np.uint64)]),
    attrs=[Attr("v", dtype=np.float64, filters=[tiledb.ZstdFilter(1)])],
    sparse=False
)

# create empty array from the schema above:
tiledb.Array.create(uri, schema)

# write a 100x100 array of 1s to array created above:
with tiledb.open(uri, "w") as tdb_array:
    da.ones((100,100)).to_tiledb(tdb_array)

TileDB With Dask Delayed

dask.delayed is a powerful feature of Dask that allows you to create arbitrary task graphs and submit them to Dask's scheduler for execution. You can be truly creative with that functionality and implement sophisticated out-of-core computations (i.e., on larger than RAM datasets) and handle highly distributed workloads.

There is no special integration needed with TileDB, as dask.delayed is quite generic and can work with any user-defined task. We just point out here that you can use TileDB array slicing in a delayed task, which allows you to process truly large TileDB arrays on your laptop or on a large cluster.

We include a very simple example below, stressing though that one can implement much more complex algorithms on arbitrarily large TileDB arrays.

import tiledb
import numpy as np
import dask, dask.array

uri = "<array-uri>"
ctx = tiledb.Ctx()

# Create a simple 1D array with 1000 elements
def write_array():
    dom = tiledb.Domain(tiledb.Dim(name="x",
                                   domain=(0, 999),
                                   tile=10,
                                   dtype=np.uint64),
         	                       ctx=ctx)

    attrs = [tiledb.Attr(name="attr", dtype=np.float64, ctx=ctx),]

    schema = tiledb.ArraySchema(domain=dom, sparse=False,
                                attrs=attrs,
                                ctx=ctx)
    tiledb.DenseArray.create(uri, schema)

    with tiledb.DenseArray(uri, 'w') as A:
        A[:] = np.arange(1000,dtype=np.float64)

# Create the array only if it does not already exist
if not tiledb.VFS().is_dir(uri):
    write_array()

# This produces an array slice
def slice_tiledb(path, slc):
    with tiledb.DenseArray(path) as A:
        return A[slc]['attr']

# Partition the array into 50 delayed slices
partition = 50
delayed_slices = list(
    dask.delayed(slice_tiledb)(uri, slice(start, start+partition)) for 
                               start in 
                               np.arange(0,1001-partition,step=partition))

# This creates a Dask array from the delayed slices
darray = dask.array.concatenate(
    dask.array.from_delayed(x,
                            shape=(partition,), dtype=np.float64)
                            for x in delayed_slices)

#Everything up until here is lazy - nothing is really computed

# This triggers the entire computation
mean = darray.mean().compute()
print(mean)

PrestoDB

Quickstart

TileDB-Presto is a data source connector for PrestoDB, which allows you to run SQL queries on TileDB arrays. The connector supports column subselection on attributes and predicate pushdown on dimension fields, leading to superb performance for projection and range queries.

The TileDB-Presto connector supports most SQL operations from PrestoDB. Arrays can be referenced dynamically and are not required to be "pre-registered" with Presto. No external service (such as Apache Hive) is required.

A docker image is provided to allow for quick testing of the TileDB-Presto connector. The docker image starts a single-node Presto cluster and opens the CLI Presto interface where SQL can be run. The image includes two example tiledb arrays:

/opt/tiledb_example_arrays/dense_global(dense array)
/opt/tiledb_example_arrays/sparse_global(sparse array)

Simply run:

# Run PrestoDB
docker run -it --rm tiledb/tiledb-presto

# Run PrestoDB adding your S3 access keys as env variables
docker run -e AWS_ACCESS_KEY_ID="<key>" -e AWS_SECRET_ACCESS_KEY="<secret>" -it tiledb/tiledb-presto

# Run PrestoDB by mounting an existing local array
docker run -it --rm -v /local/array/path:/data/local_array tiledb/tiledb-presto

You can run a quick example to see if it works:

show columns from "file:///opt/tiledb_example_arrays/dense_global";

 Column |  Type   | Extra |  Comment  
--------+---------+-------+-----------
 rows   | integer |       | Dimension 
 cols   | integer |       | Dimension 
 a      | integer |       | Attribute

select * from "file:///opt/tiledb_example_arrays/dense_global" 
WHERE rows = 3 AND cols between 1 and 2;

 rows | cols | a 
------+------+---
    3 |    1 | 5 
    3 |    2 | 6

It is possible to specify a file that contains SQL to be run from the docker image:

echo 'select * from "file:///opt/tiledb_example_arrays/dense_global" limit 10;' > example.sql
docker run -it --rm -v ${PWD}/example.sql:/tmp/example.sql tiledb/tiledb-presto /opt/presto/bin/entrypoint.sh --file /tmp/example.sql

You can also run a SQL statement directly:

docker run -it --rm tiledb/tiledb-presto /opt/presto/bin/entrypoint.sh --execute 'select * from "file:///opt/tiledb_example_arrays/dense_global" limit 10;'

Configuration

Plugin Parameters

A single configuration file is needed. The config file should be placed in the catalog folder (e.g.,/etc/presto/conf/catalog on EMR) and named tiledb.properties.

Sample file contents:

connector.name=tiledb
# Set read buffer to 10M per attribute
read-buffer-size=10485760

The following parameters can be configured in the tiledb.properties and are plugin-wide.

Session Parameters

These can be set as follows:

set session tiledb.<param>=<value>
// E.g., set session tiledb.splits=10

Unset session parameters inherit the plugin configuration defaults. The list of session parameters is summarized below"

Table properties

These are set upon table creation as follows:

create table my_table(
  ...
  ) with (uri = '<array-uri>', type='SPARSE', cell_order='ROW_MAJOR`, ...);

Column Parameters

These are set upon table creation as follows:

create table my_table(
  dim0 bigint with (dimension=true, lower_bound=0, upper_bound=100, extent=10),
  ...
  ) with (uri = '<array-uri>', type = 'SPARSE');

Installation From Source

Currently, the TileDB-Presto connector is built as a plugin. It must be packaged and installed on the PrestoDB instances. You can download the latest release or build the connector from source using the following command from the top level directory of the TileDB-Presto repo.

./mvnw package

# Tests can be skipped as follows
./mvnw package -DskipTests

To install the plugin on an existing Presto instance, you need to copy the path/to/TileDB-Presto/target/presto-tiledb-<version> folder to a tiledb directory under the plugin directory on echo Presto node. On AWS EMR, this directory is /usr/lib/presto/plugin/tiledb/.

Usage

Creating a New TileDB Array

The example below demonstrates creation of a TileDB array through Presto. Note that some array schema options are not currently supported from Presto (see Limitations for more details).

create table my_table(
  dim0 bigint with (dimension=true, lower_bound=0, upper_bound=100, extent=10),
  dim1 bigint with (dimension=true, lower_bound=0, upper_bound=100, extent=10),
  attr1 varchar
  ) with (uri = '<array-uri>', type = 'SPARSE');

<array-uri> can be any path, local (e.g., file://) or remote (e.g., s3://).

You can see the array schema as follows:

show create table tiledb.tiledb`<array-uri>`;

A TileDB array created through PrestoDB is and behaves exactly like any other TileDB array. Therefore, it is accessible by all TileDB APIs (e.g., Python) and integrations (e.g., Spark).

Querying TileDB Arrays

PrestoDB can dynamically discover existing TileDB arrays, i.e., even if they were created and populated externally from PrestoDB. Therefore, you can just insert data into a TileDB array or query it as follows:

insert into tiledb.tiledb.<array-uri> (dim0, dim1, attr1) 
values (1, 1, 'cell 1'), (1, 2, 'cell 2'), (2, 1, 'cell 3');

// Read the array
select * from tiledb.tiledb.<array-uri>;

Presto uses the form of catalog.schema.<array-uri> for querying. TileDB does not have a concept of a table schema, so any valid string can be used for the schema name when querying and tiledb is used only for convenience in the examples. <array-uri> is the array URI and can be local (file://) or remote (s3://).

Supported Datatypes

PrestoDB and TileDB have slight differences in their supported datatypes. This document serves as a mapping between the (core) TileDB datatypes and the MariaDB datatypes for easy reference.

Presto and all connectors are written in Java, and Java does not have unsigned values. As a result, an unsigned 64-bit integer can overflow if it is larger than 2^63 - 1. Unsigned integers that are 8, 16 or 32 bits are treated as larger integers. For instance, an unsigned 32-bit value is read into a Java type oflong.

Special cases ofchar(1)orvarchar(1)are stored on disk as fixed-sized attributes of size 1. Any char/varchar greater than 1 is stored as a variable-length attribute in TileDB. TileDB will not enforce the length parameter, but Presto will for inserts.

Decimal types are currently treated as doubles. TileDB does not enforce the precision or scale of the decimal types.

Limitations

The TileDB connector supports most Presto functionality. Below is a list of the features not currently supported.

Encrypted Arrays

The connector does not currently support creating/writing/reading encrypted arrays

OpenAt Timestamp

The connector does not currently support the TileDB openAt functionality to open an array at a specific timestamp.

Datatypes

TileDB Presto connector supports the following SQL datatypes:

BOOLEAN
TINYINT
INTEGER
BIGINT
REAL
DOUBLE
DECIMAL (treated as doubles)
STRING*
VARCHAR*
CHAR*
VARBINARY

No other datatypes are supported.

Unsigned Integers

The TileDB Presto connector does not have full support for unsigned values. Presto and all connectors are written in Java, and Java does not have unsigned values. As a result of this Java limitation, an unsigned 64-bit integer can overflow if it is larger than 2^63 - 1. Unsigned integers that are 8, 16 or 32 bits are treated as larger integers. For instance, an unsigned 32-bit value is read into a Java type of long.

Variable-length Char/Varchar fields

For varchar, and char datatypes the special case of char(1) or varchar(1) is stored on disk as a fixed-sized attribute of size 1. Any char/varchar greater than 1 is stored as a variable-length attribute in TileDB. TileDB will not enforce the length parameter but Presto will for inserts.

Decimal Type

Decimal types are currently treated as doubles. TileDB does not enforce the precision or scale of the decimal types.

Create Table

Create table is supported, however only a limited subset of TileDB parameters is supported.

No support for creating encrypted arrays
No support for setting custom filters on attributes, coordinates or offsets

Splits

The current split implementation is naive and splits domains evenly with user defined predicates (WHERE clause) or from the non-empty domains. This even splitting will likely produce sub optimal splits for sparse domains. Future work will move splitting into core TileDB where better heuristics will be used to produce even splits.

For now, if splits are highly uneven consider increasing the number of splits via the tiledb.splits session parameter or add where clauses to limit the data set to non-empty regions of the array.

SQL

This document contains all custom SQL options defined by the TileDB Presto connector.

Create Table

The following properties can be configured for creating a TileDB array in Presto.

Table properties

Property

Description

Default Value

Possible Values

Required

Column Properties

Property

Description

Default Value

Possible Values

Required

Examples

Below are various examples for querying data with the TileDB Presto connector.

SQL Examples

Selecting Data

Typical select statements work as expected. This include predicate pushdown for dimension fields.

Select all columns and all data from an array:

SELECT * FROM tiledb.tiledb."file:///opt/tiledb_example_arrays/dense_global"

Select subset of columns:

SELECT rows, cols FROM tiledb.tiledb."file:///opt/tiledb_example_arrays/dense_global"

Select with predicate pushdown:

SELECT * FROM tiledb.tiledb."file:///opt/tiledb_example_arrays/dense_global" WHERE rows between 1 and 2

Showing Query Plans

Get the query plan without running the query:

EXPLAIN SELECT * FROM tiledb.tiledb."file:///opt/tiledb_example_arrays/dense_global" WHERE rows between 1 and 2

Analyze the query but running and profiling:

EXPLAIN ANALYZE SELECT * FROM tiledb.tiledb."file:///opt/tiledb_example_arrays/dense_global" WHERE rows between 1 and 2

Creating a TileDB Array

It is possible to create TileDB array from Presto. Not all array schema options are currently supported from Presto though (see Limitations for more details).

Minimum create table:

CREATE TABLE region(
  regionkey bigint WITH (dimension=true),
  name varchar,
  comment varchar
  ) WITH (uri = 's3://bucket/region')

Create table with all options specified:

CREATE TABLE region(
  regionkey bigint WITH (dimension=true, lower_bound=0, upper_bound=3000, extent=50),
  name varchar,
  comment varchar
  ) WITH (uri = 's3://bucket/region', type = 'SPARSE', cell_order = 'COL_MAJOR', tile_order = 'ROW_MAJOR', capacity = 10)

Inserting Data

Data can be inserted into TileDB arrays through Presto. Inserts can be from another table or individual values.

Copy data from one table to another:

INSERT INTO tiledb.tiledb."s3://bucket/region" select * from tpch.tiny.region

Data can be inserted using the VALUES method for single row inserts. This is not recommended because each insert will create a new fragment and cause degraded read performance as the number of fragments increases.

INSERT INTO tiledb.tiledb."s3://bucket/region" VALUES (1, "Test Region", "Example")

Trino

Quickstart

TileDB-Trino is a data source connector for Trino, which allows you to run SQL queries on TileDB arrays. The connector supports column subselection on attributes and predicate pushdown on dimension fields, leading to superb performance for projection and range queries.

The TileDB-Trino connector supports most SQL operations from Trino. Arrays can be referenced dynamically and are not required to be "pre-registered" with Trino. No external service (such as Apache Hive) is required.

Configuration

Plugin Parameters

A single configuration file is needed. The config file should be placed in the catalog folder (e.g.,/etc/trino/conf/catalog on EMR) and named tiledb.properties.

Sample file contents:

connector.name=tiledb
# Set read buffer to 10M per attribute
read-buffer-size=10485760

The following parameters can be configured in the tiledb.properties and are plugin-wide.

Session Parameters

These can be set as follows:

set session tiledb.<param>=<value>
// E.g., set session tiledb.splits=10

Unset session parameters inherit the plugin configuration defaults. The list of session parameters is summarized below"

Table properties

These are set upon table creation as follows:

create table my_table(
  ...
  ) with (uri = '<array-uri>', type='SPARSE', cell_order='ROW_MAJOR`, ...);

Column Parameters

These are set upon table creation as follows:

create table my_table(
  dim0 bigint with (dimension=true, lower_bound=0, upper_bound=100, extent=10),
  ...
  ) with (uri = '<array-uri>', type = 'SPARSE');

Installation From Source

Currently, the TileDB-Τrino connector is built as a plugin. It must be packaged and installed on the Trino instances. You can download the latest release or build the connector from source using the following command from the top level directory of the TileDB-Trino repo.

./mvnw package

# Tests can be skipped as follows
./mvnw package -DskipTests

Installation on a Trino instance

First clone Trino

git clone https://github.com/trinodb/trino.git

Install Trino

./mvnw clean install -DskipTests

Create a TileDB directory

mkdir trino/core/trino-server/target/trino-server-***-SNAPSHOT/plugin/tiledb

Build and copy the TileDB-Trino jars to the TileDB directory

cp TileDB-Trino/target/*.jar trino/core/trino-server/target/trino-server-***-SNAPSHOT/plugin/tiledb

Create two nested directories "etc/catalog" which include the tiledb.properties file and move them to:

trino/core/trino-server/target/trino-server-***-SNAPSHOT/

Launch the Trino Server

trino/core/trino-server/target/trino-server-***-SNAPSHOT/bin/launcher run

Launch the Trino-CLI with the TileDB plugin

./trino/client/trino-cli/target/trino-cli-***-SNAPSHOT-executable.jar --schema tiledb --catalog tiledb

Usage

Creating a New TileDB Array

It is possible to create TileDB array from Trino. Not all array schema options are currently supported from Trino though (see for more details). An example is shown below.

Note that <array-uri> can be any path, local (e.g., file://) or remote (e.g., s3://).

You can see the array schema as follows:

A TileDB array created through Trino is and behaves exactly like any other TileDB array. Therefore, it is accessible by all TileDB APIs (e.g., Python) and integrations (e.g., Spark).

Querying TileDB Arrays

Trino can dynamically discover existing TileDB arrays, i.e., even if they were created and populated externally from Trino. Therefore, you can just insert data into a TileDB array or query it as follows:

Trino uses the form of catalog.schema.<array-uri> for querying. TileDB does not have a concept of a table schema, so any valid string can be used for the schema name when querying and tiledb is used only for convenience in the examples. <array-uri> is the array URI and can be local (file://) or remote (s3://).

Supported Datatypes

Trino and TileDB have slight differences in their supported datatypes. This document serves as a mapping between the (core) TileDB datatypes and the MariaDB datatypes for easy reference.

Trino and all connectors are written in Java, and Java does not have unsigned values. As a result, an unsigned 64-bit integer can overflow if it is larger than 2^63 - 1. Unsigned integers that are 8, 16 or 32 bits are treated as larger integers. For instance, an unsigned 32-bit value is read into a Java type oflong.

Decimal types are currently treated as doubles. TileDB does not enforce the precision or scale of the decimal types.

Limitations

The TileDB connector supports most of Trino functionality. Below is a list of the features not currently supported.

Encrypted Arrays

The connector does not currently support creating/writing/reading encrypted arrays

OpenAt Timestamp

The connector does not currently support the TileDB openAt functionality to open an array at a specific timestamp.

Datatypes

TileDB Trino connector supports the following SQL datatypes:

BOOLEAN
TINYINT
INTEGER
BIGINT
REAL
DOUBLE
DECIMAL (treated as doubles)
VARBINARY

No other datatypes are supported.

Unsigned Integers

The TileDB Trino connector does not have full support for unsigned values. Trinno and all connectors are written in Java, and Java does not have unsigned values. As a result of this Java limitation, an unsigned 64-bit integer can overflow if it is larger than 2^63 - 1. Unsigned integers that are 8, 16 or 32 bits are treated as larger integers. For instance, an unsigned 32-bit value is read into a Java type of long.

Variable-length Char/Varchar fields

Decimal Type

Decimal types are currently treated as doubles. TileDB does not enforce the precision or scale of the decimal types.

Create Table

Create table is supported, however only a limited subset of TileDB parameters is supported.

No support for creating encrypted arrays
No support for setting custom filters on attributes, coordinates or offsets

Splits

For now, if splits are highly uneven consider increasing the number of splits via the tiledb.splits session parameter or add where clauses to limit the data set to non-empty regions of the array.

SQL

This document contains all custom SQL options defined by the TileDB Trino connector.

Create Table

The following properties can be configured for creating a TileDB array in Trino.

Table properties

Property

Description

Default Value

Possible Values

Required

Column Properties

Property

Description

Default Value

Possible Values

Required

Examples

Below are various examples for querying data with the TileDB Trino connector.

SQL Examples

Selecting Data

Typical select statements work as expected. This include predicate pushdown for dimension fields.

Select all columns and all data from an array:

SELECT * FROM tiledb.tiledb."file:///opt/tiledb_example_arrays/dense_global"

Select subset of columns:

SELECT rows, cols FROM tiledb.tiledb."file:///opt/tiledb_example_arrays/dense_global"

Select with predicate pushdown:

SELECT * FROM tiledb.tiledb."file:///opt/tiledb_example_arrays/dense_global" WHERE rows between 1 and 2

Showing Query Plans

Get the query plan without running the query:

EXPLAIN SELECT * FROM tiledb.tiledb."file:///opt/tiledb_example_arrays/dense_global" WHERE rows between 1 and 2

Analyze the query but running and profiling:

EXPLAIN ANALYZE SELECT * FROM tiledb.tiledb."file:///opt/tiledb_example_arrays/dense_global" WHERE rows between 1 and 2

Creating a TileDB Array

It is possible to create TileDB array from Trino. Not all array schema options are currently supported from Trino though (see Limitations for more details).

Minimum create table:

CREATE TABLE region(
  regionkey bigint WITH (dimension=true),
  name varchar,
  comment varchar
  ) WITH (uri = 's3://bucket/region')

Create table with all options specified:

CREATE TABLE region(
  regionkey bigint WITH (dimension=true, lower_bound=0, upper_bound=3000, extent=50),
  name varchar,
  comment varchar
  ) WITH (uri = 's3://bucket/region', type = 'SPARSE', cell_order = 'COL_MAJOR', tile_order = 'ROW_MAJOR', capacity = 10)

Inserting Data

Data can be inserted into TileDB arrays through Trino. Inserts can be from another table or individual values.

Copy data from one table to another:

INSERT INTO tiledb.tiledb."s3://bucket/region" select * from tpch.tiny.region

INSERT INTO tiledb.tiledb."s3://bucket/region" VALUES (1, "Test Region", "Example")