Embedded SQL

It is possible for MariaDB to act as an embedded SQL library in TileDB. By using this embedded library, you can avoid installing and maintaining an active MariaDB server on your machine. MariaDB acts as any other library dependency to TileDB, and contains TileDB as a pluggable storage engine so that you can query any TileDB array via SQL. When embedding MariaDB, you should note that you are subject to the GPL license of MariaDB itself.

You can use embedded SQL either via a Python, or directly from the MariaDB REPL, explained below.

Usage in Python

For convenience, TileDB offers a python package TileDB-SQL-Py, which is a fork of mysqlclient-python. The changes in the fork include adding back support for embedded linking, and packaging with binary wheels in Conda, PyPi and a Docker image. This allows you to simply import the Python package and run SQL queries against TileDB Arrays using Pandas.

You can install the package as follows:

docker run -it --rm tiledb/tiledb-sql-py

You can run SQL queries against any TileDB array (on any backend) as follows:

import tiledb.sql, pandas

db = tiledb.sql.connect()
pandas.read_sql(sql="select * from `<array_uri>`", con=db)

To set TileDB configuration parameters, you just need an init_command argument in connect:

import tiledb.sql, pandas

db = tiledb.sql.connect(db="test",
        init_command="set mytile_tiledb_config='rest.token=yyy,rest.server_address=https://api.tiledb.com,...'")
pandas.read_sql(sql="select * from `<array_uri>`", con=db)

Note that, if your array URI is longer than 64 characters, you need to use MariaDB's assisted table discovery for long names by simply doing the following:

import tiledb.sql, pandas

# The following does not create an array, it just informs MariaDB about its existence
db = tiledb.sql.connect(db="test", init_command="create table short_name engine=mytile uri='<LONG_URI>'")

# Now you can query the array using the new short name
pandas.read_sql(sql="select * from `short_name`", con=db)

# or alternatively use a cursor

# db = tiledb.sql.connect()
# cur = db.cursor()
# cur.execute("create table short_name engine=mytile uri='<LONG_URI>'")
# cur.execute("SELECT * from `short_name`")
# row = cur.fetchone()
# while row:
#    print(row)
#    row = cur.fetchone()

When importing tiledb.sql you may get the following errors: Got ERROR: "Could not open mysql.plugin table. Some plugins may be not loaded" errno: 2000 Got ERROR: "Can't open and lock privilege tables: Table 'mysql.servers' doesn't exist" errno: 2000 Got ERROR: "Can't open the mysql.func table. Please run mysql_upgrade to create it." errno: 2000

You can safely ignore these, since they are an artifact of the way we embed MariaDB as a library; MariaDB is looking for tables that do not exist as we do not run it as a server. We will remove those errors soon.

Usage via MariaDB REPL

The MariaDB client REPL is also available to use from Conda for exploring arrays directly.

conda create -n tiledb
conda activate tiledb

conda install -c conda-forge libtiledb-sql

# Copy error messages to expected directory
cp ${CONDA_PREFIX}/share/english/errmsg.sys ${CONDA_PREFIX}/errmsg.sys

# Create empty test dir for database connection
mkdir ${CONDA_PREFIX}/test

mysql_embedded test

You can issue any SQL queries:

select * from `<array_uri>`;

Last updated