Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
TileDB integrates with MariaDB through a pluggable storage engine called MyTile, which can be found in the TileDB-MariaDB repo. MyTile is written to take advantage of query predicate pushdown to TileDB. MyTile allows a user to dynamically access any TileDB array, whether it is local to the machine, on S3 or on TileDB Cloud. This provides a powerful C++ based SQL engine for TileDB arrays.
A docker image is provided to allow for quick testing of the MyTile storage engine. The docker image starts a MariaDB server and connects to it from the shell for you.
MyTile requires TileDB to be installed on the system (see TileDB Installation). You can install all the other dependencies as follows.
Depending on your platform there may be other dependencies needed. See https://mariadb.com/kb/en/library/compiling-mariadb-from-source/ for details.
When compiling from source, you will need to fetch the MariaDB server and then build MariaDB with the MyTile source included. You must build MariaDB and MyTile together in one source tree, as MariaDB requires that all compilation flags and optimizations be the same between the server and the plugins.
For simplicity the cmake
command below disables many of the optional storage engines to reduce the build size. You can toggle each storage engine as you see fit.
The CMAKE_INSTALL_PREFIX
can also be set to switch locations of the installation. By default, we set it to $HOME/mytile_server
After the compilation has completed, in $HOME/mytile_server
you will have MariaDB 10.4 with a working MyTile plugin.
By default optional plugins are not loaded. You will need to edit the MariaDB configuration to load the plugin. Full configuration of MariaDB is beyond our scope as there is a plethora of options outside of MyTile settings.
Edit your ~/.my.cnf
, under the [mysqld]
section add plugin-load-add=mytile
and plugin-maturity=experimental
. If you do not /.my.cnf
file, a sample one is provided below:
Now that you have installed and configured MariaDB, you can run the server. First you need to run MariaDB's setup script to initialize the server:
After the database has been initialized, in a terminal start the server with:
This will launch the server in your terminal. After the server is successfully started, you can connect with:
These are set upon table creation as follows:
The column parameters are set upon table creation as follows:
For filter parameters either table or column the following are the list of supported filters. Any number of filters can be passed in a csv string. If a filter takes a parameters it can be set using key=value
. I.Efilters="GZIP=-1,LZ4=4"
mytile_read_buffer_size
Description: Size in bytes of the buffers to use for each attribute/coordinates when performing a read.
Command line: --mytile-read-buffer-size
Scope: Global
Dynamic: Yes
Data Type: long
Default Value: 100M
mytile_write_buffer_size
Description: Size in bytes of the buffers to use for each attribute/coordinates when performing a write.
Command line: --mytile-write-buffer-size
Scope: Global
Dynamic: Yes
Data Type: long
Default Value: 100M
mytile_delete_arrays
Description: Controls if a delete table
statement causes the array to be deleted on disk or just deregistered from MariaDB. A true value causes actual deletions of data.
Command line: --mytile-delete-arrays
Scope: Global
Dynamic: Yes
Data Type: boolean
Default Value: false
tiledb_config
Description: Set TileDB configuration parameters, this is in csv form of key1=value1,key2=value2
. Example: set mytile_tiledb_config = 'vfs.s3.aws_access_key_id=abc,vfs.s3.aws_secret_access_key=123';
Command line: --mytile-tiledb-config
Scope: Session
Dynamic: Yes
Data Type: string
Default Value: ""
reopen_for_every_query
Description: Closes and reopen the array for every query, this allows tiledb_config
parameters to take effect without forcing a table flush, but any TileDB caching is removed.
Command line: --mytile-reopen-for-every-query
Scope: Session
Dynamic: Yes
Data Type: boolean
Default Value: true
ready_query_layout
Description: Set layout for ready queries, valid values are row-major
, col-major
, unordered
or global-order
Command line: --mytile-read-query-layout
Scope: Session
Dynamic: Yes
Data Type: enum
Default Value: unordered
dimensions_are_primary_keys
Description: Should dimensions be treated and registered as primary keys.
Command line: --mytile-dimensions-are-primary-keys
Scope: Session
Dynamic: Yes
Data Type: boolean
Default Value: true
enable_pushdown
Description: Pushdown predicates where possible
Command line: --mytile-enable-pushdown
Scope: Session
Dynamic: Yes
Data Type: boolean
Default Value: true
compute_table_records
Description: Compute size of table (record count) on table opening. This is useful when you are performing a large join or group by and knowing the table size can affect the optimizer. Computing the table records can take several seconds (worst case) so this should not be enabled for small queries.
Command line: --mytile-compute-table-records
Scope: Session
Dynamic: Yes
Data Type: boolean
Default Value: false
MariaDB has . MyTile is able to use all four types. The default settings for MariaDB enable only Incremental BNL
. MyTile is able to take advantage of Batched Key Access along with algorithms and in most cases this produces significantly better join results. With Batched Key Access, MyTile is able to push all coordinates (keys) down to TileDB for performing more effective batching of data accesses.
You must enable the following join cache level and optimizer switches.
It is also important to adjust both join_buffer_space_limit
and join_buffer_size
to a size that is appropriate for your host. The size of the join buffer limits the amount of keys that are batched at one time. Too small of a value will yield bad performance for TileDB having to issue a large number of small queries. The size should be adjusted to a larger value based on the available memory size.
MariaDB and TileDB have slight differences in their supported datatypes. This document serves as a mapping between the (core) TileDB datatypes and the MariaDB datatypes for easy reference.
The following example shows you how to create a 2D sparse TileDB array with a single attribute. See for details on the MyTile and core TileDB options you can set. Note that filters are optional for creating a table.
If you wish to create an array on S3, the command is similar. Only the array name needs to change to an S3 URI:
You can see the array schema as follows:
A TileDB array created through MariaDB is and behaves exactly like any other TileDB array. Therefore, it is accessible by all TileDB APIs (e.g., Python) and integrations (e.g., Spark).
MariaDB can dynamically discover existing TileDB arrays, i.e., even if they were created and populated externally from MariaDB. Therefore, you can just insert data into a TileDB array or query it as follows:
Note that, if your array URI is longer than 64 characters, you need to use MariaDB's assisted table discovery for long names by simply doing the following:
MyTile supports querying encrypted TileDB arrays by passing the encryption key as a parameter to the assisted table discovery. Note that due to a limitation of MariaDB, if you pass an encryption key it will be shown in the create table statement for any MariaDB user which has permissions to view the created table.
Array metadata can be queried using a special suffix for the array URI. Adding @metadata
to the URI will trigger MyTile to query the array metadata instead of the array data. The results are returned in two columns, key
and value
, both with string (TEXT
) datatype in MariaDB. If a metadata value contains more than one value then a comma delimited string is returned.
Time traveling for TileDB arrays is supported in two methods.
You can use the @<timestamp>
keyword on the array URI to trigger MyTile to query the array at the given timestamp. This is equivalent to using open_at
in other APIs.
You can also use time traveling with the open_at=<timestamp>
parameter with create table discovery.
The examples below show the geospatial functions we push down to TileDB. These queries can run on arrays created by MyTile, by TileDB Open Source, or any other of our supported APIs and integrations.
In order for TileDB to identify the spatial data attribute, your array must either have an attribute named "wkb_geometry" or have a "GEOMETRY_ATTRIBUTE_NAME" metadata field that specifies the name of the spatial attribute.
You can set any of the as follows:
You can also use certain geospatial functions that leverage the performance benefits of TileDB. These are: ST_Intersects
, ST_Overlaps
, ST_Equals.
In addtion, all of the are available.
You can load data into your TileDB arrays using either the or the .
If your wkb_geometry
column is paired with a spatial index as created by the the spatial filter will be pushed down to take advantage of TileDB's powerful dimension slicing for a significant performance boost. Without the spatial index, queries will return the same results but using a full scan.
Property | Description | Datatype | Default | Possible Values | Required |
| Column is a dimension | boolean |
|
| No |
| Domain lower bound | string |
| Any value of datatype | No |
| Domain upper bound | string | Datatype MAX | Any value of datatype | No |
| Tile extent | string |
| Any Value of Datatype | No |
| Filters for attributes | string | NOOP | CSV list of filters (see below) | No |
Filter Name | Accepted Value |
| N/A |
| Integer |
| Integer |
| Integer |
| N/A |
| Integer |
| N/A |
| Integer |
| N/A |
| N/A |
| Integer |
TileDB Datatype | MariaDB SQL Datatype |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Property | Description | Default | Possible Values | Required |
| Array URI |
|
| Yes |
| Array type |
|
| No |
| Cell order |
|
| No |
| Tile order |
|
| No |
| Tile capacity |
|
| No |
| Filters to use for coordinates | NOOP | CSV list of filters (see below) | No |
| Filters to use for offsets | NOOP | CSV list of filters (see below) | No |
| Timestamp to open array at for time traveling | current unix timestamp | Any valid int64 | No |
| Key for encrypted arrays |
|
| No |