Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
MariaDB and TileDB have slight differences in their supported datatypes. This document serves as a mapping between the (core) TileDB datatypes and the MariaDB datatypes for easy reference.
TileDB Datatype
MariaDB SQL Datatype
TILEDB_CHAR
VARCHAR
TILEDB_INT8
TINY
TILEDB_UINT8
TINY
TILEDB_INT16
SHORT
TILEDB_UINT16
SHORT
TILEDB_INT32
INTEGER
TILEDB_UINT32
LONG
TILEDB_INT64
LONG
TILEDB_UINT64
LONG
TILEDB_FLOAT32
FLOAT
TILEDB_FLOAT64
DOUBLE
TILEDB_FLOAT64
DECIMAL
TILEDB_FLOAT64
NEW DECIMAL
TILEDB_DATETIME_YEAR
YEAR
TILEDB_DATETIME_MONTH
DATE
TILEDB_DATETIME_WEEK
DATE
TILEDB_DATETIME_DAY
DATE
TILEDB_DATETIME_HR
DATETIME
TILEDB_DATETIME_MIN
DATETIME
TILEDB_DATETIME_SEC
DATETIME
TILEDB_DATETIME_MS
DATETIME
TILEDB_DATETIME_US
DATETIME
TILEDB_DATETIME_NS
DATETIME
(Precision is displayed to US)
TILEDB_DATETIME_PS
DATETIME
(Precision is displayed to US)
TILEDB_DATETIME_FS
DATETIME
(Precision is displayed to US)
TILEDB_DATETIME_AS
DATETIME
(Precision is displayed to US)
TileDB integrates with MariaDB through a pluggable storage engine called MyTile, which can be found in the TileDB-MariaDB repo. MyTile is written to take advantage of query predicate pushdown to TileDB. MyTile allows a user to dynamically access any TileDB array, whether it is local to the machine, on S3 or on TileDB Cloud. This provides a powerful C++ based SQL engine for TileDB arrays.
A docker image is provided to allow for quick testing of the MyTile storage engine. The docker image starts a MariaDB server and connects to it from the shell for you.
The following example shows you how to create a 2D sparse TileDB array with a single attribute. See Configuration Parameters for details on the MyTile and core TileDB options you can set. Note that filters are optional for creating a table.
If you wish to create an array on S3, the command is similar. Only the array name needs to change to an S3 URI:
You can see the array schema as follows:
A TileDB array created through MariaDB is and behaves exactly like any other TileDB array. Therefore, it is accessible by all TileDB APIs (e.g., Python) and integrations (e.g., Spark).
MariaDB can dynamically discover existing TileDB arrays, i.e., even if they were created and populated externally from MariaDB. Therefore, you can just insert data into a TileDB array or query it as follows:
Note that, if your array URI is longer than 64 characters, you need to use MariaDB's assisted table discovery for long names by simply doing the following:
MyTile supports querying encrypted TileDB arrays by passing the encryption key as a parameter to the assisted table discovery. Note that due to a limitation of MariaDB, if you pass an encryption key it will be shown in the create table statement for any MariaDB user which has permissions to view the created table.
Array metadata can be queried using a special suffix for the array URI. Adding @metadata
to the URI will trigger MyTile to query the array metadata instead of the array data. The results are returned in two columns, key
and value
, both with string (TEXT
) datatype in MariaDB. If a metadata value contains more than one value then a comma delimited string is returned.
Time traveling for TileDB arrays is supported in two methods.
You can use the @<timestamp>
keyword on the array URI to trigger MyTile to query the array at the given timestamp. This is equivalent to using open_at
in other APIs.
You can also use time traveling with the open_at=<timestamp>
parameter with create table discovery.
You can also use certain geospatial functions that leverage the performance benefits of TileDB. These are: ST_Intersects
, ST_Overlaps
, ST_Equals.
In addtion, all of the standard MariaDB spatial functions are available.
You can load data into your TileDB arrays using either the OGR TileDB driver or the TileDB-Py API.
If your wkb_geometry
column is paired with a spatial index as created by the OGR TileDB driver, the spatial filter will be pushed down to take advantage of TileDB's powerful dimension slicing for a significant performance boost. Without the spatial index, queries will return the same results but using a full scan.
The examples below show the geospatial functions we push down to TileDB. These queries can run on arrays created by MyTile, by TileDB Open Source, or any other of our supported APIs and integrations.
In order for TileDB to identify the spatial data attribute, your array must either have an attribute named "wkb_geometry" or have a "GEOMETRY_ATTRIBUTE_NAME" metadata field that specifies the name of the spatial attribute.
You must enable the following join cache level and optimizer switches.
It is also important to adjust both join_buffer_space_limit
and join_buffer_size
to a size that is appropriate for your host. The size of the join buffer limits the amount of keys that are batched at one time. Too small of a value will yield bad performance for TileDB having to issue a large number of small queries. The size should be adjusted to a larger value based on the available memory size.
MariaDB has . MyTile is able to use all four types. The default settings for MariaDB enable only Incremental BNL
. MyTile is able to take advantage of Batched Key Access along with algorithms and in most cases this produces significantly better join results. With Batched Key Access, MyTile is able to push all coordinates (keys) down to TileDB for performing more effective batching of data accesses.
MyTile requires TileDB to be installed on the system (see TileDB Installation). You can install all the other dependencies as follows.
Depending on your platform there may be other dependencies needed. See https://mariadb.com/kb/en/library/compiling-mariadb-from-source/ for details.
When compiling from source, you will need to fetch the MariaDB server and then build MariaDB with the MyTile source included. You must build MariaDB and MyTile together in one source tree, as MariaDB requires that all compilation flags and optimizations be the same between the server and the plugins.
For simplicity the cmake
command below disables many of the optional storage engines to reduce the build size. You can toggle each storage engine as you see fit.
The CMAKE_INSTALL_PREFIX
can also be set to switch locations of the installation. By default, we set it to $HOME/mytile_server
After the compilation has completed, in $HOME/mytile_server
you will have MariaDB 10.4 with a working MyTile plugin.
By default optional plugins are not loaded. You will need to edit the MariaDB configuration to load the plugin. Full configuration of MariaDB is beyond our scope as there is a plethora of options outside of MyTile settings.
Edit your ~/.my.cnf
, under the [mysqld]
section add plugin-load-add=mytile
and plugin-maturity=experimental
. If you do not /.my.cnf
file, a sample one is provided below:
Now that you have installed and configured MariaDB, you can run the server. First you need to run MariaDB's setup script to initialize the server:
After the database has been initialized, in a terminal start the server with:
This will launch the server in your terminal. After the server is successfully started, you can connect with:
These are set upon table creation as follows:
Property
Description
Default
Possible Values
Required
uri
Array URI
""
*
Yes
array_type
Array type
SPARSE
SPARSE
, DENSE
No
cell_order
Cell order
ROW_MAJOR
ROW_MAJOR
, COL_MAJOR
No
tile_order
Tile order
ROW_MAJOR
ROW_MAJOR
, COL_MAJOR
No
capacity
Tile capacity
10000L
>0
No
coordinate_filters
Filters to use for coordinates
NOOP
CSV list of filters (see below)
No
offset_filters
Filters to use for offsets
NOOP
CSV list of filters (see below)
No
open_at
Timestamp to open array at for time traveling
current unix timestamp
Any valid int64
No
encryption_key
Key for encrypted arrays
""
*
No
The column parameters are set upon table creation as follows:
Property
Description
Datatype
Default
Possible Values
Required
dimension
Column is a dimension
boolean
0
0, 1
No
lower_bound
Domain lower bound
string
"0"
Any value of datatype
No
upper_bound
Domain upper bound
string
Datatype MAX
Any value of datatype
No
tile_extent
Tile extent
string
"10"
Any Value of Datatype
No
filters
Filters for attributes
string
NOOP
CSV list of filters (see below)
No
For filter parameters either table or column the following are the list of supported filters. Any number of filters can be passed in a csv string. If a filter takes a parameters it can be set using key=value
. I.Efilters="GZIP=-1,LZ4=4"
Filter Name
Accepted Value
NOOP
N/A
GZIP
Integer
ZSTD
Integer
LZ4
Integer
RLE
N/A
BZIP2
Integer
DOUBLE_DELTA
N/A
BIT_WIDTH_REDUCTION
Integer
BITSHUFFLE
N/A
BYTESHHUFFLE
N/A
POSITIVE_DELTA
Integer
You can set any of the TileDB configuration parameters as follows:
mytile_read_buffer_size
Description: Size in bytes of the buffers to use for each attribute/coordinates when performing a read.
Command line: --mytile-read-buffer-size
Scope: Global
Dynamic: Yes
Data Type: long
Default Value: 100M
mytile_write_buffer_size
Description: Size in bytes of the buffers to use for each attribute/coordinates when performing a write.
Command line: --mytile-write-buffer-size
Scope: Global
Dynamic: Yes
Data Type: long
Default Value: 100M
mytile_delete_arrays
Description: Controls if a delete table
statement causes the array to be deleted on disk or just deregistered from MariaDB. A true value causes actual deletions of data.
Command line: --mytile-delete-arrays
Scope: Global
Dynamic: Yes
Data Type: boolean
Default Value: false
tiledb_config
Description: Set TileDB configuration parameters, this is in csv form of key1=value1,key2=value2
. Example: set mytile_tiledb_config = 'vfs.s3.aws_access_key_id=abc,vfs.s3.aws_secret_access_key=123';
Command line: --mytile-tiledb-config
Scope: Session
Dynamic: Yes
Data Type: string
Default Value: ""
reopen_for_every_query
Description: Closes and reopen the array for every query, this allows tiledb_config
parameters to take effect without forcing a table flush, but any TileDB caching is removed.
Command line: --mytile-reopen-for-every-query
Scope: Session
Dynamic: Yes
Data Type: boolean
Default Value: true
ready_query_layout
Description: Set layout for ready queries, valid values are row-major
, col-major
, unordered
or global-order
Command line: --mytile-read-query-layout
Scope: Session
Dynamic: Yes
Data Type: enum
Default Value: unordered
dimensions_are_primary_keys
Description: Should dimensions be treated and registered as primary keys.
Command line: --mytile-dimensions-are-primary-keys
Scope: Session
Dynamic: Yes
Data Type: boolean
Default Value: true
enable_pushdown
Description: Pushdown predicates where possible
Command line: --mytile-enable-pushdown
Scope: Session
Dynamic: Yes
Data Type: boolean
Default Value: true
compute_table_records
Description: Compute size of table (record count) on table opening. This is useful when you are performing a large join or group by and knowing the table size can affect the optimizer. Computing the table records can take several seconds (worst case) so this should not be enabled for small queries.
Command line: --mytile-compute-table-records
Scope: Session
Dynamic: Yes
Data Type: boolean
Default Value: false