This is a simple guide that demonstrates how to use TileDB on HDFS. HDFS is a distributed Java-based filesystem for storing large amounts of data. It is the underlying distributed storage layer for the Hadoop stack.
TileDB integrates with HDFS through the
libhdfslibrary (HDFS C-API). The HDFS backend is enabled by default and
libhdfsloading happens at runtime based on environment variables:
If the library cannot be found, or if the Hadoop library cannot locate the correct library dependencies a runtime, an error will be returned.
To use HDFS with TileDB, change the URI you use to an HDFS path:
For instance, if you are running a local HDFS namenode on port 9000:
If you want to use the namenode specified in your HDFS configuration files, then change the prefix to:
Most HDFS configuration variables are defined in Hadoop specific XML files. TileDB allows the following configuration variables to be set at run time through configuration parameters: