Usage

Spark Dataframe to TileDB Array

You can create a new TileDB array from an existing Spark dataframe as follows. See Driver Options for a summary on the options you can use.

df.write
  .format("io.tiledb.spark")
  .option("uri", "s3://my_bucket/array_new")                          
  .option("schema.dim.0.name", "dimension0")
  // ... other options   
  .save()

You can write a Spark dataframe to an existing TileDB array by simply adding an "append" mode.

df.write
  .format("io.tiledb.spark")
  .option("uri", "s3://my_bucket/array_new")                          
  .option("schema.dim.0.name", "dimension0")
  .mode("append") // IMPORTANT
  // ... other options   
  .save()

TileDB Array to Spark Dataframe

You can read a TileDB array into a Spark dataframe as follows. See Driver Options for a summary on the options you can use.

val df = spark.read
              .format("io.tiledb.spark")
              .option("uri", "s3://my_bucket/my_array")
              .load()

SparkSQL on TileDB Arrays

You can run SQL queries with Spark on TileDB arrays as follows:

// Create a dataframe from a TileDB array
val df = spark.read
              .format("io.tiledb.spark")
              .option("uri", "s3://my_bucket/my_array")
              .load()
              
// Create a view and run SQL
df.createOrReplaceTempView("tiledbArray");
val sql_df = spark.sql("SELECT * FROM tiledbArray")
sql_df.show()

Last updated