Usage

Spark Dataframe to TileDB Array

You can create a new TileDB array from an existing Spark dataframe as follows. See Driver Options for a summary on the options you can use.

df.write
  .format("io.tiledb.spark")
  .option("uri", "s3://my_bucket/array_new")                          
  .option("schema.dim.0.name", "dimension0")
  // ... other options   
  .save()

df.write
  .format("io.tiledb.spark")
  .option("uri", "s3://my_bucket/array_new")                          
  .option("schema.dim.0.name", "dimension0")
  # ... other options
  .save()

write.df(
    df, 
    source="io.tiledb.spark",
    uri="s3://my_bucket/array_new", 
    schema.dim.0.name="dimension0")

You can write a Spark dataframe to an existing TileDB array by simply adding an "append" mode.

df.write
  .format("io.tiledb.spark")
  .option("uri", "s3://my_bucket/array_new")                          
  .option("schema.dim.0.name", "dimension0")
  .mode("append") // IMPORTANT
  // ... other options   
  .save()

df.write
  .format("io.tiledb.spark")
  .option("uri", "s3://my_bucket/array_new")                          
  .option("schema.dim.0.name", "dimension0")
  .mode("append") # IMPORTANT
  # ... other options
  .save()

write.df(
    df, 
    source="io.tiledb.spark",
    uri="s3://my_bucket/array_new", 
    schema.dim.0.name="dimension0", 
    mode="append")

TileDB Array to Spark Dataframe

You can read a TileDB array into a Spark dataframe as follows. See Driver Options for a summary on the options you can use.

val df = spark.read
              .format("io.tiledb.spark")
              .option("uri", "s3://my_bucket/my_array")
              .load()

df = spark.read
          .format("io.tiledb.spark")
          .option("uri", "s3://my_bucket/my_array")
          .load()

df <- read.df(uri = "s3://my_bucket/array_new", source = "io.tiledb.spark")

SparkSQL on TileDB Arrays

You can run SQL queries with Spark on TileDB arrays as follows:

// Create a dataframe from a TileDB array
val df = spark.read
              .format("io.tiledb.spark")
              .option("uri", "s3://my_bucket/my_array")
              .load()
              
// Create a view and run SQL
df.createOrReplaceTempView("tiledbArray");
val sql_df = spark.sql("SELECT * FROM tiledbArray")
sql_df.show()

PreviousLaunching an EMR Cluster NextDriver Options

Last updated 3 years ago

Was this helpful?