Links

Usage

Spark Dataframe to TileDB Array

You can create a new TileDB array from an existing Spark dataframe as follows. See Driver Options for a summary on the options you can use.
Scala
PySpark
SparkR
df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
// ... other options
.save()
df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
# ... other options
.save()
write.df(
df,
source="io.tiledb.spark",
uri="s3://my_bucket/array_new",
schema.dim.0.name="dimension0")
You can write a Spark dataframe to an existing TileDB array by simply adding an "append" mode.
Scala
PySpark
SparkR
df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
.mode("append") // IMPORTANT
// ... other options
.save()
df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
.mode("append") # IMPORTANT
# ... other options
.save()
write.df(
df,
source="io.tiledb.spark",
uri="s3://my_bucket/array_new",
schema.dim.0.name="dimension0",
mode="append")

TileDB Array to Spark Dataframe

You can read a TileDB array into a Spark dataframe as follows. See Driver Options for a summary on the options you can use.
Scala
PySpark
SparkR
val df = spark.read
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/my_array")
.load()
df = spark.read
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/my_array")
.load()
df <- read.df(uri = "s3://my_bucket/array_new", source = "io.tiledb.spark")

SparkSQL on TileDB Arrays

You can run SQL queries with Spark on TileDB arrays as follows:
Scala
// Create a dataframe from a TileDB array
val df = spark.read
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/my_array")
.load()
// Create a view and run SQL
df.createOrReplaceTempView("tiledbArray");
val sql_df = spark.sql("SELECT * FROM tiledbArray")
sql_df.show()