Usage

Spark Dataframe to TileDB Array

You can create a new TileDB array from an existing Spark dataframe as follows. See Driver Options for a summary on the options you can use.

Scala
PySpark
SparkR
df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
// ... other options
.save()
df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
# ... other options
.save()
write.df(
df,
source="io.tiledb.spark",
uri="s3://my_bucket/array_new",
schema.dim.0.name="dimension0")

You can write a Spark dataframe to an existing TileDB array by simply adding an "append" mode.

Scala
PySpark
SparkR
df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
.mode("append") // IMPORTANT
// ... other options
.save()
df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
.mode("append") # IMPORTANT
# ... other options
.save()
write.df(
df,
source="io.tiledb.spark",
uri="s3://my_bucket/array_new",
schema.dim.0.name="dimension0",
mode="append")

TileDB Array to Spark Dataframe

You can read a TileDB array into a Spark dataframe as follows. See Driver Options for a summary on the options you can use.

Scala
PySpark
SparkR
val df = spark.read
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/my_array")
.load()
df = spark.read
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/my_array")
.load()
df <- read.df(uri = "s3://my_bucket/array_new", source = "io.tiledb.spark")

SparkSQL on TileDB Arrays

You can run SQL queries with Spark on TileDB arrays as follows:

Scala
PySpark
SparkR
// Create a dataframe from a TileDB array
val df = spark.read
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/my_array")
.load()
// Create a view and run SQL
df.createOrReplaceTempView("tiledbArray");
val sql_df = spark.sql("SELECT * FROM tiledbArray")
sql_df.show()
# Create a dataframe from a TileDB array
df = spark.read
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/my_array")
.load()
# Create a view and run SQL
df.createOrReplaceTempView("tiledbArray");
sql_df = spark.sql("SELECT * FROM tiledbArray")
sql_df.show()
# Create a dataframe from a TileDB array
df <- read.df(uri = "s3://my_bucket/array_new", source = "io.tiledb.spark")
‚Äč
# Create a view and run SQL
createOrReplaceTempView(df, "tiledbArray")
sql_df <- sql("select * from tiledbArray")
head(sql_df)