Usage

Spark Dataframe to TileDB Array

You can create a new TileDB array from an existing Spark dataframe as follows. See Driver Options for a summary on the options you can use.
Scala
PySpark
SparkR
1
df.write
2
.format("io.tiledb.spark")
3
.option("uri", "s3://my_bucket/array_new")
4
.option("schema.dim.0.name", "dimension0")
5
// ... other options
6
.save()
Copied!
1
df.write
2
.format("io.tiledb.spark")
3
.option("uri", "s3://my_bucket/array_new")
4
.option("schema.dim.0.name", "dimension0")
5
# ... other options
6
.save()
Copied!
1
write.df(
2
df,
3
source="io.tiledb.spark",
4
uri="s3://my_bucket/array_new",
5
schema.dim.0.name="dimension0")
Copied!
You can write a Spark dataframe to an existing TileDB array by simply adding an "append" mode.
Scala
PySpark
SparkR
1
df.write
2
.format("io.tiledb.spark")
3
.option("uri", "s3://my_bucket/array_new")
4
.option("schema.dim.0.name", "dimension0")
5
.mode("append") // IMPORTANT
6
// ... other options
7
.save()
Copied!
1
df.write
2
.format("io.tiledb.spark")
3
.option("uri", "s3://my_bucket/array_new")
4
.option("schema.dim.0.name", "dimension0")
5
.mode("append") # IMPORTANT
6
# ... other options
7
.save()
Copied!
1
write.df(
2
df,
3
source="io.tiledb.spark",
4
uri="s3://my_bucket/array_new",
5
schema.dim.0.name="dimension0",
6
mode="append")
Copied!

TileDB Array to Spark Dataframe

You can read a TileDB array into a Spark dataframe as follows. See Driver Options for a summary on the options you can use.
Scala
PySpark
SparkR
1
val df = spark.read
2
.format("io.tiledb.spark")
3
.option("uri", "s3://my_bucket/my_array")
4
.load()
Copied!
1
df = spark.read
2
.format("io.tiledb.spark")
3
.option("uri", "s3://my_bucket/my_array")
4
.load()
Copied!
1
df <- read.df(uri = "s3://my_bucket/array_new", source = "io.tiledb.spark")
Copied!

SparkSQL on TileDB Arrays

You can run SQL queries with Spark on TileDB arrays as follows:
Scala
1
// Create a dataframe from a TileDB array
2
val df = spark.read
3
.format("io.tiledb.spark")
4
.option("uri", "s3://my_bucket/my_array")
5
.load()
6
7
// Create a view and run SQL
8
df.createOrReplaceTempView("tiledbArray");
9
val sql_df = spark.sql("SELECT * FROM tiledbArray")
10
sql_df.show()
Copied!