Spark Dataframe to TileDB Array
You can create a new TileDB array from an existing Spark dataframe as follows. See Driver Options for a summary on the options you can use.
Scala PySpark SparkR
Copy df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
// ... other options
.save()
Copy df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
# ... other options
.save()
Copy write.df(
df,
source="io.tiledb.spark",
uri="s3://my_bucket/array_new",
schema.dim.0.name="dimension0")
You can write a Spark dataframe to an existing TileDB array by simply adding an "append" mode.
Scala PySpark SparkR
Copy df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
.mode("append") // IMPORTANT
// ... other options
.save()
Copy df.write
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/array_new")
.option("schema.dim.0.name", "dimension0")
.mode("append") # IMPORTANT
# ... other options
.save()
Copy write.df(
df,
source="io.tiledb.spark",
uri="s3://my_bucket/array_new",
schema.dim.0.name="dimension0",
mode="append")
TileDB Array to Spark Dataframe
You can read a TileDB array into a Spark dataframe as follows. See Driver Options for a summary on the options you can use.
Scala PySpark SparkR
Copy val df = spark.read
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/my_array")
.load()
Copy df = spark.read
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/my_array")
.load()
Copy df <- read.df(uri = "s3://my_bucket/array_new", source = "io.tiledb.spark")
SparkSQL on TileDB Arrays
You can run SQL queries with Spark on TileDB arrays as follows:
Scala
Copy // Create a dataframe from a TileDB array
val df = spark.read
.format("io.tiledb.spark")
.option("uri", "s3://my_bucket/my_array")
.load()
// Create a view and run SQL
df.createOrReplaceTempView("tiledbArray");
val sql_df = spark.sql("SELECT * FROM tiledbArray")
sql_df.show()