Usage Spark Dataframe to TileDB Array
You can create a new TileDB array from an existing Spark dataframe as follows. See Driver Options for a summary on the options you can use.
Scala
Copy df.write
.format( "io.tiledb.spark" )
.option( "uri" , "s3://my_bucket/array_new" )
.option( "schema.dim.0.name" , "dimension0" )
// ... other options
.save()
PySpark
Copy df . write
. format ( "io.tiledb.spark" )
. option ( "uri" , "s3://my_bucket/array_new" )
. option ( "schema.dim.0.name" , "dimension0" )
# ... other options
. save ()
SparkR
Copy write.df(
df,
source = "io.tiledb.spark" ,
uri = "s3://my_bucket/array_new" ,
schema.dim.0.name = "dimension0" )
You can write a Spark dataframe to an existing TileDB array by simply adding an "append" mode.
Scala
Copy df.write
.format( "io.tiledb.spark" )
.option( "uri" , "s3://my_bucket/array_new" )
.option( "schema.dim.0.name" , "dimension0" )
.mode( "append" ) // IMPORTANT
// ... other options
.save()
PySpark
Copy df . write
. format ( "io.tiledb.spark" )
. option ( "uri" , "s3://my_bucket/array_new" )
. option ( "schema.dim.0.name" , "dimension0" )
. mode ( "append" ) # IMPORTANT
# ... other options
. save ()
SparkR
Copy write.df(
df,
source = "io.tiledb.spark" ,
uri = "s3://my_bucket/array_new" ,
schema.dim.0.name = "dimension0" ,
mode = "append" )
TileDB Array to Spark Dataframe
You can read a TileDB array into a Spark dataframe as follows. See Driver Options for a summary on the options you can use.
Scala
Copy val df = spark.read
.format( "io.tiledb.spark" )
.option( "uri" , "s3://my_bucket/my_array" )
.load()
PySpark
Copy df = spark . read
. format ( "io.tiledb.spark" )
. option ( "uri" , "s3://my_bucket/my_array" )
. load ()
SparkR
Copy df <- read.df( uri = "s3://my_bucket/array_new" , source = "io.tiledb.spark" )
SparkSQL on TileDB Arrays
You can run SQL queries with Spark on TileDB arrays as follows:
Scala
Copy // Create a dataframe from a TileDB array
val df = spark.read
.format( "io.tiledb.spark" )
.option( "uri" , "s3://my_bucket/my_array" )
.load()
// Create a view and run SQL
df.createOrReplaceTempView( "tiledbArray" );
val sql_df = spark.sql( "SELECT * FROM tiledbArray" )
sql_df.show()