Home page
Login
Contact us
Search…
Welcome to TileDB Embedded!
TileDB Cloud
TileDB Cloud Enterprise
Tutorials
Introduction
Arrays
Dataframes
Background
Why Arrays?
Data Model
Use Cases
Architecture
Key Concepts & Data Format
Internal Mechanics
Glossary
How To
Installation
Arrays
Groups
Embedded SQL
Configuration
Object Management
Virtual Filesystem
Performance
Catching Errors
Backends
API Reference
C
C++
C#
Python
R
Java
Go
Integrations & Extensions
Genomics
Geospatial
Distributed Computing
Spark
Quickstart
Launching an EMR Cluster
Usage
Driver Options
Supported Datatypes
Partitioning
Metrics
Performance Tips
Dask
SQL
More Help
Forum
Slack
Feature Request
Contact Us
Powered By
GitBook
Usage
Spark Dataframe to TileDB Array
You can create a new TileDB array from an existing Spark dataframe as follows. See
Driver Options
for a summary on the options you can use.
Scala
PySpark
SparkR
1
df
.
write
2
.
format
(
"io.tiledb.spark"
)
3
.
option
(
"uri"
,
"s3://my_bucket/array_new"
)
4
.
option
(
"schema.dim.0.name"
,
"dimension0"
)
5
// ... other options
6
.
save
()
Copied!
1
df
.
write
2
.
format
(
"io.tiledb.spark"
)
3
.
option
(
"uri"
,
"s3://my_bucket/array_new"
)
4
.
option
(
"schema.dim.0.name"
,
"dimension0"
)
5
# ... other options
6
.
save
()
Copied!
1
write.df
(
2
df
,
3
source
=
"io.tiledb.spark"
,
4
uri
=
"s3://my_bucket/array_new"
,
5
schema.dim.
0.
name
=
"dimension0"
)
Copied!
You can write a Spark dataframe to an existing TileDB array by simply adding an "append" mode.
Scala
PySpark
SparkR
1
df
.
write
2
.
format
(
"io.tiledb.spark"
)
3
.
option
(
"uri"
,
"s3://my_bucket/array_new"
)
4
.
option
(
"schema.dim.0.name"
,
"dimension0"
)
5
.
mode
(
"append"
)
// IMPORTANT
6
// ... other options
7
.
save
()
Copied!
1
df
.
write
2
.
format
(
"io.tiledb.spark"
)
3
.
option
(
"uri"
,
"s3://my_bucket/array_new"
)
4
.
option
(
"schema.dim.0.name"
,
"dimension0"
)
5
.
mode
(
"append"
)
# IMPORTANT
6
# ... other options
7
.
save
()
Copied!
1
write.df
(
2
df
,
3
source
=
"io.tiledb.spark"
,
4
uri
=
"s3://my_bucket/array_new"
,
5
schema.dim.
0.
name
=
"dimension0"
,
6
mode
=
"append"
)
Copied!
TileDB Array to Spark Dataframe
You can read a TileDB array into a Spark dataframe as follows. See
Driver Options
for a summary on the options you can use.
Scala
PySpark
SparkR
1
val
df
=
spark
.
read
2
.
format
(
"io.tiledb.spark"
)
3
.
option
(
"uri"
,
"s3://my_bucket/my_array"
)
4
.
load
()
Copied!
1
df
=
spark
.
read
2
.
format
(
"io.tiledb.spark"
)
3
.
option
(
"uri"
,
"s3://my_bucket/my_array"
)
4
.
load
()
Copied!
1
df
<-
read.df
(
uri
=
"s3://my_bucket/array_new"
,
source
=
"io.tiledb.spark"
)
Copied!
SparkSQL on TileDB Arrays
You can run SQL queries with Spark on TileDB arrays as follows:
Scala
1
// Create a dataframe from a TileDB array
2
val
df
=
spark
.
read
3
.
format
(
"io.tiledb.spark"
)
4
.
option
(
"uri"
,
"s3://my_bucket/my_array"
)
5
.
load
()
6
7
// Create a view and run SQL
8
df
.
createOrReplaceTempView
(
"tiledbArray"
);
9
val
sql_df
=
spark
.
sql
(
"SELECT * FROM tiledbArray"
)
10
sql_df
.
show
()
Copied!
Previous
Launching an EMR Cluster
Next
Driver Options
Last modified
7mo ago
Copy link
Contents
Spark Dataframe to TileDB Array
TileDB Array to Spark Dataframe
SparkSQL on TileDB Arrays