Read Dataframes
A dataframe is a specialization of an array (see Use Cases). As such, any TileDB API works natively for writing to and reading from a dataframe modeled as an array. However, Python Pandas has a popular offering for dataframes in main memory and, therefore, TileDB offers special optimized reading functionality to read directly from an array into a Pandas dataframe. This How To guide describes this functionality.
Sections Create Dataframes and Write Dataframes describe how to ingest a dataframe into a 1D dense or a ND sparse array. This section covers how to read from the ingested dataframes directly into either Pandas or Arrow Table. Throughout the section, we call dense dataframe a dataframe modeled as a 1D dense array, and sparse dataframe a dataframe modeled as a ND sparse array.

Read the Dataframe Schema

Since the dataframe is an array, you can read the underlying schema in the same manner as for arrays as follows:
A = tiledb.open("my_array", mode="r")
A.schema

Read From A Dense Dataframe

Suppose you have ingested a CSV file into a 1D dense array.
To find out how many rows were ingested, you can take a look at the array non-empty domain:
A = tiledb.open("my_array", mode="r")
A.nonempty_domain()
# Example ((0, 7667791),)
To read data from an array into a Pandas dataframe, you can use the df operator:
A.df[:]
For dense arrays, this operator allows you to efficiently slice any subset of rows:
A.df[11:20]
TileDB is a columnar format and, therefore, allows you to efficiently subselect on columns / attributes as follows:
A.query(attrs=['attr1']).df[:]

Read From A Sparse Dataframe

Suppose you have ingested a CSV file into a 2D sparse array.
This array allows for efficient slicing on the two dimensions as follows:
# If both dimensions are integers
A.df[1:10, 1:100]
# Or, natively on the datatype of the dimensions (e.g., datetime)
A.df[slice(np.datetime64("2019-01-01 00:00:00"), np.datetime64("2019-01-02 23:59:59")), 0:10]
You can prevent the Pandas dataframe from materializing the index columns (which will boost up reading performance) as follows:
A.query(index_col=[]).df[1:100, 0:10]
You can check the non-empty domain on the two dimensions as follows:
A.nonempty_domain()
Being a columnar format, TileDB allows you to efficiently subselect on attributes and dimensions as follows:
A.query(attrs=['attr1'], dims=['dim1']).df[:]

Read into Arrow Tables

If you are using Apache Arrow, TileDB can return dataframe results directly as Arrow Tables with zero-copy as follows:
A.query(return_arrow=True).df[:]

Read Using SQL

TileDB supports SQL via its integration with MariaDB. A simple example is shown below, but for more details read section Embedded SQL.
import tiledb.sql, pandas
db = tiledb.sql.connect()
pandas.read_sql(sql="select * from `<array_uri>`", con=db)
Export as PDF
Copy link
Outline
Read the Dataframe Schema
Read From A Dense Dataframe
Read From A Sparse Dataframe
Read into Arrow Tables
Read Using SQL