Reading Into Dataframes
A dataframe is a specialization of an array (see Use Cases). As such, any TileDB API works natively for writing to and reading from a dataframe modeled as an array. However, Python Pandas has a popular offering for dataframes in main memory and, therefore, TileDB offers special optimized reading functionality to read directly from an array into a Pandas dataframe. This How To guide describes this functionality.
Section CSV Ingestion describes how to ingest a dataframe from a CSV file into a 1D dense or a ND sparse array.

Reading From A Dense Array

Suppose you have ingested a CSV file into a 1D dense array.
To find out how many rows were ingested, you can take a look at the array non-empty domain:
1
A = tiledb.open("my_array", mode="r")
2
A.nonempty_domain()
3
# Example ((0, 7667791),)
Copied!
To read data from an array into a Pandas dataframe, you can use the df operator:
1
A.df[:]
Copied!
For dense arrays, this operator allows you to efficiently slice any subset of rows:
1
A.df[11:20]
Copied!
TileDB is a columnar format and, therefore, allows you to efficiently subselect on columns / attributes as follows:
1
A.query(attrs=['attr1']).df[:]
Copied!

Reading From A Sparse Array

Suppose you have ingested a CSV file into a 2D sparse array.
This array allows for efficient slicing on the two dimensions as follows:
1
# If both dimensions are integers
2
A.df[1:10, 1:100]
3
4
# Or, natively on the datatype of the dimensions (e.g., datetime)
5
A.df[slice(np.datetime64("2019-01-01 00:00:00"), np.datetime64("2019-01-02 23:59:59")), 0:10]
Copied!
You can prevent the Pandas dataframe from materializing the index columns (which will boost up reading performance) as follows:
1
A.query(index_col=[]).df[1:100, 0:10]
Copied!
You can check the non-empty domain on the two dimensions as follows:
1
A.nonempty_domain()
Copied!
Being a columnar format, TileDB allows you to efficiently subselect on attributes and dimensions as follows:
1
A.query(attrs=['attr1'], dims=['dim1']).df[:]
Copied!

Reading into Arrow Tables

If you are using Apache Arrow, TileDB can return dataframe results directly as arrow tables with zero-copy as follows:
1
A.query(return_arrow=True).df[:]
Copied!