A dataframe is a specialization of an array (see Use Cases). As such, any TileDB API works natively for writing to and reading from a dataframe modeled as an array. However, Python Pandas has a popular offering for dataframes in main memory and, therefore, TileDB offers special optimized reading functionality to read directly from an array into a Pandas dataframe. This How To guide describes this functionality.
Section CSV Ingestion describes how to ingest a dataframe from a CSV file into a 1D dense or a ND sparse array.
Reading From A Dense Array
Suppose you have ingested a CSV file into a 1D dense array.
To find out how many rows were ingested, you can take a look at the array non-empty domain:
A = tiledb.open("my_array", mode="r")
# Example ((0, 7667791),)
To read data from an array into a Pandas dataframe, you can use the df operator:
For dense arrays, this operator allows you to efficiently slice any subset of rows:
TileDB is a columnar format and, therefore, allows you to efficiently subselect on columns / attributes as follows:
Reading From A Sparse Array
Suppose you have ingested a CSV file into a 2D sparse array.
This array allows for efficient slicing on the two dimensions as follows:
# If both dimensions are integers
# Or, natively on the datatype of the dimensions (e.g., datetime)