1 of 11

Read Arrays

Reading the Array Schema

Inspecting the Array Schema

You can get the schema of an existing array as follows:

The following code snippet shows how to inspect the various array schema members.

Inspecting Domain

You can inspect the members of the domain as follows:

Inspecting Dimensions

You can inspect a dimensions as follows:

Inspecting Attributes

You can inspect an attribute as follows:

Inspecting Filters

You can inspect filters as follows:

Basic Reading

To read either a dense or a sparse array, the user typically opens the array in read mode and provides a subarray, any subset of the attributes (potentially including the coordinates) and the layout to get the results into (see for more details). You can read from an array as follows:

Variable-length Attributes

You can read variable-length attributes (as written by the earlier example) as follows:

Fixed-length, Nullable Attributes

You can read fixed-length, nullable attributes as follows:

Variable-length, Nullable Attributes

You can read variable-length, nullable attributes as follows:

Getting the Non-empty Domain

You can get the non-empty domain of an array as follows:

Reopening Arrays

Assuming an already open array, you can reopen the array at the current timestamp. This is useful when potential writes happened since you last opened the array, and you wish to reopen it to get the most up-to-date view of the array. Also note that this is more efficient than closing and opening the array, as it will prevent refetching already loaded fragment metadata. You can reopen an array as follows:

Slicing Negative Domains

You can slice negative domains in Python as follows:

Reading Encrypted Arrays

To read from an encrypted array, you simply need to open it for reading using the encryption key you used to create it.

// ... create context ctx

// Open encrypted array for reading
const char key[] = "0123456789abcdeF0123456789abcdeF";
tiledb_config_t* config;
tiledb_config_alloc(&config, &error);
tiledb_config_set(config, "sm.encryption_type", "AES_256_GCM", &error);
tiledb_config_set(config, "sm.encryption_key", key, &error);

tiledb_array_t* array;
tiledb_array_alloc(ctx, "<array-uri>", &array);
tiledb_array_set_config(ctx, array, config);
tiledb_array_open(ctx, array, TILEDB_READ);

// ... create context ctx

// Open encrypted array for reading
const char key[] = "0123456789abcdeF0123456789abcdeF";
Array array(ctx,
    "<array-uri>",
    TILEDB_READ,
    // Load all fragments available
    TemporalPolicy(0, std::numeric_limits<uint64_t>::max()),
    EncryptionAlgorithm(AESGCM, key));

// Or, open at timestamp
uint64_t timestamp = 1561492235844; // In ms
Array array(ctx,
    "<array-uri>",
    TILEDB_READ,
    TemporalPolicy(0, timestamp),
    EncryptionAlgorithm(AESGCM, key));

# All array and schema opening APIs support `key` as an
# optional keyword argument to open encrypted arrays:

key = "0123456789abcdeF0123456789abcdeF"

tiledb.DenseArray(uri, key=key)
tiledb.SparseArray(uri, key=key)
tiledb.open(uri, key=key)
tiledb.ArraySchema.load(uri, key=key)

ctx <- tiledb_ctx()
arrptr <- 
  tiledb:::libtiledb_array_open_with_key(ctx@ptr, uridensewkey, "READ",
                                         encryption_key)

# timestamps for TileDB are milliseconds since epoch, we use
# R Datime object to pass the value
tstamp <- as.POSIXct(1577955845.678, origin="1970-01-01") 
arrptr <- 
  tiledb:::libtiledb_array_open_at_with_key(ctx@ptr, uridensewkey, "READ",
                                            encryption_key, tstamp)

// ... create context ctx

// Open encrypted array for reading
String key = "0123456789abcdeF0123456789abcdeF";
Array array = new Array(ctx, "<array-uri>", TILEDB_READ, TILEDB_AES_256_GCM, key.getBytes(StandardCharsets.UTF_8));

// Or, open at timestamp
uint64_t timestamp = 1561492235844; // In ms
Array array = new Array(ctx, "<array-uri>", TILEDB_READ, TILEDB_AES_256_GCM, key.getBytes(StandardCharsets.UTF_8), timestamp);

// ... create context ctx

// Open encrypted array for reading
var encryption_key = "0123456789abcdeF0123456789abcdeF"
array, _ := tiledb.NewArray(ctx, "<array-uri>")
array.OpenWithKey(tiledb.TILEDB_READ, tiledb.TILEDB_AES_256_GCM, encryption_key)

// Or, open at timestamp
var timestamp uint64 = 1561492235844 // In ms
array.OpenAtWithKey(tiledb.TILEDB_READ, tiledb.TILEDB_AES_256_GCM, encryption_key, timestamp)

// ... create context ctx

// Open encrypted array for reading
string Key = "0123456789abcdeF0123456789abcdeF";
using Config config = new Config();
tiledb_config_t* config;
tiledb_config_alloc(&config, &error);
config.Set("sm.encryption_type", "AES_256_GCM");
config.Set("sm.encryption_key", key);

using Array array = new Array(ctx, "<array-uri>");
array.SetConfig(config);
array.Open(QueryType.Read);

Multi-Range Subarrays

You can slice a multi-range subarray as follows (also see ):

You can also get the various ranges set to the query as follows:

Query Conditions

Query conditions can selectively return data that meets a given expression. Rather than filter the results after a query, a condition is pushed down to TileDB and returns a subset of the valid elements.

tiledb_ctx_t * ctx;
tiledb_ctx_alloc(NULL, &ctx);

tiledb_array_t* array_read;
tiledb_array_alloc(ctx, "<array_uri>", &array_read);
tiledb_array_open(ctx, array_read, TILEDB_READ);

// Query condition where a1 != NULL
tiledb_query_condition_t* queryCondition1;
tiledb_query_condition_alloc(ctx, &queryCondition1);
tiledb_query_condition_init(ctx, queryCondition1, "a1", NULL, 0, TILEDB_NE);

// Query condition where a2 > 10.5
float conditionVal = 10.5f;
tiledb_query_condition_t* queryCondition2;
tiledb_query_condition_alloc(ctx, &queryCondition2);
tiledb_query_condition_init(ctx, queryCondition2, "a2", &conditionVal, sizeof(float), TILEDB_GT);

// Query condition where (a1 != NULL || a2 > 10.5)
tiledb_query_condition_t* queryCondition;
tiledb_query_condition_alloc(ctx, &queryCondition);
tiledb_query_condition_combine(ctx, queryCondition1, queryCondition2, TILEDB_OR, &queryCondition);

// Slice rows 1, 2 and cols 2, 3, 4
int32_t subarray_ranges[] = {1, 2, 2, 4};
tiledb_subarray_t* subarray;
tiledb_subarray_alloc(ctx, array_read, &subarray);
tiledb_subarray_set_subarray(ctx, subarray, subarray_ranges);

// Allocate buffers for query
int32_t a1_read[6];
uint64_t a1_read_size = sizeof(a1_read);
uint8_t a1_read_validity[6];
uint64_t a1_read_validity_size = sizeof(a1_read_validity);
float a2_read[6];
uint64_t a2_read_size = sizeof(a2_read);

tiledb_query_t* query_read;
tiledb_query_alloc(ctx, array_read, TILEDB_READ, &query_read);
tiledb_query_set_layout(ctx, query_read, TILEDB_ROW_MAJOR);
tiledb_query_set_data_buffer(ctx, query_read, "a1", a1_read, &a1_read_size);
tiledb_query_set_validity_buffer(ctx, query_read, "a1", a1_read_validity, &a1_read_validity_size);
tiledb_query_set_data_buffer(ctx, query_read, "a2", a2_read, &a2_read_size);
tiledb_query_set_subarray_t(ctx, query_read, subarray);
tiledb_query_set_condition(ctx, query_read, queryCondition);
tiledb_query_submit(ctx, query_read);

// For sparse arrays, a1_read_size will be set to number of bytes read into the buffer
// + Values that don't meet the query condition won't be read, so we can use byte math to get a1_result_num
int a1_result_num = (int)(a1_read_size / sizeof(int32_t));
for (size_t i = 0; i < a1_result_num; i++) {
    // Print buffers from sparse array...
}

// For dense arrays, we can use the fill value of an attribute to check the element met our conditions
tiledb_array_schema_t* schema;
tiledb_array_get_schema(ctx, array_read, &schema);
tiledb_attribute_t* attr;
tiledb_array_schema_get_attribute_from_name(ctx, schema, "a2", &attr);
float* fillVal;
uint64_t valSize;
tiledb_attribute_get_fill_value(ctx, attr, &fillVal, &valSize);
for (size_t i = 0; i < a2_read_size / sizeof(float); i++) {
    if (a2_read[i] != *fillVal) {
        // Print buffers from dense array...
    }
}

// Free allocated objects
tiledb_array_free(&array_read);
tiledb_query_condition_free(&queryCondition1);
tiledb_query_condition_free(&queryCondition2);
tiledb_query_condition_free(&queryCondition);
tiledb_subarray_free(&subarray);
tiledb_query_free(&query_read);

Context ctx;
Array array_read(ctx, "<array_uri>", TILEDB_READ);

// QueryCondition for attribute values where a1 == nullptr
QueryCondition queryCondition1(ctx);
queryCondition1.init("a1", nullptr, 0, TILEDB_EQ);

// QueryCondition for attribute values where a2 <= 10.5
float qcVal = 10.5;
QueryCondition queryCondition2(ctx);
queryCondition2.init("a2", &qcVal, sizeof(float), TILEDB_LE);

// QueryCondition for (a1 == nullptr && a2 <= 10.5)
QueryCondition queryCondition = queryCondition1.combine(queryCondition2, TILEDB_AND);

// Slice rows 1, 2 and cols 2, 3, 4
Subarray subarray(ctx, array_read);
subarray.add_range("rows", 1, 2)
        .add_range("cols", 2, 4);

// Allocate buffers for query
std::vector<int32_t> a1_read(6);
std::vector<uint8_t> a1_read_validity(6);
std::vector<float> a2_read(6);

Query query_read(ctx, array_read);
query_read.set_layout(TILEDB_ROW_MAJOR)
        .set_subarray(subarray)
        .set_data_buffer("a1", a1_read)
        .set_validity_buffer("a1", a1_read_validity)
        .set_data_buffer("a2", a2_read)
        .set_condition(queryCondition2);
query_read.submit();

// For sparse arrays, we can check query result buffers for number of elements read with our query condition
auto buffers = query_read.result_buffer_elements();
uint64_t a1_result_num = buffers["a1"].second;
uint64_t a2_result_num = buffers["a2"].second;
for (size_t i = 0; i < a1_result_num; i++) {
    // Print or consume buffers from sparse array...
}

// For dense arrays, we can use the fill value of an attribute to check the element met our conditions
const float* fillVal;
uint64_t fillValSize;
auto a2 = array_read.schema().attribute("a2");
a2.get_fill_value((const void**)&fillVal, &fillValSize);
for (size_t i = 0; i < a2_read.size(); i++) {
    if (a2_read[i] != *fillVal) {
        // Print or consume buffers from dense array...
    }
}

import tiledb

with tiledb.open(uri, mode="r") as A:
    # select cells where the attribute values for foo are less than 5
    # and bar equal to string asdf.

    # create a QueryCondition and pass a string containing a Python valid
    # Boolean expression. Note that strings are be enclosed in quotes (either
    # single or double quotes) whereas attribute names are not. The exception
    # is if the attribute name has any special characters in it, in which
    # case replace `namehere` with `attr("namehere")`.
    q = A.query(cond="foo > 5 and bar == 'asdf'")
    # Or:
    q = A.query(cond="attr('percent.mt') > 10.0")

    # output the results
    print(q.df[:])

## Example assumes current array is the standard Palmer Penguins data set

### Via qc creation API
qc <- tiledb_query_condition_init(attr = "bill_length_mm",
                                  value = 52,
                                  dtype = "FLOAT64",
                                  op = "GE")
res <- tiledb_array(uri, query_condition=qc)[]
dim(res)   # 344 -> 18 due to qc

### Via query parser
arr <- tiledb_array(uri)
qc <- parse_query_condition(bill_length_mm > 52, arr)
query_condition(arr) <- qc
dim(res)   # 344 -> 18 due to qc

### Or piped (for R 4.1.0 or later)
arr |>
    tdb_filter(bill_length_mm > 52) |>
    tdb_collect() |>
    dim()

// Create TileDB context and open the array
try(Context ctx = new Context(),
    Array array = new Array(ctx, "<array-uri>", TILEDB_READ)) {
  // Slice only rows 1, 2 and cols 2, 3, 4
  NativeArray subarray = new NativeArray(ctx, new long[] {1, 2, 2, 4}, Integer.class);
  // Prepare the query
  Query query = new Query(ctx, array, TILEDB_READ);
  
  // Prepare the vectors that will hold the results
  query.setBuffer(
        "d1", new NativeArray(ctx, 20, Integer.class));      
  query.setBuffer(
        "d2", new NativeArray(ctx, 20, Integer.class));      
  query.setBuffer(
        "a1", new NativeArray(ctx, 20, Integer.class));
  query.setBuffer(
        "a2", new NativeArray(ctx, 20, Float.class));
  
  query.setSubarray(subarray)
       .setLayout(TILEDB_ROW_MAJOR);

  // QueryCondition Equivalent to: a2 > 15.0f AND a1 == null;
  QueryCondition con1 = new QueryCondition(ctx, "a2", 15.0f, Float.class, TILEDB_GT);
  QueryCondition con2 = new QueryCondition(ctx, "a1", 0, null, TILEDB_EQ);
  // Combine the two conditions
  QueryCondition con3 = con1.combine(con2, TILEDB_AND);
  query.setCondition(con3);
  
  // Submit the query and close the array.
  query.submit();
    
  // NOTE: although not recommended (for performance reasons), 
  // you can get the coordinates even when slicing dense arrays. 
  
  // NOTE: The layout could have also been TILEDB_COL_MAJOR or
  // TILEDB_GLOBAL_ORDER.
  
  // Get the results in native java arrays
  int[] d1 = (int[]) query.getBuffer("d1");
  int[] d2 = (int[]) query.getBuffer("d2");
  int[] a1 = (int[]) query.getBuffer("a1");
  float[] a2 = (float[]) query.getBuffer("a2");
  
  // Close the query
  query.close();
}

// TODO

using TileDB.CSharp;

// Create TileDB context
using Context ctx = new Context();

// Prepare the array for reading
using Array array = new Array(ctx, "<array-uri>");
array.Open(QueryType.Read);

// QueryCondition for attribute values where a1 == null
using QueryCondition queryCondition1 =
    // TODO: Actually implement this API.
    QueryCondition.CreateIsNull(ctx, "a1");
// QueryCondition for attribute values where a2 <= 10.5
using QueryCondition queryCondition2 =
    QueryCondition.Create(ctx, "a2", 10.5f, QueryConditionOperatorType.LessThanOrEqual);

// QueryCondition for (a1 == null && a2 <= 10.5)
using QueryCondition queryCondition = queryCondition1 & queryCondition2;

// Slice rows 1, 2 and cols 2, 3, 4
using Subarray subarray = new Subarray(array);
subarray.AddRange("rows", 1, 2);
subarray.AddRange("cols", 2, 4);

int[] a1 = new int[6];
byte[] a1Validity = new byte[6];
float[] a2 = new float[6];

using Query query = new Query(ctx, array, QueryType.Read);
query.SetSubarray(subarray);
query.SetLayout(LayoutType.RowMajor);
query.SetDataBuffer("a1", a1);
query.SetValidityBuffer("a", a1Validity);
query.SetDataBuffer("a2", a2);
query.SetCondition(queryCondition);

query.Submit();

// For sparse arrays, we can check query result buffers
// for number of elements read with our query condition
ulong a1Num = query.GetResultDataElements("a1");
ulong a2Num = query.GetResultDataElements("a2");
for (ulong i = 0; i < a1Num; i++)
{
    // Print or consume buffers from sparse array...
}

// For dense arrays, we can use the fill value of an
// attribute to check the element met our conditions
float fillVal;
using (ArraySchema schema = array.Schema())
using (Attribute attribute = schema.Attribute("a2"))
{
    fillVal = attribute.FillValue<float>()[0];
}
for (ulong i = 0; i < a2Num; i++)
{
    if (a2[i] != fillVal)
    {
        // Print or consume buffer from dense array...
    }
}

Aggregates

Aggregates can be requested on the data of a query so that the computation is pushed down to TileDB rather than needing to compute the result externally. The currently supported operations can be found in .

Here are some examples of using aggregates with TileDB:

Incomplete Queries

What happens if the buffer you set to a read query is not big enough to hold the results? TileDB is smart enough not to crash in that case. Instead, it will try to fill as many results as possible to your buffers, and through the query status inform you on whether the query completed or if it is incomplete. In the latter case, you can consume the results and resubmit the query (with the same buffers or newly set buffers), and TileDB will pick up where it left off. TileDB also allows you to get a result estimate, but even that does not guarantee whether the buffers will indeed large enough to hold the actual result.

The example below shows how to typically submit a read query to account for the possible case of incomplete queries.

// ... create context ctx
// ... create query
// ... suppose you query attribute "a" with buffer size "a_size"

// Create a loop
tiledb_query_status_t status;
do {
  // Submit query and get status
  tiledb_query_submit(ctx, query);
  tiledb_query_get_status(ctx, query, &status);

  // IMPORTANT: check if there are any results, as your buffer
  // could have been too small to fit even a single result
  if (status == TILEDB_INCOMPLETE && a_size == 0) { // No results
    // You need to reallocate your buffers, otherwise 
    // you will get an infinite loop
  } else if (a_size > 0) {                          // There are results
    // Do something with the results
    // You could set new buffers to the query here
  }
} while (status == TILEDB_INCOMPLETE);

// Other statuses:  
// TILEDB_{FAILED, COMPLETED, INPROGRESS, UNINITIALIZED}

// ... create context ctx
// ... create query
// ... suppose you query attribute "a"

// Create a loop
Query::Status status;
do {
  // Submit query and get status
  query.submit();
  status = query.query_status();

  // IMPORTANT: check if there are any results, as your buffer
  // could have been too small to fit even a single result
  bool has_results = query.result_buffer_elements()["a"].second != 0
  if (status == Query::Status::INCOMPLETE && !has_results)) {
    // You need to reallocate your buffers, otherwise 
    // you will get an infinite loop
  } else if (has_results) {
    // Do something with the results
    // You could set new buffers to the query here
  }
} while (status == Query::Status::INCOMPLETE);

// Other statuses:  
// Query::Status::{FAILED, COMPLETED, INPROGRESS, UNINITIALIZED}

# Note that incomplete queries are only supported for
# sparse arrays in TileDB-Py at this time.
# Dense reads will internally reallocate buffers and resubmit the
# query until successful completion.
with tiledb.open(uri) as A:
    # iterate results as an OrderedDict
    iterable = A.query(return_incomplete=True).multi_index[:]
    
    # -- or --
    
    # iterate results as a dataframe
    iterable = A.query(return_incomplete=True).df[:]

    for result in iterable:
        # this loop will iterate until the query has
        # returned all results for the given range
        print(result)
        
# Querying estimated result size for query:
# Create query object as above:
#     iterable = A.query(return_incomplete=True).multi_index[:]
# then call `estimated_result_sizes`, which will return an
# OrderedDict of {'result name': estimate}
#     iterable.estimated_result_sizes()

# We create buffers to fit the entire result in memory. If 
# there is not enough memory allocated in the buffers to hold
# the complete result, TileDB will signal it, consume as much 
# as it can after which one can consume the remainder
ctx <- tiledb_ctx()
arrptr <- tiledb:::libtiledb_array_open(ctx@ptr, uridense,
                                        "READ")
qryptr <- tiledb:::libtiledb_query(ctx@ptr, arrptr, "READ")
subarr <- c(1L,4L, 1L,4L)
qryptr <- tiledb:::libtiledb_query_set_subarray(qryptr, subarr)
vec <- integer(4)  # reserve (insufficient) space
qryptr <- tiledb:::libtiledb_query_set_buffer(qryptr, "a", vec)
finished <- FALSE
while (!finished) {
  qryptr <- tiledb:::libtiledb_query_submit(qryptr)
  print(vec)
  finished <- 
      tiledb:::libtiledb_query_status(qryptr) == "COMPLETE"
}
res <- tiledb:::libtiledb_array_close(arrptr)

// ... create context ctx
// ... create query
// ... suppose you query attribute "a"

// Create a loop
QueryStatus status;
do {
  // Submit query and get status
  status = query.submit();

  // IMPORTANT: check if there are any results, as your buffer
  // could have been too small to fit even a single result
  bool has_results = query.resultBufferElements().get("a").getSecond() != 0
  if (status == TILEDB_INCOMPLETE && !has_results)) {
    // You need to reallocate your buffers, otherwise 
    // you will get an infinite loop
  } else if (has_results) {
    // Do something with the results
    // You could set new buffers to the query here
  }
} while (status == TILEDB_INCOMPLETE);

// Other statuses:  
// Query::Status::{FAILED, COMPLETED, INPROGRESS, UNINITIALIZED}

// ... create context ctx
// ... create query
// ... suppose you query attribute "a"

var queryStatus tiledb.QueryStatus

for {
    // Submit the query
    query.Submit()

    queryStatus, _ = query.Status()

  // IMPORTANT: check if there are any results, as your buffer
  // could have been too small to fit even a single result
    elements, _ := query.ResultBufferElements()
    resultNum := elements["a1"][1]
    if queryStatus == tiledb.TILEDB_INCOMPLETE && resultNum == 0 {
    // You need to reallocate your buffers, otherwise 
    // you will get an infinite loop
    } else {
    // Do something with the results
    // You could set new buffers to the query here
    }

    if queryStatus != tiledb.TILEDB_INCOMPLETE {
        break
    }
}

// Other statuses:  
// Query::Status::{FAILED, COMPLETED, INPROGRESS, UNINITIALIZED}

// ... create context ctx
// ... create query
// ... suppose you query attribute "a"

// Create a loop
QueryStatus status;
do
{
    // Submit query and get status
    query.Submit();
    status = query.Status();

    // IMPORTANT: check if there are any results, as your buffer
    // could have been too small to fit even a single result
    bool hasResults = query.GetResultDataElements("a") != 0;
    if (status == QueryStatus.Incomplete
        && query.GetStatusDetails().Reason == QueryStatusDetailsReason.UserBufferSize)
    {
        // You need to reallocate your buffers, otherwise 
        // you will get an infinite loop
    }
    else if (hasResults)
    {
        // Do something with the results
        // You could set new buffers to the query here
    }
} while (status == QueryStatus.Incomplete);

// Other statuses:
// QueryStatus.{Failed, Completed, InProgress, Uninitialized}

Result Estimation

When reading from sparse arrays or variable-length attributes from either dense or sparse arrays, there is no way to know how big the result will be, unless we actually execute the query. If that is the case, how should one allocate their buffers before passing them to TileDB? TileDB offers a way to get the estimated result size for any attribute. Note that TileDB does not actually execute the query and, therefore, getting the estimated result is very fast. However, this comes at the cost of accuracy, since allocating your buffers based on the estimate may still lead to . Therefore, you should always check for the query status, even if you allocate your buffers based on the result estimate.

You can get the result estimate as follows:

The number of bytes returned is an estimation and may not be divisible by the datatype size. It is left to the user to perform any ceiling operations necessary.

Time Traveling

You can open or reopen an array at a particular timestamp, if for example you'd like to see a view of the array in the past. See for more details. You can do so as follows:

Reading Fragment Info

You can get information about the fragments for any array as follows: