Concurrency

In addition to using parallelism internally, TileDB is designed having parallel programming in mind. Specifically, scientific computing users may be accustomed to using multi-processing (e.g., via MPI or Dask or Spark), or writing multi-threaded programs to speed up performance. TileDB enables concurrency via thread-/process-safe read or write queries, following a multiple writers / multiple readers model.

Concurrent Writes

In TileDB, a write operation is the process of (i) creating a write query object, (ii) submitting the query (potentially multiple times in the case of global writes) and (iii) finalizing the query object (important only in global writes). Each such write operation is atomic, i.e., a set of functions (which depends on the API) that must be treated atomically by each thread. For example, multiple threads should not submit the query for the same query object. Instead, you can have multiple threads create separate query objects for the same array (even sharing the same context or array object), and prepare and submit them in parallel with each thread.

Concurrent writes are achieved by having each thread or process create one or more separate fragments for each write operation. No synchronization is needed across processes and no internal state is shared across threads among the write operations and, thus, no locking is necessary. Regarding the concurrent creation of the fragments, thread- and process-safety is achieved because each thread/process creates a fragment with a unique name (as it is composed of a unique UUID and the current timestamp). Therefore, there are no conflicts even at the storage backend level.

TileDB supports lock-free concurrent writes of array metadata as well. Each write creates a separate array metadata file with a unique name (also incorporating a unique UUID and a timestamp), and thus name collisions are prevented.

Concurrent Reads

A read operation is the process of (i) creating a read query object and (ii) submitting the query (potentially multiple times in the case of incomplete queries) until the query is completed. Each such read operation is atomic, similar to writes.

During opening the array, TileDB loads the array schema and fragment metadata to main memory once, and shares them across all array objects referring to the same array. Therefore, for the multi-threading case, it is highly recommended that you open the array once outside the atomic block and have all threads create the query on the same array object. This is to prevent the scenario where a thread opens the array, then closes it before another thread opens the array again, and so on. TileDB internally employs a reference-count system, discarding the array schema and fragment metadata each time the array is closed and the reference count reaches zero (the schema and metadata are typically cached, but they still need to be deserialized in the above scenario). Having all concurrent queries use the same array object eliminates the above problem.

Reads in the multi-processing setting are completely independent and no locking is required. In the multi-threading scenario, locking is employed (through mutexes) only when the queries access the tile cache, which incurs a very small overhead.

Mixing Reads and Writes

Concurrent reads and writes can be arbitrarily mixed. Fragments are not visible unless the write query has been completed. Fragment-based writes make it so that reads simply see the logical view of the array without the new (incomplete) fragment. This multiple writers / multiple readers concurrency model of TileDB is more powerful than competing approaches, such as HDF5’s single writer / multiple readers (SWMR) model. This feature comes with a more relaxed consistency model, which is described in the Consistency section.

When using multiple processes on the same machine, you should be very careful with the level of parallelism you set to the TileDB context. By default, the TileDB library uses all available cores/threads in your system. TileDB will spawn the number of threads you specify through the config parameters (see Configuration Parameters) for TBB, VFS and async threads for each process, which may adversely affect the performance of your program.

Consolidation

Consolidation can be performed in the background in parallel with other reads and writes. Locking is required only for a very brief period. Specifically, consolidation is performed independently of reads and writes. The new fragment that is being created is not visible to reads before consolidation is completed. The only time when locking is required is after the consolidation finishes, when the old fragments are deleted and the new fragment becomes visible (this happens by flushing the fragment metadata on persistent storage, which is a very lightweight operation). TileDB enforces locking at this point. After all current reads release their shared lock on that array, the consolidation function gets an exclusive lock, deletes the old fragments, makes the new fragment visible, and releases the lock.

Note that locking (wherever it is needed) is achieved via mutexes in multi-threading, and file locking in multi-processing (for those storage backends that support it).

All POSIX-compliant filesystems and Windows filesystems support file locking. Note that Lustre supports POSIX file locking semantics and exposes local- (mount with -o localflock) and cluster- (mount with -o flock) level locking. Currently, TileDB does not use file locking on HDFS and S3 (these storage backends do not provide such functionality, but rather resource locking must be implemented as an external feature). For filesystems that do not support filelocking, the multi-processing programs are responsible for synchronizing the concurrent writes.

Particular care must be taken when consolidating arrays on AWS S3 and HDFS. Without filelocking TileDB has no way to prevent consolidation from deleting the old consolidated fragments. If another process is reading those fragments while consolidation is deleting them, the reading process is likely to error out or crash.

Array Creation

Array creation (i.e., storing the array schema on persistent storage) are neither thread- nor process-safe. We do not expect a practical scenario where multiple threads/processes attempt to create the same array in parallel. We suggest that only one thread/process creates the array, before multiple threads/processes start working concurrently for writes and reads.