TileDB's Python integration works well with Python's multiprocessing
ThreadPoolExecutor
and ProcessPoolExecutor
. We have a large usage example demonstrating parallel CSV ingestion, here, which may be run in either threadpool or processpool mode.
Caution: the default multiprocessing
execution method for ProcessPoolExecutor
on Linux is not compatible with TileDB (nor with most other multi-threaded applications) due to complications of global process state after fork
. ProcessPoolExecutor
must be used with multiprocessing.set_start_method("spawn")
to avoid unexpected behavior (such as hangs and crashes).