Multiprocessing (Python)

TileDB's Python integration works well with Python's multiprocessing ThreadPoolExecutor and ProcessPoolExecutor. We have a large usage example demonstrating parallel CSV ingestion, here, which may be run in either threadpool or processpool mode.

Caution: the default multiprocessing execution method for ProcessPoolExecutor on Linux is not compatible with TileDB (nor with most other multi-threaded applications) due to complications of global process state after fork. ProcessPoolExecutor must be used with multiprocessing.set_start_method("spawn") to avoid unexpected behavior (such as hangs and crashes).

Last updated