Virtual Filesystem

TileDB is designed such that all IO to/from the storage backends is abstracted behind a Virtual Filesystem (VFS) module. This module supports simple operations, such as creating a file/directory, reading/writing to a file, etc. This abstraction enables us to easily plug in more storage backends in the future, effectively making the storage backend opaque to the user.

A nice positive “by-product” of this architecture is that it is possible to expose the basic virtual filesystem functionality via the TileDB APIs. This provides a simplified interface for file IO and directory management (i.e., not related to TileDB objects such as array) on all the storage backends that TileDB supports.

This page covers most of the TileDB VFS functionality.

Writing

// Create TileDB context
tiledb_ctx_t* ctx;
tiledb_ctx_alloc(NULL, &ctx);

// Create TileDB VFS
tiledb_vfs_t* vfs;
tiledb_vfs_alloc(ctx, NULL, &vfs);

// Write binary data
tiledb_vfs_fh_t* fh;
tiledb_vfs_open(ctx, vfs, "tiledb_vfs.bin", TILEDB_VFS_WRITE, &fh);
float f1 = 153.0;
const char* s1 = "abcd";
tiledb_vfs_write(ctx, fh, &f1, sizeof(float));
tiledb_vfs_write(ctx, fh, s1, strlen(s1));
tiledb_vfs_close(ctx, fh);
tiledb_vfs_fh_free(&fh);

// Write binary data again - this will overwrite the previous file
tiledb_vfs_open(ctx, vfs, "tiledb_vfs.bin", TILEDB_VFS_WRITE, &fh);
const char* s2 = "abcdef";
f1 = 153.1;
tiledb_vfs_write(ctx, fh, &f1, sizeof(float));
tiledb_vfs_write(ctx, fh, s2, strlen(s2));
tiledb_vfs_close(ctx, fh);
tiledb_vfs_fh_free(&fh);

// Append binary data to existing file
tiledb_vfs_open(ctx, vfs, "tiledb_vfs.bin", TILEDB_VFS_APPEND, &fh);
const char* s3 = "ghijkl";
tiledb_vfs_write(ctx, fh, s3, strlen(s3));
tiledb_vfs_close(ctx, fh);
tiledb_vfs_fh_free(&fh);

// Clean up
tiledb_vfs_free(&vfs);
tiledb_ctx_free(&ctx);

Reading

// Create TileDB context
tiledb_ctx_t* ctx;
tiledb_ctx_alloc(NULL, &ctx);

// Create TileDB VFS
tiledb_vfs_t* vfs;
tiledb_vfs_alloc(ctx, NULL, &vfs);

// Read binary data
tiledb_vfs_fh_t* fh;
tiledb_vfs_open(ctx, vfs, "tiledb_vfs.bin", TILEDB_VFS_READ, &fh);
float f1;
char s1[13];
s1[12] = '\0';
tiledb_vfs_read(ctx, fh, 0, &f1, sizeof(float));
tiledb_vfs_read(ctx, fh, sizeof(float), s1, 12);
printf("Binary read:\n%.1f\n%s\n", f1, s1);

// Clean up
tiledb_vfs_fh_free(&fh);
tiledb_vfs_free(&vfs);
tiledb_ctx_free(&ctx);

Managing

// Create TileDB context
tiledb_ctx_t* ctx;
tiledb_ctx_alloc(NULL, &ctx);

// Create TileDB VFS
tiledb_vfs_t* vfs;
tiledb_vfs_alloc(ctx, NULL, &vfs);

// Create directory
int is_dir = 0;
tiledb_vfs_is_dir(ctx, vfs, "dir_A", &is_dir);
if (!is_dir) {
  tiledb_vfs_create_dir(ctx, vfs, "dir_A");
  printf("Created 'dir_A'\n");
} else {
  printf("'dir_A' already exists\n");
}

// Creating an (empty) file
int is_file = 0;
tiledb_vfs_is_file(ctx, vfs, "dir_A/file_A", &is_file);
if (!is_file) {
  tiledb_vfs_touch(ctx, vfs, "dir_A/file_A");
  printf("Created empty file 'dir_A/file_A'\n");
} else {
  printf("'dir_A/file_A' already exists\n");
}

// Getting the file size
uint64_t file_size;
tiledb_vfs_file_size(ctx, vfs, "dir_A/file_A", &file_size);

// Moving files (moving directories is similar)
tiledb_vfs_move_file(ctx, vfs, "dir_A/file_A", "dir_A/file_B");

// Deleting files and directories. Note that, in the case of directories,
// the function will delete all the contents of the directory (i.e., it
// works even for non-empty directories).
tiledb_vfs_remove_file(ctx, vfs, "dir_A/file_B");
tiledb_vfs_remove_dir(ctx, vfs, "dir_A");

// Clean up
tiledb_vfs_free(&vfs);
tiledb_ctx_free(&ctx);

TileDB allows you to create/delete S3 buckets via its VFS functionality,

// ... create context ctx
// ... create VFS vfs

tiledb_vfs_create_bucket(ctx, vfs, "s3://my_bucket");
tiledb_vfs_remove_bucket(ctx, vfs, "s3://my_bucket");

However, extreme care must be taken when creating/deleting buckets on AWS S3. After its creation, a bucket may take some time to “appear” in the system. This will cause problems if the user creates the bucket and immediately tries to write a file in it. Similarly, deleting a bucket may not take effect immediately and, therefore, it may continue to “exist” for some time.

Configuring VFS

You can configure VFS by passing a configuration object upon its creation.

// Create TileDB context
tiledb_ctx_t* ctx;
tiledb_ctx_alloc(NULL, &ctx);

// Create a configuration object
tiledb_config_t *config;
tiledb_config_alloc(&config, NULL);
tiledb_config_set(config, "vfs.file.max_parallel_ops", "16", NULL);

// Create TileDB VFS with a config object
tiledb_vfs_t* vfs;
tiledb_vfs_alloc(ctx, config, &vfs);

// Clean up
tiledb_config_free(&config);
tiledb_vfs_free(&vfs);
tiledb_ctx_free(&ctx);

If you do not set a configuration object to VFS, then VFS will inherit the (default or set) configuration of the context. Otherwise, the set options in the passed configuration object will override those of the context's, but the rest of the options will still be inherited from the context's configuration.

Last updated