The subarray shape must follow the space tiling of the array as much as possible. See Choosing Tiling and Cell Layout since read performance is highly dependent on the array tiling.
The most efficient layout to read in is global order, as TileDB will avoid sorting and re-organizing the result cells internally as much as possible. Row-major and col-major layouts require sorting and thus more work for TileDB. In general, the read layout must coincide as much as possible with the global order.
The larger the read, the better for performance. This is because TileDB can take advantage of parallelism and perform better parallel tile filtering and IO.
Section Read Parallelism describes in detail how TileDB internally parallelizes reads. You should consider fine-tuning the parameters explained therein.