A simple memory bandwidth benchmark that executes the following statement in
a 1D compute shader:
a[i] = b[i] + 1
where i is the global invocation index.
The data is cycled through three different storage buffers, over many
iterations.
Loads and stores for data types that are not supported by the target device
are emulated using atomics.