The OpenCL part seems to be capable of handling offsets. Also added extra check for accel_gemm.
accel_gemm