An efficient implementation of the Parallel Sum Reduction operator that computes the sum of a large array of values in both OpenMP and OpenCL.