Efficient 2D convolution utils with large compute to data-movement ratio, by using run-time indexing on maintained operation lineage