mardi 14 juin 2016

Parallel optimization of array iteration


I want to optimize the following code sample and implement it in Cuda, but, so far, i was not able to find a way to "parallelize" the operation in order to lower the O(n) scenario. It may not even be possible judging by the fact that you need all previous elements to have gone through the operation in order to calculate the current one. If that is truly the case, which is the fastest implementation (Cuda wise) to iterate over the array given the fact that the data type is of 'unsigned int'?

for( i = 1; i < array_length; i++ )
        array[i] = array[i] + array[i-1];

Aucun commentaire:

Enregistrer un commentaire