I have a need to perform a single calculation (sum and product) on a large amount of data in java environment. I know that the best solution would be to use SIMD architectures such as CUDA but I do not have the ability to have dedicated hardware.
You know frameworks for map reduce that run on single machine and exploit the multi-core ?
The fork- join framework is often recommended for these kinds of tasks since it automatically uses as many threads as there are cores, you can read more about it in the Java tutorial: http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html
On the other hand, if you are just summing and multiplying numbers it may be simpler to choose a fixed number of threads, like 4, and have the first thread sum the numbers at indexes 0,4,8,.. the second at indexes 1,5,9,... the third at 2,6,10,.. and so on