We have already shown before how to set up CUDA in both Windows and Linux environment and how to create a simple project. Today we will demonstrate how easy it is to perform a simple Monte Carlo simulation using Thrust library. We will do so by estimating $\pi$ with random numbers.
From the project’s homepage we see that
Thrust is a parallel algorithms library which resembles the C++ Standard Template Library (STL). Thrust’s high-level interface greatly enhances programmer productivity while enabling performance portability between GPUs and multicore CPUs.
And this is very true. When you use Thrust it is almost identical to using the STL. While it might not be the fastest GPU implementation for your particular problem, it is easy to use and provides nice tools for quick development. It takes care of many things behind the scene (
cudaFree, etc.) and makes the code cleaner and pleasant to read.
We will illustrate the $\pi$ estimation with three different approaches. Firstly, we will implement the algorithm outlined in this presentation. Secondly, we will implement a modified code from Thrust examples (with CuRand in kernel). Finally, we will implement the standard CuRand example utilizing Thrust.
The first version of $\pi$ estimation is using random number generator implemented in Thrust library. The code is very strightforward and has three main components. The functor
randomPoint responsible for generating pair of random numbers is
Note, that the original presentation is missing and important line of code
rng.discard(2 * n). We would geerate the same random numbers again and again without it.
Following functor is responsible for deciding whether the point is inside a unit circle or not
And the final piece of code for gluing these parts together is
The original presentation uses
reduce functions separately. For a small performance improvement we combined these two functions into one -
The second version follows an example from Thrust library but instead of using Thrust random number generator we use CuRand random number generator in kernel function.
The heavy-weighting is done by following functor
The main difference between first and second approach is that here we generate 100,000 random numbers within each thread, i.e. we perform more computations within each kernel function.
Following is code which calls the code above and estimates $\pi$
The only downside is that we found it rather cumbersome to specify a different random number generator, e.g. Sobol or Mersenne Twister.
The last version utilizes CuRand function called from host rather than device. The helper functor responsible for deciding whether point is within unit circle is
The core part of code of this version is
Note, that we use Sobol random number generator in this case. Also note, that we use structure of arrays (SoA) rather than array of structures (AoS) as described in this presentation.
We hope you enjoyed this post and do not forget to leave us a comment. Thank you.