Rewriting My Code to Run in Parallel (1)

As I have mentioned in my previous post I am about to make my code for finding co-integrated pairs run in parallel and more efficient.

But before I do so in the actual co-integration code I would like to run some tests to see whether it would improve the performance of calculations (my guess is a big Yes, otherwise I wouldn't be doing it). There are several packages for parallel computing in R and you have to find one that would match your system configuration, e.g. I am running Linux so I have chosen doMC package. You can find an interesting reading (although a bit outdated) about parallel computing packages in R here.

The main reason why I think my hypothesis about the speed up is true comes from this image:

cpu usage 1

CPUs usage 1

 

I captured this screen while the main co-integration script was running. Notice the wild movements in CPU1 and CPU2. It's far away from ideal state where both cores would run on 100%. And that's exactly what we are going to end up in this post. Please, follow up.

First of all, you should notice that we have one main for-loop and one inner for-loop. The question is which one of them should we make parallel? Let's try the inner one first.

I have prepared a small piece of code we are going to use in our testing. Here it is:

 

View Code RSPLUS
1
2
3
4
5
6
7
8
9
10
11
require(doMC)
require(fUnitRoots)
require(RcppArmadillo)
registerDoMC()
 
cols = 500
rows = 2000
 
# generate random matrix
x = rnorm(cols*rows)
x = matrix(data = x, ncol = cols, nrow = rows)

 

This will create a test matrix that is similar to our problem with co-integrated pairs in S&P 500 stocks. Function registerDoMC() will spawn number of workers equals to your number of CPUs (in my case 2). You can also specify explicitly how many workers you want to spawn.

Next step is to measure how long does it take to finish the original version of the code:

 

View Code RSPLUS
1
2
3
4
5
6
7
8
9
stime0 = system.time({
    for (i in 1:(cols-1)) {
        cat(i, " ")
        for (j in ((i+1):cols)) {
            m = fastLm(x[,j], x[,i])
            o = adfTest(resid(m), type="nc")@test$p.value
        }
    }
})[3]

 

On my system configuration it took 1454.935 seconds to complete the task. Now, lets introduce the first parallel version of the code:

 

View Code RSPLUS
1
2
3
4
5
6
7
8
9
ptime = system.time({
    for (i in 1:(cols-1)) {
        cat(i, " ")
        foreach(j = ((i+1):cols)) %dopar% {
            m = fastLm(x[,j], x[,i])
            o = adfTest(resid(m), type="nc")@test$p.value
        }
    }
})[3]

 

Update: Please note that the only differences are change of second for-loop for foreach-loop and addition of %dopar% operator.

In this case it took my system 1069.815 seconds to finish. It's much better but not exactly what I was hoping for. The image of CPUs usage tells us, that we can certainly do better. We are running dopar code 499 times and the behind the scenes management of foreach package along with doMC package took too much time and CPUs are not used efficiently.

CPUs usage 2

CPUs usage 2

 

Next option is to make the main for-loop run in parallel. Again, it's nothing difficult and we need to modify just few lines of code:

 

View Code RSPLUS
1
2
3
4
5
6
7
8
9
ptime2 = system.time({
    foreach(i = 1:(cols-1)) %dopar% {
        cat(i, " ")
        for (j in ((i+1):cols)) {
            m = fastLm(x[,j], x[,i])
            o = adfTest(resid(m), type="nc")@test$p.value
        }
    }
})[3]

 

This piece of code took my system 820.143 seconds to finish. With this version we have reduced the computational time by nearly 44%. That's quite amazing! CPUs usage looks now as follows:

CPUs usage 3

CPUs usage 3

 

As you can see, both cores were almost all the time on 100%. It's nearly unbelievable how easy it is to use these packages and implement them. However, there are still some open questions to be solved, e.g. how to get the output from this foreach-loop, and we will look at shortly.

 

P.S.: there is one humongous drawback which makes our code still to run slowly.. it's adfTest() function and we will optimize it as well. See you next time.

 

 

  1. nikke’s avatar

    Hi!

    Really nice blog you have!
    I am also trying to modify the code to run in parallel.
    I have just changed the foreach loop and added the %dopar% compared to the code in this post http://blog.quanttrader.org/2011/04/optimizing-my-r-code/

    running into couple of issues:
    1. the cols variable is not set. Getting error: object 'cols' not found
    2. next doesnt seem to work in the foreach context.

    Reply

  2. sebastien’s avatar

    Sounds cool, but you should show how to recuperate the outputs of the tests (o and m objects) in both the normal (for loop) and parallel (for each %dopar% loops ) versions. And show how this is quicker in parallel!

    Thanks

    Reply

    1. QuantTrader’s avatar

      By default, the results are returned in a list for parallel foreach-loop so constructing the return variable is implicitly in the code already. In the case of serial for-loop I would use preallocated memory space, i.e. it's almost free to change it.

      Reply

    2. harry’s avatar

      It'd be nice if you highlighted (or mentioned) that the only differences between the snippets of code was the addition and position of the %dopar% command. Easy enough for me to understand, but this may benefit others.

      Reply

      1. QuantTrader’s avatar

        Thank you Harry for suggestion.

        Reply

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.