Download Prices From Yahoo In Parallel

Following my previous post about rewriting my code to run in parallel I have modified the code for downloading the S&P 500 prices from Yahoo to run i parallel as well. To be honest, I quite enjoy writing the code to run in parallel. It's fun for various reasons, but some theoretical background is highly recommended (e.g. here, here or here).

The good news is that it takes very short time (148 seconds) to download the data from Yahoo, but on the other hand the merge function still takes way too much time to complete. To be more specific, on average, 80% of time is spent on merge and 20% on downloading the actual data from Yahoo. It's faster than the original code but I don't like the idea of spending so much time on merge function.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
rm(list = ls(all = TRUE))
require(doMC)
require(quantmod)
require(tseries)
require(timeDate)
 
registerDoMC(20)
 
symbols = read.csv("/home/robo/Dropbox/work/FX/PairTrading/sp500.csv", header = F, stringsAsFactors = F)
nrStocks = length(symbols[,1])
 
dateStart = "2007-01-01"
 
z = foreach(i = 1:nrStocks, .combine = merge.zoo) %dopar% {
    cat("Downloading ", i, " out of ", nrStocks , "\n")
    x = get.hist.quote(instrument = symbols[i,], start = dateStart, quote = "AdjClose", retclass = "zoo", quiet = T)
    colnames(x) = symbols[i,1]
    x
}
 
z = as.xts(z)
 
registerDoMC()

 

One intuitive solution is to preallocate the memory and save the results there. However, I could't find a way how to modify a variable that is out of foreach scope when run in parallel. I understand that we could corrupt the data, but locking the modify/update would solve this issue (updating doesn't take much time). I tried to google/yahoo/duckduckgo/bing the solution but without luck. Do you know the answer?

This solution has jet another drawback.. missing data.

Then I saw one line of code where I change my data from type "zoo" into "xts". Xts is written in C, whereas zoo is written in pure R (I read some articles about intentions to merge this packages but who knows when it will be). So why not to change the variable into xts right after the download? Simple..

And the result? On average, 43 seconds!

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
rm(list = ls(all = TRUE))
require(doMC)
require(quantmod)
require(tseries)
require(timeDate)
 
registerDoMC(20)
 
symbols = read.csv("/home/robo/Dropbox/work/FX/PairTrading/sp500.csv", header = F, stringsAsFactors = F)
nrStocks = length(symbols[,1])
 
dateStart = "2007-01-01"
 
z = foreach(i = 1:nrStocks, .combine = merge.xts) %dopar% {
    cat("Downloading ", i, " out of ", nrStocks , "\n")
    x = get.hist.quote(instrument = symbols[i,], start = dateStart, quote = "AdjClose", retclass = "zoo", quiet = T)
    colnames(x) = symbols[i,1]
    x = as.xts(x)
}
 
registerDoMC()

 

 

Tags: , ,

  1. al’s avatar

    thanks! i got it to work on 2.10 32bit as well, just had to reduce the number of concurrent tasks (running this on a small laptop)

    Reply

  2. Joshua Ulrich’s avatar

    I haven't tested this, but it might be even faster if you stored the results of your foreach call in a list and only called merge.xts once (via do.call(merge, z)).

    Also, some of the xts C code has already been merged into zoo (coredata and lag).

    Reply

    1. QuantTrader’s avatar

      Thx for suggestion. However, the tricky part in serial computation is that you have to wait for the response from the server and you can't do nothing else in the meantime. That's the reason why I spawn 20 slaves.. so I can wait for 20 responses from the server at the same time.

      Reply

    2. al’s avatar

      i suspect either my SP500.csv or my R version (2.10) is out of date as i get a ton of errors running this script -- can you share your R version and sp500.csv? Thanks in advance

      Reply

      1. QuantTrader’s avatar

        Hi al, the details of my R installation are:

        R version 2.15.0 (2012-03-30)
        Copyright (C) 2012 The R Foundation for Statistical Computing
        ISBN 3-900051-07-0
        Platform: x86_64-pc-linux-gnu (64-bit)

        and as I show in my previous post, the most recent version of sp500.csv file is located here:

        http://dl.dropbox.com/u/14584441/sp500.csv

        Reply

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.