VLookup type method in R

Question

VLookup type method in R

I have df with thousands of tickers for different future contracts. They have an abbreviated name (which appears later) and a long name (which I want to have in another df)

full_list <- structure( list( Ticker = c("AC", "AIC", "BBS", "BO", "C", "DF"), Long_Name = c("Ethanol -- CBOT", "DJ UBS Commodity Index -- CBOT", "South American Soybeans -- CBOT", "Soybean Oil -- CBT", "Corn -- CBT", "Dow Jones Industrial Average -- CBT") ), .Names = c("Ticker", "Long_Name"), row.names = c(NA, 6L), class = "data.frame" )

This df has a list that I get daily. I have to go and find the abbreviated name and match it with the long name.

 replace <- structure( list( Type = c("F", "F", "F", "F", "F", "F"), Location = c("US", "US", "US", "US", "US", "US"), Symbol = c("BO", "C", "DF", "AIC", "AC", "BBS"), Month = c("V13", "U13", "U13", "U13", "U13", "U13") ), .Names = c("Type", "Location", "Symbol", "Month"), row.names = c(NA, 6L), class = "data.frame" )

I'm looking for R to do this, replace the $ Symbol column and find these values in the full_list column of $ Ticker and add the column, replace $ Long_Name, where the corresponding full_list $ Long_Name will be rewritten. Hope this makes sense. I understand that column names are hard to understand.

This will be a simple VLookup in excel, but I have a script that I will use on a daily basis almost in R.

+6

r match vlookup

Tim Sep 05 '13 at 20:21

source share

5 answers

You can use match - which gives an index where the first argument falls into the second argument. For instance:

 arg1 <- c("red","blue") arg2 <- c("blue","red") > match(arg1,arg2) [1] 2 1

Then just create a new column in the replacement frame (note - you should call it something else, because replacement is actually a function in r) using the full_list data frame with the corresponding characters.

 replace$Long_Name <- full_list$Long_Name[match(replace$Symbol,full_list$Ticker)] > replace Type Location Symbol Month Long_Name 1 F US BO V13 Soybean Oil -- CBT 2 F US C U13 Corn -- CBT 3 F US DF U13 Dow Jones Industrial Average -- CBT 4 F US AIC U13 DJ UBS Commodity Index -- CBOT 5 F US AC U13 Ethanol -- CBOT 6 F US BBS U13 South American Soybeans -- CBOT

+8

dayne Sep 05 '13 at 20:26

source share

If this is a large data set, you can use the environment search:

 library(qdap) replace$Long_Name <- lookup(replace$Symbol, full_list) ## > replace ## Type Location Symbol Month Long_Name ## 1 F US BO V13 Soybean Oil -- CBT ## 2 F US C U13 Corn -- CBT ## 3 F US DF U13 Dow Jones Industrial Average -- CBT ## 4 F US AIC U13 DJ UBS Commodity Index -- CBOT ## 5 F US AC U13 Ethanol -- CBOT ## 6 F US BBS U13 South American Soybeans -- CBOT

+6

Tyler rinker Sep 05 '13 at 20:33

source share

Required data.table response

 library(data.table) full_list <- data.table(full_list, key='Symbol') replace <- data.table(replace, key='Ticker') replace[full_list]

FWIW in the data set above about 1e5 rows of the key data.table will be much faster than the other approaches listed (except for the qdap version, I have not tried this). merge timers can be found here

+5

Justin Sep 05 '13 at 21:16

source share

If you are using a large dataset, you may encounter some time and memory problems; if this happens, try the following:

 require(plyr) colnames(replace)<-c("Type", "Location", "Ticker", "Month") Full<-join(full_list, replace, by = "Ticker", type = "left", match = "all") > Full Ticker Long_Name Type Location Month 1 AC Ethanol -- CBOT F US U13 2 AIC DJ UBS Commodity Index -- CBOT F US U13 3 BBS South American Soybeans -- CBOT F US U13 4 BO Soybean Oil -- CBT F US V13 5 C Corn -- CBT F US U13 6 DF Dow Jones Industrial Average -- CBT F US U13

Although this is more than a one-line solution, merging may take some time to process with large data frames. In addition, the plyr package can be your best friend.

+1

Jellio Sep 05 '13 at 20:45

source share

Joshua ulrich · Accepted Answer · 2013-09-05T20:26:19+0000

merge them:

 > merge(full_list, replace, by.x="Ticker", by.y="Symbol") Ticker Long_Name Type Location Month 1 AC Ethanol -- CBOT F US U13 2 AIC DJ UBS Commodity Index -- CBOT F US U13 3 BBS South American Soybeans -- CBOT F US U13 4 BO Soybean Oil -- CBT F US V13 5 C Corn -- CBT F US U13 6 DF Dow Jones Industrial Average -- CBT F US U13

VLookup type method in R

More articles: