A simple way to run a program in r. Two for 47,022,066 operations

here is my question: I have two databases. One of them ( BASE1 ) has some Mexican states and their respective municipalities ( 19,138 vol. ) For 2008-2009:

 1. **State            Municipality   Year      ID** 
 2. Aguascalientes   Calvillo       2008       _
 3. Aguascalientes   Calvillo       2009       _
 4. Baja California  Tecate         2008       _
 5. Baja California  Tecate         2009       _
 6. (...)

Another database ( BASE2 ) contains an identifier for all municipalities ( 2,457 vol. ):

 1. **State              Municipality   ID**
 2. Aguascalientes     Calvillo       1001
 3. Baja California    Tecate         2003 
 4. (...)

I want to assign an appropriate identifier (BASE2) for each case in BASE1.

I am creating the following code:

for (i in 1:2457){
  for (j in 1:19138) {
    if (BASE_2$MUNICIPALITY_NAME[i]==BASE1$MUNICIPALITY_NAME[j] && BASE2$STATE_NAME[i]==BASE1$STATE_NAME[j]) {
      BASE1$MUNICIPALITY_ID[j]=BASE2$MUNICIPALITY_ID[i]
    } 

  }

}

The reason I am creating the code as follows:

if (BASE_2$MUNICIPALITY_NAME[i]==BASE1$MUNICIPALITY_NAME[j] **&& BASE2$STATE_NAME[i]==BASE1$STATE_NAME[j]**) 

consists in the fact that in some states there are similarly named municipalities.

The code works (I tested it with a subset of the databases), but it takes too much time to execute it. The question is:

? , , 2 457 19,138 = 47,022,066 ( )

+4
2

merge

base1  <- data.frame(key1 = letters[1:6], key2 = letters[7:12], values = 1:6)

base1
  key1 key2 values
1    a    g      1
2    b    h      2
3    c    i      3
4    d    j      4
5    e    k      5
6    f    l      6

base2  <- data.frame(key1 = letters[1:6], key2 = letters[7:12], values = 12:17)

base2
  key1 key2 values
1    a    g     12
2    b    h     13
3    c    i     14
4    d    j     15
5    e    k     16
6    f    l     17

merge(base1, base2, by = c("key1", "key2"))
  key1 key2 values.x values.y
1    a    g        1       12
2    b    h        2       13
3    c    i        3       14
4    d    j        4       15
5    e    k        5       16
6    f    l        6       17
+3

@matt_k, data.table:

library(data.table)
base1  <- data.table(key1 = letters[1:6], key2 = letters[7:12], values1 = 1:6)
base2  <- data.table(key1 = letters[1:6], key2 = letters[7:12], values2 = 12:17)
setkeyv(base1, c("key1", "key2"))
setkeyv(base2, c("key1", "key2"))
base1[base2]
#    key1 key2 values1  values2
# 1:    a    g       1       12
# 2:    b    h       2       13
# 3:    c    i       3       14
# 4:    d    j       4       15
# 5:    e    k       5       16
# 6:    f    l       6       17

dplyr:

library(dplyr)
base1  <- data.frame(key1 = letters[1:6], key2 = letters[7:12], values1 = 1:6)
base2  <- data.frame(key1 = letters[1:6], key2 = letters[7:12], values2 = 12:17)
left_join(base1, base2)
#   key1 key2 values1 values2
# 1    a    g       1      12
# 2    b    h       2      13
# 3    c    i       3      14
# 4    d    j       4      15
# 5    e    k       5      16
# 6    f    l       6      17

.

+2

Source: https://habr.com/ru/post/1523987/


All Articles