R - Speed ​​optimization and chess rating

Hello!

I am trying to calculate the player chess players rating for several players in 6 different skills (C1, C2, ... C6). I have a huge dataframe (data) of games that look at it (head (data)). In this game, one person (user) chooses between two other people (p1 / p2) to win.

row.names user p1 p2 skill win looser time --------------------------------------------------------- 2 KE CL HK C1 CL HK 433508371 25 KE HK JT c1 HK JT 433508401 35 KE AB JT C1 AB JT 433508444 110 NF IP HE C1 HE IP 433508837 78 NF IP AS C1 AS IP 433508848 82 NF IT CV C1 CV IT 433508860 

In another table (old_users), I track all the chess games of the players in 6 skills (head (old_users))

  user C1 C2 C3 C4 C5 C6 1 BD 1200 1200 1200 1200 1200 1200 2 NF 1200 1200 1200 1200 1200 1200 3 CH 1200 1200 1200 1200 1200 1200 4 AR 1200 1200 1200 1200 1200 1200 5 AS 1200 1200 1200 1200 1200 1200 6 MS 1200 1200 1200 1200 1200 1200 

Algorithm The algorithm passes through the data one row at a time per cycle, each time looking at the i-th row. The algorithm will look for p1 and p2 score data, extract the score of two players per skill. Then calculate their new score based on who wins or loses, and then updates the old_users cell with the corresponding new ratings.

What I need to do I need to do this as quickly as possible, and with the dataframe data, which are now 6000+ lines for a total of 24 players, this takes some time.

I tried using my current for loop, which gives the following points, which are too many.

  user system elapsed 104.72 0.28 118.02 

Questions

  • Why is this algorithm taking so long? Are there any commands that are not well suited for loops, etc. Etc.?
  • How can I achieve what I want faster?

Current for cycle

 for (i in 1:dim(data)[1]) { tmp_data<-data[i,] #Take the i'th row in data score_col<-which(colnames(old_users)==tmp_data$skill) #find old_user column which matched the skill played winners_old_data<-old_users[which(old_users$user==tmp_data$win),] #Fetch winner old scores loosers_old_data<-old_users[which(old_users$user==tmp_data$looser),] #Fetch looser old scores winners_new_score=winners_old_data[score_col]+(32/2)*(1-0+(1/2)*((loosers_old_data[score_col]-winners_old_data[score_col])/200)) #Calculate the winner new score loosers_new_score=loosers_old_data[score_col]+(32/2)*(0-1+(1/2)*((winners_old_data[score_col]-loosers_old_data[score_col])/200)) #Calculate the looser new score old_users[old_users$user==winners_old_data[[1]],score_col]<-winners_new_score #update cell in old_users old_users[old_users$user==loosers_old_data[[1]],score_col]<-loosers_new_score #update cell in old_users } 

Data to play with

https://drive.google.com/file/d/0BxE_CHLUGoS0WlczUkxLM3VtVjQ/edit?usp=sharing

Any help is much appreciated

Thanks!

// HK

+5
source share
1 answer

The data you posted is ridiculously small! To think that I needed to install something to parse it ...! If you could send much more data, I can check how useful my suggestion is.

I would recommend turning your user data into a matrix with identifiers as growth names and skills as code names. Why?

  • You can get a slight speed improvement by accessing the data through regular indexing, rather than using which( == ) everywhere. Or at least it will make your code more readable.

  • More importantly, the values ​​inside the matrix are changed locally; while with data.frame, I think your code is constantly creating a whole new object, which should take a lot of time.


 # read and transform your data data <- read.csv("data.txt", header = FALSE) names(data) <- c("user", "p1", "p2", "skill", "win", "looser", "time") users <- data.matrix(read.csv("users.txt", header = FALSE, row.names = 1)) colnames(users) <- paste("C", 1:6) for (i in 1:nrow(data)) { game <- data[i,] winner.old <- users[game$win, game$skill] looser.old <- users[game$looser, game$skill] winner.new <- winner.old + 32/2 * (1 - 0 + (1/2) * (looser.old-winner.old) / 200) looser.new <- looser.old + 32/2 * (0 - 1 + (1/2) * (winner.old-looser.old) / 200) users[game$win, game$skill] <- winner.new users[game$looser, game$skill] <- looser.new } 

Isn't that a lot easier to read? Hope it will be a little faster, please check and let me know. Or provide a wider range of data with which we can play. Thanks.

+2
source

Source: https://habr.com/ru/post/1203787/


All Articles