This should probably increase:
library(data.table) DT <- data.table(data.df) DT[, c("Species", "SizeClass", "Infected") := as.list(strsplit(Class, "\\.")[[1]]), by=Class ]
Reasons for the increase:
data.table pre allocates memory for columns- each column assignment in data.frame reassigns the entirety of the data (as opposed to .table data)
- The
by operator allows you to implement the strsplit task once for each unique value.
Here is a good quick method for the whole process.
# Save the new col names as a character vector newCols <- c("Species", "SizeClass", "Infected") # split the string, then convert the new cols to columns DT[, c(newCols) := as.list(strsplit(as.character(Class), "\\.")[[1]]), by=Class ] DT[, c(newCols) := lapply(.SD, factor), .SDcols=newCols] # remove the old column. This is instantaneous. DT[, Class := NULL] ## Have a look: DT[, lapply(.SD, class)] # Time Location Replicate Population Species SizeClass Infected # 1: integer integer integer numeric factor factor factor DT
source share