My data frame consists of people and the city in which they live at a particular point in time. I would like to generate one destination-origin matrix for each year that records the number of moves from one city to another. I'd like to know:
- How can I automatically generate destination tables for each year in my dataset?
- How can I generate all tables in the same 5x5 format, 5 is the number of cities in my example?
- Is there more efficient code than I suggest below? I intend to run it on a very large dataset.
Consider the following example:
#An example dataframe id=sample(1:5,50,T) year=sample(2005:2010,50,T) city=sample(paste(rep("City",5),1:5,sep=""),50,T) df=as.data.frame(cbind(id,year,city),stringsAsFactors=F) df$year=as.numeric(df$year) df=df[order(df$id,df$year),] rm(id,year,city)
My best attempt
#Creating variables for(i in 1:length(df$id)){ df$origin[i]=df$city[i] df$destination[i]=df$city[i+1] df$move[i]=ifelse(df$orig[i]!=df$dest[i] & df$id[i]==df$id[i+1],1,0)
Creating a source table for 2007
yr07=df[df$year_move==2007,] table(yr07$origin,yr07$destination)
Result
City1 City2 City3 City5 City1 0 0 1 2 City2 2 0 0 0 City5 1 1 0 0
source share