Why does the merge result have more lines than the original file?

When I have mergetwo frames of data, the result has more rows than the original data.

In this case, all has 104956 lines , koppen has 3968 lines, and alltest has 130335 lines . Normally, alltest should have lines equal to or less than all .

Why is this inflation happening? I'm not sure if giving a reproducible example will help, since it works in previous cases, I used it.

alltest <- merge(all, koppen, by = "fips", sort = F)
+4
source share
1 answer

-, ?merge:

, , . , .

:

url    <- "http://koeppen-geiger.vu-wien.ac.at/data/KoeppenGeiger.UScounty.txt"
koppen <- read.table(url, header=T, sep="\t")
nrow(koppen)
# [1] 3594
length(unique(koppen$FIPS))
# [1] 2789

, , koppen FIPS. -, , , , , , :

koppen[koppen$FIPS==2020,]
#     STATE    COUNTY FIPS CLS  PROP
# 73 Alaska Anchorage 2020 Dsc 0.010
# 74 Alaska Anchorage 2020 Dfc 0.961
# 75 Alaska Anchorage 2020  ET 0.029

, . all FIPS, koppen, :

merge(all,unique(koppen$FIPS))

all[all$FIPS %in% unique(koppen$FIPS),]

all, :

merge(all,unique(koppen[c("STATE","COUNTY","FIPS")]),by="FIPS")

EDIT .

, koppen FIPS, CLS, , (, CLS) . :

# this extracts the row with the largest value of PROP, for that FIPS
url        <- "http://koeppen-geiger.vu-wien.ac.at/data/KoeppenGeiger.UScounty.txt"
koppen     <- read.csv(url, header=T, sep="\t")
koppen     <- with(koppen,koppen[order(FIPS,-PROP),])
sub.koppen <- aggregate(koppen,by=list(koppen$FIPS),head,n=1)
result     <- merge(all, sub.koppen, by="FIPS")

# this extracts a row at random
sub.koppen <- aggregate(koppen,by=list(koppen$FIPS), 
                        function(x)x[sample(1:length(x),1)])
result     <- merge(all, sub.koppen, by="FIPS")
+5

Source: https://habr.com/ru/post/1544054/


All Articles