I have two datasets
datf1 <- data.frame (name = c("regular", "kklmin", "notSo", "Jijoh", "Kish", "Lissp", "Kcn", "CCCa"), number1 = c(1, 8, 9, 2, 18, 25, 33, 8)) #----------- name number1 1 regular 1 2 kklmin 8 3 notSo 9 4 Jijoh 2 5 Kish 18 6 Lissp 25 7 Kcn 33 8 CCCa 8 datf2 <- data.frame (name = c("reGulr", "ntSo", "Jijoh", "sean", "LiSsp", "KcN", "CaPN"), number2 = c(2, 8, 12, 13, 20, 18, 13)) #------------- name number2 1 reGulr 2 2 ntSo 8 3 Jijoh 12 4 sean 13 5 LiSsp 20 6 KcN 18 7 CaPN 13
I want to combine them in a column of names, however a partial match is allowed (to avoid the difficulty of merging spelling errors in a large data set and even to detect such spelling errors) and, for example,
(1) If four consecutive letters (all, if the number of letters is less than 4) in any position - this corresponds to a fine
ABBCD = BBCDK = aBBCD = ramABBBCD = ABB
(2) ABBCD = aBbCd
sensitivity is disabled in the match, e.g. ABBCD = aBbCd
(3) The new dataset will contain both names (names from datf1 and datf2). Thus, this letter we can determine if the match is perfect (can a single column with the number of letters match)
Is such a merger possible?
Changes:
datf1 <- data.frame (name = c("xxregular", "kklmin", "notSo", "Jijoh", "Kish", "Lissp", "Kcn", "CCCa"), number1 = c(1, 8, 9, 2, 18, 25, 33, 8)) datf2 <- data.frame (name = c("reGulr", "ntSo", "Jijoh", "sean", "LiSsp", "KcN", "CaPN"), number2 = c(2, 8, 12, 13, 20, 18, 13)) uglyMerge(datf1, datf2) name1 name2 number1 number2 matches 1 xxregular <NA> 1 NA 0 2 kklmin <NA> 8 NA 0 3 notSo <NA> 9 NA 0 4 Jijoh Jijoh 2 12 5 5 Kish <NA> 18 NA 0 6 Lissp LiSsp 25 20 5 7 Kcn KcN 33 18 3 8 CCCa <NA> 8 NA 0 9 <NA> reGulr NA 2 0 10 <NA> ntSo NA 8 0 11 <NA> sean NA 13 0 12 <NA> CaPN NA 13 0