Ignore deal with difflib.get_close_matches ()

How can I tell difflib.get_close_matches () to ignore case? I have a dictionary that has a specific format that includes capitalization. However, the test line may have full capitalization or lack of capitalization, and they should be equivalent. However, the results should be correctly capitalized, so I cannot use a modified dictionary.

import difflib names = ['Acacia koa A.Gray var. latifolia (Benth.) H.St.John', 'Acacia koa A.Gray var. waianaeensis H.St.John', 'Acacia koaia Hillebr.', 'Acacia kochii W.Fitzg. ex Ewart & Jean White', 'Acacia kochii W.Fitzg.'] s = 'Acacia kochi W.Fitzg.' # base case: proper capitalisation print(difflib.get_close_matches(s,names,1,0.9)) # this should be equivalent from the perspective of my program print(difflib.get_close_matches(s.upper(),names,1,0.9)) # this won't work because of the dictionary formatting print(difflib.get_close_matches(s.upper().capitalize(),names,1,0.9)) 

Conclusion:

 ['Acacia kochii W.Fitzg.'] [] [] 

Work code:

Based on Hugh Botwell's answer, I changed the code as follows to get a working solution (which should also work when more than one result is returned):

 import difflib names = ['Acacia koa A.Gray var. latifolia (Benth.) H.St.John', 'Acacia koa A.Gray var. waianaeensis H.St.John', 'Acacia koaia Hillebr.', 'Acacia kochii W.Fitzg. ex Ewart & Jean White', 'Acacia kochii W.Fitzg.'] test = {n.lower():n for n in names} s1 = 'Acacia kochi W.Fitzg.' # base case s2 = 'ACACIA KOCHI W.FITZG.' # test case results = [test[r] for r in difflib.get_close_matches(s1.lower(),test,1,0.9)] results += [test[r] for r in difflib.get_close_matches(s2.lower(),test,1,0.9)] print results 

Conclusion:

 ['Acacia kochii W.Fitzg.', 'Acacia kochii W.Fitzg.'] 
+6
source share
1 answer

I don't see a quick way to make difflib case insensitive.

A quick and dirty solution seems

  • make a function that converts the string to some canonical form (for example: uppercase, single, without punctuation)

  • use this function to make dict {canonical string: original string} and list [canonical string]

  • run .get_close_matches against the list of canonical strings, then include the results via dict to return the original strings

+7
source

Source: https://habr.com/ru/post/919979/


All Articles