Python: find first line in line

Given a string and a list of substrings, I want any string in the first position to be in a string. If there is no substring, return 0. I want to ignore case.

Is there anything more pythonic than:

given = 'Iamfoothegreat' targets = ['foo', 'bar', 'grea', 'other'] res = len(given) for t in targets: i = given.lower().find(t) if i > -1 and i < res: res = i if res == len(given): result = 0 else: result = res 

This code works, but seems inefficient.

+5
source share
5 answers

I would not return 0, since it can be a starting index, or use -1, not one or the other value that is not possible, you can just use try / except and return the index:

 def get_ind(s, targ): s = s.lower() for t in targets: try: return s.index(t.lower()) except ValueError: pass return None # -1, False ... 

If you want to ignore case for input string, then also set s = s.lower() before the loop.

You can also do something like:

 def get_ind_next(s, targ): s = s.lower() return next((s.index(t) for t in map(str.lower,targ) if t in s), None) 

But this does in the worst case, two searches for each substring, and not for one with try / except. It will be at least also a short circuit in the first match.

If you really want the min of all to change to:

 def get_ind(s, targ): s = s.lower() mn = float("inf") for t in targ: try: i = s.index(t.lower()) if i < mn: mn = i except ValueError: pass return mn def get_ind_next(s, targ): s = s.lower() return min((s.index(t) for t in map(str.lower, targ) if t in s), default=None) 

default=None only works in python> = 3.4, so if you are using python2 you have to change the logic a bit.

Python3 terms:

 In [29]: s = "hello world" * 5000 In [30]: s += "grea" + s In [25]: %%timeit ....: targ = [re.escape(x) for x in targets] ....: pattern = r"%(pattern)s" % {'pattern' : "|".join(targ)} ....: firstMatch = next(re.finditer(pattern, s, re.IGNORECASE),None) ....: if firstMatch: ....: pass ....: 100 loops, best of 3: 5.11 ms per loop In [18]: timeit get_ind_next(s, targets) 1000 loops, best of 3: 691 µs per loop In [19]: timeit get_ind(s, targets) 1000 loops, best of 3: 627 µs per loop In [20]: timeit min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0]) 1000 loops, best of 3: 1.03 ms per loop In [21]: s = 'Iamfoothegreat' In [22]: targets = ['bar', 'grea', 'other','foo'] In [23]: get_ind_next(s, targets) == get_ind(s, targets) == min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0]) Out[24]: True 

python2:

 In [13]: s = "hello world" * 5000 In [14]: s += "grea" + s In [15]: targets = ['foo', 'bar', 'grea', 'other'] In [16]: timeit get_ind(s, targets)1000 loops, best of 3: 322 µs per loop In [17]: timeit min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0]) 1000 loops, best of 3: 710 µs per loop In [18]: get_ind(s, targets) == min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0]) Out[18]: True 

You can also combine the first with min:

 def get_ind(s, targ): s,mn = s.lower(), None for t in targ: try: mn = s.index(t.lower()) yield mn except ValueError: pass yield mn 

Which does the same job, it's a little better and can be a little faster:

 In [45]: min(get_ind(s, targets)) Out[45]: 55000 In [46]: timeit min(get_ind(s, targets)) 1000 loops, best of 3: 317 µs per loop 
+2
source

Use regex

In another example, just use regex, so think that python regex implementation is very fast. Not my regex function

 import re given = 'IamFoothegreat' targets = ['foo', 'bar', 'grea', 'other'] targets = [re.escape(x) for x in targets] pattern = r"%(pattern)s" % {'pattern' : "|".join(targets)} firstMatch = next(re.finditer(pattern, given, re.IGNORECASE),None) if firstMatch: print firstMatch.start() print firstMatch.group() 

Output

 3 foo 

If nothing is found, output is nothing. Must be explained to make sure nothing is found.

Significantly more normal not really pythonic

Also give you line with line

 given = 'Iamfoothegreat'.lower() targets = ['foo', 'bar', 'grea', 'other'] dct = {'pos' : - 1, 'string' : None}; given = given.lower() for t in targets: i = given.find(t) if i > -1 and (i < list['pos'] or list['pos'] == -1): dct['pos'] = i; dct['string'] = t; print dct 

Output:

 {'pos': 3, 'string': 'foo'} 

If the item is not found:

 {'pos': -1, 'string': None} 

Performance comparison as

with this line and pattern

 given = "hello world" * 5000 given += "grea" + given targets = ['foo', 'bar', 'grea', 'other'] 

1000 cycles with a timeout:

 regex approach: 4.08629107475 sec for 1000 normal approach: 1.80048894882 sec for 1000 

10 loops. Now with much larger goals (goals * 1000):

 normal approach: 4.06895017624 for 10 regex approach: 34.8153910637 for 10 
+2
source

You can use the following:

 answer = min([given.lower().find(x.lower()) for x in targets if x.lower() in given.lower()] or [0]) 

Demo 1

 given = 'Iamfoothegreat' targets = ['foo', 'bar', 'grea', 'other'] answer = min([given.lower().find(x.lower()) for x in targets if x.lower() in given.lower()] or [0]) print(answer) 

Output

 3 

Demo 2

 given = 'this is a different string' targets = ['foo', 'bar', 'grea', 'other'] answer = min([given.lower().find(x.lower()) for x in targets if x.lower() in given.lower()] or [0]) print(answer) 

Output

 0 

I also believe that the following solution is quite readable:

 given = 'the string' targets = ('foo', 'bar', 'grea', 'other') given = given.lower() for i in range(len(given)): if given.startswith(targets, i): print i break else: print -1 
+1
source

Your code is pretty good, but you can make it a little more efficient by moving the .lower transform from the loop: there is no need to repeat it for each target substring. The code can be slightly condensed using lists, although this does not necessarily speed it up. I use the nested comp list to not give given.find(t) twice for each t .

I wrapped my code in a function for easier testing.

 def min_match(given, targets): given = given.lower() a = [i for i in [given.find(t) for t in targets] if i > -1] return min(a) if a else None targets = ['foo', 'bar', 'grea', 'othe'] data = ( 'Iamfoothegreat', 'IAMFOOTHEGREAT', 'Iamfothgrease', 'Iamfothgret', ) for given in data: print(given, min_match(given, targets)) 

Output

 Iamfoothegreat 3 IAMFOOTHEGREAT 3 Iamfothgrease 7 Iamfothgret None 
+1
source

Try the following:

 def getFirst(given,targets): try: return min([i for x in targets for i in [given.find(x)] if not i == -1]) except ValueError: return 0 
0
source

Source: https://habr.com/ru/post/1244447/


All Articles