Python: find first line in line

Question

Python: find first line in line

Given a string and a list of substrings, I want any string in the first position to be in a string. If there is no substring, return 0. I want to ignore case.

Is there anything more pythonic than:

given = 'Iamfoothegreat' targets = ['foo', 'bar', 'grea', 'other'] res = len(given) for t in targets: i = given.lower().find(t) if i > -1 and i < res: res = i if res == len(given): result = 0 else: result = res

This code works, but seems inefficient.

+5

python string match

foosion Mar 04 '16 at 17:30

source share

5 answers

Padraic cunningham · Answer 1 · 2016-03-04T17:39:53+0000

I would not return 0, since it can be a starting index, or use -1, not one or the other value that is not possible, you can just use try / except and return the index:

 def get_ind(s, targ): s = s.lower() for t in targets: try: return s.index(t.lower()) except ValueError: pass return None # -1, False ...

If you want to ignore case for input string, then also set s = s.lower() before the loop.

You can also do something like:

 def get_ind_next(s, targ): s = s.lower() return next((s.index(t) for t in map(str.lower,targ) if t in s), None)

But this does in the worst case, two searches for each substring, and not for one with try / except. It will be at least also a short circuit in the first match.

If you really want the min of all to change to:

 def get_ind(s, targ): s = s.lower() mn = float("inf") for t in targ: try: i = s.index(t.lower()) if i < mn: mn = i except ValueError: pass return mn def get_ind_next(s, targ): s = s.lower() return min((s.index(t) for t in map(str.lower, targ) if t in s), default=None)

default=None only works in python> = 3.4, so if you are using python2 you have to change the logic a bit.

Python3 terms:

 In [29]: s = "hello world" * 5000 In [30]: s += "grea" + s In [25]: %%timeit ....: targ = [re.escape(x) for x in targets] ....: pattern = r"%(pattern)s" % {'pattern' : "|".join(targ)} ....: firstMatch = next(re.finditer(pattern, s, re.IGNORECASE),None) ....: if firstMatch: ....: pass ....: 100 loops, best of 3: 5.11 ms per loop In [18]: timeit get_ind_next(s, targets) 1000 loops, best of 3: 691 µs per loop In [19]: timeit get_ind(s, targets) 1000 loops, best of 3: 627 µs per loop In [20]: timeit min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0]) 1000 loops, best of 3: 1.03 ms per loop In [21]: s = 'Iamfoothegreat' In [22]: targets = ['bar', 'grea', 'other','foo'] In [23]: get_ind_next(s, targets) == get_ind(s, targets) == min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0]) Out[24]: True

python2:

 In [13]: s = "hello world" * 5000 In [14]: s += "grea" + s In [15]: targets = ['foo', 'bar', 'grea', 'other'] In [16]: timeit get_ind(s, targets)1000 loops, best of 3: 322 µs per loop In [17]: timeit min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0]) 1000 loops, best of 3: 710 µs per loop In [18]: get_ind(s, targets) == min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0]) Out[18]: True

You can also combine the first with min:

 def get_ind(s, targ): s,mn = s.lower(), None for t in targ: try: mn = s.index(t.lower()) yield mn except ValueError: pass yield mn

Which does the same job, it's a little better and can be a little faster:

 In [45]: min(get_ind(s, targets)) Out[45]: 55000 In [46]: timeit min(get_ind(s, targets)) 1000 loops, best of 3: 317 µs per loop

Kordi · Answer 2 · 2016-03-04T17:57:26+0000

Use regex

In another example, just use regex, so think that python regex implementation is very fast. Not my regex function

 import re given = 'IamFoothegreat' targets = ['foo', 'bar', 'grea', 'other'] targets = [re.escape(x) for x in targets] pattern = r"%(pattern)s" % {'pattern' : "|".join(targets)} firstMatch = next(re.finditer(pattern, given, re.IGNORECASE),None) if firstMatch: print firstMatch.start() print firstMatch.group()

Output

 3 foo

If nothing is found, output is nothing. Must be explained to make sure nothing is found.

Significantly more normal not really pythonic

Also give you line with line

 given = 'Iamfoothegreat'.lower() targets = ['foo', 'bar', 'grea', 'other'] dct = {'pos' : - 1, 'string' : None}; given = given.lower() for t in targets: i = given.find(t) if i > -1 and (i < list['pos'] or list['pos'] == -1): dct['pos'] = i; dct['string'] = t; print dct

Output:

 {'pos': 3, 'string': 'foo'}

If the item is not found:

 {'pos': -1, 'string': None}

Performance comparison as

with this line and pattern

 given = "hello world" * 5000 given += "grea" + given targets = ['foo', 'bar', 'grea', 'other']

1000 cycles with a timeout:

 regex approach: 4.08629107475 sec for 1000 normal approach: 1.80048894882 sec for 1000

10 loops. Now with much larger goals (goals * 1000):

 normal approach: 4.06895017624 for 10 regex approach: 34.8153910637 for 10

gtlambert · Answer 3 · 2016-03-04T17:45:22+0000

You can use the following:

 answer = min([given.lower().find(x.lower()) for x in targets if x.lower() in given.lower()] or [0])

Demo 1

 given = 'Iamfoothegreat' targets = ['foo', 'bar', 'grea', 'other'] answer = min([given.lower().find(x.lower()) for x in targets if x.lower() in given.lower()] or [0]) print(answer)

Output

Demo 2

 given = 'this is a different string' targets = ['foo', 'bar', 'grea', 'other'] answer = min([given.lower().find(x.lower()) for x in targets if x.lower() in given.lower()] or [0]) print(answer)

Output

I also believe that the following solution is quite readable:

 given = 'the string' targets = ('foo', 'bar', 'grea', 'other') given = given.lower() for i in range(len(given)): if given.startswith(targets, i): print i break else: print -1

PM 2Ring · Answer 4 · 2016-03-04T18:06:59+0000

Your code is pretty good, but you can make it a little more efficient by moving the .lower transform from the loop: there is no need to repeat it for each target substring. The code can be slightly condensed using lists, although this does not necessarily speed it up. I use the nested comp list to not give given.find(t) twice for each t .

I wrapped my code in a function for easier testing.

 def min_match(given, targets): given = given.lower() a = [i for i in [given.find(t) for t in targets] if i > -1] return min(a) if a else None targets = ['foo', 'bar', 'grea', 'othe'] data = ( 'Iamfoothegreat', 'IAMFOOTHEGREAT', 'Iamfothgrease', 'Iamfothgret', ) for given in data: print(given, min_match(given, targets))

Output

 Iamfoothegreat 3 IAMFOOTHEGREAT 3 Iamfothgrease 7 Iamfothgret None

AMACB · Answer 5 · 2016-03-04T17:34:54+0000

Try the following:

 def getFirst(given,targets): try: return min([i for x in targets for i in [given.find(x)] if not i == -1]) except ValueError: return 0

Python: find first line in line

Use regex

Significantly more normal not really pythonic

Performance comparison as

More articles: