How to find the currency value in a string?

I am writing a small tool to extract a bunch of values ​​from a string (usually a tweet).

A string can consist of words and numbers along with a quantity, a prefix currency symbol (£, $, €, etc.), as well as a number of hashtags (#foo #bar). I launch the appEngine app and use tweepy to input tweets.

The current code should find the values ​​below:

tagex = re.compile(r'#.*')
curex = re.compile(ur'[£].*')
for x in api.user_timeline(since_id = t.lastimport):
          tags = re.findall(tagex, x.text)
          amount = re.findall(curex, x.text)[0]
          logging.info("Text: " + x.text)
          logging.info("Tags: " + str(tags))
          logging.info("Amount: " + amount)

where x.text is, for example, "Taxi London £ 6.50 #projectfoo #clientmeeting"

tagex finds the hashtags in order, but I can’t get curex to extract the amount I get: Amount: £ 6.50 #projectfoo #clientmeeting.

I also need to separate the currency symbol to get the amount as a float, but this should be pretty simple later.

+3
source share
2
>>> re.search(ur'([£$€])(\d+(?:\.\d{2})?)', s).groups()
(u'\xa3', u'6.50')
  • [£$€]
  • \d+(?:\.\d{2}) , ,
  • ()

, .* - , , .

+15

Marcog,

    re.search(ur'([£\$€])(\d+(?:\.\d{2})?)', s).groups()

.

+1

Source: https://habr.com/ru/post/1789228/


All Articles