How to parse a string containing text for number / float in javascript?

I am trying to create a javascript function capable of parsing a sentence and returning a number.

Here is jsFiddle I installed for test cases below -

  • "I have 1 pound" β†’ 1
  • β€œI have Β£ 3.50 to spend” β†’ 3.50
  • β€œI have 23.00 pounds” β†’ 23
  • 'Β£ 27.33' β†’ 27.33
  • '$ 4345.85' β†’ 4345.85
  • '3.00' β†’ 3
  • '7.0' β†’ 7
  • 'Must have 2.0.' β†’ 2
  • "Must be 15.20." β†’ 15.20
  • '3.15' β†’ 3.15
  • "I have only 5, not very good." β†’ 5
  • '34.23' β†’ 34.23
  • 'sdfg545.14sdfg' β†’ 545.14
  • β€œYesterday I spent $ 235,468.13. Today I want to spend less. β†’ 235468.13
  • "Yesterday I spent 340pounds." β†’ 340
  • "Today I spent Β£ 14.52, tomorrow - Β£ 17.30" β†’ 14.52
  • "I have 0 trees, Β£ 11.33 tomorrow" β†’ 0

16 and 17 indicate that he must find the first number. I understand that some of the test cases can be tough, but I welcome everything that gives me reasonable coverage.

Here is the format I use for my function

function parseSentenceForNumber(sentence){ return number; //The number from the string } 

I think I can get 60-80% of myself, but I expect that the correct expression may be the best solution here, and I have never been a great one. I hope I have enough test cases, but feel free to add any missing ones.

Your help is greatly appreciated.

** UPDATE **

Workloads of work answers, and I need to spend some time looking at them in more detail. Mike Samuel mentioned commas and .5, which forces me to add a couple more test cases

18. "I have 1000 pounds" β†’ 1000 19. '5' β†’ 0.5

And jsalonen mentioned adding a test case without numbers

20. 'This sentence contains no numbers' β†’ null

Here is the updated fiddle using jsalonen solution, without my changes in the specification I would be 100% there, with the changes I am '95%. Can anyone suggest a solution to number 18 with commas?

** UPDATE **

I added an expression to split the commas into the jsalonen function, and I'm 100%.

Here is the final function

 function parseSentenceForNumber(sentence){ var matches = sentence.replace(/,/g, '').match(/(\+|-)?((\d+(\.\d+)?)|(\.\d+))/); return matches && matches[0] || null; } 

And the final fiddle

Really appreciate the help, and I improved my knowledge of regular expressions along the way. thanks

+4
source share
6 answers

An answer that matches all negative and positive numbers with any number of digits:

 function parseSentenceForNumber(sentence){ var matches = sentence.match(/(\+|-)?((\d+(\.\d+)?)|(\.\d+))/); return matches && matches[0] || null; } 

Consider also adding negative test cases, for example, testing what happens when a string does not contain numbers:

 test("Test parseSentenceForNumber('This sentence contains no numbers')", function() { equal( parseSentenceForNumber('This sentence contains no numbers'), null ); }); 

Full fiddle: http://jsfiddle.net/cvw8g/6/

+2
source

Regular expression:

 \d+(?:\.\d+)? 

must do it.

  • \d+ matches a sequence of digits.
  • . \ d + matches the decimal point followed by digits.
  • (?:...)? makes this group optional

This does not apply to the special case when the fraction is equal to zero, and you do not want the fraction to be included in the result, which was difficult with a regular expression (I am not sure that this can be done even though I am ready to be proved incorrectly). This should be easier to handle after comparing the number with the decimal point in it.

After matching the number in the string, use parseFloat() to convert it to a number, and toFixed(2) to get 2 decimal places.

+2
source

The general form of a number in machine-readable form:

 /[+\-]?((?:[1-9]\d*|0)(?:\.\d*)?|\.\d+)([eE][+-]?\d+)?/ 

grammar based

 number := optional_sign (integer optional_fraction | fraction) optional_exponent; optional_sign := '+' | '0' | Ξ΅; integer := decimal_digit optional_integer; optional_integer := integer | Ξ΅; optional_fraction := '.' optional_integer | Ξ΅; fraction := '.' integer; optional_exponent := ('e' | 'E') optional_sign integer; 

so you can do

 function parseSentenceForNumber(sentence){ var match = sentence.match( /[+\-]?((?:[1-9]\d*|0)(?:\.\d*)?|\.\d+)([eE][+-]?\d+)?/); return match ? +match[0] : null; //The number from the string } 

but it does not take into account

  • Locales that use fraction separators other than '.' as in "Ο€ - 3.14159 ..."
  • A comma to separate groups of numbers, as in 1,000,000
  • Fractions
  • Percent
  • Natural language descriptions, such as "a dozen" or "15 million pounds"

To handle these cases, you can search for β€œentity extraction,” since this is the main field that tries to find phrases that define structured data in unstructured text.

+2
source

Another possible regex:

 /\d+\.?\d{0,2}/ 

It means:

  • \d : one or more digits
  • \.? : zero or one period
  • d{0,2} up to two digits

http://jsfiddle.net/cvw8g/7/

+1
source

There is no regular expression, it also uses parsing (so it will return NaN if it does not find the number).
Finds the first number in a string, then tries to parse it from that point.

Skips all your tests and returns a number, not a string, so you can immediately use it for comparison or arithmetic.

 function parseSentenceForNumber(str) { //tacked on to support the new "1,000" -> 1000 case str = str.replace(',', ''); var index; //find the first digit for (index = 0; index < str.length; ++index) { if (str.charAt(index) >= '0' && str.charAt(index) <= '9') break; } //checking for negative or decimal point (for '.5') if (index > 0 && ( str.charAt(index - 1) == '-' || str.charAt(index - 1) == '.' )) //go back one character --index; //get the rest of the string, accepted by native parseFloat return parseFloat(str.substring(index)); } 
+1
source

All tests pass and I think this is a lot more readable:

 function parseSentenceForNumber(sentence){ return parseFloat(sentence.replace(/,(?=\d)/g,"").match(/-?\.?\d.*/g)); } 

... almost all tests are good: it returns "NaN" instead of "null" when there is no number in the sentence. But I think that "NaN" is more informative than a simple "zero".

Here is jsFiddle: http://jsfiddle.net/55AXf/

+1
source

Source: https://habr.com/ru/post/1493711/


All Articles