Parse numbers in FParsec

I started learning FParsec. It has a very flexible way of analyzing numbers; I can provide a set of number formats that I want to use:

type Number = | Numeral of int | Decimal of float | Hexadecimal of int | Binary of int let numberFormat = NumberLiteralOptions.AllowFraction ||| NumberLiteralOptions.AllowHexadecimal ||| NumberLiteralOptions.AllowBinary let pnumber = numberLiteral numberFormat "number" |>> fun num -> if num.IsHexadecimal then Hexadecimal (int num.String) elif num.IsBinary then Binary (int num.String) elif num.IsInteger then Numeral (int num.String) else Decimal (float num.String) 

However, the language I'm trying to make out is a bit strange. A number can be a number (non-negative int ), decimal (non-negative float ), hexadecimal (with the prefix #x ) or binary (with the prefix #b ):

 numeral: 0, 2 decimal: 0.2, 2.0 hexadecimal: #xA04, #x611ff binary: #b100, #b001 

Right now I have to do the parsing twice, replacing # with 0 (if necessary) in order to use pnumber :

 let number: Parser<_, unit> = let isDotOrDigit c = isDigit c || c = '.' let numOrDec = many1Satisfy2 isDigit isDotOrDigit let hexOrBin = skipChar '#' >>. manyChars (letter <|> digit) |>> sprintf "0%s" let str = spaces >>. numOrDec <|> hexOrBin str |>> fun s -> match run pnumber s with | Success(result, _, _) -> result | Failure(errorMsg, _, _) -> failwith errorMsg 

What is the best way to parse in this case? Or how can I modify FParsec CharStream to facilitate the conditional parsing process?

+6
source share
1 answer

Parsing numbers can be pretty messy if you want to generate good error messages and check for overflow correctly.

The following is a simple FParsec implementation of your number parser:

 let numeralOrDecimal : Parser<_, unit> = // note: doesn't parse a float exponent suffix numberLiteral NumberLiteralOptions.AllowFraction "number" |>> fun num -> // raises an exception on overflow if num.IsInteger then Numeral(int num.String) else Decimal(float num.String) let hexNumber = pstring "#x" >>. many1SatisfyL isHex "hex digit" |>> fun hexStr -> // raises an exception on overflow Hexadecimal(System.Convert.ToInt32(hexStr, 16)) let binaryNumber = pstring "#b" >>. many1SatisfyL (fun c -> c = '0' || c = '1') "binary digit" |>> fun hexStr -> // raises an exception on overflow Binary(System.Convert.ToInt32(hexStr, 2)) let number = choiceL [numeralOrDecimal hexNumber binaryNumber] "number literal" 

Generating good error messages on overflow can complicate this implementation a bit, because ideally you will also need to backtrack after the error, so the error position ends at the beginning of the number literal (see numberLiteral for an example).

An easy way to handle overflow exception gracefully is to use a small exception handling combinator, for example:

 let mayThrow (p: Parser<'t,'u>) : Parser<'t,'u> = fun stream -> let state = stream.State try p stream with e -> // catching all exceptions is somewhat dangerous stream.BacktrackTo(state) Reply(FatalError, messageError e.Message) 

Then you can write

 let number = mayThrow (choiceL [...] "number literal") 

I'm not sure what you wanted to say with โ€œalter FParsec CharStream to simplify the conditional parsing process,โ€ but the following example demonstrates how you could write a low-level implementation that uses only CharStream methods.

 type NumberStyles = System.Globalization.NumberStyles let invariantCulture = System.Globalization.CultureInfo.InvariantCulture let number: Parser<Number, unit> = let expectedNumber = expected "number" let inline isBinary c = c = '0' || c = '1' let inline hex2int c = (int c &&& 15) + (int c >>> 6)*9 let hexStringToInt (str: string) = // does no argument or overflow checking let mutable n = 0 for c in str do n <- n*16 + hex2int c n let binStringToInt (str: string) = // does no argument or overflow checking let mutable n = 0 for c in str do n <- n*2 + (int c - int '0') n let findIndexOfFirstNonNull (str: string) = let mutable i = 0 while i < str.Length && str.[i] = '0' do i <- i + 1 i let isHexFun = id isHex // tricks the compiler into caching the function object let isDigitFun = id isDigit let isBinaryFun = id isBinary fun stream -> let start = stream.IndexToken let cs = stream.Peek2() match cs.Char0, cs.Char1 with | '#', 'x' -> stream.Skip(2) let str = stream.ReadCharsOrNewlinesWhile(isHexFun, false) if str.Length <> 0 then let i = findIndexOfFirstNonNull str let length = str.Length - i if length < 8 || (length = 8 && str.[i] <= '7') then Reply(Hexadecimal(hexStringToInt str)) else stream.Seek(start) Reply(Error, messageError "hex number literal is too large for 32-bit int") else Reply(Error, expected "hex digit") | '#', 'b' -> stream.Skip(2) let str = stream.ReadCharsOrNewlinesWhile(isBinaryFun, false) if str.Length <> 0 then let i = findIndexOfFirstNonNull str let length = str.Length - i if length < 32 then Reply(Binary(binStringToInt str)) else stream.Seek(start) Reply(Error, messageError "binary number literal is too large for 32-bit int") else Reply(Error, expected "binary digit") | c, _ -> if not (isDigit c) then Reply(Error, expectedNumber) else stream.SkipCharsOrNewlinesWhile(isDigitFun) |> ignore if stream.Skip('.') then let n2 = stream.SkipCharsOrNewlinesWhile(isDigitFun) if n2 <> 0 then // we don't parse any exponent, as in the other example let mutable result = 0. if System.Double.TryParse(stream.ReadFrom(start), NumberStyles.AllowDecimalPoint, invariantCulture, &result) then Reply(Decimal(result)) else stream.Seek(start) Reply(Error, messageError "decimal literal is larger than System.Double.MaxValue") else Reply(Error, expected "digit") else let decimalString = stream.ReadFrom(start) let mutable result = 0 if System.Int32.TryParse(stream.ReadFrom(start), NumberStyles.None, invariantCulture, &result) then Reply(Numeral(result)) else stream.Seek(start) Reply(Error, messageError "decimal number literal is too large for 32-bit int") 

While this implementation parses hexadecimal and binary numbers without the aid of system methods, it ultimately delegates decimal analysis to Int32.TryParse and Double.TryParse.

As I said: it's dirty.

+9
source

Source: https://habr.com/ru/post/907762/


All Articles