Characters extracted by istream >> double

Sample code in Coliru :

#include <iostream> #include <sstream> #include <string> int main() { double d; std::string s; std::istringstream iss("234cdefipxngh"); iss >> d; iss.clear(); iss >> s; std::cout << d << ", '" << s << "'\n"; } 

Here I read N3337 (presumably this is the same as C ++ 11). In [istream.formatted.arithmetic] we have (rephrase):

operator>>(double& val);

As with the inserts, these extractors depend on the local num_get <> object (22.4.2.1) to analyze the input stream data. These extractors behave as formatted input functions (as described in 27.7.2.2.1). After creating the watch object, the conversion occurs as if it were performed by the following code fragment:

typedef num_get< charT,istreambuf_iterator<charT,traits> > numget;
iostate err = iostate::goodbit;
use_facet< numget >(loc).get(*this, 0, *this, err, val);
setstate(err);

A look at 22.4.2.1:

The details of this operation are performed in three steps.
- Step 1: definition of the conversion specifier
- Step 2: Extract the characters from and determine the appropriate char value for the format expected from the conversion specification defined in step 1.
- Step 3: Save the results

In the description of stage 2, it is too long for me to insert it all here. However, it is clearly stated that all symbols must be extracted before attempting conversion; and, in addition, it is necessary to extract the following characters:

  • any of 0123456789abcdefxABCDEFX+-
  • Locale decimal_point()
  • Locale thousands_sep()

Finally, the rules for stage 3 include:

- for the floating point strtold function.

The numeric value that you want to save can be one of the following:

- zero if the conversion function cannot convert the entire field.

All of this, obviously, clearly indicates that the output of my code should be 0, 'ipxngh' . However, he does deduce something else.

Is this a compiler / library error? Is there any position that I skip for the locale to change the behavior of stage 2? (In another question, someone posted an example of a system that really extracts characters, but also extracts ipxn that are not listed in the list specified in N3337).

Update

As perreal pointed out, this text from stage 2 matters:

If true is selected, then if. has not yet accumulated, then the character’s position will be remembered, but the character is otherwise ignored. Otherwise, if. already accumulated, the character is discarded, and step 2 ends. If it is not discarded, then it is checked whether c allowed as the next character of the input field of the conversion specifier returned by step 1. If so, it accumulates.

If a character is either discarded or accumulated, then it advances in ++ and processes returns to the beginning of stage 2.

So, step 2 can end if the character is in the list of valid characters, but is not a valid character for %g . He does not say exactly, but apparently this refers to the definition of fscanf from C99, which allows:

  • a non-empty sequence of decimal digits, optionally containing a decimal point character, then an optional part of the exponent, as defined in 6.4.4.2;
  • a 0x or 0X, then a non-empty sequence of hexadecimal digits, optionally containing the decimal point, then the optional binary part of the exponent, as defined in 6.4.4.2;
  • INF or INFINITY ignoring the case
  • NAN or NAN (n-char -sequence opt), ignoring case in the NAN part, where:

and

Unlike the "C" locale, additional forms of a sequence of objects specific to the locale can be taken.

So, actually Coliru's conclusion is correct; and in fact, the processing should try to check the sequence of characters extracted before the actual input in %g , while extracting each character.

The next question: is it allowed, as in the thread associated with earlier, to accept i , n , p , etc. in stage 2?

These are valid characters for %g , however, they are not included in the list of atoms that are allowed to Read Stage 2 (i.e. c == 0 for my last quote, so the character is not discarded and does not accumulate).

+6
source share
2 answers

This is a mess because it is likely that the implementation of gcc / libstdC ++ or clang / libC ++ does not match. It is not clear that β€œa check is made to determine whether c is allowed, since the next character of the input field of the conversion specifier returned by step 1” means, but I believe that the use of the phrase β€œnext character” means that the check should be context-sensitive (t .e. depends on already accumulated characters), therefore, an attempt to analyze, for example, "21abc" , should stop when 'a' is encountered. This is consistent with the discussion in LWG issue 2041 , which added this proposal to the standard after it was removed during the development of C ++ 11. The libC ++ error is error 17782 .

libstdC ++, on the other hand, refuses to parse "0xABp-4" past 0 , which is actually clearly inappropriate based on the standard (it should parse "0xAB" as hexfloat, as explicitly permitted by C99 fscanf for %g ).

The adoption of i , p and n not allowed by the standard. See LWG issue 2381 .

The standard describes the processing very accurately - it should be done "as if" by the specified code fragment that does not accept these characters. Compare the resolution of LWG issue 221 , in which they added x and x to the character list, because num_get described otherwise would not parse 0x for integer inputs.

Clang / libC ++ accepts "inf" and "nan" along with hexfloats, but not "infinity" as an extension. See bug 19611 .

+5
source

At the end of stage 2, it says:

If it is not discarded, a check is performed to determine if c is allowed as the next character of the conversion input field returned by step 1. If so, it accumulates.

If a character is either discarded or accumulated, then it is promoted using ++ in and processing returns to the beginning of stage 2.

Therefore, perhaps a not allowed in the %g specifier, and it does not accumulate and is not ignored.

+4
source

Source: https://habr.com/ru/post/972061/


All Articles