Iterator invalidation with `std :: string :: begin ()` / `std :: string :: end ()`?

#include <string> #include <iostream> int main() { std::string s = "abcdef"; std::string s2 = s; auto begin = const_cast<std::string const &>(s2).begin(); auto end = s2.end(); std::cout << end - begin << '\n'; } 

This code mixes the result of begin() const with the result of end() . None of these functions are allowed to invalidate any iterators. However, I am curious if the requirement of end() change the begin iterator variable actually means that the begin variable can be used with end .

Consider a C ++ 98 implementation, copy-on-write std::string ; the non-const functions begin() and end() cause the internal buffer to be copied, because the result of these functions can be used to modify the string. Thus, begin above starts for both s and s2 , but using the non-const end() member means that it no longer works for s2 , the container that created it.

The above code produces "unexpected" results with copy-to-write implementations, such as libstdc ++. Instead of end - begin matches s2.size() , libstdC ++ produces a different number .

  • Does begin invalid iterator in s2 , the container from which it was extracted represents the "nullification" of the iterator? If you look at the requirements for iterators, they all look for that iterator after calling .end() , so maybe begin is still qualified as a valid iterator and thus was not invalidated?

  • Is the above code correct in C ++ 98? In C ++ 11, which prohibits the implementation of copy to write?

From my brief reading of the specifications, it does not seem sufficiently specified, so there can be no guarantee that the results of begin() and end() can be used together, even without mixing constants and non-const.

+6
source share
4 answers

As you say, C ++ 11 is different from earlier versions in this regard. In C ++ 11, there is no problem because all attempts to allow copy while writing were deleted. In pre-C ++ 11, your code leads to undefined behavior; s2.end() call has the right to cancel existing iterators (and, possibly, also in g ++).

Note that even if s2 not a copy, the standard would allow it to invalidate iterators. In fact, the C ++ 98 CD even did things like the behavior of f( s.begin(), s.end() ) or s[i] == s[j] undefined. This was only implemented at the last minute and fixed so that only the first call to begin() , end() or [] can invalidate iterators.

+6
source

The code is fine: the CoW implementation is pretty much required to parse when there is a danger to the iterator or the link to the element is stored. That is, when you have something that refers to an element in one line, and a copy of it allows you to do the same, that is, use an iterator or an index operator, it will need to be divided. He could know about his iterators and update them as needed.

Of course, in a parallel system it is almost impossible to do all this without data races, but pre-C ++ 11 has no data race.

+2
source

Starting with N3337 ( which is essentially identical to C ++ 11 ), the specification reads ([string.require] / 4):

References, pointers, and iterators related to the elements of the basic_string sequence can be invalidated by the following uses of this basic_string object:
[...]
- Call non-constant member functions, except for the operator [], front, back, start, rbegin, end and rend.

At least since I read this, it means that calling begin or end does not allow nullification of any iterators. Although not explicitly stated, I would also take it as meaning that calling the const function cannot invalidate any iterators.

This wording still remains unchanged until n4296.

+2
source

C ++ 98 [lib.basic.string] / 5:

References, pointers, and iterators that refer to elements of the basic_string sequence may not be valid when using the following basic_string objects:

  • As an argument for functions that are not members of swap() , operator>>() and getline() .

  • As an argument to basic_string::swap() .

  • Call data() and c_str() member functions.

  • Calling non-constant member functions, except operator[]() , at() , begin() , rbegin() , end() and rend() .

  • After any of the above applications, except for the insert() and erase() forms that return iterators, the first call to the non-constant member functions operator[]() , at() , begin() , rbegin() , end() or rend()

Since the constructor s2 is a "function of a non-constant member", it corresponds to a call to non-const s2.end() - the first such call to the last mark above - to invalidate iterators. Therefore, the program does not have a specific behavior in C ++ 98.

I will not comment on C ++ 11 because, as I think, the other answers clearly explain that the program defined behavior in this context.

+1
source

Source: https://habr.com/ru/post/983055/


All Articles