C ++ 11 internal representation of std :: string (libstdC ++)

How is std :: string internally represented in C ++ 11 (libstdc ++)?

While copying inside the implementation, I found:

/* A string looks like this: * * [_Rep] * _M_length * [basic_string<char_type>] _M_capacity * _M_dataplus _M_refcount * _M_p ----------------> unnamed array of char_type * * Where the _M_p points to the first character in the string, and * you cast it to a pointer-to-_Rep and subtract 1 to get a * pointer to the header. * * This approach has the enormous advantage that a string object * requires only one allocation. All the ugliness is confined * within a single %pair of inline functions, which each compile to * a single @a add instruction: _Rep::_M_data(), and * string::_M_rep(); and the allocation function which gets a * block of raw bytes and with room enough and constructs a _Rep * object at the front. * * The reason you want _M_data pointing to the character %array and * not the _Rep is so that the debugger can see the string * contents. (Probably we should add a non-inline member to get * the _Rep for the debugger to use, so users can check the actual * string length.) * * Note that the _Rep object is a POD so that you can have a * static <em>empty string</em> _Rep object already @a constructed before * static constructors have run. The reference-count encoding is * chosen so that a 0 indicates one reference, so you never try to * destroy the empty-string _Rep object. */ // _Rep: string representation // Invariants: // 1. String really contains _M_length + 1 characters: due to 21.3.4 // must be kept null-terminated. // 2. _M_capacity >= _M_length // Allocated memory is always (_M_capacity + 1) * sizeof(_CharT). // 3. _M_refcount has three states: // -1: leaked, one reference, no ref-copies allowed, non-const. // 0: one reference, non-const. // n>0: n + 1 references, operations require a lock, const. // 4. All fields==0 is an empty string, given the extra storage // beyond-the-end for a null terminator; thus, the shared // empty string representation needs no constructor. struct _Rep_base { size_type _M_length; size_type _M_capacity; _Atomic_word _M_refcount; }; 

I do not really understand these comments:

  • std :: string ref counted? How? I mean, _M_refcount is not a pointer, so if one line modifies it, the other cannot see it.
  • Does the buffer lie immediately after the header? If in this case I really do not understand why.
+6
source share
1 answer

GCC has moved away from the refcounted line to follow the C ++ 11 standard, but note that it is possible that your program will use it as part of its ABI compatibility implementation.

How is it counted

std::string does not have a member _Rep_Base , but a pointer to _Rep with _Rep inheriting from _Rep_Base

This is explained here:

  * Where the _M_p points to the first character in the string, and * you cast it to a pointer-to-_Rep and subtract 1 to get a * pointer to the header. 

The buffer lies after the header ...

Yes, but after the header of the _Rep object, and your line only has a pointer to it.

0
source

Source: https://habr.com/ru/post/972458/


All Articles