Why does Swift String.Index keep the index value 4 times the real?

I tried to implement the Boyer-Moore algorithm in the Swift Playground, and I used the Swift String.Index a lot, and something that bothered me was why the indexes are stored 4 times as much as it seems. >

For instance:

let why = "is s on 4th position not 1st".index(of: "s")

This code in the Swift Playground _compoundOffset 4does not generate 1. I am sure there is a reason for this, but I could not find an explanation anywhere.

This is not a duplicate of any question that explains how to get the char index in Swift, I know that I used the index (of :) function to illustrate the question. I wanted to know why the 2nd char value is 4 not 1 when using String.Index.

So, I believe that it keeps the indices private, and I don’t need to know the internal implementation, maybe this is due to the UTF16 and UTF32 encodings.

+4
source share
1 answer

First of all, never assume that _compoundOffsetis anything other than an implementation detail. _compoundOffsetis an internal property String.Indexthat uses bitmask to store two values ​​in this number:

  • encodedOffset, which is the offset of the index byte in units of UTF-16 code. It's publicly available to rely on. In your case, encodedOffsetthere is 1, because this is the offset for this character, measured in UTF-16 code units. Please note that the string encoding in memory does not matter! encodedOffsetthere is always UTF-16.

  • transcodedOffset, UTF-16. , . 0 , UTF-8, , UTF-16. transcodedOffset encodedOffset.

_compoundOffset == 4? transcodedOffset ​​encodedOffset 62 . , - encodedOffset == 1, transcodedOffset == 0 0b100, 4.

String.Index.

+4

Source: https://habr.com/ru/post/1688856/


All Articles