I got to looking for C sources, but I cannot find this function, and I really do not want to write it myself, because it must be there.
To clarify: Unicode points are represented as U + ######## - it’s easy to get what I need, this is the format that the character is written to the file (for example). The Unicode code word translates into bytes so that 7 bits of the rightmost byte are written to the first byte, then 6 bits of the following bits are written to the next byte, and so on. Emacs, of course, knows how to do this, but I cannot find to get from it the byte sequence of the UTF-8 encoded string as a sequence of bytes (each of which contains 8 bits).
Functions such as get-byte
or multybite-char-to-unibyte
only work with characters that can be represented using a maximum of 8 bits. I need the same thing as get-byte
, but for multi-byte characters, so instead of the integer 0..256, I would get either a vector of integers 0..256, or one long integer 0..2 ^ 32.
EDIT
Just in case, someone will need this later:
(defun haxe-string-to-x-string (s) (with-output-to-string (let (current parts) (dotimes (i (length s)) (if (> 0 (multibyte-char-to-unibyte (aref si))) (progn (setq current (encode-coding-string (char-to-string (aref si)) 'utf-8)) (dotimes (j (length current)) (princ (format "\\x%02x" (aref current j))))) (princ (format "\\x%02x" (aref si))))))))
user797257
source share