Test data set for automatic testing of the string validator UTF-8

I wrote a UTF-8 string check function.

The function accepts the byte buffer and the length in UTF-8 characters and checks that the buffer consists of the specified number of valid UTF-8 characters.

If the buffer is too short or large, or if it contains invalid UTF8 characters, no check is performed.

Now I want to write auto tests for my validator.

Is there a dataset that I can reuse?

I found this file: http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt , but it seems that it is not suitable for my purposes - this is more for how I understand.

Any clues?

+4
source share
2 answers
  • Valid UTF-8 data to see it pass
    • Strings containing characters that need 1 block of code, 2, 3, and 4! (Do not just check "ABC" or "café").
  • Explicitly invalid data, such as some ISO-8859-1 lines (this is also not true UTF-8)
  • A string containing overlapping forms (1-byte character encoded as, for example.) They should not pass as UTF-8
  • String containing codes above U + 10FFFF
  • Everything listed here: http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

Depending on how good your code is:

  • Capturing a UTF-8 string that encodes anything from U + D800 to U + DFFF (surrogate pairs that should never be present on a UTF-8 string)

Those test cases:

Should pass: "ABC" 41 42 43 Should pass: "ABÇ" 41 42 c3 87 Should pass: "ABḈ" 41 42 e1 b8 88 Should pass: "AB𝜍" 41 42 f0 9d 9c 8d Should fail: Bad data 80 81 82 83 Should fail: Bad data C2 C3 Should fail: Overlong C0 43 Should fail: encodes F5 80 80 80 U+140000 Should fail: encodes F4 90 80 80 U+110000 Should fail: encodes ED A0 80 U+D800 

(I just checked them, so double check thrice if you get unexpected results.)

+2
source

I finished loading UTF-8-test.txt by one, comparing the result with the expected one (which I hard-coded on the card line number->ok/fail ).

This works, but I would also like to get some cases for incomplete UTF-8 characters, buffer overflows, etc. So, if you know the existing test suite (not even reusable, but which can serve as a source of inspiration), please send the link here.

0
source

Source: https://habr.com/ru/post/1346315/


All Articles