The maximum allowed code point in Unicode is U + 10FFFF, which makes it a 21-bit code set (but not all 21-bit integers are valid Unicode code points; in particular, values from 0x110000 to 0x1FFFFF are not valid Unicode code points).
This is where the number 1,114,112 comes from: U + 0000 .. U + 10FFFF is 1,114,112 values.
However, there is also a set of code points that are surrogates for UTF-16. They are in the range U + D800 .. U + DFFF. These are 2048 code points that are reserved for UTF-16.
1,114,112 - 2,048 = 1,112,064
There are also 66 non-characters. They are partially defined in Corrigendum No. 9 : 34 values in the form U + nFFFE and U + nFFFF (where n is the value 0x00000, 0x10000, ... 0xF0000, 0x100000) and 32 values U + FDD0 - U + FDEF. Subtracting them, too, we get 1111 998 characters to be allocated. Three ranges are reserved for private use: U + E000 .. U + F8FF, U + F0000 .. U + FFFFD and U + 100000 .. U + 10FFFD. And the number of actually assigned values depends on the version of Unicode you are viewing. You can find information about the latest version in the Unicode Consortium . Among other things, the introduction says:
Unicode Standard, Version 7.0, Contains 112,956 Characters
Thus, only about 10% of the available code points were allocated.
I can’t explain why you found 1,112,114 as the number of code points.
Incidentally, the upper limit of U + 10FFFF is chosen so that all values in Unicode can be represented in one or two 2-byte coding units in UTF-16, using one high surrogate and one low surrogate to represent values outside of BMP or Basic. , which is in the range U + 0000 .. U + FFFF.