Scenario
You have many XML files stored in UTF-16 in a database or on a server where space is not a problem. You need to take most of these files that you need to access other systems as XML files, and it is imperative that you use as little space as possible.
Question
In fact, only 10% of files stored as UTF-16 need to be stored as UTF-16, the rest can be safely stored as UTF-8 and be fine. If we can have the ones that should be UTF-16, those are, and the rest is UTF-8, we can use about 40% less on the file system.
We tried to use large data compression, and this is useful, but we find that we get the same compression ratio with UTF-8, since we get faster with UTF-16 and UTF-8. Therefore, in the end, if as much data as possible is stored as UTF-8, we can not only save space while saving uncompress, but we can save more space even when compressing, and we can even save time from the compression itself.
purpose
To find out when there are Unicode characters in the XML file that require UTF-16, we can use UTF-16 only when we need to.
Some file and XML information
XML, , "" โโ , Unicode . , UTF-16 , 10% .
# .Net Framework 4.0.
EDIT:
- UTF-8.
UTF, , . !
UTF-8. UTF-8 UTF-16 , , XML-. , UTF-8 , UTF-16, , , BMP, (ASCII-spec, , US 104-) UTF-8 UTF-16.
UTF-8 2 U07FF ASCII; , UTF-8 UTF-16 (, , ) , , , , . 90% .
UTF-16, , , (), , (), "" ( ?), , , , . XML, , , , UTF-16, , , .
: , , , Unicode, UTF-8. . , , ( ) UTF-8.
, 10% UTF-16. XML , , , UTF-8, UTF-16, , , XML.
- " UTF-8 ". .
, XML , . , ... XML U + 0800 ( UTF-8), , U + 0080 ( UTF-8), UTF-16.
UTF-16 UTF-8, "". .
, , UTF-16. UTF-16 UTF-8. , , UTF-8, UTF-16, .
UTF-8 .
, UTF-16, UTF-8. UTF-8, UTF-16 ( UTF-32 ) UCS ( UTF).
, UTF-16, UTF-8. , . XML 0x20-0x7F .
- XML ( -thans more-thans) - , , . , UTF-16 , UTF-8, XML, , UTF-8 - .
, UTF-8 .
: , . , UTF-8 .
Source: https://habr.com/ru/post/1765205/More articles:Free Nhibernate, how to handle it, there are many that really only have one? - fluent-nhibernateRRD Basics and more! - graphMySQL Error Code: 1005 - sqlParsing a string before the last character index in SQL Server - sqlhttps://translate.googleusercontent.com/translate_c?depth=1&pto=aue&rurl=translate.google.com&sl=ru&sp=nmt4&tl=en&u=https://fooobar.com/questions/1765204/question-about-web-programming-maps-to-be-specific&usg=ALkJrhhieKYqLIIBl76JEpZiZSGN9__yiwAuthentication of WCF service using System.Web.ApplicationServices.AuthenticationService, I can not authenticate membership provider - authenticationHaskell: type deduction function - typesChanging a database in symfony (doctrine) - phpAre there any performance penalties when using nested structures? - c ++How to change jqGrid theme? - jqueryAll Articles