Non-Ascii character handling in C ++

I am facing some problems with non-Ascii characters in C ++. I have one file containing non-ascii characters that I read in C ++ using file processing. After reading the file (say 1.txt) I store the data in a stream of lines and write it to another file (say 2.txt).

Suppose 1.txt contains:

ação 

In 2.txt, I should get the same ouyput values, but non-Ascii characters will be printed as their Hex value in 2.txt.

Also, I'm sure C ++ only processes Ascii characters like Ascii.

Please advise how to print these characters correctly in 2.txt

EDIT:

Firstly, the Psuedo-Code for the whole process:

 1.Shell script to Read from DB one Value and stores in 11.txt 2.CPP Code(a.cpp) reading 11.txt and Writing to f.txt 

Data is present in the database that is read: Instalação

11.txt file contains: Instalação

F.txt File Contains: Instalação

A.cpp output on screen: Instalação

a.cpp

 #include <iterator> #include <iostream> #include <algorithm> #include <sstream> #include<fstream> #include <iomanip> using namespace std; int main() { ifstream myReadFile; ofstream f2; myReadFile.open("11.txt"); f2.open("f2.txt"); string output; if (myReadFile.is_open()) { while (!myReadFile.eof()) { myReadFile >> output; //cout<<output; cout<<"\n"; std::stringstream tempDummyLineItem; tempDummyLineItem <<output; cout<<tempDummyLineItem.str(); f2<<tempDummyLineItem.str(); } } myReadFile.close(); return 0; } 

Locale says the following:

 LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= 
+4
source share
2 answers

At least if I understand what you need, I would do something like this:

 #include <iterator> #include <iostream> #include <algorithm> #include <sstream> #include <iomanip> std::string to_hex(char ch) { std::ostringstream b; b << "\\x" << std::setfill('0') << std::setw(2) << std::setprecision(2) << std::hex << static_cast<unsigned int>(ch & 0xff); return b.str(); } int main(){ // for test purposes, we'll use a stringstream for input std::stringstream infile("normal stuff. weird stuff:\x01\xee:back to normal"); infile << std::noskipws; // copy input to output, converting non-ASCII to hex: std::transform(std::istream_iterator<char>(infile), std::istream_iterator<char>(), std::ostream_iterator<std::string>(std::cout), [](char ch) { return (ch >= ' ') && (ch < 127) ? std::string(1, ch) : to_hex(ch); }); } 
+2
source

Sounds to me like utf8 problem. Since you did not mark your question with C ++ 11, here is an excellent article for unicode and C ++ streams.

From your updated code, let me explain what happens. You create a file stream to read your file. Internally, the file stream only recognizes chars until you say otherwise. A char , on most machines, can only contain 8 bits of data, but the characters in your file use more than 8 bits. To be able to read your file correctly, you MUST know how it is encoded. The most common encoding is UTF-8, which uses 1 to 4 chars for each character.

Once you know your encoding, you can use wifstream (for UTF-16) or imbue() locale for other encodings.

Update: If your file is ISO-88591 (from your comment above), try this.

 wifstream myReadFile; myReadFile.imbue(std::locale("en_US.iso88591")); myReadFile.open("11.txt"); 
+1
source

Source: https://habr.com/ru/post/1491383/


All Articles