I am making a small program that reads a file containing UTF-8 elements, char on char. After reading the char, it compares it with several other characters and, if there is a match, it replaces the character in the file with the underscore character '_'.
(Well, this actually duplicates this file with special letters replaced by underscores.)
I'm not sure where exactly I was messing around here, but this is most likely everywhere.
Here is my code:
FILE *fpi; FILE *fpo; char ifilename[FILENAME_MAX]; char ofilename[FILENAME_MAX]; wint_t sample; fpi = fopen(ifilename, "rb"); fpo = fopen(ofilename, "wb"); while (!feof(fpi)) { fread(&sample, sizeof(wchar_t*), 1, fpi); if ((wcscmp(L"ά", &sample) == 0) || (wcscmp(L"ε", &sample) == 0) ) { fwrite(L"_", sizeof(wchar_t*), 1, fpo); } else { fwrite(&sample, sizeof(wchar_t*), 1, fpo); } }
I skipped the code that is associated with generating the file name because it has nothing to offer. This is just string manipulation.
If I submit this program to a file containing the words γειά σου κόσμε. , I would like him to return this: γει_ σου κόσμ_.
A search on the Internet did not help much, since most of the results were very general or talked about completely different things regarding UTF-8. Like no one needs to manipulate individual characters for some reason.
Anything that points me to the right path is welcome. I'm not necessarily looking for a direct, fixed version of the code that I submitted, I would be grateful for any insightful comments that help me understand how the wchar mechanism works. All wbyte, wchar, L, no-L, this is a mess for me.
Thank you in advance for your help.