This is more related to the other answers, but I will try to explain it from a slightly different angle.
Here is a version of your Jonathan Leffler code with three minor changes: (1) I made explicit actual individual bytes in UTF-8 strings; and (2) I modified the sprintf format string width specifier to hope for what you are actually trying to do. Also tangentially (3) I used perror to get a slightly more useful error message when something fails.
#include <stdio.h> #include <stdlib.h> #include <string.h> #define SIZE 40 int main(void) { char buf[SIZE + 1]; char *pat = "\320\277\321\200\320\270\320\262\320\265\321\202" " \320\274\320\270\321\200"; /* " " */ char str[SIZE + 2]; FILE *f1 = fopen("\320\262\321\205\320\276\320\264", "r"); /* "" */ FILE *f2 = fopen("\320\262\321\213\321\205\320\276\320\264", "w"); /* "" */ if (f1 == 0 || f2 == 0) { perror("Failed to open one or both files"); /* use perror() */ return(1); } size_t nbytes; if ((nbytes = fread(buf, 1, SIZE, f1)) > 0) { buf[nbytes] = 0; if (strncmp(buf, pat, nbytes) == 0) { sprintf(str, "%*s\n", 1+(int)nbytes, buf); /* nbytes+1 length specifier */ fwrite(str, 1, 1+nbytes, f2); /* +1 here too */ } } fclose(f1); fclose(f2); return(0); }
The behavior of sprintf with a positive numeric width specifier is to fill in the blanks on the left, so the space you tried to use is redundant. But you have to make sure that the target field is wider than the line you are printing so that there really is some kind of padding.
To make this answer self-sufficient, I will repeat what others have already said. A traditional char always exactly one byte, but one character in UTF-8 is usually not exactly one byte, unless all of your characters are actually ASCII. One of the attractions of UTF-8 is that legacy C code does not need to know anything about UTF-8 in order to continue to work, but of course, the assumption that one char is one glyph cannot be saved. (As you can see, for example, the glyph n in "hello world" is matched with two bytes - and therefore two char - "\320\277" .)
This is clearly less than ideal, but it demonstrates that you can treat UTF-8 as βjust bytesβ if your code doesn't really care about the semantics of glyphs. If so, you'd better switch to wchar_t as indicated, for example. here: http://www.gnu.org/software/libc/manual/html_node/Extended-Char-Intro.html
However, the wchar_t standard is less than ideal when the standard expectation is UTF-8. See the GNU documentation publication for a less intrusive alternative and a bit of background. In this case, you can replace char with uint8_t and various str* functions with u8_str* replacement and execute. It is assumed that one glyph is equal to one byte, but this will be a minor technique in your sample program. Customization is available at http://ideone.com/p0VfXq (although, unfortunately, the library is not available at http://ideone.com/ so that it cannot be demonstrated there).