How to change multi-valued characters by others in C?

I have a UTF-8 text file containing several characters that I would like to change with others (only those that were between | (and |)), but the problem is that some of these characters are not considered characters, but as multi-character signs. (By this I mean that they cannot be placed between "∞", but only as "∞", so char *?)

Here is my text file:

Text : |(abc∞∪v=|)

For instance:

should be changed with ¤c

by ¸!

= changed to "

Since some characters (∞ and ∪) are multi-characters, I decided to use fscanf to get the whole text word for word. The problem with this method is that I have to put a space between each character ... My file should look like this:

Text : |( a b c ∞ ∪ v = |)

fgetc cannot be used because characters like ∞ cannot be considered as one single character. If I use it, I will not be able to strcmp a char with each character (char *), I tried to convert my char to char *, but strcmp! = 0.

Here is my C code to help you understand my problem:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

int main(void){
    char *carac[]={"∞","=","∪"}; //array with our signs
    FILE *flot,*flot3;
    flot=fopen("fichierdeTest2.txt","r"); // input text file
    flot3=fopen("resultat.txt","w"); //output file
    int i=0,j=0;
    char a[1024]; //array that will contain each read word.
    while(!feof(flot))
    {
        fscanf(flot,"%s",&a[i]);
        if (strstr(&a[i], "|(") != NULL){ // if the word read contains |(  then j=1
            j=1;
            fprintf(flot3,"|(");
        }
        if (strcmp(&a[i], "|)") == 0)
            j=0;
        if(j==1) { //it means we are between |( and |) so the conversion can begin
            if (strcmp(carac[0], &a[i]) == 0) { fprintf(flot3, "¤c"); }
            else if (strcmp(carac[1], &a[i]) == 0) { fprintf(flot3,"\"" ); }
            else if (strcmp(carac[2], &a[i]) == 0) { fprintf(flot3, " ¸!"); }
            else fprintf(flot3,"%s",&a[i]); // when it a letter, number or sign that doesn't need to be converted
        }
        else { // when we are not between |( and |) just copy the word to the output file with a space after it
            fprintf(flot3, "%s", &a[i]);
            fprintf(flot3, " ");
        }
        i++;
    }
}

Thank you very much for your help in the future!

EDIT: Each character will be correctly changed if I put a space between them, but without it it will not work, which I am trying to solve.

+4
source share
2

, . , , .

C, char . - , ¤ c. ( char s). .

() . -; UTF-8, UTF-16 big-endian, UTF-16 little endian, 8- .

C- - , "∞" - C-, , . strcmp , , , . , , ( ) , !


, - , . , , UTF-8:

char *carac[]={
    "\xe2\x88\x9e", // ∞
    "=",
    "\xe2\x88\xaa"}; // ∪

, ( ) .


, : , , . strcmp ! strncmp:

if (strncmp(carac[0], &a[i], strlen(carac[0])) == 0)
{
    fprintf(flot3, "\xC2\xA4""c"); // ¤c
}

( , ): fscanf word (, ) . , . , :

fscanf(flot,"%s",a);
for (i = 0; a[i] != '\0'; )
{
    if (strncmp(&a[i], "|(", 2)) // start pattern
    {
        now_replacing = 1;
        i += 2;
        continue;
    }
    if (now_replacing)
    {
        if (strncmp(&a[i], whatever, strlen(whatever)))
        {
            fprintf(...);
            i += strlen(whatever);
        }
    }
    else
    {
        fputc(a[i], output);
        i += 1; // processed just one char
    }
}
+4

, , .

strcmp(carac[0], &a[i])

( i = 2). , "∞" &a[2]. , &a[2] - , strcmp , , . , "∞" , "abc∞∪v=|)", a .

, , (8 ) (16 ). UTF-16

if( 8734 = *((short *)&a[i])) { /* character is infinity */ }

8734 , UTF16 .

: , . 8734 (0x221E) , 7714 (0x1E22).

- , , - . "% s: . , ( , )." ()

//feof = false.
fscanf(flot,"%s",&a[i]); 
//feof = ture.

, . .

+1

Source: https://habr.com/ru/post/1664448/


All Articles