Reversing arabic string as a result of abcpdf.net

Hi

I used abcpdf.net to convert Arabic PDF files through the read (pdfpath) and gettext () functions, the resulting text (string) looks like it cannot be read, since arabic is rtl langauge, my question is: now I have to cancel the Arabic parts strings to make it readable, but I don’t know how to do it, so I need help to extract only the Arabic part and then cancel it.

I am using C # and here is an example of an extracted line from my pdf when using the abcpdf.net library

0.00 KCCUSER1 6:17:19 PM28 / 10/2010 ةعابطلا خيرات

(200) لوادتملا زكارمو تاكرح

ةصاقملل ةيتيوكلا ةكرشلا

10/28/2010

RBKPI012

لمعلا خيرات

عمجم / ح - 88 لجلا عيبلل افيا ةيلودلا ةيلاملا تاراشتسلا ةكرش - 65646

C023

يحاتتفلا ديصرلا

+3
1
private string Convert(string source)
{
    string arabicWord = string.Empty;
    StringBuilder sbDestination = new StringBuilder();

    foreach (var ch in source)
    {
        if (IsArabic(ch))
            arabicWord += ch;
        else
        {
            if (arabicWord != string.Empty)
                sbDestination.Append(Reverse(arabicWord));

            sbDestination.Append(ch);
            arabicWord = string.Empty;
        }
    }

    // if the last word was arabic    
    if (arabicWord != string.Empty)
        sbDestination.Append(Reverse(arabicWord));

    return sbDestination.ToString();
}

IsArabic

private bool IsArabic(char character)
{
    if (character >= 0x600 && character <= 0x6ff)
        return true;

    if (character >= 0x750 && character <= 0x77f)
        return true;

    if (character >= 0xfb50 && character <= 0xfc3f)
        return true;

    if (character >= 0xfe70 && character <= 0xfefc)
        return true;

    return false;
}

// Reverse the characters of string
string Reverse(string source)
{
    return new string(source.ToCharArray().Reverse().ToArray());
}

!

+1

Source: https://habr.com/ru/post/1774074/


All Articles