Parse some weird text format

I am trying to parse some data returned by a third-party application (TSV file). I have all the data carefully analyzed in each field (see. TSV file parsing ), but I don’t know how to format some fields.
Sometimes the data in the field is encapsulated as follows:

=T("[FIELD_DATA]")

(This is similar to Excel formatting).
When this happens, specific characters are escaped using CHAR (ASCII_NUM), and the rest of the string is also encapsulated, as in the above example, without the = sign, which appears only at the beginning of the field.

So, does anyone have any idea how I can parse fields that look like this:

=T("- Merge User Interface of Global Xtra Alert and EMT Alert")&CHAR(10)&T("- Toaster ?!")&CHAR(10)&T("")&CHAR(10)&T("")&CHAR(10)&T("None")&CHAR(10)&T("")&CHAR(10)&T("None")

(any number of CHAR / T groups ()).

I was thinking about regexing or looping a string, but I doubt the reality of this. Help someone?

+3
source share
2 answers

I would go like Darin, but his regular expression didn't work for me. I would use this one:

(=T|&CHAR|&T)(\("*([A-Za-z?!0-9 -]*)"*\))+

You will find that Groups[2](remember the zero bias on them) will be data inside ()and ""if exists "". For example, this will find:

- Merge User Interface of Global Xtra Alert and EMT Alert

at

=T("- Merge User Interface of Global Xtra Alert and EMT Alert")

and

10

at

&CHAR(10)

If you have:

&T("")

it yields zero in groups [2].

Hope this helps.

+1
source
class Program
{
    public static void Main(string[] args)
    {
        var input = @"=T(""- Merge User Interface of Global Xtra Alert and EMT Alert"")&CHAR(10)&T(""- Toaster ?!"")&CHAR(10)&T("""")&CHAR(10)&T("""")&CHAR(10)&T(""None"")&CHAR(10)&T("""")&CHAR(10)&T(""None"")";
        var matches = Regex.Matches(input, @"T\(\""([^\""]*)\""\)");
        foreach (Match match in matches)
        {
            Console.WriteLine(match.Groups[1].Value);
        }            
    }
}
0
source

Source: https://habr.com/ru/post/1736359/


All Articles