C # Parsing and matching patterns with a text file

Some ideas are needed on how to solve this problem. I have a template file that describes a line in a text file. For instance:

Template

[%f1%]|[%f2%]|[%f3%]"[%f4%]"[%f5%]"[%f6%] 

Text file

 1234|1234567|123"12345"12"123456 

Now I need to read the fields from a text file. The template file field describes [%some name%] . In addition, field delimiters are specified in the template file, in this example here | and " . The length of the fields can vary across different files, but the delimiters remain unchanged. What would be the best way to read in a template and a template read in a text file?

EDIT: a text file has several lines, for example:

 1234|1234567|123"12345"12"123456"\r\n 1234|field|123"12345"12"asdasd"\r\n 123sd|1234567|123"asdsadf"12"123456"\r\n 45gg|somedata|123"12345"12"somefield"\r\n 

EDIT2: Ok, make it even harder. Some fields may contain binary data, and I know the starting and ending positions of the binary data field. I should be able to mark these fields in the template, and then the analyzer will know that this field is binary. How to solve this problem?

0
source share
4 answers

I would create a regex based on a pattern and then parse a text file using the following:

 class Parser { private static readonly Regex TemplateRegex = new Regex(@"\[%(?<field>[^]]+)%\](?<delim>[^[]+)?"); readonly List<string> m_fields = new List<string>(); private readonly Regex m_textRegex; public Parser(string template) { var textRegexString = '^' + TemplateRegex.Replace(template, Evaluator) + '$'; m_textRegex = new Regex(textRegexString); } string Evaluator(Match match) { // add field name to collection and create regex for the field var fieldName = match.Groups["field"].Value; m_fields.Add(fieldName); string result = "(.*?)"; // add delimiter to the regex, if it exists // TODO: check, that only last field doesn't have delimiter var delimGroup = match.Groups["delim"]; if (delimGroup.Success) { string delim = delimGroup.Value; result += Regex.Escape(delim); } return result; } public IDictionary<string, string> Parse(string text) { var match = m_textRegex.Match(text); var groups = match.Groups; var result = new Dictionary<string, string>(m_fields.Count); for (int i = 0; i < m_fields.Count; i++) result.Add(m_fields[i], groups[i + 1].Value); return result; } } 
+1
source

You can parse the pattern with regular expressions. A similar expression will match each field definition and separator:

 Match m = Regex.Match(template, @"^(\[%(?<name>.+?)%\](?<separator>.)?)+$") 

The mapping will contain two named groups for (name and separator), each of which will contain the number of captures for each time they match in the input string. In your example, the separation group will have less capture than the name group.

You can then iterate over the captures and use the results to extract the fields from the input string and save the values, for example:

 if( m.Success ) { Group name = m.Groups["name"]; Group separator = m.Groups["separator"]; int index = 0; Dictionary<string, string> fields = new Dictionary<string, string>(); for( int x = 0; x < name.Captures.Count; ++x ) { int separatorIndex = input.Length; if( x < separator.Captures.Count ) separatorIndex = input.IndexOf(separator.Captures[x].Value, index); fields.Add(name.Captures[x].Value, input.Substring(index, separatorIndex - index)); index = separatorIndex + 1; } // Do something with results. } 

Obviously, in a real program you will have to consider invalid input, etc., which I did not do here.

+1
source

I would do this with a few lines of code. Scroll through the template line, grabbing all the text between "[" as the variable name and everything else as the terminator. Read the entire text of the terminal, assign it a variable name, repeat.

0
source

1- Use the API for this sscanf(line, format, __arglist) check here

2- Use line splitting How:

 public IEnumerable<int> GetDataFromLines(string[] lines) { //handle the output data List<int> data = new List<int>(); foreach (string line in lines) { string[] seperators = new string[] { "|", "\"" }; string[] results = line.Split(seperators, StringSplitOptions.RemoveEmptyEntries); foreach (string result in results) { data.Add(int.Parse(result)); } } return data; } 

Test it with the line:

 line = "1234|1234567|123\"12345\"12\"123456"; string[] lines = new string[] { line }; GetDataFromLines(lines); //output list items are: 1234 1234567 123 12345 12 123456 
0
source

Source: https://habr.com/ru/post/891438/


All Articles