Remove character from string

I am having problems with a fairly simple task - I feel like I missed something very obvious here.

I have a CSV file separated by a semicolon. There are several numbers in this file that contain points like "1.300", but there are also dates like "2015.12.01". The task is to find and delete all points, but only those that are indicated in numbers and not in dates. Dates and numbers are completely reversible and are never in the same position in the file.

My question now is: What is the best way to solve this problem?

From the point of view of programmers: is it a good solution to simply divide into each half-face, count the points and, if there is only one point, delete it? This is the only way I could think of a problem.


An example of the source file:

2015.12.01; 13.100; 500; 1.200; 100; 

Result:

 2015.12.01; 13100; 500; 1200; 100; 
+5
source share
3 answers

The source file looks like a valid file generated by a program running on a machine whose language it uses . as a separator of thousands (in most cases in Europe) and a separator of dates (only for German locales). Such locales also use ; as a list separator.

If the question was only how to analyze such dates, numbers, the answer should pass the correct culture to the parsing function, for example: decimal.Parse("13.500",new CultureInfo("de-at")) will return 13500. The actual problem is in that the data must be submitted to another program that uses . as a decimal separator.

The safest option would be to change the locale used by the export program, for example, change the CultureInfo stream if the exporter is a .NET program, locale in the SSIS package, etc., in an en-gb type locale to export from . and avoid the weird date format. This assumes that the next program in the pipeline does not use German for date, English for numbers

Another option is to download the text, analyze the fields using the appropriate language, and then export them in the format that the next program requires.

Finally, a regular expression can be used to match only numeric fields and remove a point. It can be a little more complicated and depends on the actual content.

For example, (\d+)\.(\d{3}) can be used to match numbers if there is only one thousand separators. This can cause a crash if any text field contains similar values. Or ;(\d+)\.(\d{3}); can correspond only to a full field, except for the first and last fields, for example:

 Regex.Replace("1.457;2016.12.30;13.000;1,50;2015.12.04;13.456",@";(\d+)\.(\d{3});",@"$1$2;") 

produces:

 1.457;2016.12.3013000;1,50;2015.12.04;13.456 

A regular expression that matches either numbers between ; , or first / last field, could be

  (^|;)(\d+)\.(\d{3})(;|$) 

This will create 1457;2016.12.30;13000;1,50;2015.12.04;13456 , for example:

 var data="1.457;2016.12.30;13.000;1,50;2015.12.04;13.456"; var pattern=@ "(^|;)(\d+)\.(\d{3})(;|$)"; var replacement=@ "$1$2$3$4"; var result= Regex.Replace(data,pattern,replacement); 

The advantage of regular expressions over breaking and replacing strings is that it is much faster and more efficient. Instead of generating timelines for each split, manipulation, Regex only calculates the indices in the source. A string object is only generated when you request the final result of the text. This results in significantly fewer distributions and garbage collection.

Even in medium-sized files, this can lead to 10 times better performance.

+1
source

If you can rely on dates to have two periods and only one number, you can use this as a filter:

 string s = "123.45"; if (s.Count(x => x == '.') == 1) { s = s.Replace(".", null); } 
+5
source

I would not rely on the number of points, as errors may be made.

You can use double.TryParse to safely check if a string is a number

 var data = "2015.12.01;13.100;500;1.200;100;"; var dataArray = data.Split(';'); foreach (var s in dataArray) { double result; if(double.TryParse(s,out result)) // implement your logic here Console.WriteLine(s.Replace(".",string.Empty)); } 
+1
source

Source: https://habr.com/ru/post/1246539/


All Articles