Massaging a fixed-width source file using C #

Problem

Current data

........Column 1....Column 2.......Column3....Column 4 Row1...........0...........0.............0...........Y Row2.......3142.56...........500............0...........N Row3.......3142.56...........500............0...........N 

The source file has fixed-width columns. A program that exports fixed-width columns does not include numbers after the decimal place as part of a reserved fixed-width size

  • Line 1 is the normal output and works fine
  • Lines 2 and 3 have 2 decimal places, so columns 2,3,4 ... are replaced by 2 places.

I created a C # script that overwrites the file and tries to solve this problem.

I found a way to read a row and split into columns. This becomes a string variable. However, you must determine whether the string contains "0-9" and then ".". template. Then I need to calculate how many decimal places after the pattern. Then remove the X space (the number of decimal places at the beginning).

So

Current Status [_ _ _ _3142.56]

What we want to see after [_ _3142.56]

Attempts So far So far, I have been able to find that Regex seems to be doing what it is after. Then the value of IndexOf (".") Can be used to count the number of positions after the decimal number.

So, I came up with below

  // Resolve Decimal Issues foreach (object Column in splitLine) { String CurrentColumn = Column.ToString(); if (Regex.Match(CurrentColumn, @"^[0-9]+(\.[0-9]+)?$").Success == true) { // Count how many numbers AFTER a decimal int decimalLength = CurrentColumn.Substring(CurrentColumn.IndexOf(".")).Length; if (decimalLength >= 1) { // Remove this amount of places from the start of the string CurrentColumn = CurrentColumn.Substring(CurrentColumn.Length - decimalLength); } } //Start re-joining the string newLine = newLine + CurrentColumn + "\t"; } 

The problem is that IndexOf returns -1 when it does not find a match, causing an error.

Error stack

 Error: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ArgumentOutOfRangeException: StartIndex cannot be less than zero. Parameter name: startIndex at System.String.InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy) at ST_dd38f3d289db4495bf07257723356ed3.csproj.ScriptMain.Main() --- End of inner exception stack trace --- at System.RuntimeMethodHandle._InvokeMethodFast(Object target, Object[] arguments, SignatureStruct& sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner) at System.RuntimeMethodHandle.InvokeMethodFast(Object target, Object[] arguments, Signature sig, MethodAttributes methodAttributes, RuntimeTypeHandle typeOwner) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture, Boolean skipVisibilityChecks) at System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) at System.RuntimeType.InvokeMember(String name, BindingFlags bindingFlags, Binder binder, Object target, Object[] providedArgs, ParameterModifier[] modifiers, CultureInfo culture, String[] namedParams) at System.Type.InvokeMember(String name, BindingFlags invokeAttr, Binder binder, Object target, Object[] args, CultureInfo culture) at Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTATaskScriptingEngine.ExecuteScript() 

So, I'm a little confused about what I can do to solve this problem. I think that they are on the right track ... but this last mistake has lost me a little.

+4
source share
3 answers

I think your logic is wrong.

Given bbbb123.45 ( b is a space), your logic will give decimalLength of 3. CurrentColumn.Substring(CurrentColumn.Length - decimalLength) will return .45 .

What you really want is CurrentColumn.Substring(decimalLength) , which starts with the 3rd character and returns b123.45 .

The approach is very similar:

  // Resolve Decimal Issues foreach (object Column in splitLine) { String CurrentColumn = Column.ToString(); if (Regex.IsMatch(CurrentColumn, @"^[0-9]+(\.[0-9]+)?$")) { // If there a decimal point, remove characters from the front // of the string to compensate for the decimal portion. int decimalPos = CurrentColumn.IndexOf("."); if (decimalPos != -1) { CurrentColumn = CurrentColumn.Substring(CurrentColumn.Length - decimalPos); } } //Start re-joining the string newLine = newLine + CurrentColumn + "\t"; } 

This is very bad, by the way, if the length of the decimal part exceeds the number of spaces at the beginning of the line. From your description, I do not think this is a problem. But this is something to keep in mind.

+2
source

Try the following:

 // Resolve Decimal Issues foreach (object Column in splitLine) { String CurrentColumn = Column.ToString(); char[] s = {'.'}; if (Regex.Match(CurrentColumn, @"^[0-9]+(\.[0-9]+)?$").Success && CurrentColumn.Contains('.')) { // Count how many numbers AFTER a decimal int decimalLength = CurrentColumn.split(s, StringSplitOptions.None)[1].Length; if (decimalLength >= 1) { // Remove this amount of places from the start of the string CurrentColumn = CurrentColumn.Substring(CurrentColumn.Length - decimalLength); } } //Start re-joining the string newLine = newLine + CurrentColumn + "\t"; } 
0
source

A concise, tight and LINQed approach will be as follows. There is no need to search for anything, just split, pack, remake and rebuild. This actually (I just noticed) works for any text file that needs to be done with a fixed width.

 // "inputData" is assumed to contain the whole source file const int desiredFixedWidth = 12; // How wide do you want your columns ? const char paddingChar = ' '; // What char do you want to pad your columns with? // Step 1: Split the lines var srcLines = inputData.Split(new string[]{Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries); // Step 2: Split up each line, ditch extra chars, pad the values, rebuild the file var outLines = srcLines.Select(s => string.Join(paddingChar.ToString(), s.Split(new string[] { paddingChar.ToString() }, StringSplitOptions.RemoveEmptyEntries) .Select(l => l.PadLeft(desiredFixedWidth, paddingChar)))); 

On the side of the note, the β€œgenerator” of your broken file must be fixed to adhere to the required width ...

0
source

Source: https://habr.com/ru/post/1486984/


All Articles