Fast and efficient way to read whitespace file of numbers into an array?

I need a fast and efficient method to read a file, separated by a space, with numbers in an array. Files are formatted as follows:

4 6
1 2 3 4 5 6
2 5 4 3 21111 101
3 5 6234 1 2 3
4 2 33434 4 5 6

The first row is the size of the [rows columns] array. The following lines contain array data.

Data can also be formatted without a new line:

4 6
1 2 3 4 5 6 2 5 4 3 21111 101 3 5 6234 1 2 3 4 2 33434 4 5 6

I can read the first row and initialize the array with the row and column values. Then I need to populate the array with data values. My first idea was to read the file line by line and use the split function. But the second format presented gives me a pause, because all the data in the array will be loaded into memory immediately. Some of these files are located in 100 MB. The second way is to read the file in pieces, and then take them apart in parts. Maybe someone has a better way to do this?

+3
source share
7 answers

What about:

    static void Main()
    {
        // sample data
        File.WriteAllText("my.data", @"4 6
1 2 3 4 5 6
2 5 4 3 21111 101
3 5 6234 1 2 3
4 2 33434 4 5 6");

        using (Stream s = new BufferedStream(File.OpenRead("my.data")))
        {
            int rows = ReadInt32(s), cols = ReadInt32(s);
            int[,] arr = new int[rows, cols];
            for(int y = 0 ; y < rows ; y++)
                for (int x = 0; x < cols; x++)
                {
                    arr[y, x] = ReadInt32(s);
                }
        }
    }

    private static int ReadInt32(Stream s)
    { // edited to improve handling of multiple spaces etc
        int b;
        // skip any preceeding
        while ((b = s.ReadByte()) >= 0 && (b < '0' || b > '9')) {  }
        if (b < 0) throw new EndOfStreamException();

        int result = b - '0';
        while ((b = s.ReadByte()) >= '0' && b <= '9')
        {
            result = result * 10 + (b - '0');
        }
        return result;
    }

, - , , , , ASCII ( , ).

+1

? / ?

, , , .

, , . , (/ ) ( , ). , . , .

, , ( ).

+2

. , . , .

, :

int counter = 0;
while (fileOpen) {
    char ch = readChar(); // use your imagination to define this method.
    if (isDigit(ch)) {
        counter *= 10;
        counter += asciiToDecimal(ch);
    } else if (isWhitespace(ch)) {
        appendToArray(counter);
        counter = 0;
    } else {
        // Error?
    }
}

.

+2

, , , - . split.

, .

, , .

0

, .
, , .
, .

var fileData = File.ReadAllText(...).Split(' ');
var convertedToNumbers = fileData.Select(entry => int.Parse(entry));
int rows = convertedToNumbers.First();
int columns = convertedToNumbers.Skip(1).First();
// Now we have the number of rows, number of columns, and the data.
int[,] resultData = new int[rows, columns];
// Skipping over rows and columns values.
var indexableData = convertedToNumbers.Skip(2).ToList();
for(int i=0; i<rows; i++)
    for(int j=0; j<columns; j++)
        resultData[i, j] = inedexableData[i*rows + j];

, , n , . , .

0

, .

private IEnumerable<String> StreamAsSpaceDelimited(this StreamReader reader)
{
    StringBuilder builder = new StringBuilder();
    int v;
    while((v = reader.Read()) != -1)
    {
        char c = (char) v;
        if(Char.IsWhiteSpace(c))
        {
            if(builder.Length >0)
            {
                yield return builder.ToString();
                builder.Clear();
            }
        }
        else
        {
            builder.Append(c);
        }
    }
    yield break;
}

(), , :

using(StreamReader sr = new StreamReader("filename"))
{
    var nums = sr.StreamAsSpaceDelimited().Select(s => int.Parse(s));
    var enumerator = nums.GetEnumerator();
    enumerator.MoveNext();
    int numRows = enumerator.Current;
    enumerator.MoveNext();
    int numColumns = enumerator.current;
    int r =0, c = 0;
    int[][] destArray = new int[numRows][numColumns];
    while(enumerator.MoveNext())
    {
        destArray[r][c] = enumerator.Current;
        c++;
        if(c == numColumns)
        {
            c = 0;
            r++;
            if(r == numRows)
               break;//we are done
        }
    }

because we use iterators, it should never read more than a few characters at a time. this is a general approach used for parsing large files (for example, this works LINQ2CSV ).

0
source

Here are two methods.

IEnumerable<int[]> GetArrays(string filename, bool skipFirstLine)
{
    using (StreamReader reader = new StreamReader(filename))
    {
        if (skipFirstLine && !reader.EndOfStream)
            reader.ReadLine();

        while (!reader.EndOfStream)
        {
            string temp = reader.ReadLine();
            int[] array = temp.Trim().Split().Select(s => int.Parse(s)).ToArray();
            yield return array;
        }
    }
}

int[][] GetAllArrays(string filename, bool skipFirstLine)
{
    int skipNumber = 0;
    if (skipFirstLine )
        skipNumber = 1;
    int[][] array = File.ReadAllLines(filename).Skip(skipNumber).Select(line => line.Trim().Split().Select(s => int.Parse(s)).ToArray()).ToArray();
    return array;
}

If you are dealing with large files, the former is likely to be preferred. If the files are small, then the second can load the whole thing into a jagged array.

0
source

Source: https://habr.com/ru/post/1748035/


All Articles