C ++ Reading matrices from multiple delimited file

So, I was given a file with ten matrices, and I would like to read these matrices from the file and save them in vectors / arrays, where each matrix is ​​stored either in a vector or in an array. However, the format of these matrices makes reading data difficult (I do not read very well from the input file).

The file has the following format. Elements of each matrix are separated by the symbol ",". Each row is separated by a ";", and each matrix is ​​separated by a "|". For example, three 2 by 2 matrices are as follows.

1,2; 3.4 | 0.1; 1,0 | 5.3; 3.1 |

And I just want to save the matrices into three different vectors, but I'm not sure how to do this.

I tried

while(getline(inFile,line)){ stringstream linestream(line); string value; while(getline(linestream, value, ','){ //save into vector } } 

But this is obviously very rude and only separates the data with a comma. Is there a way to split data with multiple delimiters?

Thanks!

+5
source share
4 answers
 string line; while(getline(infile, line, '|')) { stringstream rowstream(line); string row; while(getline(rowstream, row, ';')) { stringstream elementstream(row); string element; while(getline(elementstream, element, ',')) { cout << element << endl; } } } 

Using the code above, you can build logic to store an individual element as you like.

+6
source

I use this own function to split a string into a string vector:

 /** * \brief Split a string in substrings * \param sep Symbol separating the parts * \param str String to be splitted * \return Vector containing the splitted parts * \pre The separator can not be 0 * \details Example : * \code * std::string str = "abc.def.ghi..jkl."; * std::vector<std::string> split_str = split('.', str); // the vector is ["abc", "def", "ghi", "", "jkl", ""] * \endcode */ std::vector<std::string> split(char sep, const std::string& str); std::vector<std::string> split(char sep, const std::string& str) { assert(sep != 0 && "PRE: the separator is null"); std::vector<std::string> s; unsigned long int i = 0; for(unsigned long int j = 0; j < str.length(); ++j) { if(str[j] == sep) { s.push_back(str.substr(i, j - i)); i = j + 1; } } s.push_back(str.substr(i, str.size() - i)); return s; } 

Then, expecting you to have a Matrix class, you can do something like:

 std::string matrices_str; std::ifstream matrix_file(matrix_file_name.c_str()); matrix_file >> matrices_str; const std::vector<std::string> matrices = split('|', matrices_str); std::vector<Matrix<double> > M(matrices.size()); for(unsigned long int i = 0; i < matrices.size(); ++i) { const std::string& matrix = matrices[i]; const std::vector<std::string> rows = split(';', matrix); for(unsigned long int j = 0; j < rows.size(); ++j) { const std::string& row = matrix[i]; const std::vector<std::string> elements = split(',', row); for(unsigned long int k = 0; k < elements.size(); ++k) { const std::string& element = elements[k]; if(j == 0 && k == 0) M[i].resize(rows.size(), elements.size()); std::istringstream iss(element); iss >> M[i](j,k); } } } 

Or compressed code:

 std::string matrices_str; std::ifstream matrix_file(matrix_file_name.c_str()); matrix_file >> matrices_str; const std::vector<std::string> matrices = split('|', matrices_str); std::vector<Matrix<double> > M(matrices.size()); for(unsigned long int i = 0; i < matrices.size(); ++i) { const std::vector<std::string> rows = split(';', matrices[i]); for(unsigned long int j = 0; j < rows.size(); ++j) { const std::vector<std::string> elements = split(',', matrix[i]); for(unsigned long int k = 0; k < elements.size(); ++k) { if(j == 0 && k == 0) M[i].resize(rows.size(), elements[k].size()); std::istringstream iss(elements[k]); iss >> M[i](j,k); } } } 
+2
source

You can use the concept of finite state machine . You need to determine the state for each step. Read one char and then determine what it is (a number or a separator).

Here is how you could do it. Check this online for more information. text parsing , finite state machine , lexical analyzer , formal grammar

 enum State { DECIMAL_NUMBER, COMMA_D, SEMICOLON_D, PIPE_D, ERROR_STATE, }; char GetChar() { // implement proper reading from file static char* input = "1,2;3,4|0,1;1,0|5,3;3,1|"; static int index = 0; return input[index++]; } State GetState(char c) { if ( isdigit(c) ) { return DECIMAL_NUMBER; } else if ( c == ',' ) { return COMMA_D; } else if ( c == ';' ) { return SEMICOLON_D; } else if ( c == '|' ) { return PIPE_D; } return ERROR_STATE; } int main(char* argv[], int argc) { char c; while ( c = GetChar() ) { State s = GetState(c); switch ( c ) { case DECIMAL_NUMBER: // read numbers break; case COMMA_D: // append into row break; case SEMICOLON_D: // next row break; case PIPE_D: // finish one matrix break; case ERROR_STATE: // syntax error break; default: break; } } return 0; } 
+1
source

An example that you really map to a very simple byte machine.

Start with a zero matrix and something that keeps track of where in the matrix you are writing. Read one character at a time. If the character is a digit, multiply the current number in the matrix by 10 and add a number to it, if the character is a semicolon, go to the next number in the line, if the character is a semicolon, go to the next line, if the character is a pipe, start a new matrix.

You might not want to do it this way if the numbers are a floating point. I would save them in a buffer and use the standard method for analyzing floating point numbers. But other than that, you don’t need to maintain a complex state or create a large parser. You might want to add error handling at a later stage, but even there the error handling is pretty trivial and depends only on the current character you are viewing.

+1
source

Source: https://habr.com/ru/post/1264611/


All Articles