Reading from an arbitrary delimited FileStream

I ran into the problem of reading msg from a file using C ++. Usually people create a file stream and then use the function getline()to extract the msg. The function getline()can take an additional parameter as a separator, so that it returns each "string" separated by a new separator, but not the default "\ n". However, this delimiter must be char. In my usecase, it is possible that the delimiter in msg is something else like "| - |", so I'm trying to get a solution so that it accepts the string as a delimiter instead of char.

I searched StackOverFlow a bit and found interesting posts. Analysis (division) row in C ++ using lines of restrictor (standard C ++) It gives a solution to use string::find(), and string::substr()to parse an arbitrary delimiter. However, all the solutions there involve inputting a string, not a stream. In my case, the file stream data is too large / waste to fit directly into memory, so it should read in msg via msg (or most of the messages at once).

Actually, read the gcc implementation of the function std::getline(), it seems that it is much easier to handle the case delimiter - this is singe char. Since every time you load a piece of characters, you can always find a separator and separate them. Although the difference between the separator is more than one char, the separator itself can move between two different pieces and cause many other angular cases.

Not sure if anyone else came across this requirement before and how you guys adjusted it elegantly. It would seem nice to have a standard function, for example istream& getNext (istream&& is, string& str, string delim)? It seems to be a common sense to me. Why not do it in the standard library so that people can no longer implement their own version separately?

Many thanks

+4
3

,

std::string delimeter="someString";
//initialize table with a row per target string character, a column per possible char and all zeros
std::vector<vector<int> > table(delimeter.size(),std::vector<int>(256,0));
int endState=delimeter.size();
//set the entry for the state looking for the next letter and finding that character to the next state
for(unsigned int i=0;i<delimeter.size();i++){
    table[i][(int)delimeter[i]]=i+1;
}

int currentState=0;
int read=0;
bool done=false;
while(!done&&(read=<istream>.read())>=0){
    if(read>=256){
        currentState=0;
    }else{
        currentState=table[currentState][read];
    }
    if(currentState==endState){
        done=true;
    }
    //do your streamy stuff
}

, ASCII, , .

0

STL , . ( ), , .

, std::getline() , std::istream::get() . :

std::istream& my_getline(std::istream &input, std::string &str, const std::string &delim)
{
    if (delim.empty())
        throw std::invalid_argument("delim cannot be empty!"); 

    if (delim.size() == 1)
        return std::getline(input, str, delim[0]);

    str.clear();

    std::string temp;
    char ch;
    bool found = false;

    do
    {
        if (!std::getline(input, temp, delim[0]))
            break;

        str += temp;

        found = true;

        for (int i = 1; i < delim.size(); ++i)
        {
            if (!input.get(ch))
            {
                if (input.eof())
                    input.clear(std::ios_base::eofbit);

                str.append(delim.c_str(), i);
                return input;
            }

            if (delim[i] != ch)
            {
                str.append(delim.c_str(), i);
                str += ch;
                found = false;
                break;
            }
        }
    }
    while (!found);

    return input;
}
0

It seems to be easiest to create something like getline(): read the last character of the delimiter. Then check if the line is long enough for the separator, and if so, then the separator ends. If not, continue reading:

std::string getline(std::istream& in, std::string& value, std::string const& separator) {
    std::istreambuf_iterator<char> it(in), end;
    if (separator.empty()) { // empty separator -> return the entire stream
        return std::string(it, end);
    }
    std::string rc;
    char        last(separator.back());
    for (; it != end; ++it) {
        rc.push_back(*it);
        if (rc.back() == last
            && separator.size() <= rc.size()
            && rc.substr(rc.size() - separator.size()) == separator) {
            return rc.resize(rc.size() - separator.size());
        }
    }
    return rc; // no separator was found
}
0
source

Source: https://habr.com/ru/post/1682803/


All Articles