How to ignore invalid fields when analyzing protobuff message in text format

I am modeling parsing a text format with an invalid field in C ++.

My simple test .proto file:

$ cat settings.proto package settings; message Settings { optional int32 param1 = 1; optional string param2 = 2; optional bytes param3 = 3; } 

My text file is:

 $ cat settings.txt param1: 123 param: "some string" param3: "another string" 

I am parsing a file using google :: protobuf :: TextFormat :: Parser:

 #include <iostream> #include <fcntl.h> #include <unistd.h> #include <fstream> #include <google/protobuf/text_format.h> #include <google/protobuf/io/zero_copy_stream_impl.h> #include <settings.pb.h> using namespace std; int main( int argc, char* argv[] ) { GOOGLE_PROTOBUF_VERIFY_VERSION; settings::Settings settings; int fd = open( argv[1], O_RDONLY ); if( fd < 0 ) { cerr << " Error opening the file " << endl; return false; } google::protobuf::io::finputStream finput( fd ); finput.SetCloseOnDelete( true ); google::protobuf::TextFormat::Parser parser; parser.AllowPartialMessage( true ); if ( !parser.Parce( &finput, &settings ) ) { cerr << "Failed to parse file!" << endl; } cout << settings.DebugString() << endl; google::protobuf::ShutdownProtobufLibrary(); std::cout << "Exit" << std::endl; return true; } 

I set AllowPartialMessage to true for the parser. All fields are optional. But Parse is currently stopping parsing after the first invalid field. And after parsing the "settings" it contains only one first field.

Is there a way to report an error and continue parsing other valid fields?

+5
source share
1 answer

A text format parser does not allow unknown fields. The text format is designed to communicate with people, and people make typos. It is important to detect these typos, and not silently ignore them.

As a rule, the reason for ignoring unknown fields is reprogrammable compatibility: then your program can (partially) understand messages written against future versions of the protocol with new fields. There are two specific use cases that I see a lot:

  • Systems that exchange data between machines in text format. I recommend against this. Use the binary format instead, or if you really want the communication between the machines to be textual, use JSON.

  • Systems in which a person writes a text configuration file, then distributes it to possibly old servers in the production process. In this case, I recommend that you โ€œprecompileโ€ the protobuf text format into a binary file using a tool that runs on the user's desktop, and then send only the binary message to production servers. The local tool can be easily updated and can tell the user if they mistakenly wrote the field name.

+3
source

Source: https://habr.com/ru/post/1240973/


All Articles