Protocol buffers, where to use them?

I recently read an article on protocol buffers ,

Protocol buffers are a method of serializing structured data. this is useful when developing programs for communicating with each other by wire or for storing data. The method includes a description of the interface language, which describes the structure of some data and a program that generates source code from this description to generate or analyze a stream of bytes representing structured data

What do I want to know where to use them? Are there real examples, not simple examples of the address book? Is this used, for example, to pre-cache query results from databases?

+5
source share
2 answers

Protocol buffers are a format for storing and exchanging data, especially used for RPC - communication between programs or computers.

Alternatives include language-specific serialization (Java serialization, Python pickles , etc.), table formats like CSV and TSV, structured text formats like XML and JSON, and other binary formats like Apache Thrift . Conceptually, these are just different ways of representing structured data, but in practice they have different pros and cons.

Protocol Buffers:

  • Efficient space based on customizable format for compact data presentation.
  • Provide a high level cross-security language (especially in strongly typed languages ​​like Java, but even in Python it is still very useful).
  • Designed for backward and backward compatibility. It is easy to make structural changes to protocol buffers (usually adding new fields or obsolete old ones) without requiring all applications that use the protocol to be updated at the same time.
  • Somewhat tedious to work with manually. Although there is a text format, it is mainly useful for manual inspection, rather than storage, protos. For example, JSON is much easier to write and edit a person. Therefore, protos are usually written and read by programs.
  • Depending on the .proto compiler. By separating the structure from the data protocol, buffers can be scarce and medium, but this means that without a related .proto file and a tool such as protoc to generate code for its analysis, arbitrary data in a proto format is unusable. This makes protos a poor choice for sending data to other people who might not have a .proto file.

To make some broad generalizations about different formats:

  • CSV / TSV / etc .. are useful for human-built data that should never be transmitted between people or programs. It is easy to build and easy to disassemble, but it is a nightmare to keep in sync and cannot easily represent complex structures.
  • Language-specific serialization, such as pickles, may be useful for short-lived serialization, but it quickly goes into backward compatibility issues and obviously limits you to one language. Except for some very specific cases, protobufs fulfill all the same goals with greater security and better protection for the future.
  • JSON is ideal for sending data between different parties (e.g. public API). As structure and content are transmitted together, anyone can understand this and easily analyze it in all major languages. Currently, there is little reason to use other structured formats, such as XML.
  • Binary formats, such as protocol buffers, are ideal for almost all other uses of data serialization; long-term and short-term storage, interprocess communication, in-process and application caching, and much more.

Google famously uses protocol buffers for almost everything they do . If you can imagine the reason you need to store or transfer data, Google is probably doing this with protocol buffers.

+2
source

I used them to create a financial trading system. Here are the reasons:

  • There are libraries for many languages. Some things should be in C ++, others in C #. And it was open for distribution in Python or Java, etc.
  • This is necessary for fast serialization / deserialization and compactness. This is due to speed requirements in the financial trading system. Messages were much shorter than comparable text messages, which meant that you never had problems connecting them in the same network packet.
  • No need to read from the wire. The system used to have XML, which is good for debugging, but you can get debugging outputs in other ways and turn them off during production.
  • This gives your message a natural structure and API to get the parts you need. Writing something ordinary would require reflection on all the helper functions in order to get numbers out of binary, with angles, etc.
+2
source

Source: https://habr.com/ru/post/1238692/


All Articles