Is there a book or some documentation that describes best practices for developing batch (stand-alone) processes for exchanging data between two parties?
I found some useful information on the spring website, but it's pretty low level: batch processing strategies and principles of batch principles .
There are many considerations for batch processing, for example:
- data transfer method (e.g. files)
- control protocol between two parties
- error processing
- file naming conventions (when using files for transfer)
- cut-off time synchronization between two sides
- and etc.
It would be nice if there was some kind of author document or checklists that ensure that the projects are in line with best practices in this area.
UPDATE:I will add answers to this section when I stumbled upon them.
General information about batch / offline processing
This section is taken from @ user1813068 answer.
You can find some architectural design patterns in this link , as well as in this one that describe approaches to integrating partners and partners and data synchronization.
This Wikipedia page also provides an overview of high-level architectural patterns and includes patterns for data integration: architectural patterns .
The book Data Integration Blueprint and Modeling is also very good.
Data files
Most of the content in this section is taken here: source
Using headers and footers to share flat files is considered best practice. Flat files can be exchanged without headers and footers, and file naming may contain part of the same information as the header. When using a delimited file, a field list header is always required.
Headings
When exchanging data between systems, it is very important that the receiving party knows exactly what type of data is being sent. One way to ensure this is to provide a header line that contains relevant information about the data content and how it is processed.
When working with flat files, the file name itself can also be used to inform the receiving party of the contents of the file. However, the title bar provides better support for all available options.
When working with the API, these header fields can be provided in a similar way. The implementation will be determined by the developer of the API service.
If the header is included, it consists of one data set and should always be the first in the file.
headers and footers
A footer can be provided using file-based formats to indicate that there is no more data to process.
During processing, data found after the footer line should be ignored. Also, when creating data, keep in mind that any data after the footer line will be ignored.
Data formats
Delimited Files
In fact, industry standard is delimited files.
Comma-delimited files (CSVs or comma-separated values) usually require encapsulating data, usually with double quotes ("), then double quotes must be escaped with either a backslash () or double double quotes ("). Due to inconsistencies in the CSV implementation, it is recommended that you use tabs as a delimiter without encapsulation. In this case, tabs must be removed from the data. Delimited files usually process these XML files faster.
XML files
There are some in the industry who prefer XML files. XML allows for a clearer presentation of information because it supports nested data. Many companies are limited or do not support this format, so it is not recommended.
Encoding
UTF-8 Encoding
All data must be encoded in UTF-8 encoding to ensure maximum compatibility between all systems.
Dates & Time
To prevent confusion, it is recommended that you use UTC for all date and time fields.
Some other recommendations: EDI planning and file transfer