How to implement HTTP transfer of huge binary files from a file generator to a server (Java)?

Simply put, our system consists of a server and an agent. The agent generates a huge binary file, which may be required to transfer to the server.

Given:

  • Now the system should cope with files up to 1G, which are likely to grow to 10G in 2 years.
  • Transmission must be through HTTP, because other ports may be closed.
  • This is not a file sharing system - the agent just has to push the file to the server.
  • Both Agent and Server are written in Java.
  • The binary may contain sensitive information, so the transfer must be safe.

I am looking for methods and libraries that will help me transfer huge files. Some of the topics that I know of are as follows:

  • Compression Which one to choose? We do not limit ourselves to gzip or deflate, simply because they are most popular for HTTP traffic. If there is any unusual compression scheme that gives the best results for our task - so be it.
  • Separation . Obviously, the file needs to be split and transferred in several parallel sessions.
  • Background Transferring a huge file takes a lot of time. Does this affect the decision, if at all?
  • Security Is there a way to HTTPS? Or should we take a different approach, given the amount of data?
  • ready-made I am completely ready to encode it myself (it should be fun), but I can’t avoid the question of whether there are ready-made solutions that satisfy my requirements.

Has anyone encountered this problem in their products and how did they handle it?

Thanks.

EDIT

Some may ask about choosing HTTP as the transfer protocol. The fact is that the Server and the Agent can be removed from each other, even if they are located on the same corporate network. We faced numerous problems related to the fact that clients only keep HTTP ports open on nodes in their corporate networks. This does not leave us much choice, but uses HTTP. Using FTP is fine, but it needs to be tunneled through HTTP. Does this mean that we are still taking full advantage of FTP, or will it lead to other alternatives becoming more viable? I do not know, please advice.

EDIT2

Correction - HTTPS is always open, and sometimes (but not always) HTTP is also open. But that is all.

+4
source share
1 answer

You can use any protocol on port 80. Using HTTP is a good choice, but you do not need to use it.

Compression Which one to choose? We do not limit ourselves to gzip or deflate, simply because they are most popular for HTTP traffic. If there is any unusual compression scheme that gives the best results for our task - so be it.

The best compression depends on the content. I would use Deflator for simplicity, however BZIP2 may give better results (library required)

For your file type, you can first perform some compression specific to this type, to make the data smaller.

Separation. Obviously, the file needs to be divided and transferred in several parallel sessions.

This is not obvious to me. Downloading data in parallel improves performance by capturing most of the available bandwidth (for example, extruding other users with the same bandwidth). This may be undesirable or even pointless (if there are no other users)

Background. Transferring a huge file takes a lot of time. Does this affect the decision, if at all?

You will need the ability to restart the download at any time.

Security. Is there an HTTPS way? Or should we take a different approach, given the amount of data?

I am sure this is great, regardless of the amount of data.

I am completely ready to encode it myself (it should be fun), but I can not avoid the question of whether there are ready-made solutions that satisfy my requirements.

I would try to use existing web servers to make sure they match the job. I would be surprised if there wasn’t a free web server that does all of the above.

Here is a selection of http://www.java-sources.net/open-source/web-servers

+3
source

Source: https://habr.com/ru/post/1387923/


All Articles