Why does Java read random amounts from a socket, but not the whole message?

Question

Why does Java read random amounts from a socket, but not the whole message?

I am working on a project and wondering about Java sockets. The source file, which can be found here .

After successfully transferring the file size to plain text, I need to transfer binary data. (DVD.VOB files)

I have a loop like

// Read this files size long fileSize = Integer.parseInt(in.readLine()); // Read the block size they are going to use int blockSize = Integer.parseInt(in.readLine()); byte[] buffer = new byte[blockSize]; // Bytes "red" long bytesRead = 0; int read = 0; while(bytesRead < fileSize){ System.out.println("received " + bytesRead + " bytes" + " of " + fileSize + " bytes in file " + fileName); read = socket.getInputStream().read(buffer); if(read < 0){ // Should never get here since we know how many bytes there are System.out.println("DANGER WILL ROBINSON"); break; } binWriter.write(buffer,0,read); bytesRead += read; }

I read a random number of bytes close to 99%. I use Socket, which is based on TCP, so I do not need to worry about lower level transmission errors.

The resulting number changes, but always very close to the end I received 7258144 bytes from 7266304 bytes in the file GLADIATOR / VIDEO_TS / VTS_07_1.VOB

The application then hangs there in block reading. I am embarrassed. The server sends the correct file size and has a successful implementation in Ruby, but I cannot get the Java version to work.

Why should I read fewer bytes than sent over a TCP socket?

The above is due to an error that many of you have mentioned below.

BufferedReader consumes 8 kb of my socket input. The correct implementation can be found here.

+4

java file-io file-upload sockets tcpclient

Enabrentane Dec 18 '10 at 14:23

source share

4 answers

~~Sergey, perhaps, was right that the data is lost inside the buffer, but I am not sure of its explanation.~~ ~~(BufferedReaders usually do not store data inside their buffers. Perhaps he is thinking of a problem with BufferedWriters that could lose data if the main thread shuts down prematurely.)~~ [Do not pay attention; I answered incorrectly. The rest of this is valid AFAIK.]

I think you have a problem specific to your application. In the client code, you start reading as follows:

 public static void recv(Socket socket){ try { BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream())); //... int numFiles = Integer.parseInt(in.readLine());

... and you will start using in to start the exchange. But then you switch to using the raw socket stream:

  while(bytesRead > fileSize){ read = socket.getInputStream().read(buffer);

Since in is a BufferedReader, it already fills its buffer up to 8192 bytes from the socket input stream. Any bytes that are in this buffer and which you do not read from in will be lost. Your application hangs because it believes that the server is holding onto some bytes, but the server does not have them.

The solution is not to make byte bytes from the socket (ouch! Your bad CPU!), But to use BufferedReader sequentially. Or, to use buffering with binary data, change the BufferedReader to a BufferedInputStream that wraps an InputStream socket.

By the way, TCP is not as reliable as many people believe. For example, when a server socket is closed, it can write data to the socket, which is then lost when the socket connection is disconnected. Calling Socket.setSoLinger can help prevent this problem.

EDIT: Also, BTW, you play with fire, processing byte and character data, as if they were interchangeable, as you do below. If the data is really binary, then converting to String can lead to data corruption. Perhaps you want to write in a BufferedOutputStream?

  // Java is retarded and reading and writing operate with // fundamentally different types. So we write a String of // binary data. fileWriter.write(new String(buffer)); bytesRead += read;

EDIT 2 : Refined (or tried to clarify: -} processing binary and string data.

+1

Dan breslau Dec 18 '10 at 15:42

source share

Here is your problem. The first few lines of the program using in.readLine (), which are probably a kind of BufferedReader. BufferedReaders will read data from the socket in 8K chunks. So, when you did the first readLine (), it read the first 8K into the buffer. The first 8K contains your two numbers, followed by newlines, then part of the portion of the VOB file (which is missing a fragment). Now that you have switched to using getInputStream () from the socket, you get 8K in transmission, assuming your start is zero.

 socket.getInputStream().read(buffer); // you can't do this without losing data.

Although BufferedReader is good for reading character data, switching between binary and character data in a stream is not possible with this. You will have to switch to using an InputStream instead of a Reader and convert the first few portions manually into character data. If you are reading a file using a buffered byte array, you can read the first fragment, search for new lines and convert everything to the left of it into character data. Then write everything to the right to the file, and then start reading the rest of the file.

It was simpler with a DataInputStream, but a good conversion of processing characters is not suitable for you (readLine is not recommended, since BufferedReader is the only replacement - doh). You should probably write a DataInputStream replacement that uses Charset under the covers to properly handle string conversion. Then it would be easier to switch between symbols and binary.

+1

chubbsondubs Dec 18 '10 at 16:41

source share

The main problem is that BufferedReader will read as much data as possible and place it in its buffer. He will provide you with data as you request it. This is the whole point of buffering, i.e. Reducing the number of calls in the OS. The only safe way to use buffered input is to use the same buffer over the life of the connection.

In your case, you use a buffer to read only two lines, but it is very likely that 8192 bytes were read into the buffer. (Default buffer size). Suppose the first two lines are 32 bytes, this means you have to read 8160, but you bypass the buffer to execute read () on the socket, which results in 8160 bytes remaining in the buffer that you ultimately discard. (the amount you are missing)

BTW: you should see this in the debugger if you are checking the contents of a buffered reader.

+1

Peter Lawrey Dec 18 '10 at 18:51

source share

Sergey Tachenov · Accepted Answer · 2010-12-18T14:45:36+0000

If your in is a BufferedReader, then you are faced with a common buffering problem more than necessary. The default buffer size of BufferedReader is 8192 characters, which roughly corresponds to the difference between what you expect and what you received. So the data that you are missing is inside the internal buffer of the BufferedReader, converted to characters (I wonder why it did not break with any conversion error).

The only workaround is to read the first lines by bytes without using any buffered ~~classes~~ readers. As far as I know, Java does not provide an unbuffered InputStreamReader with readLine () capability (with the exception of the deprecated DataInputStream.readLine (), as noted in the comments below), so you need to do it yourself. I would do this by reading single bytes, putting them in a ByteArrayOutputStream until I came across EOL, and then converting the resulting byte array to String using the String constructor with the appropriate encoding.

Please note that although you cannot use BufferedInputReader, nothing stops you from using BufferedInputStream from the very beginning, which will make byte bytes more efficient.

Update

In fact, I'm doing something similar right now, just a little harder. This is an application protocol that involves the exchange of some data structures that are well represented in XML, but sometimes they bind binary data to them. We implemented this by having two attributes in the root XML: fragmentLength and isLastFragment. The first indicates how many bytes of binary data follows the XML part, and isLastFragment is a logical attribute indicating the last fragment, so the read side knows that there will be no binary data anymore. XML is completed with a zero mark, so we do not need to deal with readLine (). The code to read is as follows:

  InputStream ins = new BufferedInputStream(socket.getInputStream()); while (!finished) { ByteArrayOutputStream buf = new ByteArrayOutputStream(); int b; while ((b = ins.read()) > 0) { buf.write(b); } if (b == -1) throw new EOFException("EOF while reading from socket"); // b == 0 Document xml = readXML(new ByteArrayInputStream(buf.toByteArray())); processAnswers(xml); Element root = xml.getDocumentElement(); if (root.hasAttribute("fragmentLength")) { int length = DatatypeConverter.parseInt( root.getAttribute("fragmentLength")); boolean last = DatatypeConverter.parseBoolean( root.getAttribute("isLastFragment")); int read = 0; while (read < length) { // split incoming fragment into 4Kb blocks so we don't run // out of memory if the client sent a really large fragment int l = Math.min(length - read, 4096); byte[] fragment = new byte[l]; int pos = 0; while (pos < l) { int c = ins.read(fragment, pos, l - pos); if (c == -1) throw new EOFException( "Preliminary EOF while reading fragment"); pos += c; read += c; } // process fragment }

Using zero-completion XML for this turned out to be really wonderful, since we can add additional attributes and elements without changing the transport protocol. At the transport level, we also do not need to worry about working with UTF-8, because the XML parser will do this for us. In your case, you are probably doing a great job with these two lines, but if you need to add more metadata later, you might also want to consider zero-term XML.

Why does Java read random amounts from a socket, but not the whole message?

More articles: