Why is my Rust program running slower than an equivalent Java program?

Question

Why is my Rust program running slower than an equivalent Java program?

I played with binary serialization and deserialization in Rust and noticed that binary deserialization is several orders of magnitude slower than with Java. To eliminate the possibility of overhead due to, for example, distribution and overhead, I just read the binary stream from each program. Each program reads from a binary file on disk that contains a 4-byte integer containing the number of input values, and a continuous piece of an 8-byte big-endian IEEE 754 — floating-point numbers. Here's the Java implementation:

import java.io.*; public class ReadBinary { public static void main(String[] args) throws Exception { DataInputStream input = new DataInputStream(new BufferedInputStream(new FileInputStream(args[0]))); int inputLength = input.readInt(); System.out.println("input length: " + inputLength); try { for (int i = 0; i < inputLength; i++) { double d = input.readDouble(); if (i == inputLength - 1) { System.out.println(d); } } } finally { input.close() } } }

Here's the Rust implementation:

 fn main() { use std::os; use std::io::File; use std::io::BufferedReader; let args = os::args(); let fname = args[1].as_slice(); let path = Path::new(fname); let mut file = BufferedReader::new(File::open(&path)); let input_length = read_int(&mut file) as uint; for i in range(0u, input_length) { let d = read_double_slow(&mut file); if i == input_length - 1 { println!("{}", d); } } } fn read_int<R : Reader>(input: &mut R) -> i32 { match input.read_be_i32() { Ok(x) => x, Err(e) => fail!(e) } } fn read_double_slow<R : Reader>(input: &mut R) -> f64 { match input.read_be_f64() { Ok(x) => x, Err(e) => fail!(e) } }

I output the last value to make sure that all the data is actually read. On my machine, when the file contains (the same) 30 million randomly generated doublings, the Java version works after 0.8 seconds and the Rust version after 40.8 seconds.

Suspicious inefficiency in the interpretation of Rust bytes themselves, I repeated it with the help of special floating-point deserialization. The internal elements are almost exactly the same as in Rust Reader , without IoResult wrappers:

 fn read_double<R : Reader>(input: &mut R, buffer: &mut [u8]) -> f64 { use std::mem::transmute; match input.read_at_least(8, buffer) { Ok(n) => if n > 8 { fail!("n > 8") }, Err(e) => fail!(e) }; let mut val = 0u64; let mut i = 8; while i > 0 { i -= 1; val += buffer[7-i] as u64 << i * 8; } unsafe { transmute::<u64, f64>(val); } }

The only change I made to the previous Rust code to do this job was to create an 8-byte fragment that needs to be passed and (re) used as a buffer in the read_double function. This gave a significant increase in performance, approximately an average of 5.6 seconds. Unfortunately, this is still noticeably slower (and more verbose!) Than the Java version, which makes it difficult to scale to larger input sets. Is there anything that can be done to make it faster in Rust? More importantly, can these changes be made in such a way that they can be combined into a default Reader implementation to make binary I / O less painful?

For reference, here is the code I use to create the input file:

 import java.io.*; import java.util.Random; public class MakeBinary { public static void main(String[] args) throws Exception { DataOutputStream output = new DataOutputStream(new BufferedOutputStream(System.out)); int outputLength = Integer.parseInt(args[0]); output.writeInt(outputLength); Random rand = new Random(); for (int i = 0; i < outputLength; i++) { output.writeDouble(rand.nextDouble() * 10 + 1); } output.flush(); } }

(Note that generating random numbers and writing them to disk only takes 3.8 seconds on my test machine.)

+6

performance file-io rust

Ben sidhom Aug 12 '14 at 3:33

source share

1 answer

Chris morgan · Accepted Answer · 2014-08-12T06:15:37+0000

When you create without optimizations, it will often be slower than in Java. But build it with optimizations ( rustc -O or cargo --release ), and this should be much faster. If the standard version still ends more slowly, then something that needs to be carefully studied to find out where the slowness is - maybe something is built in that shouldn't or shouldn't be, or maybe some kind of optimization, which was expected does not occur.

Why is my Rust program running slower than an equivalent Java program?

More articles: