If you know the number of lines in your file, and if you randomize complete lines, you can simply randomize by line number and then read the selected line. Just select a random string using the Random class and save the list of random numbers, so you don't select once.
BufferedReader reader = new BufferedReader(new FileReader(new File("file.cvs"))); BufferedWriter chosen = new BufferedWriter(new FileWriter(new File("chosen.cvs"))); BufferedWriter notChosen = new BufferedWriter(new FileWriter(new File("notChosen.cvs"))); int numChosenRows = 10000; long numLines = 1000000000; Set<Long> chosenRows = new HashSet<Long>(numChosenRows+1, 1); for(int i = 0; i < numChosenRows; i++) { while(!chosenRows.add(nextLong(numLines))) { // add returns false if the value already exists in the Set } } String line; for(long lineNo = 0; (line = reader.readLine()) != null; lineNo++){ if(chosenRows.contains(lineNo)){ // Do nothing for the moment } else { notChosen.write(line); } } // Randomise the set of chosen rows // Use RandomAccessFile to write the rows in that order
See this answer for the nextLong method, which produces a random long scale up to a specific size.
Edit: Like most people, I overlooked the requirement for writing randomly selected lines in random order. I assume that RandomAccessFile will help with this. Just rank the list with the selected rows and access them in that order. As for unchosen, I edited the code above to just ignore the selected ones.
source share