I just don't think you can do this efficiently without using a bunch of disk. Any form of data structure will have so much memory and / or memory that your algorithm will suffer. So I would expect the sorting solution to be the best.
I believe that you can sort large chunks of a file at a time, and then merge (i.e. from merge-sort) those chunks after. After sorting the piece, obviously, it should return to disk. You can simply replace the data in the input file (assuming it is binary) or write to a temporary file.
As for the records, you just have a bunch of 64-bit values. With 30 GB of RAM, you can store almost 4 billion records at a time. It is very cute. You can sort so much locally with quicksort, or half as many with mergesort. You probably won't get a contiguous block of memory of this size. So you have to break it. This will make quicksort a bit more complicated, so you can also use mergesort in RAM.
During the final merger, it is trivial to delete duplicates. Merging can be completely file-based, but in the worst case, you will use the number of disks equivalent to twice the number of entries in the input file (one file for zero and one file for output). If you can use the input file as a scraper, then you have not exceeded the RAM limit or the limits of your disk (if any).
I think the key here is the requirement that it not be multithreaded. This is well suited for disk storage. Most of your time will be spent accessing the disk. Therefore, you want you to do this as efficiently as possible. In particular, when you sort by volume, you want to minimize the number of searches. You have a large amount of memory as a buffer, so I'm sure you can make it very efficient.
So, let's say your file is 60 GB (and I assume it is binary), so there are about 8 billion records. If you sort by volume in RAM, you can process 15 GB at a time. This means reading and (more) writing the file once. Now there are four pieces. If you want to make a pure merge-sort method, you are always dealing with two arrays. This means that you read and write the file two more times: once to merge each 15-gigabyte fragment into 30 GB and one final merge with them (including dropping duplicates).
I do not think this is too bad. Three times in and out. If you find a good way to quickly sort, then you can probably do this with fewer passes through the file. I believe a data structure such as deque will work well, as it can handle non-contiguous chunks of memory ... But you probably want to create your own and fine-tune your sorting algorithm to use it.