I will explain what my problem is, because it is important to understand what I want :-).
I am working on a pipeline written in pion that uses several external tools to perform several analyzes of genomics data. One of these tools works with very large fastq files, which in the end are nothing more than regular text files.
Usually these fastq files are gzipped, and since they are plain text, the compression ratio is very high. Most data analysis tools can work with gzipped files, but we have a few that cannot. So, we make unzipp files, work with them and finally compress.
As you can imagine, this process:
- Slower
- Large drive consumption
- Bandwidth consumption (when working on the NFS file system)
So, I'm trying to find a way to βtrickβ these tools into working directly with gzipped files without having to touch the source code of the tools.
I was thinking about using FIFO files, and I tried this, but it doesnβt work if the tool reads the file more than once or if the tool searches for the file.
So basically I have questions:
Is there a way to map the file in memory so you can do something like:
./tool mapped_file (where mapped_file is not a file, but a link to a file with memory mapping.
Do you have any other suggestions on how I can achieve my goal?
Thanks everyone!
source share