I am compiling full HTML from a service that provides access to a very large collection of blogs and news websites. I test HTML as it is (in real time) to see if it contains multiple keywords. If it contains one of the keywords, I write HTML to a text file to save it.
I want to do this for a week. Therefore, I collect a large amount of data. Testing the program for 3 minutes yielded a 100 MB text file. I have 4 TB of space and I can’t use more of this.
In addition, I do not want the text files to become too large, because I assume that they will become inaccessible.
I suggest opening a text file and writing HTML code, often checking its size. If it gets larger, say 200 MB, I close the text file and open another. I also need to save the execution log as much as I used in total so that I can make sure that I am not approaching 4 TB.
The question I have at this point is to check the size of the text file before the file has been closed (using FileWriter.close ()). Is there a function for this, or should I count the number of characters written to the file and use them to estimate the file size?
A separate question: are there ways to minimize the amount of space that my text files occupy? I am working in Java.
source share