I download data from the site, and the site gives me data in very large blocks. In the largest block there are "pieces" that I need to disassemble individually. These "chunks" begin with "(ClinicalData)" and end with "(/ ClinicalData)". Therefore, an example line would look something like this:
(ClinicalData)(ID="1")(/ClinicalData)(ClinicalData)(ID="2")(/ClinicalData)(ClinicalData)(ID="3")(/ClinicalData)(ClinicalData)(ID="4")(/ClinicalData)(ClinicalData)(ID="5")(/ClinicalData)
In “ideal” circumstances, a block is intended for one line of data, but sometimes erroneous newline characters appear. Since I want to parse fragments (ClinicalData) in a block, I want my data to be parsed sequentially. So I take a text file, read it all in a StringBuilder, delete new lines (just in case), and then insert my own translation lines, so I can read line by line.
StringBuilder dataToWrite = new StringBuilder(File.ReadAllText(filepath), Int32.MaxValue);
// Need to clear newline characters just in case they exist.
dataToWrite.Replace("\n", "");
// set my own newline characters so the data becomes parse-able by line
dataToWrite.Replace("<ClinicalData", "\n<ClinicalData");
// set the data back into a file, which is then used in a StreamReader to parse by lines.
File.WriteAllText(filepath, dataToWrite.ToString());
This works fine (although it may be inefficient, but at least I really like it :)) until I came across a piece of data that gives me as a large file of 280 MB in size.
System.OutOfMemoryException , , . , , StringBuilder 280 ? , , regex.match "(ClinicalData) , . (: .ReadBytes).
, 280MB , , , !