I ran my code, which so far worked well, but ran into a strange error. The part of the code that went wrong takes the โfieldโ from the input string and converts it to a long one.
The input line when an exception is thrown is copied from the command line. I can not get the actual line from the file since the input file is 144 GB and I am not looking for this! However, everything looks fine. In the file, the elements should be divided into a tab.
SBL_XSMJR247_ID:2331788_ 99 mm17 35305666 70 89M =35305769 190 NNGTCTTGGAGATCACGAGGCCCATGACAGCATGGAACAAGTCATGTGAAGCCCAGCCAGACACGATGAAAAATTTATAGACAAAAAGA ~~JGIIJJ@CFGIJJJJIJJIJJJIEIGJJIJJJJJJJHIJIJIIFIIJJJHFHH ?DFFCACEEDDDDDDDDBBDDDEEDDDDDDDD@B xl:i:35305666 xr:i:35305754 xs:i:89 xd:A:f xm:A:u xa:A:"" xL:i:35305666 xR:i:35305855 xS:i:190xW:i:14 xP:i:0 xQ:i:0 xC:A:"" xD:A:"" PG:Z:novoalign AS:i:12 UQ:i:12 NM:i:2 MD:Z:0G0C87 PQ:i:18 SM:i:70 AM:i:70
An error message is displayed:
nfeerror: For input string: "35305666" xL
Yes, this is a little ugly.
Finally, the code that was run to receive the error.
private long findValueLong(String line, String field) { try { String[] lineComps = line.split("\t"); for (String component : lineComps) { if (component.startsWith(field)) { return Long.parseLong(component.split(":")[component.split(":").length - 1]); } } }catch(NumberFormatException nfe){ System.out.println("nfe error: " + nfe.getMessage() + " " + field); System.out.println(line); System.exit(0); } return 0L; }
A discarded message implies that the code found the long part of the xL field correctly -> "35305666", but it did not seem to be able to parse the result. The value is many, much less than the maximum value for a long one, so this cannot be.
Any ideas?
Thanks.
Update 1:
I added a few more println statements and rerun the code to see what it was parsing. I added:
System.out.println(component) System.out.println(component.split...)
The values โโreturned for the two statements are:
xL:i:35305666 35305666
I cross-reference values โโwith a file, and there are no hidden non-printable characters. The value appears as ...\txL:i:35305666\t... Now I added .trim () to the end of the function, which gets the final value for parsing, and re-runs the code again .. let's see what happens!
Update 2:
Adding .trim() to the end of the line that is being processed seems to have solved the problem. Was the split function supposed to add spaces somewhere on the line? This is a very strange behavior, though, since I run ~ 160 million of these lines, and the error occurred only once. There is nothing special about this line, and it was generated in the same way as the others. trim() seems to be a workaround rather than a solution ...