I have two files representing records at intervals.
file1.txt
a 5 10
a 13 19
a 27 39
b 4 9
b 15 19
c 20 33
c 39 45
and
file2.txt
something id1 a 4 9 commentx
something id2 a 14 18 commenty
something id3 a 1 4 commentz
something id5 b 3 9 commentbla
something id6 b 16 18 commentbla
something id7 b 25 29 commentblabla
something id8 c 5 59 hihi
something id9 c 40 45 hoho
something id10 c 32 43 haha
What I would like to do is make a file that represents only the entries of file2, for which, if column 3 of file2 is identical to column 1 of file1, the range (columns 4 and 5) is not the same as file1 (column 2 and 3).
The expected output file should be in the file
test.result
something id3 a 1 4 commentz
something id7 b 25 29 commentblabla
I tried using the following python code:
import csv
with open ('file2') as protein, open('file1') as position, open ('test.result',"r+") as fallout:
writer = csv.writer(fallout, delimiter=' ')
for rowinprot in csv.reader(protein, delimiter=' '):
for rowinpos in csv.reader(position, delimiter=' '):
if rowinprot[2]==rowinpos[0]:
if rowinprot[4]<rowinpos[1] or rowinprot[3]>rowinpos[2]:
writer.writerow(rowinprot)
This did not work ... I had the following result:
something id1 a 4 9 commentx
something id1 a 4 9 commentx
something id1 a 4 9 commentx
which, apparently, is not the one I want.
What did I do wrong? It seems to be in conditional loops. However, I could not understand this ...