I have data in tab delimited format that looks like this:
0/0:23:-1.03,-7.94,-83.75:69.15 0/1:34:-1.01,-11.24,-127.51:99.00 0/0:74:-1.02,-23.28,-301.81:99.00
I'm only interested in the first 3 characters of each record (i.e. 0/0 and 0/1). I decided that the best way to do this - use matchand genfromtxtin numpy. This example until I got:
import re
csvfile = 'home/python/batch1.hg19.table'
from numpy import genfromtxt
data = genfromtxt(csvfile, delimiter="\t", dtype=None)
for i in data[1]:
m = re.match('[0-9]/[0-9]', i)
if m:
print m.group(0),
else:
print "NA",
This works for the first line of data, but it's hard for me to determine how to expand it for each line of the input file.
Should I make it a function and apply it to each line separately or is there a more pythonic way to do this?
source
share