Hi all
I am new to python and programming. I need to read fragments of a large text file, the format is as follows:
<word id="8" form="hibernis" lemma="hibernus1" postag="np---nb-" head-"7" relation="ADV"/>
I need form , lemma and postag . For example, for the above, I need hibernis , hibernus1 and np---nb- .
How do I tell python to read until it reaches the form, to read ahead, until it reaches the quotation mark " and then read the information between the quotes of "hibernis" ? Actually struggling with this.
My attempts so far have been to remove punctuation, split the sentence, and then extract the information I need from the list. I have problems getting python to iterate over the whole file, although I can only get this work for 1 line. My code is below:
f=open('blank.txt','r') quotes=f.read() noquotes=quotes.replace('"','') f.close() rf=open('blank.txt','w') rf.write(noquotes) rf.close() f=open('blank.txt','r') finished = False postag=[] while not finished: line=f.readline() words=line.split() postag.append(words[4]) postag.append(words[6]) postag.append(words[8]) finished=True
Thank you for your feedback / criticism
thanks
bob
source share