How to extract 3000 characters after some words that appear several times in a text file?

Question

How to extract 3000 characters after some words that appear several times in a text file?

I have a text file:

"Accounting principles. Negative collateral conditions. Auxiliary distributions. Business lines ...... Accounting Principles: defined in the definition of IFRS. Administrative Agent: SVB ...... In case any Accounting Principles (as defined below), and such change results ... "

In this file, “Accounting Principles” appears three times, and “IFRS” appears once.

I am trying to extract 3,000 characters (or 300 words) after each “Accounting Principles” and “IFRS”. Now I can extract symbols only after the first appearance of the “Accounting Principles” and write separate codes for the “Accounting Principles” and “IFRS”. So my question is how to extract 3,000 characters after each appearance of the “Accounting Principles” and how to write one code that I can consider with the “Accounting Principles” and “IFRS” together, instead of using two separate codes?

Many thanks!

My code is as follows:

import os
sourcepath=os.listdir('try/')
for filename in sourcepath:
    inputfile='try/'+filename
    with open(inputfile, 'r') as f:
        text=f.read()
        index=text.index('Accounting Principles')
        right=text[index: index+3000]
        print(right)

import os
sourcepath=os.listdir('try/')
for filename in sourcepath:
    inputfile='try/'+filename
    with open(inputfile, 'r') as f:
        text=f.read()
        index=text.index('IFRS')
        right=text[index: index+3000]
        print(right)

+4

python

WUTONG Apr 7 '18 at 13:33

source share

2 answers

Robᵩ · Answer 1 · 2018-04-07T13:55:49+0000

" " "" 30 .

import re

with open('x.in') as fp:
    text = fp.read()

for m in re.finditer("Accounting Principles|IFRS", text):
    print(text[m.start():m.end()+30])

Ajax1234 · Answer 2 · 2018-04-07T14:11:54+0000

re.sub , "Accounting Principles" "IFRS", full_string

marked_data = re.sub('Accounting\sPrinciples|IFRS', '*', open('filename.txt').read())
new_data = [marked_data[i:i+3000] for i in range(len(marked_data)-3000)]

How to extract 3000 characters after some words that appear several times in a text file?

More articles: