I need to parse text files, where the relevant information is often distributed over several lines in a non-linear way. Example:
1234
1 IN THE SUPERIOR COURT OF THE STATE OF SOME STATE
2 IN AND FOR THE COUNTY OF SOME COUNTY
3 UNLIMITED JURISDICTION
4 --o0o--
5
6 JOHN SMITH AND JILL SMITH, )
)
7 Plaintiffs, )
)
8 vs. ) No. 12345
)
9 ACME CO, et al., )
)
10 Defendants. )
___________________________________)
I need to pull out the Plaintiff and Respondent identifiers.
These transcripts have a very wide variety of forms, so I can not always count on those beautiful parentheses that were there, or the information of the plaintiff and defendant was neatly packed, for example:
1 SUPREME COURT OF THE STATE OF SOME OTHER STATE
COUNTY OF COUNTYVILLE
2 First Judicial District
Important Litigation
3 --------------------------------------------------X
THIS DOCUMENT APPLIES TO:
4
JOHN SMITH,
5 Plaintiff, Index No.
2000-123
6
DEPOSITION
7 - against - UNDER ORAL
EXAMINATION
8 OF
JOHN SMITH,
9 Volume I
10 ACME CO,
et al,
11 Defendants.
12 --------------------------------------------------X
Two constants:
- "Claimant" will appear after the name of the claimant (s), but not necessarily in one line.
- The plaintiffs and the names of the defendants will be in upper case.
Any ideas?
source
share