I have a list of lines that all early modern English words end with 'th.' They include: appoints, requires, etc. - all are paired for a single third party.
As part of a much larger project (using my computer to transform the Gutenberg etext of Gargantua and Pantagruel into something more similar to 20th-century English, so that I will be easier to read it). I want to remove the last two or three characters from all these words and replace them with "s", then use the slightly modified function for words that have not yet been modernized, both are included below.
My main problem is that I just can't get Python input right. I find that this part of the language is really confusing at the moment.
Here is a function that removes th:
from __future__ import division
import nltk, re, pprint
def ethrema(word):
if word.endswith('th'):
return word[:-2] + 's'
Here is a function that removes extraneous e:
def ethremb(word):
if word.endswith('es'):
return word[:-2] + 's'
therefore, the words "abateth" and "accuseth" will go through ethrema, but not through ethremb (ethrema), while the word "abhorreth" should go through both.
If anyone can think of a more efficient way to do this, I'm all ears.
Here is the result of my very amateurish attempt to use these functions in a tokenized list of words that need modernization:
>>> eth1 = [w.ethrema() for w in text]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'ethrema'
So yes, this is really a print issue. These are the first functions that I have ever written in Python, and I have no idea how to apply them to real objects.