Parsing the name and degree?

I am trying to parse a string containing a name and a degree. I have a long list of them. Some do not contain degrees, some contain one, and some contain several.

Examples of lines:

Sam da Man JD Green Eggs Jr. Ed.M. Argle Bargle Sr. MA Cersei Lannister MA Ph.D. 

As far as I can tell, degrees are included in the following patterns:

 xxxxxxxxx. x.xx. xx.xxxxx. two caps (ex: 'MA') 

How would I make it out?

I am new to regex and this problem has proven to be very time consuming. I used this post and tried split = re.split('\s+|([.])',s) and split = re.split('\s+|\.',s) , but they were still divided into the first a place.

I thought, in response to the first comment, about assigning a degree. I am trying to create a regular expression that recognizes "xx" and then a wildcard, because there are several patterns in degrees that look like this: xx (something): xxxxxxxxx.

and then I will have a few more classifications.

Alternatively, classifying a name might be easier?

Or even listing the degrees in a collection and searching for them?

 {'MAT','Ph.D.','MA','JD','Ed.M.', 'MA', 'MBA', 'Ed.S.', 'M.Div.', 'M.Ed.", 'RN', 'BSEd.'} 
+4
source share
2 answers

Try changing your "junior", "older", ... replacing them with something like this: "Jr ~", "Sr ~", ... This is a regular expression for this:

 / (Jr|Sr)\. / $1~ /g 

(see here )

You get this line:

 Sam da Man JD Green Eggs Jr~ Ed.M. Argle Bargle Sr~ MA Cersei Lannister MA Ph.D. 

Now you can easily capture degrees with this regular expression:

 / (MA|RN|([AZ][az]?[az]?\.)+) /g 

(see here )

0
source

you can use this:

 '[ ](MA|RN|([AZ][az]?[az]?\.){2,3})' 

he does not accept a single word with a single dot

0
source

Source: https://habr.com/ru/post/1489287/


All Articles