Stanford CoreNLP 3.5.2 : http://nlp.stanford.edu/software/corenlp.shtml
, /User/username/stanford -corenlp-full-2015-04-20
Python :
stanford_distribution_dir = "/User/username/stanford-corenlp-full-2015-04-20"
list_of_sentences_path = "/Users/username/list_of_sentences.txt"
stanford_command = "cd %s ; java -Xmx2g -cp \"*\" edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ssplit.eolonly -filelist %s -outputFormat json" % (stanford_distribution_dir, list_of_sentences_path)
os.system(stanford_command)
Python .json :
import json
sample_json = json.loads(file("sample_file.txt.json").read()
sample_json .
for sentence in sample_json["sentences"]:
tokens = []
ner_tags = []
for token in sentence["tokens"]:
tokens.append(token["word"])
ner_tags.append(token["ner"])
print (tokens, ner_tags)
list_of_sentences.txt , :
input_file_1.txt
input_file_2.txt
...
input_file_100.txt
input_file.txt( ) input_file.txt.json Java- .json NER. .json (, ner tag sequence). "" , . "json" .json , json.loads(...), , .
, .