Creating a directed word graph in python

From the list of offers I want to generate a directed graph for generating sub-offers in accordance with the following property:

Suppose there are three sentences:

  • I love candy
  • I love you
  • I really like sweets.

The graph will have edges between each consecutive words with a weight of 1 each time a pair occurs.

Then I find out the path on the chart with the maximum weight. Here he returns "I really like sweets" with a weight of 3 + 2 + 1 + 1

str1 = """Man who run in front of car, get tired. Man who run behind car, get exhausted.""" # create a list of words separated at whitespaces wordList1 = str1.split(None) # strip any punctuation marks and build modified word list # start with an empty list wordList2 = [] for word1 in wordList1: # last character of each word lastchar = word1[-1:] # use a list of punctuation marks if lastchar in [",", ".", "!", "?", ";"]: word2 = word1.rstrip(lastchar) else: word2 = word1 # build a wordList of lower case modified words wordList2.append(word2.lower()) 

Now wordList2 has a list of consecutive words. How to convert it to a schedule. I use the networkx library but am not very familiar with it.

How can I continue to create an edge-weighted graph?

+4
source share
2 answers

Here is a solution to your problem using networkX.

This code will generate a directed graph, dG. dG will have one node per word, with a count attribute indicating how many times this word has been seen. It will have one directed front per one bigram, with the attribute "weight" indicating how many times the bigram has occurred.

 import networkx as nx import string from sys import maxint str1 = """Man who run in front of car, get tired. Man who run behind car, get exhausted.""" wordList1 = str1.split(None) wordList2 = [string.rstrip(x.lower(), ',.!?;') for x in wordList1] dG = nx.DiGraph() for i, word in enumerate(wordList2): try: next_word = wordList2[i + 1] if not dG.has_node(word): dG.add_node(word) dG.node[word]['count'] = 1 else: dG.node[word]['count'] += 1 if not dG.has_node(next_word): dG.add_node(next_word) dG.node[next_word]['count'] = 0 if not dG.has_edge(word, next_word): dG.add_edge(word, next_word, weight=maxint - 1) else: dG.edge[word][next_word]['weight'] -= 1 except IndexError: if not dG.has_node(word): dG.add_node(word) dG.node[word]['count'] = 1 else: dG.node[word]['count'] += 1 except: raise 

To view the contents of a graph, you can print nodes and edges.

 for node in dG.nodes(): print '%s:%d\n' % (node, dG.node[node]['count']) for edge in dG.edges(): print '%s:%d\n' % (edge, maxint - dG.edge[edge[0]][edge[1]]['weight']) 

Note that the weight of the bigram edge, instead of counting from zero, is counted from maxint. This is because the networkX shortest path utilities will use the weight value as the cost for each edge. By making our highest estimates the smallest weights, we can use these utilities to find the path with the highest edge values.

for example, we can get the path with the highest counter between the word "man" and the word "exhausted":

 >>>shortest_path = nx.shortest_path(dG, source='man', target='exhausted', weight='weight') >>>print shortest_path ['man', 'who', 'run', 'behind', 'car', 'get', 'exhausted'] 

or we can get the paths with the highest number of counts between the word "man" and all other words:

 >>>shortest_paths = nx.shortest_path(dG, source='man', weight='weight') >>>print shortest_paths {'run': ['man', 'who', 'run'], 'get': ['man', 'who', 'run', 'behind', 'car', 'get'], 'exhausted': ['man', 'who', 'run', 'behind', 'car', 'get', 'exhausted'], 'car': ['man', 'who', 'run', 'behind', 'car'], 'who': ['man', 'who'], 'behind': ['man', 'who', 'run', 'behind'], 'of': ['man', 'who', 'run', 'in', 'front', 'of'], 'in': ['man', 'who', 'run', 'in'], 'front': ['man', 'who', 'run', 'in', 'front'], 'tired': ['man', 'who', 'run', 'behind', 'car', 'get', 'tired'], 'man': ['man']} 

As noted above, there is the possibility of getting cycles on a graph like this, and I'm not sure how well the network shortest path algorithm will handle this case.

Good luck

+5
source

Use the dictionary to store bigrams, adding 1 to the value each time you encounter bigrams. Get the maximum dictionary values ​​to generate the start point, and then recursively call the function that gets the bigram value with the highest value that starts with the last word in the previous bigram until there are no more bigrams starting with the last word. Beware of the possibility of circular graphs, for example. I love that I love that I love ad infinitum (set the recursion limit).

I have never worked with networkx on purpose, but looking at the homepage still applies.

+1
source

Source: https://habr.com/ru/post/1437704/


All Articles