I wrote a Python program that spends a lot of time searching for object attributes and values from dictionary keys. I would like to know if I can optimize these search times in any way, potentially with the C extension, in order to shorten the execution time, or I just need to re-implement the program in a compiled language.
The program implements some algorithms using graphics. It runs slowly on our datasets, so I profiled the code cProfileusing a smaller dataset that could really end up. The vast majority of time is burned in one function, and in particular, in two expressions of generator expressions inside the function:
The expression of the generator in line 202 is equal to
neighbors_in_selected_nodes = (neighbor for neighbor in
node_neighbors if neighbor in selected_nodes)
and the expression of the generator on line 204 is equal to
neighbor_z_scores = (interaction_graph.node[neighbor]['weight'] for
neighbor in neighbors_in_selected_nodes)
The source code for this context function is presented below.
selected_nodesis the setnodes in interaction_graph, which is a NetworkXGraph instance. node_neighborsis an iterator of Graph.neighbors_iter().
Graphhe himself uses dictionaries to store nodes and edges. Its attribute Graph.nodeis a dictionary that stores nodes and their attributes (for example 'weight') in dictionaries belonging to each node.
Each of these searches must be amortized by constant time (i.e. O (1)), however I still pay a large fine for the search. Is there a way I can speed up these searches (for example, by writing parts of this as a C extension), or do I need to move the program to a compiled language?
, ; .
def calculate_node_z_prime(
node,
interaction_graph,
selected_nodes
):
"""Calculates a z'-score for a given node.
The z'-score is based on the z-scores (weights) of the neighbors of
the given node, and proportional to the z-score (weight) of the
given node. Specifically, we find the maximum z-score of all
neighbors of the given node that are also members of the given set
of selected nodes, multiply this z-score by the z-score of the given
node, and return this value as the z'-score for the given node.
If the given node has no neighbors in the interaction graph, the
z'-score is defined as zero.
Returns the z'-score as zero or a positive floating point value.
:Parameters:
- `node`: the node for which to compute the z-prime score
- `interaction_graph`: graph containing the gene-gene or gene
product-gene product interactions
- `selected_nodes`: a `set` of nodes fitting some criterion of
interest (e.g., annotated with a term of interest)
"""
node_neighbors = interaction_graph.neighbors_iter(node)
neighbors_in_selected_nodes = (neighbor for neighbor in
node_neighbors if neighbor in selected_nodes)
neighbor_z_scores = (interaction_graph.node[neighbor]['weight'] for
neighbor in neighbors_in_selected_nodes)
try:
max_z_score = max(neighbor_z_scores)
# max() throws a ValueError if its argument has no elements; in this
# case, we need to set the max_z_score to zero
except ValueError, e:
# Check to make certain max() raised this error
if 'max()' in e.args[0]:
max_z_score = 0
else:
raise e
z_prime = interaction_graph.node[node]['weight'] * max_z_score
return z_prime
cProfiler, .
ncalls tottime percall cumtime percall filename:lineno(function)
156067701 352.313 0.000 642.072 0.000 bpln_contextual.py:204(<genexpr>)
156067701 289.759 0.000 289.759 0.000 bpln_contextual.py:202(<genexpr>)
13963893 174.047 0.000 816.119 0.000 {max}
13963885 69.804 0.000 936.754 0.000 bpln_contextual.py:171(calculate_node_z_prime)
7116883 61.982 0.000 61.982 0.000 {method 'update' of 'set' objects}