Python code optimization with many attributes and dictionary lookups

Question

Python code optimization with many attributes and dictionary lookups

I wrote a Python program that spends a lot of time searching for object attributes and values from dictionary keys. I would like to know if I can optimize these search times in any way, potentially with the C extension, in order to shorten the execution time, or I just need to re-implement the program in a compiled language.

The program implements some algorithms using graphics. It runs slowly on our datasets, so I profiled the code cProfileusing a smaller dataset that could really end up. The vast majority of time is burned in one function, and in particular, in two expressions of generator expressions inside the function:

The expression of the generator in line 202 is equal to

    neighbors_in_selected_nodes = (neighbor for neighbor in
            node_neighbors if neighbor in selected_nodes)

and the expression of the generator on line 204 is equal to

    neighbor_z_scores = (interaction_graph.node[neighbor]['weight'] for
            neighbor in neighbors_in_selected_nodes)

The source code for this context function is presented below.

selected_nodesis the setnodes in interaction_graph, which is a NetworkXGraph instance. node_neighborsis an iterator of Graph.neighbors_iter().

Graphhe himself uses dictionaries to store nodes and edges. Its attribute Graph.nodeis a dictionary that stores nodes and their attributes (for example 'weight') in dictionaries belonging to each node.

Each of these searches must be amortized by constant time (i.e. O (1)), however I still pay a large fine for the search. Is there a way I can speed up these searches (for example, by writing parts of this as a C extension), or do I need to move the program to a compiled language?

, ; .

def calculate_node_z_prime(
        node,
        interaction_graph,
        selected_nodes
    ):
    """Calculates a z'-score for a given node.

    The z'-score is based on the z-scores (weights) of the neighbors of
    the given node, and proportional to the z-score (weight) of the
    given node. Specifically, we find the maximum z-score of all
    neighbors of the given node that are also members of the given set
    of selected nodes, multiply this z-score by the z-score of the given
    node, and return this value as the z'-score for the given node.

    If the given node has no neighbors in the interaction graph, the
    z'-score is defined as zero.

    Returns the z'-score as zero or a positive floating point value.

    :Parameters:
    - `node`: the node for which to compute the z-prime score
    - `interaction_graph`: graph containing the gene-gene or gene
      product-gene product interactions
    - `selected_nodes`: a `set` of nodes fitting some criterion of
      interest (e.g., annotated with a term of interest)

    """
    node_neighbors = interaction_graph.neighbors_iter(node)
    neighbors_in_selected_nodes = (neighbor for neighbor in
            node_neighbors if neighbor in selected_nodes)
    neighbor_z_scores = (interaction_graph.node[neighbor]['weight'] for
            neighbor in neighbors_in_selected_nodes)
    try:
        max_z_score = max(neighbor_z_scores)
    # max() throws a ValueError if its argument has no elements; in this
    # case, we need to set the max_z_score to zero
    except ValueError, e:
        # Check to make certain max() raised this error
        if 'max()' in e.args[0]:
            max_z_score = 0
        else:
            raise e

    z_prime = interaction_graph.node[node]['weight'] * max_z_score
    return z_prime

cProfiler, .

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
156067701  352.313    0.000  642.072    0.000 bpln_contextual.py:204(<genexpr>)
156067701  289.759    0.000  289.759    0.000 bpln_contextual.py:202(<genexpr>)
 13963893  174.047    0.000  816.119    0.000 {max}
 13963885   69.804    0.000  936.754    0.000 bpln_contextual.py:171(calculate_node_z_prime)
  7116883   61.982    0.000   61.982    0.000 {method 'update' of 'set' objects}

+3

optimization python dictionary profiling

gotgenes 05 . '10 18:22

4

Lie Ryan · Answer 1 · 2010-04-05T22:52:18+0000

_graph.neighbors_iter (node), ( collection.heapq)? , node_neighbors , node, selected_node, max selected_node.

-, selected_node? , , "interactive_graph.node [] x selected_node" , .

EDIT:

() O (n log n)

, . , , O (n log n), . (, heapq, ), . , , , .

, python list.sort(), timsort, ( O (n) ). O (n log n), , .

, , . , , , , .

Mike DeSimone · Answer 2 · 2010-04-05T23:08:06+0000

, "" ["weight"] ( - ?) .weight ( - ).

, __slots__ :

class Node(object):
    # ... class stuff goes here ...

    __slots__ = ('weight',) # tuple of member names.

EDIT: , NetworkX, , , . -, "FIXME".

, , , , , . , , , , . , , , , , :

,
, , ,
__slots__ , .

__slots__ , , . node .

, :

node
A node Python, None.

, , ,

node
Python, , keyword/value node G.node [n] node n.

, node , ? node? contentString weight ? , , , .

, NetworkX.

, , , , .

, , :

neighbor_z_scores = (interaction_graph.node[neighbor]['weight'] for
        neighbor in node_neighbors if neighbor in selected_nodes)

Lie Ryan · Answer 3 · 2010-04-05T19:05:20+0000

dict KeyErrors, /:

# cache this object
ignode = interaction_graph.node
neighbor_z_scores = []
for neighbor in node_neighbors:
    try:
        neighbor_z_scores.append(ignode[neighbor]['weight'])
    except KeyError:
        pass

getdefault list:

sentinel = object()
# cache this object 
ignode = interaction_graph.node

neighbor_z_scores = (ignode[neighbor]['weight'] for neighbor in node_neighbors)
# using identity testing, it slightly faster
neighbor_z_scores = (neighbor for neighbor in neighbor_z_scores if neighbor is not sentinel)

tzot · Answer 4 · 2010-04-18T01:18:57+0000

, itertools.

:

import itertools as it, operator as op
GET_WEIGHT= op.attrgetter('weight')

:

neighbors_in_selected_nodes = (neighbor for neighbor in
        node_neighbors if neighbor in selected_nodes)

neighbors_in_selected_nodes = it.ifilter(selected_node.__contains__, node_neighbors)

neighbor_z_scores = (interaction_graph.node[neighbor]['weight'] for
        neighbor in neighbors_in_selected_nodes)

neighbor_z_scores = (
    it.imap(
        GET_WEIGHT,
        it.imap(
            interaction_graph.node.__getitem__,
            neighbors_in_selected_nodes)
    )
)

?

Python code optimization with many attributes and dictionary lookups

More articles: