The first thing to do in these cases is always a profile:
15147/297 0.032 0.000 0.041 0.000 tree.py:102(__eq__) 15400/200 0.031 0.000 0.106 0.001 tree.py:399(convert) 1 0.023 0.023 0.129 0.129 plcfrs_cython.pyx:52(parse) 6701/1143 0.022 0.000 0.043 0.000 heapdict.py:45(_min_heapify) 18212 0.017 0.000 0.023 0.000 plcfrs_cython.pyx:38(__richcmp__) 10975/10875 0.017 0.000 0.035 0.000 tree.py:75(__init__) 5772 0.016 0.000 0.050 0.000 tree.py:665(__init__) 960 0.016 0.000 0.025 0.000 plcfrs_cython.pyx:118(deduced_from) 46938 0.014 0.000 0.014 0.000 tree.py:708(_get_node) 25220/2190 0.014 0.000 0.016 0.000 tree.py:231(subtrees) 10975 0.013 0.000 0.023 0.000 tree.py:60(__new__) 49441 0.013 0.000 0.013 0.000 {isinstance} 16748 0.008 0.000 0.015 0.000 {hasattr}
The first thing I noticed is that there are very few functions from the cython module itself. Most of them come from the tree.py module and perhaps this is a bottleneck.
Focusing on the cython side, I see the richcmp function:
we can optimize it simply by adding a value type in the method declaration
def __richcmp__(ChartItem self, ChartItem other, int op): ....
It decreases the value
ncalls tottime percall cumtime percall filename:lineno(function) .... 18212 0.011 0.000 0.015 0.000 plcfrs_cython.pyx:38(__richcmp__)
Adding elif syntax instead of single if activates cython switch optimization
if op == 0: return self.label < other.label or self.vec < other.vec elif op == 1: return self.label <= other.label or self.vec <= other.vec elif op == 2: return self.label == other.label and self.vec == other.vec elif op == 3: return self.label != other.label or self.vec != other.vec elif op == 4: return self.label > other.label or self.vec > other.vec elif op == 5: return self.label >= other.label or self.vec >= other.vec
receipt:
17963 0.002 0.000 0.002 0.000 plcfrs_cython.pyx:38(__richcmp__)
trying to figure out exactly where the tree.pyhaps99 conversion happens, I found out that this function inside dopg.py takes all this time
def removeids(tree): """ remove unique IDs introduced by the Goodman reduction """ result = Tree.convert(tree) for a in result.subtrees(lambda t: '@' in t.node): a.node = a.node.rsplit('@', 1)[0] if isinstance(tree, ImmutableTree): return result.freeze() return result
Now I'm not sure that every node in the tree is a ChartItem, and if the getitem value is used elsewhere, but adding these changes:
cdef class ChartItem: cdef public str label cdef public str root cdef public long vec cdef int _hash __slots__ = ("label", "vec", "_hash") def __init__(ChartItem self, label, int vec): self.label = intern(label) #.rsplit('@', 1)[0]) self.root = intern(label.rsplit('@', 1)[0]) self.vec = vec self._hash = hash((self.label, self.vec)) def __hash__(self): return self._hash def __richcmp__(ChartItem self, ChartItem other, int op): if op == 0: return self.label < other.label or self.vec < other.vec elif op == 1: return self.label <= other.label or self.vec <= other.vec elif op == 2: return self.label == other.label and self.vec == other.vec elif op == 3: return self.label != other.label or self.vec != other.vec elif op == 4: return self.label > other.label or self.vec > other.vec elif op == 5: return self.label >= other.label or self.vec >= other.vec def __getitem__(ChartItem self, int n): if n == 0: return self.root elif n == 1: return self.vec def __repr__(self): #would need bitlen for proper padding return "%s[%s]" % (self.label, bin(self.vec)[2:][::-1])
and inside the sample preparation itself:
from libc cimport pow def mostprobableparse... ... cdef dict parsetrees = <dict>defaultdict(float) cdef float prob m = 0 for n,(a,prob) in enumerate(derivations): parsetrees[a] += pow(e,prob) m += 1
I get:
189345 function calls (173785 primitive calls) in 0.162 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 6701/1143 0.025 0.000 0.037 0.000 heapdict.py:45(_min_heapify) 1 0.023 0.023 0.120 0.120 plcfrs_cython.pyx:54(parse) 960 0.018 0.000 0.030 0.000 plcfrs_cython.pyx:122(deduced_from) 5190/198 0.011 0.000 0.015 0.000 tree.py:102(__eq__) 6619 0.006 0.000 0.006 0.000 heapdict.py:67(_swap) 9678 0.006 0.000 0.008 0.000 plcfrs_cython.pyx:137(concat)
so the next steps are to optimize heapify and deduced_from
deduce_from can be optimized a bit more:
cdef inline deduced_from(ChartItem Ih, double x, pyCx, pyunary, pylbinary, pyrbinary, int bitlen): cdef str I = Ih.label cdef int Ir = Ih.vec cdef list result = [] cdef dict Cx = <dict>pyCx cdef dict unary = <dict>pyunary cdef dict lbinary = <dict>pylbinary cdef dict rbinary = <dict>pyrbinary cdef ChartItem Ilh cdef double z cdef double y cdef ChartItem I1h for rule, z in unary[I]: result.append((ChartItem(rule[0][0], Ir), ((x+z,z), (Ih,)))) for rule, z in lbinary[I]: for I1h, y in Cx[rule[0][2]].items(): if concat(rule[1], Ir, I1h.vec, bitlen): result.append((ChartItem(rule[0][0], Ir ^ I1h.vec), ((x+y+z, z), (Ih, I1h)))) for rule, z in rbinary[I]: for I1h, y in Cx[rule[0][1]].items(): if concat(rule[1], I1h.vec, Ir, bitlen): result.append((ChartItem(rule[0][0], I1h.vec ^ Ir), ((x+y+z, z), (I1h, Ih)))) return result
I will stop here, although I am sure that we can continue to optimize, because this will complicate the problem.
The unittest series would be useful to argue that each optimization does not introduce any subtle error.
Note, try using spaces instead of tabs.