chrisb answer gives you everything you need to know, but if you play the gory details ...
But, firstly, excerpts from a long analysis in a nutshell:
For free functions, there is not much difference between cpdef and deployment with cdef + def in performance. The resulting c-code is almost identical.
For related methods, cpdef -approach might be slightly faster with inheritance hierarchies, but nothing needs to be worried too much.
Using cpdef -syntax has its advantages, as the resulting code is clearer (at least for me) and shorter.
Free Functions:
When we define something stupid:
cpdef do_nothing_cp(): pass
the following happens:
- a quick c function is created (in this case it has the cryptic name
__pyx_f_3foo_do_nothing_cp because my extension is called foo , but you really only need to look for the f prefix). - a python function is also created (called
__pyx_pf_3foo_2do_nothing_cp - the pf prefix), it does not duplicate the code and does not call a fast function somewhere in the path. - a python shell is created called
__pyx_pw_3foo_3do_nothing_cp ( pw prefix) do_nothing_cp a method definition is issued, this is what is needed for the python shell, and this is the place where the function is stored that should be called when foo.do_nothing_cp called.
Here you can see it in the generated c-code:
static PyMethodDef __pyx_methods[] = { {"do_nothing_cp", (PyCFunction)__pyx_pw_3foo_3do_nothing_cp, METH_NOARGS, 0}, {0, 0, 0, 0} };
For the cdef function, only the first step is performed; for the def function, only steps 2-4 are performed.
Now, when we load the foo module and call foo.do_nothing_cp() , the following happens:
- The function pointer associated with the name
do_nothing_cp is found in our case, the python-wrapper pw function. pw function is called through a pointer function and calls the pf function (as C functionality).pf function causes a fast f function.
What happens if we call do_nothing_cp inside a cython module?
def call_do_nothing_cp(): do_nothing_cp()
Obviously, cython does not need a python mechanism to define a function in this case - it can directly use the fast f function through a call to the c function, bypassing the pw and pf functions.
What happens if we end the cdef function in a def function?
cdef _do_nothing(): pass def do_nothing(): _do_nothing()
Cython does the following:
- a fast
_do_nothing function is created corresponding to the function f above. - a
pf function for do_nothing that calls _do_nothing somewhere in the path. - python-wrapper function is created, i.e.
pw that wraps the pf function - the functionality is tied to
foo.do_nothing with a function pointer to the python-wrapper pw function.
As you can see, there is not much difference with cpdef approach.
cdef functions are just a c function, but the def and cpdef are python functions of the first class - you can do something like this:
foo.do_nothing=foo.do_nothing_cp
In terms of performance, we cannot expect much difference here:
>>> import foo >>> %timeit foo.do_nothing_cp 51.6 ns ± 0.437 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) >>> %timeit foo.do_nothing 51.8 ns ± 0.369 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
If we look at the resulting machine code ( objdump -d foo.so ), we will see that the C compiler has entered all the calls for the cpdef version of do_nothing_cp :
0000000000001340 <__pyx_pw_3foo_3do_nothing_cp>: 1340: 48 8b 05 91 1c 20 00 mov 0x201c91(%rip),%rax 1347: 48 83 00 01 addq $0x1,(%rax) 134b: c3 retq 134c: 0f 1f 40 00 nopl 0x0(%rax)
but not for expanded do_nothing (I have to admit, I'm a little surprised and still do not understand the reasons):
0000000000001380 <__pyx_pw_3foo_1do_nothing>: 1380: 53 push %rbx 1381: 48 8b 1d 50 1c 20 00 mov 0x201c50(%rip),%rbx
This may explain why the cpdef version is a bit faster, but in any case, the difference has nothing to do with the overhead of calling a python function.
<strong> Method Class:
The situation is a bit more complicated for class methods due to possible polymorphism. Let's start with:
cdef class A: cpdef do_nothing_cp(self): pass
At first glance, there is not much difference in the above case:
- Fixed fast, c-only,
f prefix version of function - Fixed python version (prefix
pf ) that calls f function - The python wrapper (
pw prefix) wraps the pf version and is used for registration. do_nothing_cp registered as a class A method via tp_methods PyTypeObject .
As can be seen from the resulting c file:
static PyMethodDef __pyx_methods_3foo_A[] = { {"do_nothing", (PyCFunction)__pyx_pw_3foo_1A_1do_nothing_cp, METH_NOARGS, 0}, ... {0, 0, 0, 0} }; .... static PyTypeObject __pyx_type_3foo_A = { ... __pyx_methods_3foo_A, ... };
Obviously, the linked version should have an implicit parameter self as an additional argument, but there is something else: The function f executes the dispatch function, if it is not called from the corresponding pf function, this dispatch looks like this (I save only the important parts):
static PyObject *__pyx_f_3foo_1A_do_nothing_cp(CYTHON_UNUSED struct __pyx_obj_3foo_A *__pyx_v_self, int __pyx_skip_dispatch) { if (unlikely(__pyx_skip_dispatch)) ;//__pyx_skip_dispatch=1 if called from pf-version /* Check if overridden in Python */ else if (look-up if function is overriden in __dict__ of the object) use the overriden function } do the work.
Why is this needed? Consider the following foo extension:
cdef class A: cpdef do_nothing_cp(self): pass cdef class B(A): cpdef call_do_nothing(self): self.do_nothing()
What happens when we call B().call_do_nothing() ?
- `B-pw-call_do_nothing 'is located and called.
- it calls
B-pf-call_do_nothing , - which calls
Bf-call_do_nothing , - which calls
Af-do_nothing_cp , bypassing pw and pf versions.
What happens when we add the following C class, which overrides the do_nothing_cp function?
import foo def class C(foo.B): def do_nothing_cp(self): print("I do something!")
Now calling C().call_do_nothing() results in:
call_do_nothing' of the C -class being located and called which means, pw-call_do_nothing' of class B , which is located and called,- which calls
B-pf-call_do_nothing , - which calls
Bf-call_do_nothing , - which calls
Af-do_nothing (as we already know!), bypassing pw and pf versions.
And now at stage 4. we need to send a call to Af-do_nothing() to get the correct call to C.do_nothing() ! Fortunately, we have this dispatch in this function!
Making it harder: what if class C also cdef ? Sending through __dict__ will not work because cdef-classes does not have __dict__ ?
For cdef classes, polymorphism is implemented similarly to C ++ "virtual tables", therefore in B.call_do_nothing() the f-do_nothing not called directly, but through a pointer that depends on the class of the object (you can see that these "virtual tables" are configured in __pyx_pymod_exec_XXX , e.g. __pyx_vtable_3foo_B.__pyx_base ). Thus, the __dict__ -dispatch function in the Af-do_nothing() function is not needed in the case of a pure cdef hierarchy.
As for performance, comparing cpdef with cdef + def , I get:
cpdef def+cdef A.do_nothing 107ns 108ns B.call_nothing 109ns 116ns
so the difference is not that big if someone cpdef will be a little faster.