Wednesday, May 26, 2010

cython timings test

The TASK : To optimize cython functions

Detailed: functions which depend on a once initialized attribute value

This often comes handy in many cases, for example to write a Laplacian function of a scalar field in spherical/axisymmetric coordinate system, you would need three independent cases for 1,2,3 dimensions for performance purposes and if u do not write all functions as general 3D functions.


The test CODE : test_kernel.pyx


cdef class Kernel:
    cdef int dim
    cdef double (*func)(Kernel,double)
    def __init__(self, dim=1):
        self.dim = dim
        if dim == 1:
            self.func = self.func1
        elif dim == 2:
            self.func = self.func2
  
    cdef double func1(self, double x):
        return 1+x
  
    cdef double func2(self, double x):
        return 2+x
  
    cdef double c_func(self, double x):
        '''this is only to make function signature compatible with func1 and func2'''
        return self.func(self, x)
  
    def p_func(self, double x):
        return self.func(self, x)
  
    cpdef double py_func(self, double x):
        return self.func(self, x)
  
    cpdef double py_c_func(self, double x):
        return self.c_func(x)
  
    def py_func1(self, x):
        return self.func1(x)
  
    def py_func2(self, x):
        return self.func2(x)
  
    cdef double func_common(self, double x):
        cdef int dim = self.dim
        if dim == 1:
            return 10+x
        elif dim == 2:
            return 20+x
  
    def py_func_c_common(self, x):
        return self.func_common(x)
  
    cpdef double py_func_common(self, double x):
        cdef int dim = self.dim
        if dim == 1:
            return 10+x
        elif dim == 2:
            return 20+x

Compilation command:
    cython -a test_kernel.pyx;
    gcc <optimization-flag> -shared -fPIC test_kernel.c -lpython2.6 -I /usr/include/python2.6/ -o test_kernel.so
where optimization flag is either empty or "-O2" or "-O3"

Cython optimization
Tip 1:
Type (cdef) as many variables as you can. You also need to type the locals in each function. Try to try to use C data types wherever possible.
Tip 2:
use:
    cython -a file.pyx
command to generate a html file which shows lines which cause expensive python functions to be called. Clicking on a line shows the corresponding C code generated, highlighting expensive calls in shades of red. Try to eliminate as many such calls as you can.

The TEST :

time_kernel.py

import timeit

def time(s):
    '''returns time in microseconds'''
    t = 1e6*timeit.timeit(s,'import test_kernel;k1=test_kernel.Kernel(1);k2=test_kernel.Kernel(2);',number=1000000)/1000000.
    print s, t
    return t

time('k1.p_func(0)')
time('k1.py_func(0)')
time('k1.py_func1(0)')
time('k1.py_c_func(0)')
time('k1.py_func_c_common(0)')
time('k1.py_func_common(0)')

time('k2.p_func(0)')
time('k2.py_func(0)')
time('k2.py_func2(0)')
time('k2.py_c_func(0)')
time('k2.py_func_c_common(0)')
time('k2.py_func_common(0)')

Timings :



functiontime (μs)(ns)

Optimization flag ->None-O2-O3sum(k1+k2)/2penalty
1k1.p_func(0)0.201780.183210.180350.188450.193680.0000
2k1.py_func(0)0.232240.185990.183930.200720.195411.7345
3k1.py_func1(0)0.214770.189910.192520.199070.198024.3456
4k1.py_c_func(0)0.233950.191960.192430.206110.197613.9311
5k1.py_func_c_common(0)0.195660.184580.190620.190290.197673.9960
6k1.py_func_common(0)0.219810.187070.189840.198910.195101.4237
7k2.p_func(0)0.204480.183880.181940.19010

8k2.py_func(0)0.217980.188590.184370.19698

9k2.py_func2(0)0.204130.181240.181940.18910

10k2.py_c_func(0)0.231140.191660.192380.20506

11k2.py_func_c_common(0)0.198600.187830.187450.19129

12k2.py_func_common(0)0.216090.187470.186400.19666


Average0.215600.186810.187030.19648


Result :

The best is to write separate C function and a python accessor function.

task
functionpenalty cost (ns)
C function + python accessor : base casep_func
cpdef instead of defpy_func1.7345
calling a cdef class method instead of a function pointer attributepy_func1,py_func24.3456
one extra c function callpy_c_func3.9311
(def + cdef) instead of (cpdef)py_func_c_common-py_func_common2.5723
One C comparison vs one C function callpy_func_common1.4237

Conclusion :

As can be clearly seen that the results are clearly inconclusive :)
This was a small test carried on my laptop with no controlled environment. Also thought the results seemed close to repeatable, nevertheless many trials should be conduction and each value should have a standard deviation also to check the repeatability. However one clear conclusion is do not forget to add optimization flags. Setuptools already does that for you.
Also using a function pointer is not so bad after all. It would become more advantageous in case of more number of comparisons.
Cython provides great speedups (who didn't know that :) ). The pure python version of py_func_common took 0.408μs for dim=1 and 0.518μs for dim=2
These results are purely from python point of view. The effect of cdef/cpdef should also be considered in c/cython code which calls these functions.

CAVEAT:

I am no optimization expert. I have done this out of out of sheer boredom :)
If anyone wants to verify, you are welcome
Any information content is purely coincindental

No comments:

Post a Comment