Wednesday, May 26, 2010

cython timings test

The TASK : To optimize cython functions

Detailed: functions which depend on a once initialized attribute value

This often comes handy in many cases, for example to write a Laplacian function of a scalar field in spherical/axisymmetric coordinate system, you would need three independent cases for 1,2,3 dimensions for performance purposes and if u do not write all functions as general 3D functions.


The test CODE : test_kernel.pyx


cdef class Kernel:
    cdef int dim
    cdef double (*func)(Kernel,double)
    def __init__(self, dim=1):
        self.dim = dim
        if dim == 1:
            self.func = self.func1
        elif dim == 2:
            self.func = self.func2
  
    cdef double func1(self, double x):
        return 1+x
  
    cdef double func2(self, double x):
        return 2+x
  
    cdef double c_func(self, double x):
        '''this is only to make function signature compatible with func1 and func2'''
        return self.func(self, x)
  
    def p_func(self, double x):
        return self.func(self, x)
  
    cpdef double py_func(self, double x):
        return self.func(self, x)
  
    cpdef double py_c_func(self, double x):
        return self.c_func(x)
  
    def py_func1(self, x):
        return self.func1(x)
  
    def py_func2(self, x):
        return self.func2(x)
  
    cdef double func_common(self, double x):
        cdef int dim = self.dim
        if dim == 1:
            return 10+x
        elif dim == 2:
            return 20+x
  
    def py_func_c_common(self, x):
        return self.func_common(x)
  
    cpdef double py_func_common(self, double x):
        cdef int dim = self.dim
        if dim == 1:
            return 10+x
        elif dim == 2:
            return 20+x

Compilation command:
    cython -a test_kernel.pyx;
    gcc <optimization-flag> -shared -fPIC test_kernel.c -lpython2.6 -I /usr/include/python2.6/ -o test_kernel.so
where optimization flag is either empty or "-O2" or "-O3"

Cython optimization
Tip 1:
Type (cdef) as many variables as you can. You also need to type the locals in each function. Try to try to use C data types wherever possible.
Tip 2:
use:
    cython -a file.pyx
command to generate a html file which shows lines which cause expensive python functions to be called. Clicking on a line shows the corresponding C code generated, highlighting expensive calls in shades of red. Try to eliminate as many such calls as you can.

The TEST :

time_kernel.py

import timeit

def time(s):
    '''returns time in microseconds'''
    t = 1e6*timeit.timeit(s,'import test_kernel;k1=test_kernel.Kernel(1);k2=test_kernel.Kernel(2);',number=1000000)/1000000.
    print s, t
    return t

time('k1.p_func(0)')
time('k1.py_func(0)')
time('k1.py_func1(0)')
time('k1.py_c_func(0)')
time('k1.py_func_c_common(0)')
time('k1.py_func_common(0)')

time('k2.p_func(0)')
time('k2.py_func(0)')
time('k2.py_func2(0)')
time('k2.py_c_func(0)')
time('k2.py_func_c_common(0)')
time('k2.py_func_common(0)')

Timings :



functiontime (μs)(ns)

Optimization flag ->None-O2-O3sum(k1+k2)/2penalty
1k1.p_func(0)0.201780.183210.180350.188450.193680.0000
2k1.py_func(0)0.232240.185990.183930.200720.195411.7345
3k1.py_func1(0)0.214770.189910.192520.199070.198024.3456
4k1.py_c_func(0)0.233950.191960.192430.206110.197613.9311
5k1.py_func_c_common(0)0.195660.184580.190620.190290.197673.9960
6k1.py_func_common(0)0.219810.187070.189840.198910.195101.4237
7k2.p_func(0)0.204480.183880.181940.19010

8k2.py_func(0)0.217980.188590.184370.19698

9k2.py_func2(0)0.204130.181240.181940.18910

10k2.py_c_func(0)0.231140.191660.192380.20506

11k2.py_func_c_common(0)0.198600.187830.187450.19129

12k2.py_func_common(0)0.216090.187470.186400.19666


Average0.215600.186810.187030.19648


Result :

The best is to write separate C function and a python accessor function.

task
functionpenalty cost (ns)
C function + python accessor : base casep_func
cpdef instead of defpy_func1.7345
calling a cdef class method instead of a function pointer attributepy_func1,py_func24.3456
one extra c function callpy_c_func3.9311
(def + cdef) instead of (cpdef)py_func_c_common-py_func_common2.5723
One C comparison vs one C function callpy_func_common1.4237

Conclusion :

As can be clearly seen that the results are clearly inconclusive :)
This was a small test carried on my laptop with no controlled environment. Also thought the results seemed close to repeatable, nevertheless many trials should be conduction and each value should have a standard deviation also to check the repeatability. However one clear conclusion is do not forget to add optimization flags. Setuptools already does that for you.
Also using a function pointer is not so bad after all. It would become more advantageous in case of more number of comparisons.
Cython provides great speedups (who didn't know that :) ). The pure python version of py_func_common took 0.408μs for dim=1 and 0.518μs for dim=2
These results are purely from python point of view. The effect of cdef/cpdef should also be considered in c/cython code which calls these functions.

CAVEAT:

I am no optimization expert. I have done this out of out of sheer boredom :)
If anyone wants to verify, you are welcome
Any information content is purely coincindental

1 comment:


  1. One of those two post-production platinum models did rolex replica sale eventually leave Patek, at the historic "The Art of Patek Philippe" auction in 1989, and was sold privately until it ended up replica watches uk with rock legend and serious watch collector Eric Clapton. It was, and will certainly remain, the only white-metal 2499 in private hands. Since it left Patek, it was available to the public for the first time when it was sold by rolex replica sale Christie's in 2012. The final bid for the 2499/100P exceeded 3.6 million US dollars. Although Patek has stopped producing their Reference 5004, for the 5th Only Watch auction (held in September, 2013) they went on to create replica watches one final and completely unique version, the 5004T. It is housed in a highly polished rolex replica uk titanium case, a seldom-used material by Patek. Making the design even more distinct compared to other 5004 versions (and other Patek watches in general) is a dial made of solid replica watches uk gold, hand-engraved with a checkered pattern.

    ReplyDelete