The TASK : To optimize cython functions
Detailed: functions which depend on a once initialized attribute value
This often comes handy in many cases, for example to write a Laplacian function of a scalar field in spherical/axisymmetric coordinate system, you would need three independent cases for 1,2,3 dimensions for performance purposes and if u do not write all functions as general 3D functions.The test CODE : test_kernel.pyx
cdef class Kernel: cdef int dim cdef double (*func)(Kernel,double) def __init__(self, dim=1): self.dim = dim if dim == 1: self.func = self.func1 elif dim == 2: self.func = self.func2 cdef double func1(self, double x): return 1+x cdef double func2(self, double x): return 2+x cdef double c_func(self, double x): '''this is only to make function signature compatible with func1 and func2''' return self.func(self, x) def p_func(self, double x): return self.func(self, x) cpdef double py_func(self, double x): return self.func(self, x) cpdef double py_c_func(self, double x): return self.c_func(x) def py_func1(self, x): return self.func1(x) def py_func2(self, x): return self.func2(x) cdef double func_common(self, double x): cdef int dim = self.dim if dim == 1: return 10+x elif dim == 2: return 20+x def py_func_c_common(self, x): return self.func_common(x) cpdef double py_func_common(self, double x): cdef int dim = self.dim if dim == 1: return 10+x elif dim == 2: return 20+x |
Compilation command:
cython -a test_kernel.pyx;
gcc <optimization-flag> -shared -fPIC test_kernel.c -lpython2.6 -I /usr/include/python2.6/ -o test_kernel.so
where optimization flag is either empty or "-O2" or "-O3"
Cython optimization
Tip 1:
Type (cdef) as many variables as you can. You also need to type the locals in each function. Try to try to use C data types wherever possible.
Tip 2:
use:
cython -a file.pyx
command to generate a html file which shows lines which cause expensive python functions to be called. Clicking on a line shows the corresponding C code generated, highlighting expensive calls in shades of red. Try to eliminate as many such calls as you can.
The TEST :
time_kernel.pyimport timeit def time(s): '''returns time in microseconds''' t = 1e6*timeit.timeit(s,'import test_kernel;k1=test_kernel.Kernel(1);k2=test_kernel.Kernel(2);',number=1000000)/1000000. print s, t return t time('k1.p_func(0)') time('k1.py_func(0)') time('k1.py_func1(0)') time('k1.py_c_func(0)') time('k1.py_func_c_common(0)') time('k1.py_func_common(0)') time('k2.p_func(0)') time('k2.py_func(0)') time('k2.py_func2(0)') time('k2.py_c_func(0)') time('k2.py_func_c_common(0)') time('k2.py_func_common(0)') |
Timings :
function | time (μs) | (ns) | |||||
Optimization flag -> | None | -O2 | -O3 | sum | (k1+k2)/2 | penalty | |
1 | k1.p_func(0) | 0.20178 | 0.18321 | 0.18035 | 0.18845 | 0.19368 | 0.0000 |
2 | k1.py_func(0) | 0.23224 | 0.18599 | 0.18393 | 0.20072 | 0.19541 | 1.7345 |
3 | k1.py_func1(0) | 0.21477 | 0.18991 | 0.19252 | 0.19907 | 0.19802 | 4.3456 |
4 | k1.py_c_func(0) | 0.23395 | 0.19196 | 0.19243 | 0.20611 | 0.19761 | 3.9311 |
5 | k1.py_func_c_common(0) | 0.19566 | 0.18458 | 0.19062 | 0.19029 | 0.19767 | 3.9960 |
6 | k1.py_func_common(0) | 0.21981 | 0.18707 | 0.18984 | 0.19891 | 0.19510 | 1.4237 |
7 | k2.p_func(0) | 0.20448 | 0.18388 | 0.18194 | 0.19010 | ||
8 | k2.py_func(0) | 0.21798 | 0.18859 | 0.18437 | 0.19698 | ||
9 | k2.py_func2(0) | 0.20413 | 0.18124 | 0.18194 | 0.18910 | ||
10 | k2.py_c_func(0) | 0.23114 | 0.19166 | 0.19238 | 0.20506 | ||
11 | k2.py_func_c_common(0) | 0.19860 | 0.18783 | 0.18745 | 0.19129 | ||
12 | k2.py_func_common(0) | 0.21609 | 0.18747 | 0.18640 | 0.19666 | ||
Average | 0.21560 | 0.18681 | 0.18703 | 0.19648 |
Result :
task | function | penalty cost (ns) |
C function + python accessor : base case | p_func | |
cpdef instead of def | py_func | 1.7345 |
calling a cdef class method instead of a function pointer attribute | py_func1,py_func2 | 4.3456 |
one extra c function call | py_c_func | 3.9311 |
(def + cdef) instead of (cpdef) | py_func_c_common-py_func_common | 2.5723 |
One C comparison vs one C function call | py_func_common | 1.4237 |
Conclusion :
As can be clearly seen that the results are clearly inconclusive :)This was a small test carried on my laptop with no controlled environment. Also thought the results seemed close to repeatable, nevertheless many trials should be conduction and each value should have a standard deviation also to check the repeatability. However one clear conclusion is do not forget to add optimization flags. Setuptools already does that for you.
Also using a function pointer is not so bad after all. It would become more advantageous in case of more number of comparisons.
Cython provides great speedups (who didn't know that :) ). The pure python version of py_func_common took 0.408μs for dim=1 and 0.518μs for dim=2
These results are purely from python point of view. The effect of cdef/cpdef should also be considered in c/cython code which calls these functions.
CAVEAT:
I am no optimization expert. I have done this out of out of sheer boredom :)If anyone wants to verify, you are welcome
Any information content is purely coincindental
ReplyDeleteOne of those two post-production platinum models did rolex replica sale eventually leave Patek, at the historic "The Art of Patek Philippe" auction in 1989, and was sold privately until it ended up replica watches uk with rock legend and serious watch collector Eric Clapton. It was, and will certainly remain, the only white-metal 2499 in private hands. Since it left Patek, it was available to the public for the first time when it was sold by rolex replica sale Christie's in 2012. The final bid for the 2499/100P exceeded 3.6 million US dollars. Although Patek has stopped producing their Reference 5004, for the 5th Only Watch auction (held in September, 2013) they went on to create replica watches one final and completely unique version, the 5004T. It is housed in a highly polished rolex replica uk titanium case, a seldom-used material by Patek. Making the design even more distinct compared to other 5004 versions (and other Patek watches in general) is a dial made of solid replica watches uk gold, hand-engraved with a checkered pattern.