Sunday, November 14, 2010

whY kernel development is not for the light-hearted

The linux kernel is a huge behemoth

[pankaj@pankajlaptop pysph-perf]$ su -

Password: 

[root@pankajlaptop ~]# yum --enablerepo fedora-debuginfo --enablerepo updates-debuginfo install kernel-debuginfo

Loaded plugins: auto-update-debuginfo, langpacks, presto, refresh-packagekit

Adding en_US to language list

Found 1 installed debuginfo package(s)

Enabling rpmfusion-nonfree-debuginfo: RPM Fusion for Fedora 14 - Nonfree - Debug

Enabling rpmfusion-free-updates-debuginfo: RPM Fusion for Fedora 14 - Free - Updates Debug

Enabling rpmfusion-free-debuginfo: RPM Fusion for Fedora 14 - Free - Debug

Enabling rpmfusion-nonfree-updates-debuginfo: RPM Fusion for Fedora 14 - Nonfree - Updates Debug

Setting up Install Process

Resolving Dependencies

--> Running transaction check

---> Package kernel-debuginfo.x86_64 0:2.6.35.6-48.fc14 set to be installed

--> Processing Dependency: kernel-debuginfo-common-x86_64 = 2.6.35.6-48.fc14 for package: kernel-debuginfo-2.6.35.6-48.fc14.x86_64

--> Running transaction check

---> Package kernel-debuginfo-common-x86_64.x86_64 0:2.6.35.6-48.fc14 set to be installed

--> Finished Dependency Resolution

Dependencies Resolved

=============================================================================================================================================================

 Package                                           Arch                      Version                              Repository                            Size

=============================================================================================================================================================

Installing:

 kernel-debuginfo                                  x86_64                    2.6.35.6-48.fc14                     updates-debuginfo                    239 M

Installing for dependencies:

 kernel-debuginfo-common-x86_64                    x86_64                    2.6.35.6-48.fc14                     updates-debuginfo                     37 M

Transaction Summary

=============================================================================================================================================================

Install       2 Package(s)

Total download size: 276 M

Installed size: 1.6 G

Is this ok [y/N]: y

Downloading Packages:

Setting up and reading Presto delta metadata

Processing delta metadata

Package(s) data still to download: 276 M

(1/2): kernel-debuginfo-2.6.35.6-48.fc14.x86_64.rpm                                                                                   | 239 MB     01:11     

(2/2): kernel-debuginfo-common-x86_64-2.6.35.6-48.fc14.x86_64.rpm                                                                     |  37 MB     00:11     

-------------------------------------------------------------------------------------------------------------------------------------------------------------

Total                                                                                                                        3.3 MB/s | 276 MB     01:23     

Running rpm_check_debug

Running Transaction Test

Transaction Test Succeeded

Running Transaction

  Installing     : kernel-debuginfo-common-x86_64-2.6.35.6-48.fc14.x86_64                                                                                1/2 

  Installing     : kernel-debuginfo-2.6.35.6-48.fc14.x86_64                                                                                              2/2 

Installed:

  kernel-debuginfo.x86_64 0:2.6.35.6-48.fc14                                                                                                                 

Dependency Installed:

  kernel-debuginfo-common-x86_64.x86_64 0:2.6.35.6-48.fc14                                                                                                   

Complete!

[root@pankajlaptop ~]# 

Thursday, November 11, 2010

python editors in python: spyder and iep

This post is gonna be about python editors (IDEs if you may call, not but quite so)

If you are looking for IDEs, check out pydev and SPE, these are some of the best ones out there with integrated debugging features. There's also a wing editor which many people say is quite good, but i've never used it

Here i'm gonna list opinions about IEP and Spyder. I'm really interested in both of them and run their repository version.
My main issue is i want a good editor for cython, for which i am willing to get my hands dirty (a bit) and add some features myself (see http://powerpan.blogspot.com/2010/10/cython-functions-coverage-revsited.html), so if any of you can help me or have an opinion please do so.

Common features:
Both are python editors written in python (pyqt)
Both provide code completion support and outline
Both have integrated python shells

Differences:
Pros:

Spyder seems a bit more mature project
Spyder has ipython shell which is more useful than a python shell
IEP has outline support for cython

Cons:
Spyder seems to excessively misuse screen real estate - see this issue (editor areas are quite small)
IEP lacks some features such as tree file browser, shell variables, variable occurances marker, pylint support annotation
IEP is officially only for Python3, though you can surely work with python2 files (Also check http://code.google.com/r/pankaj86-iep-python2/source/browse if you really want to run it in python2 only)

Features both lack:
Graphical Debugger: get the code from pydev/winpdb to provide a graphical debugger
Cython support: I could certainly do well with more cython support. Outlining (iep does that), completion support in cython and goto definition (including into cython file from a python file)
Profiling support: Simply run profiling and put the result data into a table (spreadsheet widget)
Documentation: More documentation is needed for both the projects, especially developer documentation to implement new interesting plugins, such as profiling, debugging etc
Also it'd be cool if tooltips could be added to the editor to show documentation and other information on hovering on words in the editor, as in pydev

Few ideas:
Merge features from IEP to Spyder and vice versa. Add new features to both. Having two different projects is good in a way as it keep diversity and help in bringing new ideas. However that shouldn't mean they are independent without any co-operation. Both should surely work together to bring new features.

NOTE: The content may get outdated as you are reading it

Sunday, October 31, 2010

linux proxy problem revisited

Some time back i posted how to setup a local forwarding proxy in linux so that u do not need to set your proxy password in each and every program that asks, and that anyone and everyone cannot see your passwords by simply typing "echo $http_proxy"
But again it was not so automatic, you still needed to set the proxy to localhost "http://127.0.0.1:3128/"

Now this post is to make that also redundant. No program connecting to http (port 80) (not https) needs to have the proxy set. This is done by a transparent proxy (intercepting proxy), which 3proxy is by default without requiring any specific configuration (squid needs some option to be set in its config file)

Firstly set up the local forwarding proxy as mentioned in my previous post http://powerpan.blogspot.com/2010/06/linux-proxy-problem.html
Now you need to set an iptables (default linux firewall) policy which will redirect all outgoing traffic to port 80 to the local proxy at port 3128
You also need some way to exclude the proxy itself from being redirected. So do the following:

Create a specific user for 3proxy:
Create a user on your computer (say 3proxy) with a specific uid (say 480)
# useradd -u 480 3proxy
Now in the 3proxy config file set it to change its user to 3proxy. Just before the "proxy" line in /etc/3proxy.cfg add the following line:
setuid 480
and restart the 3proxy service:
# service 3proxy restart

Redirect outgoing http traffic to local proxy
Here's an iptables rule that forwards outgoing traffic on port 80 (excluding those from user 3proxy) to the local proxy.
# iptables -t nat -A OUTPUT -p tcp -m owner ! --uid-owner 3proxy --dport 80 -j REDIRECT --to-port 3128

Thats all. Now you are done. To make it persistent across reboots add it to some startup file (/etc/profile.d/).

If you are on Fedora there's a better way:
Create a file (/etc/iptables-transparent-proxy) with the following line:
-A OUTPUT -p tcp -m owner ! --uid-owner 3proxy -m tcp --dport 80 -j REDIRECT --to-ports 3128
Now open system -> config -> firewall and in the Custom Rules (bottommost filter on the left) add a new rule with protocol:ipv4, table:nat and file:/etc/iptables-transparent-proxy and you are done.

To test it, open a terminal, unset http_proxy and wget google.com. The index.html file should be downloaded

NOTE:

This automatic forwarding does not work for https sites which are specifically designed to prevent such things (man-in-the-middle attacks)

Outgoing http traffic to any port other than 80 is not redirected

Saturday, October 30, 2010

cython functions coverage revsited

A few days back i showed how to get coverage of cython functions using Ned Batchelder's coverage.py
In that post i had posted a patch to coveragepy to enable cython function coverage.

However i realize that many of my friends have never applied a patch before, dont have admin rights on few machines few other problems which may hinder their using this new feature, so here's a good new for you.

I've rewritten the patch into a single file pyx_coverage.py . You can directly use this file instead of 'coverage' command to get cython function coverage, no need to patch anything. You still need to have Ned's coveragepy installed though.

All commands/options/configuration files for coveragepy are applicable here too.

To find coverage of cython files (pyx extension) you need to do following:
1. compile cython code to 'c' with directive profile=True
2. keep source pyx files in same locations as the compiled .so files
i.e. use 'python setup.py build_ext --inplace' or 'python setup.py develop'
3. run coverage (this file) with the option timid enabled (can also set in in .coveragerc)
i.e. 'python pyx_coverage.py run --timid my_module.py'

You can use nose test collector as follows:
$ python pyx_coverage.py run /path/to/nosetests /path/to/source

replacing the /paths as appropriate

Download the file from here: https://sites.google.com/site/pankaj86/files/pyx_coverage.py

Reader's Bonus: If you can help me write a python c extension for this treat assured.
Hint: See Ned's coverage.tracer python c extension.

Monday, October 25, 2010

cython functions coverage using coverage.py

For all the coders out there, if you have not been writing unit tests for your code then god bless you, but if you do write tests here's another tool you must use: code coverage.
In python, on of the most popular code coverage tools is Ned Batchelder's coverage.py. It reports statement coverage of all your tests and also can report coverage in beautiful html pages.Its a very nice tool and also integrates well with testing frameworks such as nose to automate your testing and coverage reporting tasks.

But, for all those who use cython, you must surely be aware of the difficulties it brings along while testing, you can never be sure if "all is well". coverage.py doesn't report coverage of cython modules, as those are compiled into native functions.
To mitigate this problem to some extent, i wrote a simple patch to enable coverage.py to report function coverage (not statement coverage) of cython pyx files too, hurray. So now after applying the patch to coverage, all you need to do is:

compile cython code with profiling enabled (cython --directive profile=True)
run your tests under coverage as you would normally do taking care to add the timid option (coverage --timid test.py)
???
profit

So now that you can profit, i'd be greatly thankful if someone comes up and writes this patch into the tracer.c file in the coverage source (it's very simple, get the source using $ hg clone http://bitbucket.org/ned/coveragepy)

Tuesday, July 13, 2010

Nautilus media tag editor extension by yours truly

Over this weekend, i was searching for something to do (not that i dont have tasks piled up), and i remembered there was no easy way to edit audio files tags. I know there are some excellent programs out there such as easytag and exfalso . However they need you to run a separate program to edit the tags.
Also there is a nautilus extension provided by totem that displays the metadata in the file properties dialog, but cannot edit it. I thought i may be a good idea to make a metadata editor extension for nautilus, so thats what i came up with.

Installation:
The extension requires that exfalso be installed on your computer (since it directly uses the exfalso metadata editor). On fedora, its a simple command:

$ yum install quodlibet

The extension itself is a single python file which you need to put into a directory ~/.nautilus/python-extensions/ (create the directory if it does not exist). The file is available at: https://sites.google.com/site/pankaj86/files/media_tag_editor.py.

Now the mandatory screenshot:

Monday, July 12, 2010

benchmarking pitfalls

In this post i'm going to list some of the pitfalls which can happen when you are trying to optimize your code and test the timings of specific code snippets.

As an example, consider the case of testing the performance of custom array class implemented in pysph at http://code.google.com/p/pysph/source/browse/source/pysph/base/carray.pyx?repo=kunalp-alternate , which is a simple substitute for 1D numpy arrays.

Now in orfer to test the performace i wrote a simple benchmark code:

cpdef dict loopget(ns=Ns):

    cdef double t, t1, t2, num

    cdef int N, i

    cdef dict ret = {}

    empty = numpy.empty

    zeros = numpy.zeros

    cdef DoubleArray carr

    cdef numpy.ndarray[ndim=1, dtype=numpy.float64_t] narr

    for N in ns:

        carr = DoubleArray(N)

        t = time()

        for i in range(N):

            num = carr[i]

        t = time()-t

        narr = zeros(N)

        t1 = time()

        for i in range(N):

            num = narr[i]

        t1 = time()-t1

        t2 = time()

        for i in range(N):

            num = carr.data[i]

        t2 = time()-t2

        ret['carr loopget %d'%N] = t/N

        ret['carrd loopget %d'%N] = t2/N

        ret['narr loopget %d'%N] = t1/N

    return ret

This snippet simply times the retrieval of a value from a custom DoubleArray class (using its data attribute which is a c array) versus a numpy buffer, both should ideally be at c speed, but the numpy buffer is not unless you disable cython array bounds check.
Now if you run it you would be surprized to see the timings:

carr loopget 100                             9.05990600586e-08
carr loopget 1000                            3.38554382324e-08
carr loopget 10000                           3.42130661011e-08
carr loopget 100000                          3.51309776306e-08
carrd loopget 100                            9.53674316406e-09
carrd loopget 1000                           9.53674316406e-10
carrd loopget 10000                          9.53674316406e-11
carrd loopget 100000                         9.53674316406e-12
narr loopget 100                             9.53674316406e-09
narr loopget 1000                            1.90734863281e-09
narr loopget 10000                           1.09672546387e-09
narr loopget 100000                          1.01089477539e-09

Strangely, getting the value in the c data array is extremely fast, and in fact takes the same time independent of the size of the array.
My first thought was that the access time in C was extremely small as compared to the time it took to call the python's time function. However my readings about gcc and compiler optimizations came to my mind.
Trick: Note that the assignment which is tested in the code snippet does not affect any other part of the code, and the variable num is never even read again. Hence the compiler optimizes it away (this technique is called dead code removal). Thus in the case of C array access the assignments do not occur at all. This does not happen for the other two parts because in calling python functions the compiler can never be sure what is done of the variables, and hence cannot reliably determine whether the assignment has any side-effect or not, so that the assignment is not removed while compilation.

Keeping this fact in mind, let us try to modify the test code so that this specific optimization does not take place. Consider our new test code:

cpdef dict loopget(ns=Ns):
    cdef double t, t1, t2, num
    cdef int N, i
    cdef dict ret = {}
    empty = numpy.empty
    zeros = numpy.zeros
    cdef dict d = {}

    cdef DoubleArray carr
    cdef numpy.ndarray[ndim=1, dtype=numpy.float64_t] narr

    for N in ns:
        carr = DoubleArray(N)
        t = time()
        for i in range(N):
            num = carr[i]
        t = time()-t
        d[num] = num
        narr = zeros(N)
        t1 = time()
        for i in range(N):
            num = narr[i]
        t1 = time()-t1
        d[num] = num
        t2 = time()
        for i in range(N):
            num = carr.data[i]
        t2 = time()-t2
        d[num] = num
        ret['carr loopget %d'%N] = t/N
        ret['carrd loopget %d'%N] = t2/N
        ret['narr loopget %d'%N] = t1/N
    return ret

The purpose of these added statements is to make sure that the assignment to num is not useless and that the compiler does not optimize it away. Since the new statements occur outside the time() calls it shouldn't affect our tests.
Let us now check the new timings:

carr loopget 100                             1.19209289551e-07
carr loopget 1000                            4.19616699219e-08
carr loopget 10000                           4.52041625977e-08
carr loopget 100000                          4.62603569031e-08
carrd loopget 100                            9.53674316406e-09
carrd loopget 1000                           3.09944152832e-09
carrd loopget 10000                          1.31130218506e-09
carrd loopget 100000                         1.07049942017e-09
narr loopget 100                             2.14576721191e-08
narr loopget 1000                            2.86102294922e-09
narr loopget 10000                           1.69277191162e-09
narr loopget 100000                          2.29835510254e-09

As you can see now the times are more reasonable.
Conclusion: Timing testing code is not a trivial thing to do :)

Friday, July 9, 2010

nearest particle search

Nearest neighbour particle search (NNPS) is a common requirement of (meshfreee) particle methods, such as SPH. The requirement is to locate all particles within a fixed distance (the kernel support) of a specified particle, and the trick is to avoid doing brute-force distance comparison of every particle with every other particle (O(N^2)). There are many techniques available to implement this. One of the simplest for a fixed kernel support of all particles is to bin the particles and then search for the particles only in the neighbouring bins. Such a technique is implemented here: http://code.google.com/p/pysph/source/browse/source/pysph/base/nnps.pyx?repo=kunalp-alternate.
Here i'm gonna present some timings for the nnps. Note that the timings are old and also include some constant extra times for other operations (calling of rand() numpy function which i've now converted to the c rand() function).

Here are the timings result (Click on image to view the raw data sheet) :

As you can see, it shows that the bin size should be atleast thrice the kernel support size to get good performance.

Wednesday, July 7, 2010

aero nebula cluster

For those who do not know, i'm currently in my last year of the DD program in Aerospace engineering, and my project is implementation of solid mechanics code using SPH (smoothed particle hydrodynamics) integrated into the pysph project.
It pysph is basically a SPH implementation framework written in python/cython. (Now you know my reason for all those optimization posts :) ).
Now for most CFD codes, you need to run them in parallel on clusters so as to reduce the time required. So i just saw the specs of the nebula cluster (on which i have login) in aero department. Its really wonderful. The specs are:
20 nodes (15 working) each node with 12 six-core AMD opteron 2427 processors with 2.2 GHz xloxk speed and 12 GB RAM, in all 180 6-core processors.
This is sure gonna make parallelizing much more fun and interesting.
PS: I just rad and saw quite a few videos from google about their patented map-reduce technique. It would be interesting to implement SPH in map-reduce and let it run in the "cloud", the buzzword of today.

Thursday, June 17, 2010

the linux proxy problem

For those of you who use linux for anything more than web browsing (in university/office) must be aware of the problems a proxy can pose. In many places as in my institute, you need to necessarily use a specified proxy server to access outside world, needing authentication for your credentials.
In my college, a common login registered in a central ldap server provides for all authentication services (used for course registration/fees payments/emails/proxy/...). Hence it is very important to protect it. Here i will show one way to avoid anyone easily getting your password.

Network proxy loophole in GNOME:
If you are using GNOME (default Fedora/Ubuntu) and you set your proxy details in "system->preferences->network proxy" then you open a simple loophole in the settings.
After setting your username/password, open a new terminal and type
    echo $http_proxy
Now you can clearly see your password as
        http://<user>:<pass>@proxy.com:3128/
Now since many people come to your rooms in colleges you can see how simple it is to get your credentials.

Is there a way out:
There may be other ways, but here's the one which i follow. I create a local forwarding proxy server on my own computer and direct all applications to use that proxy. The settings for my proxy server are written in a file only readable by the root.
What follows is a step-by-step guide to set it up. Tested on Fedora

What do i use:
I use a small proxy server 3proxy, you could also use any other proxy server such as squid. In fact i used to use squid before i came to know of 3proxy (when it was packaged in fedora). Squid is a much more feature rich and heavy proxy. When i was using it had a bug whereby it would do at least 100 cpu wakeups per second, using precious power on my laptop. This may have been fixed by now.

Installation:
On Fedora systems you can do
    yum install 3proxy
A similar command for apt-get may work on Ubuntu (i've never tried)

Configuration:
The configuration you need to do is

Open the file /etc/3proxy.cfg in editor of your choice as root
Locate the line containing 'proxy -n'
Above this line, upto the line 'dnspr', comment out all uncommented lines and instead add the following lines:

auth iponly
allow * * 127.0.0.0/24,<local_IPs> * * * *
allow * * * * * * *
parent 1000 http <proxy.server.com> <port> <proxy_user> <proxy_pass>
proxy -n

The values in angle brackets need to be replaced by you configuration The values for my college are given in parenthesis
<local IPs> = ips not connected through proxy [10.0.0.0/8]
<proxy.server.name> = proxy server [netmon.iitb.ac.in]
<port> = proxy port [80]
<proxy_user> = proxy authentication username
<proxy_pass> = proxy authentication password
Comment out all lines with the content:

socks
pop3p
ftppr
admin
dnspr
tcppm
udppm
Save the file
as root run (this will make the file only readable by root user)
chmod o-rwx /etc/3proxy.cfg
chkconfig 3proxy on
??
profit

The details of the 3proxy.cfg file are documented at http://www.3proxy.ru/doc/html/man3/3proxy.cfg.3.html

Now in whichever application you need to set the proxy server, set it as

http://127.0.0.1:3128/

without any authentication.
Thats it, now only root knows your ldap password, and no one else can snoop

EDIT:
If you automatically want to set the proxy environment variable of the whole system, then you can create a file /etc/profile.d/proxy.sh with the following content

export http_proxy=http://127.0.0.1:3128/
export https_proxy=$http_proxy
export ftp_proxy=$http_proxy

Many (not all) programs on linux use these environment variables to get proxy settings.

EDIT2 :
To set multiple proxies (different hosts go through different proxies) you can do something like below (see 3proxy.cfg manual for much more detail and many other options):

# direct connection allow * 127.0.0.1 127.0.0.0/24,<local_IPs> * * # through proxy1 allow * * <hosts_thru_proxy1> * * parent 1000 http <proxy1.server.com> <port> <proxy_user> # through proxy2 allow * * <hosts_thru_proxy2> * * parent 1000 http <proxy2.server.com> <port> <proxy_user> # through proxy3 allow * * <hosts_thru_proxy3> * * parent 1000 http <proxy3.server.com> <port> <proxy_user> allow * * * * * proxy -n

Sunday, May 30, 2010

research made simple with zotero

If you are into study/research of any kind (academic/non-academic) which involves reading up things and keeping track of them then you are in for a great productivity boost. This will help if you are reading books/news/articles/wikipedia/journals or any such sort of thing. The too I'm talking about is zotero
With zotero you can save proper bibliographic references of lots of material you see on the internet and manage/search/cite them in various forms. Its really difficult to describe all the wonderful things zotero can do for your research, so it'll be really good for you if you watch the screencast http://www.zotero.org/support/screencast_tutorials/zotero_tour

Some features you'll find helpful:
Collect:

Single click saving of references. For example single click on any sciencedirect article, if you have subscribed (as in my college), a single click will save all information about the article, including the pdf (with well thought name instead of fulltext.pdf) if its available.
To enable saving pdf select in Zotero Preferences->General tab -> automatically attach associated pdfs and other files when saving.
In search tab in preferences, you may also want to enable indexing of pdfs if you need.
Clicking on sites with references to lots of articles (wikipedia references, cited by in Scopus etc), you can easily select all the references you need to save

Manage:

You can search all your saved articles, add notes, tags etc
You can group all articles in collections based on topic
You can create saved searches based on various criteria

Cite:

To cite an article(s) simply select them and right click to 'create bibliography from selectd articles' and choose a format style from the many available (including all popular journals) and you are done
If you are using bibtex to manage bibliographies for your article then select the articles and right click to do 'export selected items' and select bibtex format
Zotero plugins are available for Openoffice and MS Office too, so you can easily insert the references in your articles, without the pain of collecting anf formatting

If you work in team then this is a really wonderful feature. Create a simple login on the zotero server (you can also use openid)
In zotero preferences->sync tab enter your zotero login details and enable sync my library and group library.
All synced items (including attached pdfs) are available on the internet anywhere without even installing zotero addon. You just need to login to zotero and see your collection. This is very useful if your college has access to some journals but when you are somewhere else in a conference and you need to check and article. 100 MB space is freely available and you can buy even more.
'My library' is your personal collection. Group libraries are shared collections, which can be shared with other people you are working with.

So what are you waiting for, install it now. If you did not install it yet, then you need to watch the screencast http://www.zotero.org/support/screencast_tutorials/zotero_tour now

Saturday, May 29, 2010

numpy array performance / divide and conquer considered harmful

This is again a post about python code speed, the data and inference are more than a few months old but still valid.
Here's a spreadsheet showing speed of array math operations (+, -, *, /) between numpy arrays and python lists.
Check this spreadsheet to see the timings of various operations
https://spreadsheets.google.com/ccc?key=0AomYDYyBBNkkdHAtMkdHMF9TZ29lMmZQV3UwYkxWNFE&hl=en

The operations I considered for comparison were:

x+0.1
x-0.1
x*0.1
x/0.1
x*(1/0.1)
x+y
x-y
x*y
x/y
[p+yp[j] for j,p in enumerate(xp)]
[xp[j]+yp[j] for j in xrange(i)]

where x and y are numpy arrays, xp and yp are python lists, all of size N which is varied for the comparison.
The raw timings data is available here:

https://spreadsheets.google.com/pub?key=0AomYDYyBBNkkdHAtMkdHMF9TZ29lMmZQV3UwYkxWNFE&hl=en&output=html

See the timings plot yourself

Conclusion:

Use numpy arrays for size > 10
Avoid division as much as you can to improve the speed of your numerical codes
Instead of x/0.1 do x*(1/0.1) . This itself causes large speedup as N is increased.
x/0.1 and x/y take almost the same time
+, -, * take almost same time, / takes much more time, and its expense increases as N is increased.
Once again, do not divide.
The same thing is valid in cython code also. Avoid division even in cython code, and even if you are using double instead of numpy arrays (buffer). Rewrite expressions to minimize the usage of division operator.

Wednesday, May 26, 2010

cython timings test

The TASK : To optimize cython functions

Detailed: functions which depend on a once initialized attribute value

This often comes handy in many cases, for example to write a Laplacian function of a scalar field in spherical/axisymmetric coordinate system, you would need three independent cases for 1,2,3 dimensions for performance purposes and if u do not write all functions as general 3D functions.

The test CODE : test_kernel.pyx

cdef class Kernel:
    cdef int dim
    cdef double (*func)(Kernel,double)
    def __init__(self, dim=1):
        self.dim = dim
        if dim == 1:
            self.func = self.func1
        elif dim == 2:
            self.func = self.func2

    cdef double func1(self, double x):
        return 1+x

    cdef double func2(self, double x):
        return 2+x

    cdef double c_func(self, double x):
        '''this is only to make function signature compatible with func1 and func2'''
        return self.func(self, x)

    def p_func(self, double x):
        return self.func(self, x)

    cpdef double py_func(self, double x):
        return self.func(self, x)

    cpdef double py_c_func(self, double x):
        return self.c_func(x)

    def py_func1(self, x):
        return self.func1(x)

    def py_func2(self, x):
        return self.func2(x)

    cdef double func_common(self, double x):
        cdef int dim = self.dim
        if dim == 1:
            return 10+x
        elif dim == 2:
            return 20+x

    def py_func_c_common(self, x):
        return self.func_common(x)

    cpdef double py_func_common(self, double x):
        cdef int dim = self.dim
        if dim == 1:
            return 10+x
        elif dim == 2:
            return 20+x

Compilation command:
   cython -a test_kernel.pyx;
   gcc <optimization-flag> -shared -fPIC test_kernel.c -lpython2.6 -I /usr/include/python2.6/ -o test_kernel.so
where optimization flag is either empty or "-O2" or "-O3"

Cython optimization
Tip 1:
Type (cdef) as many variables as you can. You also need to type the locals in each function. Try to try to use C data types wherever possible.
Tip 2:
use:
   cython -a file.pyx
command to generate a html file which shows lines which cause expensive python functions to be called. Clicking on a line shows the corresponding C code generated, highlighting expensive calls in shades of red. Try to eliminate as many such calls as you can.

The TEST :

time_kernel.py

import timeit

def time(s):

'''returns time in microseconds'''

t = 1e6*timeit.timeit(s,'import test_kernel;k1=test_kernel.Kernel(1);k2=test_kernel.Kernel(2);',number=1000000)/1000000.

print s, t

return t

time('k1.p_func(0)')
time('k1.py_func(0)')
time('k1.py_func1(0)')
time('k1.py_c_func(0)')
time('k1.py_func_c_common(0)')
time('k1.py_func_common(0)')

time('k2.p_func(0)')
time('k2.py_func(0)')
time('k2.py_func2(0)')
time('k2.py_c_func(0)')
time('k2.py_func_c_common(0)')
time('k2.py_func_common(0)')

Timings :

	function	time (μs)					(ns)
	Optimization flag ->	None	-O2	-O3	sum	(k1+k2)/2	penalty
1	k1.p_func(0)	0.20178	0.18321	0.18035	0.18845	0.19368	0.0000
2	k1.py_func(0)	0.23224	0.18599	0.18393	0.20072	0.19541	1.7345
3	k1.py_func1(0)	0.21477	0.18991	0.19252	0.19907	0.19802	4.3456
4	k1.py_c_func(0)	0.23395	0.19196	0.19243	0.20611	0.19761	3.9311
5	k1.py_func_c_common(0)	0.19566	0.18458	0.19062	0.19029	0.19767	3.9960
6	k1.py_func_common(0)	0.21981	0.18707	0.18984	0.19891	0.19510	1.4237
7	k2.p_func(0)	0.20448	0.18388	0.18194	0.19010
8	k2.py_func(0)	0.21798	0.18859	0.18437	0.19698
9	k2.py_func2(0)	0.20413	0.18124	0.18194	0.18910
10	k2.py_c_func(0)	0.23114	0.19166	0.19238	0.20506
11	k2.py_func_c_common(0)	0.19860	0.18783	0.18745	0.19129
12	k2.py_func_common(0)	0.21609	0.18747	0.18640	0.19666
	Average	0.21560	0.18681	0.18703	0.19648

Result :

The best is to write separate C function and a python accessor function.

task	function	penalty cost (ns)
C function + python accessor : base case	p_func
cpdef instead of def	py_func	1.7345
calling a cdef class method instead of a function pointer attribute	py_func1,py_func2	4.3456
one extra c function call	py_c_func	3.9311
(def + cdef) instead of (cpdef)	py_func_c_common-py_func_common	2.5723
One C comparison vs one C function call	py_func_common	1.4237

Conclusion :

As can be clearly seen that the results are clearly inconclusive :)
This was a small test carried on my laptop with no controlled environment. Also thought the results seemed close to repeatable, nevertheless many trials should be conduction and each value should have a standard deviation also to check the repeatability. However one clear conclusion is do not forget to add optimization flags. Setuptools already does that for you.
Also using a function pointer is not so bad after all. It would become more advantageous in case of more number of comparisons.
Cython provides great speedups (who didn't know that :) ). The pure python version of py_func_common took 0.408μs for dim=1 and 0.518μs for dim=2
These results are purely from python point of view. The effect of cdef/cpdef should also be considered in c/cython code which calls these functions.

CAVEAT:

I am no optimization expert. I have done this out of out of sheer boredom :)
If anyone wants to verify, you are welcome
Any information content is purely coincindental

Tuesday, April 27, 2010

Tracing python programs

Coming with the easier python debugging enabled by the new gdb with python hooks is another awesome python feature coming in the new Fedora "Goddard" 13 release. That is tracing of python processes and their function calls. This feature is developed on top of systemtap, the linux analogue of Sun's awesome Dtrace system tracer.
So what does it mean? For the uninitiated, it implements hooks (tracepoints) in the python main shared libraries (libpython.so and libpython3.so) so that systemtap can trace whenever a python function is entered/exited in any python process on the system. This means you can anytime check a python process to see which functions are being called and how many times etc. This has really cool uses. More information about this feature is available at https://fedoraproject.org/wiki/Features/SystemtapStaticProbes#Python_2

Just to illustrate the use try the following examples (from the above link)
First install python-debuginfo. Now add yourself to stapdev and stapusr groups or run the following command as root:
$ stap /usr/share/doc/python3-libs-3.1.2/pyfuntop.stp
This will display a top like output on the terminal showing the python functions which are called by all running processes and the number of times its being called. Its fun to watch, just run a python program and check all the functions being called :)
Here's a sample output from my laptop

PID                                                                         FILENAME   LINE                       FUNCTION  CALLS
 15479                                 /usr/lib/python2.6/site-packages/yum/packages.py    261                         verCMP  15768
 15479                                 /usr/lib/python2.6/site-packages/yum/packages.py    270                        __cmp__  15767
 15479                             /usr/lib/python2.6/site-packages/rpmUtils/updates.py    129                   returnNewest   9045
 15479                           /usr/lib/python2.6/site-packages/rpmUtils/miscutils.py     36                     compareEVR   1191
 15479                                 /usr/lib/python2.6/site-packages/yum/packages.py     48                   comparePoEVR    578
 15479                                 /usr/lib/python2.6/site-packages/yum/packages.py    296                          verEQ    556
 15479                                 /usr/lib/python2.6/site-packages/yum/packages.py     55                 comparePoEVREQ    556
 15479                                 /usr/lib/python2.6/site-packages/yum/__init__.py    778                             2
 15479                                 /usr/lib/python2.6/site-packages/yum/__init__.py    206                     _getConfig      2
 15479                                   /usr/lib/python2.6/site-packages/yum/config.py     69                        __get__      2
 15479                                         /usr/lib64/python2.6/logging/__init__.py   1026                          debug      1
 15479                                         /usr/lib64/python2.6/logging/__init__.py   1236                   isEnabledFor      1
 15479                                         /usr/lib64/python2.6/logging/__init__.py   1222              getEffectiveLevel      1
 15479                             /usr/lib/python2.6/site-packages/rpmUtils/updates.py    272                      doUpdates      1

This shows the functions called during a 1 second interval (the script updates the display every second) by checking of available updates by packagekit.
Another system script displays the python function call hierarchy of any program you run. Try this by running
$ stap -v /usr/share/doc/python-libs-2.6.4/systemtap-example.stp -c python
Now you will get a python terminal after a long hierarchy of function calls. Here you can see all python functions called for each line you enter on the python terminal. Its not as much fun, but useful if you want to check where all those extra unneeded function calls are being made.
Read a short writeup from the developer of these features at http://fedoraproject.org/wiki/Python_in_Fedora_13 and also check http://press.redhat.com/2010/04/27/fedora-13-spotlight-feature-exploring-new-frontiers-of-python-development/

Saturday, April 24, 2010

hard disk speed and os partition

I knew this for quite some time, but now i have experimental evidence also that installing your os and keeping your home partition near the front of the hard disk results in more responsive computer. This is because conventional hard disks are rotary which means the read data faster from the outer sectors than from the inner sectors. So the next time you install your os keep it near the front partition.
Here's a screenshot of the palimpsest utility (Applications -> system tools -> disk utility in fedora, package gnome-disk-utility) showing the read only benchmark speed and access (seek) times. As you can see the initial part has nearly double the speed than the last part. Seek times don't show any such obvious relation.

Friday, April 23, 2010

Easier cython/python/c debugging with new GDB

We all know how debugging is an dreaded integral part of every programmer's work. It can also be fun sometimes depending on the time to deadline, complexity of the bug and time already spent.
So if anyone is still left who does assignments or other programs without debugging (using print statements etc) then please consider learning it. Else you are simply increasing your work and frustration.
For debugging in any programming language my advise would be to use the eclipse debugger gui which provides all the standard features present in any debugger and integration with java, c/c++, python and a host of other languages.

This post was not about plain debugging. Its about the new features in GDB 7 (the GNU debugger) which enables writing pretty printers in python. More information can be had from the net. However it means a much easier debugging experience with cython. The good folks at Fedora have written some cool scripts to integration python scripting capability of gdb to enable easier cython debugging.
Check out the awesomeness at https://fedoraproject.org/wiki/Features/EasierPythonDebugging
In short now you can do the following easily with the new gdb

automatically display python frame information in PyEval_EvalFrameEx in gdb backtraces, including in ABRT:

python source file, line number, and function names
values of locals, if available

name of function for wrapped C functions

This gonna make my life easier, especially since my DD project is in cyton/python.

Also not to forget the uber cool features it could enable not only for python developers but for all. As an example check out the blog at http://labs.trolltech.com/blogs/2010/04/22/peek-and-poke-vol-3/

Thursday, March 4, 2010

funny slashdot comment about windows

Are you saying that this linux can run on a computer without windows underneath it, at all ? As in, without a boot disk, without any drivers, and without any services ? That sounds preposterous to me. If it were true (and I doubt it), then companies would be selling computers without a windows. This clearly is not happening, so there must be some error in your calculations. I hope you realise that windows is more than just Office ? Its a whole system that runs the computer from start to finish, and that is a very difficult thing to acheive. A lot of people dont realise this. Microsoft just spent $9 billion and many years to create Vista, so it does not sound reasonable that some new alternative could just snap into existence overnight like that. It would take billions of dollars and a massive effort to achieve. IBM tried, and spent a huge amount of money developing OS/2 but could never keep up with Windows. Apple tried to create their own system for years, but finally gave up recently and moved to Intel and Microsoft. Its just not possible that a freeware like the Linux could be extended to the point where it runs the entire computer fron start to finish, without using some of the more critical parts of windows. Not possible. I think you need to re-examine your assumptions.

Saturday, February 27, 2010

Increasing coursework

I just realized that the coursework in my college increases at a very fast pace as the years go by.
Here's the proof:
my directory sizes of the data in each year's courseware. 4th years data is as yet incomplete as mid-terms are just over and half a sem is still remaining. (y* are the year's coursework directories)

[pankaj@localhost courseware]$ du -sh y*
158M    y1
322M    y2
1.7G    y3
2.2G    y4

UPDATE: The score is 6.9 GB for y4 at the end of the year, most of it is a single course, with hundreds of MB of datasets and simulation videos

real programmers

i just found a funny old article about programming. I'm glad i'm not a 'real' programmer (just for your information, i'm a complex programmer)
check the link http://www.pbm.com/~lindahl/real.programmers.html

Tuesday, February 23, 2010

my academic schedule

Now you know how stupid my lectures are scheduled.
Legend:
Blue : Moodle calendar (academic assignments, submissions etc)
Green : Indian holidays
Yellow : My edited calendar for lecture schedules and other academic things :)