Saturday, May 29, 2010

numpy array performance / divide and conquer considered harmful

This is again a post about python code speed, the data and inference are more than a few months old but still valid.
Here's a spreadsheet showing speed of array math operations (+, -, *, /) between numpy arrays and python lists.
Check this spreadsheet to see the timings of various operations
https://spreadsheets.google.com/ccc?key=0AomYDYyBBNkkdHAtMkdHMF9TZ29lMmZQV3UwYkxWNFE&hl=en

The operations I considered for comparison were:

  • x+0.1
  • x-0.1
  • x*0.1
  • x/0.1
  • x*(1/0.1)
  • x+y
  • x-y
  • x*y
  • x/y
  • [p+yp[j] for j,p in enumerate(xp)]
  • [xp[j]+yp[j] for j in xrange(i)]
where x and y are numpy arrays, xp and yp are python lists, all of size N which is varied for the comparison.
The raw timings data is available here:
    https://spreadsheets.google.com/pub?key=0AomYDYyBBNkkdHAtMkdHMF9TZ29lMmZQV3UwYkxWNFE&hl=en&output=html

    See the timings plot yourself
    Conclusion:
    • Use numpy arrays for size > 10
    • Avoid division as much as you can to improve the speed of your numerical codes
    • Instead of x/0.1 do x*(1/0.1) . This itself causes large speedup as N is increased.
    • x/0.1 and x/y take almost the same time
    • +, -, * take almost same time, / takes much more time, and its expense increases as N is increased.
    • Once again, do not divide.
    • The same thing is valid in cython code also. Avoid division even in cython code, and even if you are using double instead of numpy arrays (buffer). Rewrite expressions to minimize the usage of division operator.

    No comments:

    Post a Comment