Smart Python scientific libraries
I've been spending some time this week learning how Pandas really works. It looks great and seems to be widely used. Luckily, the project I'm currently working on heavily uses numpy, which Pandas is based on.
I decided to try replacing some ugly custom code with Pandas awesome indexing and merging functionality. In short, it's was possible, fast, and all around great.
Now, it's obvious the folks behind Pandas are really smart. However, the purpose of this post was really to point out an awesome little detail that you might not notice at first when mixing Pandas and numpy code.
>>> import pandas >>> import numpy >>> x = numpy.arange(0, 5) >>> s = pandas.Series(x) >>> x array([0, 1, 2, 3, 4]) >>> s 0 0 1 1 2 2 3 3 4 4 >>> s 0 >>> s = 10 >>> x array([10, 1, 2, 3, 4])
See what happened there? Pandas is smart enough to not copy the data. As I mentioned before, Pandas is based heavily on numpy. So, the code is smart enough to know that copying the data is wasteful.
This might be obvious to most people, but it's a great little detail. This makes it trivial to take existing numpy code, create a Pandas Series from it and do advanced indexing, group-by, or even plotting. All of this is possible without having to switch your entire code base from ndarrays to Series or pay the penalty of copying potentially massive datasets again.
Sometimes the details matter.
Also, I wanted to point out the folks behind Pandas
aren't just code smart. They are super nice and helpful too! I ran into some
issues with using the
reindex_like() method and quickly received a few
from their developers.
In addition, I've found some really helpful community members over at Stackoverflow that helped me work through an efficiency issue.
This all just goes to show that the scientific Python community is alive and well. Don't be afraid to ask them questions or try out something new!
Published: 11-29-2012 21:10:12