High level URL manipulation using native Python API

While developing my new project I faced the need of manipulating an URL in order to change its query-string.
Basically my goal was to provide a parameter with a default value if not already defined and to add another new one.
The modules I used are urlparse and urllib, and in a few lines of code I achieved my goal in an high level programming fashion (I mean, without regex or low level “hacks”).
So let’s start… the first step is to parse the URL string using urlsplit:

from urlparse import urlsplit

url_data = urlsplit(url_string)

Supposing url_string is a string holding a valid url like “http://www.mysite.com/path/?a=1&b=2″, the urlsplit will return a SplitResult object which is a named tuple.
A named tuple is a subclass of tuple, a class which behaves like it but offer a way to initialize it using pre-defined keywords arguments and to refer them later, so for example is possible to create a named tuple called “CreditCard” in this way:

from collections import namedtuple

CreditCard = namedtuple('CreditCard', 
'number, secure_code, expire, owner')

and use it in this way:

card = CreditCard(number=1234567890, 
                  secure_code=123, 
                  expire='2018/06/06', 
                  owner='Peter Parker')
print '{}\'s card number is: {} and'
      'has this secure code: {}'.format(
          card.owner, card.number, card.secure_code
      )

One cool feature of named tuple is that you can update one of its field without to have to recreate the object by yourself using the method _replace (it will returns a new tuple with the updated value… remember that tuples are IMMUTABLE objects!).
So to change the owner of the previously defined credit card you will do:

card = card._replace(owner='Bruce Wayne')

(to be honest I don’t know why they decided to mark this helpful method as “protected” using the underscore prefix… but this does’n really matter)

ok… you should get it now. Let’s back to SplitResult… the tuple has the following properties (they are all strings objects):

  • scheme (http, https…)
  • netloc (www.mysite.com)
  • path (/path/)
  • query (the query-string)
  • fragment (what comes after the “#” sign)

So, what I needed was to manipulate the query-string, but once parsed out from the original URL it’s just a raw string and to avoid to mess up with string manipulation I used parse_qs, which returns a python dictionary:

from urlparse import parse_qs 

qs_data = parse_qs(url_data.get('query'))

A dictionary is very handy in order to manipulate query-string parameters, so now all I have to do is something like:

if not 'target_parameter' in qs_data:
    qs_data['target_parameter'] = ['tp1']
qs_data['extra_parameter'] = ['ex1']

You may be wondering about value assignment as a list instead of simple strings, well, this is because parse_qs returns a dictionary with keys and values as sequences since a parameter can be supplied with multiple values (ie: “?cat=ACTION&cat=HORROR&cat=COMEDY”).

Now that the query string data has been updated all I have to do is to serialize it back to a simple string and update the original SplitResult.

from urllib import urlencode

url_data._replace(query=urlencode(qs_data, True))

The second argument passed to urlencode (True) tells the function that we are passing sequences as values so it will handle them according.
The new modified url can be now retrieved by calling:

url_data.geturl()

to summing up, this is the full code:

from urllib import urlencode
from urlparse import urlsplit, parse_qs

# parse original string url
url_data = urlsplit(url_string)

# parse original query-string
qs_data = parse_qs(url_data.get('query'))

# manipulate the query-string
if not 'target_parameter' in qs_data:
    qs_data['target_parameter'] = ['tp1']
qs_data['extra_parameter'] = ['ex1']

# get the url with modified query-string
url_data._replace(query=urlencode(qs_data, True)).geturl()

That’s all folks! If you enjoyed this post don’t forget to share it using the buttons below ;)

Python: reading numbers from JSON without loss of precision using Decimal class for data processing

In the project I’m working on, I’m using an external API which returns a JSON response containing conversion rates for currencies. Since I’m dealing with currencies and prices, the precision of numbers plays an important rule in order to calculate values in the application. The good thing about JSON, despite its name is the acronym of “JavaScript Object Notation“, is that it’s a cross-language format, so it’s not limited to the capabilities of a specific language like JavaScript, so numbers in in JSON may have an higher precision than a js float!
This is a quote from wikipedia about JSON numbers (emphasis is mine):

Number — a signed decimal number that may contain a fractional part and may use exponential E notation. JSON does not allow non-numbers like NaN, nor does it make any distinction between integer and floating-point. (Even though JavaScript uses a double-precision floating-point format for all its numeric values, other languages implementing JSON may encode numbers differently)

By default Python’s json module will loads decimal numbers as float, so if we have a JSON like:

{ "number": 1.00000000000000000001 }

the default conversion into python will be {u'number': 1.0} if we just write the following code:

import json

json.loads(json_string)

But, fortunately is dead simple to load numbers in JSON using the decimal module, there is no need to write custom decoders as I saw on the web, it’s just a matter of specify the Decimal class for floats parsing in the loads() function in this way:

import json
from decimal import Decimal

json.loads(json_string, parse_float=Decimal)

In this way the loaded python object will be:
{u'number': Decimal('1.00000000000000000001')}
And we will be able to perform precise arithmetic computations!
It’s also possible to use Decimal even for integer numbers, by specifying parse_int:

json.loads(json_string, 
           parse_int=Decimal, 
           parse_float=Decimal)

Additional reading from official Python docs:

Abstract classes in Python using abc module

Python is a powerful language but it lacks some OOP features which are the foundation of other programming languages like Java.
One of these are abstract classes (which are classes you can’t instantiate but only extends in order to inherit base common methods and to be forced to implement abstract methods representing a common interface).
A common practice that can be found in Python projects is to “mimic” abstract classes/methods by creating a base class and defining a series of methods that raise a NotImplementedError. The official python documentation in fact says:

In user defined base classes, abstract methods should raise this exception when they require derived classes to override the method.

But… with my huge pleasure I discovered that, starting from Python 2.6, a module called abc allows to create “real” abstract classes, methods and properties!!

So, let’s see how to implement a better and effective abstract class in Python by forgetting the old NotImplementedError:

from abc import ABCMeta, abstractmethod

class AbstractAnimal(object):
    __metaclass__ = ABCMeta
    
    @abstractmethod
    def run(self):
        pass

now… if you (stupid idiot) try to instantiate an AbstractAnimal, the Python interpreter will complains saying:

TypeError: Can’t instantiate abstract class AbstractAnimal with abstract methods run

since, now you get it… let’s extend the abstract class with a concrete one:

class Dog(AbstractAnimal):
    pass

but you don’t trust me or simply forget to implement the abstract method (which MUST be implemented since marked as abstract)… and once again the interpreter will complains with:

TypeError: Can’t instantiate abstract class Dog with abstract methods run

(that, to be honest, is a dumb message since the class Dog is not actually abstract, but simply not implementing the required methods… but is however an “understandable” exception)

uh… If you use a cool IDE like PyCharm, it will marks the class as invalid by showing a message in the tooltip: “Dog must implement all abstract methods”!

finally once the method is implemented:

class Dog(AbstractAnimal):
    def run(self):
        print 'running like a dog...'

it’s also possible to define abstract properties using the related decorator @abstracproperty.

So… to recap: say goodbye to NotImplementedError, use abc module to assign ABCMeta to __metaclass__ property of the abstract class you want to implement, and use @abstractmethod and @abstracproperty to provide abstract methods and properties.

Read more about abc module here.

Creating a custom AMI with Postgis and its dependencies in order to deploy Django + GeoDjango on Amazon Elastic Beanstalk

While the installation of PostgreSQL + Postgis on my development machine (my beloved MacBook Pro) has been very easy, thanks to MacPorts, installing the necessary software on Amazon Elastic Beanstalk (in order to move my project Cygora.com from local to the cloud) has been an hard challenge!
Theoretically you can customize an environment by using configuration files in which you can specify packages and other resources to install, but the problem is that in the Amazon 64bit Linux distribution for Python (which is an extremely customized version of Red Hat) you don’t have apt (for which postgis packages are available), instead you have to rely on yum. Is possible to install extra repositories for yum (see here: http://postgis.net/install) in order to easily install postgis… but honestly I have no idea which repository should be the right one for Amazon Linux, so… it’s been painful, but I opted for an “old school” style installation, by downloading and compiling the missing packages by myself. So, after launching my EC2 instance I did connect to it via SSH and:

1. Switch to root user:

sudo su -

2. Update all the installed packages (which Amazon doesn’t update very often!):

yum update -y

3. Install development tools and necessary libraries (some of them, like “graphviz” are not required for GeoDjango and you can aovid their installation if you want… I’m reporting all my libraries as a future reference for myself :P)

yum install -y python-devel libpcap libpcap-devel libnet libnet-devel pcre pcre-devel gcc gcc-c++ libtool make libyaml libyaml-devel binutils libxml2 libxml2-devel zlib zlib-devel file-devel postgresql postgresql-devel postgresql-contrib geoip geoip-devel graphviz graphviz-devel gettext libtiff-devel libjpeg-devel libzip-devel freetype-devel lcms2-devel libwebp-devel tcl-devel tk-devel

4. Download and compile proj:

wget http://download.osgeo.org/proj/proj-4.8.0.zip
unzip proj-4.8.0.zip && cd proj-4.8.0
./configure && make && sudo make install
cd ..

5. Download and compile geos:

wget http://download.osgeo.org/geos/geos-3.4.2.tar.bz2
tar -xvf geos-3.4.2.tar.bz2 && cd geos-3.4.2
./configure && make && sudo make install
cd ..

6. Download and compile gdal (this library is the most SLOW to compile and depending on the type of instance that you have launched it may takes up to a couple of hours… be patient!):

wget http://download.osgeo.org/gdal/1.10.1/gdal1101.zip
unzip gdal1101.zip && cd gdal-1.10.1
./configure --with-python=yes && make && sudo make install
cd ..

7. Download and install postgis:

wget http://download.osgeo.org/postgis/source/postgis-2.1.1.tar.gz
tar -xvf postgis-2.1.1.tar.gz && cd postgis-2.1.1
./configure && make && sudo make install

8. Update installed libraries (this step is necessary to avoid issues related to invalid library paths):

sudo echo /usr/local/lib >> /etc/ld.so.conf
sudo ldconfig

It’s also a nice idea to export the environment variable LD_LIBRARY_PATH (as /usr/local/lib/:$LD_LIBRARY_PATH).
Once you have installed all the necessary software on your machine you can create a custom AMI by going to: EC2 > instances > select your instance > create AMI. To use that AMI as the default one for your application you have to specify its id in your Elastic Beanstalk environment configuration.