Regular Expressions in Python: how to match english and non english letters

Ok, this is a quick (and I hope super-helpful) tip on how to match foreign languages letters like (ö, è…) in a python regex.
As everybody knows, matching letter signs is just a matter of using [a-z] or \w (the latter will also match underscores!) but unfortunately letters with “decorations” are not matched by these selectors. If you want to match them, you have to use unicode selectors (something like [\u00D8-\u00F6]), but python can automatically match all the unicode variants by simply passing the flag re.UNICODE to compile(). So this:

re.compile('[^\W_]', re.IGNORECASE | re.UNICODE)

will match any english and non english letter.
But let me explain… \w matches letters and underscores, \W (note it’s uppercased) as opposite match all but letters and undescores, so [^\W_] will match letters only (thanks to the negation ^).
Bear in mind: the flag re.UNICODE as reported in python docs :

“Makes several escapes like \w, \b, \s and \d dependent on the Unicode character database”

A stupid demonstration:

# -*- coding: utf-8 -*-
import re

ENGLISH_CHARS = re.compile('[^\W_]', re.IGNORECASE)
ALL_CHARS = re.compile('[^\W_]', re.IGNORECASE | re.UNICODE)

assert len(ENGLISH_CHARS.findall('_àÖÎ_')) == 0
assert len(ALL_CHARS.findall('_àÖÎ_')) == 3

ps: not all languages have implemented the unicode flag, for example JavaScript had not …I love Python :)

Frameworks, libraries, dependencies and philosophies

In these days, at the company where I work, we are discussing about which front-end framework to adopt for our projects.
The two candidates are currently AngularJS and Backbone, but the discussion has stimulated my thoughts about libraries and frameworks in general and I want to speak about this topic and about opinions from colleagues and other people working in the IT whose I disagree . (I’m such a badass that sticks to his ideas until someone is able to provide me a far valid, far exhaustive and far more convincing list of motivations than mine).
So, in this post I would like to express my point of view about frameworks, libraries and dependencies in a language-agnostic way.
First of all I want to provide the definitions of “framework” and “library” in order to be sure that you (the reader) will consider these terms as I do. I’m providing them by taking inspiration from a relevant question on StackOverflow: What is the difference between a framework and a library? and by arbitrarily rearranging the responses in that discussion in order to provide a more personal and deeper definition.


A library performs specific, well defined operations like handling http connections, data serialization, cryptography and so on. It’s composed by several classes and/or functions and you (the developer) have the control over it, you decide when and where to call such “utilities” in your code.


A framework is instead far more complex than a library and it offers a lot of different functionalities. It follows the so called “Hollywood principle” (if you love buzz words like many “chatterboxes” do) for which you (the developer) don’t have the full control over it, but instead you have to follow and agree to the concepts and procedures defined by the framework. A framework goes beyond the programming language, it has its architecture, its philosophy and is composed by different layers of code (and often also by different languages and technologies, it could be composed by a mix of Python + JavaScript + HTML + SQL + YAML for example).

So, by comparing AngularJS with Backbone we are comparing two different things, since the first is a complete front-end framework, it offers all that you need to create a JavaScript application (templating, controllers, data binding, localization…) and you don’t have the “low-level” control on the code, since you don’t manually create instances of your controllers neither you render your UI components (it’s the framework that does the work for you when it’s ready), on the other hand Backbone is just a library which provides an abstraction layer on the MVC design pattern, but you have to use several third party libraries to handle dom manipulation, templating, cookies and so on.

Now, I want to report a series of sentences coming from ideas/doubts/observations I heard/read on the web or by colleagues and below each one I will reply with my opinion.

1. Does I need a framework to do X?

If you are a professional developer (or you aim to be) which is implementing something else beyond an “hello world” demo, you should absolutely use some kind of framework to do your job!

2. …but I don’t want to use a framework, I prefer to write my code from scratch

Writing code from scratch is an useless waste of time and effort! Why should you re-invent the wheel each time? Why don’t rely on a tool which already exists and effectively solves common needs for an application development and that’s already developed, used and tested by a lot of people (who have a strong experience on the involved language and on several aspects of developing, like: performance, security, scalability, maintainability and so on)?
And if you choose to write stuff from scratch, unless you are writing a pile of shitty spaghetti code, in the end you will implement some kind of abstraction layer in order to reuse functionalities in your app… so you will build your own framework… so once again: why don’t you use something that already exists and it’s being actively developed, tested, patched and for which people are writing extensions, tutorials, plugins and are constantly reporting issues and experiences?
I have to say that I was myself guilty in the past, I was used to write all I needed from scratch, only because I’m able to do that and I felt somehow “stronger”, “smarter” by acting that way… but it’s not the truth. The truth is that using a framework consciously is an evidence of being a mature programmer, who understands that is better to invest time on application specific features rather than writing low-level common utilities.
If you are an experienced programmer who love to write low-level API, it’s more useful to contribute to these frameworks, since they are usually open source projects available on github or write “plugins”/”extensions” for them.
Don’t reinvent the wheel, even if you can do it better, please avoid it, it’s a waste of time if you are alone (the development it’s only a part of the work, you have to maintain it, document it, fix bugs and so on).

3. Frameworks are hard to learn, I prefer to use a bunch of small simple libraries

That’s true, frameworks require a considerable time investment in order to be mastered by the developer (I learned Python in 2 days by myself and it took some months to get confident with the Django framework for example), but definitely it worth! You will be far more productive once you get used to it, you will write less code, in less time and without to worrying about how to design your application skeleton because is the framework itself which provide you the path to follow. On the other hand, learning a library is quite simple, but that’s just because libraries are focused on a single limited scope.
I don’t like to introduce many dependencies in my projects, I usually rely on third party code only if it dramatically improves my productivity and this is where a framework comes into play! It’s just one dependency, yes an huge dependency, but I can trust that each components it provides works as expected, since they are tested by the team that developed the framework and by people that use it, on the other hand by choosing several libraries arbitrarily in order to compose my own software stack, I’m not sure that each component will works smoothly as expected.
Let’s suppose to use libraries: A, B, C, D… how can I be sure that C hasn’t some kind of conflict or subtle incompatibility with A, B or D? Theoretically if A, B, C and D are just simple libraries that handle one and only one specific task, there won’t be any issue, but from my experience I can say that theory and practice are two different things… such issues are a concrete problem and this is specially true if we are using a dynamic scripting language like JavaScript (or Python, Ruby, PHP and so on) in which you can write atrocious hacks like runtime classes/functions (re)definition, variables overriding and so on. The worst thing is that such issues are often very, very, very HARD to spot!

4. A framework is heavy and I think I will use just use a little part of it… it’s overkilling!

Someone talks about monolithic frameworks and considers projects like Django (talking about server side stuff) or AngularJS (talking about client side stuff) overkilling so prefers to use multiple dedicated libraries. In my opinion you will never use a framework at 100% of its capabilities, but if you are not taking advantage of at least 50/60% of its features, this means that your project is so tiny and simple that you really just need few things or, if that’s not the case, you are therefore writing by yourself functionalities that are already provided by the framework and you are not aware of (or that you deliberately decided to not use).
Adopting a framework is an important choice that must not be taken easy, it must be carefully considered, planned, discussed.
As I said in several occasions to other colleagues, the adoption of a framework is like a marriage, you know that won’t be the perfect one, like there won’t be the perfect woman/man of your life, but anyway you found the one you like the most and that provides all or many of the functionalities you need, but of course it has also some flaw or trait you don’t like.
So the point is that you have to evaluate all these aspects and once you choose a framework you have to “get along with it”, you have to share its philosophy and style and you have to use it like it’s expecting to be used, otherwise you’ll find yourself struggling with the framework in order to make it behave like it hasn’t been intended to behave, and you’ll waste time and effort you could spare.

5. Ok, let’s use a framework or a set of libraries… but I want to be able to switch from one to another easily, so I’m gonna write some wrappers/adapters!

Switching from a framework to another in a big project will never be easy (that’s why you have to choose it carefully)! Writing an abstraction layer by creating wrappers around original API is in my opinion a waste of time! Don’t get me wrong it’s not about laziness, but instead because I’m a pragmatic developer who doesn’t live in a magic world where ponies run on the rainbow, I decided instead to take the pill from Morpheus which brings me to the real world in which I’m aware of the difficulty/impossibility of maintain such code base (that in the moment in which you have actually to switch the underlying framework will probably reveal its fragility and uselessness).
In my opinion, this approach is not agile, this is not feasible at all… it looks to me just insane!

I think that I said all I have to say about the topic, I hope you enjoyed my post and I would like to read your point of view in the comments.

High level URL manipulation using native Python API

While developing my new project I faced the need of manipulating an URL in order to change its query-string.
Basically my goal was to provide a parameter with a default value if not already defined and to add another new one.
The modules I used are urlparse and urllib, and in a few lines of code I achieved my goal in an high level programming fashion (I mean, without regex or low level “hacks”).
So let’s start… the first step is to parse the URL string using urlsplit:

from urlparse import urlsplit

url_data = urlsplit(url_string)

Supposing url_string is a string holding a valid url like “″, the urlsplit will return a SplitResult object which is a named tuple.
A named tuple is a subclass of tuple, a class which behaves like it but offer a way to initialize it using pre-defined keywords arguments and to refer them later, so for example is possible to create a named tuple called “CreditCard” in this way:

from collections import namedtuple

CreditCard = namedtuple('CreditCard', 
'number, secure_code, expire, owner')

and use it in this way:

card = CreditCard(number=1234567890, 
                  owner='Peter Parker')
print '{}\'s card number is: {} and'
      'has this secure code: {}'.format(
          card.owner, card.number, card.secure_code

One cool feature of named tuple is that you can update one of its field without to have to recreate the object by yourself using the method _replace (it will returns a new tuple with the updated value… remember that tuples are IMMUTABLE objects!).
So to change the owner of the previously defined credit card you will do:

card = card._replace(owner='Bruce Wayne')

(to be honest I don’t know why they decided to mark this helpful method as “protected” using the underscore prefix… but this does’n really matter)

ok… you should get it now. Let’s back to SplitResult… the tuple has the following properties (they are all strings objects):

  • scheme (http, https…)
  • netloc (
  • path (/path/)
  • query (the query-string)
  • fragment (what comes after the “#” sign)

So, what I needed was to manipulate the query-string, but once parsed out from the original URL it’s just a raw string and to avoid to mess up with string manipulation I used parse_qs, which returns a python dictionary:

from urlparse import parse_qs 

qs_data = parse_qs(url_data.get('query'))

A dictionary is very handy in order to manipulate query-string parameters, so now all I have to do is something like:

if not 'target_parameter' in qs_data:
    qs_data['target_parameter'] = ['tp1']
qs_data['extra_parameter'] = ['ex1']

You may be wondering about value assignment as a list instead of simple strings, well, this is because parse_qs returns a dictionary with keys and values as sequences since a parameter can be supplied with multiple values (ie: “?cat=ACTION&cat=HORROR&cat=COMEDY”).

Now that the query string data has been updated all I have to do is to serialize it back to a simple string and update the original SplitResult.

from urllib import urlencode

url_data._replace(query=urlencode(qs_data, True))

The second argument passed to urlencode (True) tells the function that we are passing sequences as values so it will handle them according.
The new modified url can be now retrieved by calling:


to summing up, this is the full code:

from urllib import urlencode
from urlparse import urlsplit, parse_qs

# parse original string url
url_data = urlsplit(url_string)

# parse original query-string
qs_data = parse_qs(url_data.get('query'))

# manipulate the query-string
if not 'target_parameter' in qs_data:
    qs_data['target_parameter'] = ['tp1']
qs_data['extra_parameter'] = ['ex1']

# get the url with modified query-string
url_data._replace(query=urlencode(qs_data, True)).geturl()

That’s all folks! If you enjoyed this post don’t forget to share it using the buttons below ;)

Python: reading numbers from JSON without loss of precision using Decimal class for data processing

In the project I’m working on, I’m using an external API which returns a JSON response containing conversion rates for currencies. Since I’m dealing with currencies and prices, the precision of numbers plays an important rule in order to calculate values in the application. The good thing about JSON, despite its name is the acronym of “JavaScript Object Notation“, is that it’s a cross-language format, so it’s not limited to the capabilities of a specific language like JavaScript, so numbers in in JSON may have an higher precision than a js float!
This is a quote from wikipedia about JSON numbers (emphasis is mine):

Number — a signed decimal number that may contain a fractional part and may use exponential E notation. JSON does not allow non-numbers like NaN, nor does it make any distinction between integer and floating-point. (Even though JavaScript uses a double-precision floating-point format for all its numeric values, other languages implementing JSON may encode numbers differently)

By default Python’s json module will loads decimal numbers as float, so if we have a JSON like:

{ "number": 1.00000000000000000001 }

the default conversion into python will be {u'number': 1.0} if we just write the following code:

import json


But, fortunately is dead simple to load numbers in JSON using the decimal module, there is no need to write custom decoders as I saw on the web, it’s just a matter of specify the Decimal class for floats parsing in the loads() function in this way:

import json
from decimal import Decimal

json.loads(json_string, parse_float=Decimal)

In this way the loaded python object will be:
{u'number': Decimal('1.00000000000000000001')}
And we will be able to perform precise arithmetic computations!
It’s also possible to use Decimal even for integer numbers, by specifying parse_int:


Additional reading from official Python docs:

Abstract classes in Python using abc module

Python is a powerful language but it lacks some OOP features which are the foundation of other programming languages like Java.
One of these are abstract classes (which are classes you can’t instantiate but only extends in order to inherit base common methods and to be forced to implement abstract methods representing a common interface).
A common practice that can be found in Python projects is to “mimic” abstract classes/methods by creating a base class and defining a series of methods that raise a NotImplementedError. The official python documentation in fact says:

In user defined base classes, abstract methods should raise this exception when they require derived classes to override the method.

But… with my huge pleasure I discovered that, starting from Python 2.6, a module called abc allows to create “real” abstract classes, methods and properties!!

So, let’s see how to implement a better and effective abstract class in Python by forgetting the old NotImplementedError:

from abc import ABCMeta, abstractmethod

class AbstractAnimal(object):
    __metaclass__ = ABCMeta
    def run(self):

now… if you (stupid idiot) try to instantiate an AbstractAnimal, the Python interpreter will complains saying:

TypeError: Can’t instantiate abstract class AbstractAnimal with abstract methods run

since, now you get it… let’s extend the abstract class with a concrete one:

class Dog(AbstractAnimal):

but you don’t trust me or simply forget to implement the abstract method (which MUST be implemented since marked as abstract)… and once again the interpreter will complains with:

TypeError: Can’t instantiate abstract class Dog with abstract methods run

(that, to be honest, is a dumb message since the class Dog is not actually abstract, but simply not implementing the required methods… but is however an “understandable” exception)

uh… If you use a cool IDE like PyCharm, it will marks the class as invalid by showing a message in the tooltip: “Dog must implement all abstract methods”!

finally once the method is implemented:

class Dog(AbstractAnimal):
    def run(self):
        print 'running like a dog...'

it’s also possible to define abstract properties using the related decorator @abstracproperty.

So… to recap: say goodbye to NotImplementedError, use abc module to assign ABCMeta to __metaclass__ property of the abstract class you want to implement, and use @abstractmethod and @abstracproperty to provide abstract methods and properties.

Read more about abc module here.