Understanding mutability and immutability in Scala

It’s been 4 months now since I started working with Scala and although I’m far away from being considered a Scala expert (the language is far more complex to master than Java, C# or any other programming language), I’ve got crystal clear how mutability and immutability works in the language and also generally (because this is actually a general, language agnostic concept). Unfortunately it seems to me that several developers are struggling to get it right, so in this post I’m gonna try my best to make it simple and clear for all.
There are two level of mutability/immutability:

1. Variable level
2. Object level

A variable is mutable if defined using the var keyword, it’s instead immutable if defined with the val keyword.
The immutability of a variable means that once it has been defined it can’t be reassigned to something else, the mutability of a variable instead means that it’s possible to assign another object to the variable after its definition.
For example the following code won’t compile:

object Foo extends App {
    val foo: String = "ciao"
    foo = "hello"
}

It will raise this exception:

error: reassignment to val

This happens because foo is immutable (it’s like a constant, it can’t change).
But if we change val with var all will be fine:

object Foo extends App {
    var foo: String = "ciao"
    foo = "hello"
}

This because foo is now mutable, so we can assign it to another string (one or more times).
BUT regarding object mutability we are not mutating the string object, because a string either referred with val or var is an immutable object! So the reassignment of foo does not change the string object itself, this is not possible, what happens is that a new string object is created (“hello”) and it’s assigned to the variable, later the previous string object “ciao” will be marked for deletion and the garbage collector will get rid off it.
It’s possible to demonstrate that these objects are different by checking their hash code:

object Foo extends App {
    var foo: String = "ciao"
    println(foo.hashCode)
    println(foo.hashCode)
    
    foo = "hello"
    println(foo.hashCode)
    println(foo.hashCode)
    
    foo = "hi"
    println(foo.hashCode)
    println(foo.hashCode)
}

The code above will print 2 codes of the same value for the object “ciao”, 2 codes of the same value for the object “hello” and 2 codes of the same value for the object “hi” (we have created 3 string objects in memory and thus we have 3 different hash codes for each one).
Mutability and immutability are the reason behind different collections implementations.
One of the most common collection is the List object for example and it’s an immutable one.
It’s possible to “sum” several lists into one, but in my opinion is not a good practice, because we are creating several unnecessary temporary objects in memory.
For example:

object Foo extends App {
    var myList: List[Int] = List(1, 2, 3)
    myList = myList ::: List(4, 5, 6)
    myList = myList ::: List(6, 7, 8)
    myList = myList ::: List(9, 10, 11)
}

can be better by using a collection designed for mutability like ListBuffer:

import scala.collection.mutable.ListBuffer

object Foo extends App {
    val buffer: ListBuffer[Int] = ListBuffer(1, 2, 3)
    buffer.append(4, 5, 6)
    buffer.append(6, 7, 8)
    buffer.append(9, 10, 11)
}

and as you can see I defined the buffer as an immutable constant variable, because I do want the buffer to expand (and this is provided by design since it’s a mutable collection) but I don’t want it to be replaced by another buffer with a new assignment… do you get it?

Configuring Docker in order to run properly behind a company proxy

Quick post to remember how to setup Docker in order to run behind a proxy:

1. Create docker service configuration file:

sudo mkdir /etc/systemd/system/docker.service.d
sudo touch /etc/systemd/system/docker.service.d/http-proxy.conf
sudo nano /etc/systemd/system/docker.service.d/http-proxy.conf

Add the following:

[Service]
Environment="HTTP_PROXY=http://xxx:yyy" 
Environment="HTTPS_PROXY=https://xxx:yyy"
Environment="NO_PROXY=localhost,127.0.0.0"

where xxx is the host and yyy is the port of course!

2. Check and apply configuration:

sudo systemctl daemon-reload

Check proper configuration:

sudo systemctl show --property Environment docker

Restart docker if the command output is correct:

sudo systemctl restart docker

You might wish to customize your bashrc too (~/.bashrc), by adding:

export http_proxy="http://xxx:yyy"
export https_proxy="https://xxx:yyy"

[Python recipe] Run a Flask web application locally in https

Problem:

You want to run a Flask application on your development machine using https.

Solution:

Provided that you have openssl installed (apt-get install openssl -y on ubuntu), we can create a self signed SSL certificate:

cd /your_project_dir

mkdir ssl && cd ssl

openssl genrsa -des3 -passout pass:x -out server.pass.key 2048

openssl rsa -passin pass:x -in server.pass.key -out server.key

rm server.pass.key

openssl req -new -key server.key -out server.csr

openssl x509 -req -sha256 -days 365 -in server.csr -signkey server.key -out server.crt

Once the certificate has been generated, we can configure the Flask app to run in https using it (in the python code I’m assuming that the application code is under the “src” folder).
The Flask app should be configured with the option ssl_context which accepts a tuple containing the paths of .crt and .key files.
In the main application module:

ssl_dir: str = os.path.dirname(__file__).replace('src', 'ssl')
key_path: str = os.path.join(ssl_path, 'server.key')
crt_path: str = os.path.join(ssl_path, 'server.crt')
ssl_context: tuple = (crt_path, key_path)
app.run('0.0.0.0', 8000, debug=False, ssl_context=ssl_context)

Now you should be able to reach your app from any device in your local network with https://your-machine-ip:8000

Notes:

This recipe has been tested with Flask 0.12.1 on Ubuntu, older versions may behave differently.

The definitive guide to solve the infamous Python exception “ModuleNotFoundError”

Among common Python exceptions, the most infamous and time consuming one to solve is no doubt the “ModuleNotFoundError” but actually is pretty simple to fix once you understand a couple of concepts.
Fundamentally it can be raised for three reasons:

1. A typo or a wrong path specified in the import statement

This is the most easy to spot, and if you are using an IDE like PyCharm you will notice it immediately before running your code.

In order to reproduce the exception, let’s consider a project structure like:

/proj
    /foo
        __init__.py
        bar.py
    main.py

A main.py containing:

from fo.bar import BarClass
c = BarClass()

and bar.py containing:

class BarClass:
    pass

By using /proj as a current working directory and by running:

python main.py

We will obtain the following exception:

Traceback (most recent call last):
  File "/Users/dave/PycharmProjects/proj/main.py", line 1, in <module>
    from fo.bar import BarClass
ModuleNotFoundError: No module named 'fo'

To solve the problem, we have simply to change the import in order to match the right path (“foo.bar” instead of “fo.bar”):

from foo.bar import BarClass
c = BarClass()

So far, so easy… but let’s go on with scenario N.2

2. Execution context which requires an entry addition in sys.path that has not been satisfied

This one occurs when we are executing a python script with an import statement in a directory from which the interpreter cannot resolve the path to the required module defined in the import statement due to missing or bad configuration of the sys.path.
And, here you have first to understand how Python lookup for modules works, so I report the official documentation:

When a module named spam is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. sys.path is initialized from these locations:

  1. The directory containing the input script (or the current directory when no file is specified).
  2. PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH).
  3. The installation-dependent default.

Let’s keep the structure of the scenario N.1, but with main.py containing:

class BaseClass:
    pass

and bar.py containing:

from main import BaseClass
c = BaseClass()

but now let’s change the working directory to “foo”, and launch the command:

python bar.py 

We will obtain the following exception:

Traceback (most recent call last):
  File "bar.py", line 1, in <module>
    from main import BaseClass
ModuleNotFoundError: No module named 'main'

Because since we are in the “foo” directory and we didn’t update the sys.path, Python is looking for a main.py file in that directory and obviously is not the case!
We can fix this issue in two ways: by using the PYTHONPATH environment variable or by extending the sys.path list.
To use the PYTHONPATH in a single shot, we can launch the script with the following command:

PYTHONPATH=../ python bar.py

In this way, we are practically saying “hey python, please consider also the parent directory for the module lookup”.
The same can be specified programmatically in this way:

import sys
sys.path.append('../')

Of course the code above must be written before the other import statement. Anyway my advice is to avoid such approach and to relay only on the PYTHONPATH environment variable.
Use sys.path instead to debug your current path resolution in this way:

import sys

for p in sys.path:
    print(p)

3. Circular dependency

This one is the most hateful that you can face. It happens when a module A requires something from a module B and in turn, the module B requires something from module A, thus generating a “deadly” circular reference.
In most cases it happens after an automatic refactoring with PyCharm (typically if you use the logging framework in the classical way)*, if it happens for other reasons it’s a signal that your software design is not sound and that you must review it carefully.

* for a classical usage of the logging framework I mean:

import logging

log = logging.getLogger(__name__)

class MyClass:
    def my_method(self):
        log.info('My method invoked')

then after moving MyClass to another module (via automatic refactoring), PyCharm tends to include an import of log (which 1. is not required since each module has its logger, 2. may cause the circular dependency).
To manually reproduce the exception, let’s consider a super simple structure like the following:

    /proj
        a.py
        b.py

With a.py containing:

from b import ClassB

class ClassA:
    def __init__(self):
        self.b = ClassB()

and b.py containing:

from a import ClassA

class ClassB:
    pass

a = ClassA()

By running python a.py in the project root, we will get the following exception:

Traceback (most recent call last):
  File "/Users/dave/PycharmProjects/proj/a.py", line 1, in <module>
    from b import ClassB
  File "/Users/dave/PycharmProjects/proj/b.py", line 1, in <module>
    from a import ClassA
  File "/Users/dave/PycharmProjects/proj/a.py", line 1, in <module>
    from b import ClassB
ImportError: cannot import name 'ClassB'

If we pay attention we can quite easily spot that this time we are facing a circular reference issue, since the stack trace is longer that the previous ones, and it prints a “ping-pong” between a.py and b.py.

Writing better software with Python 3.6 type hints

One of the recent features of Python 3 that I like the most is definitely the support for type annotations.
Type annotations are a precious tool (especially if used in combination with an advanced IDE like PyCharm) that allow us to: write clear and implicitly documented code, prevent us from invoking methods with wrong data types (ok, actually we can do whatever at runtime since Python is a dynamic language and type hints as the name suggests is just that: an hint) and get useful code suggestions and autocompletion.
Starting with Python 3.6 is now possible to specify not only arguments type in method signatures, but also types for inline variables. Let’s see it in action with a sample code:

from datetime import datetime, timedelta
from enum import Enum
from typing import List


class Sex(Enum):
    M = 'M'
    F = 'F'


class Person:
    def __init__(self, 
                 first_name: str, 
                 last_name: str, 
                 birth_date: datetime, 
                 sex: Sex):
        self._first_name: str = first_name
        self._last_name: str = last_name
        self._birth_date: datetime = birth_date
        self._sex: Sex = sex
        self._hobbies: List[str] = []

    def get_age(self) -> int:
        diff: timedelta = datetime.now() - self._birth_date
        return int(diff.days / 365)

    @property
    def hobbies(self) -> List[str]:
        return self._hobbies

    @hobbies.setter
    def hobbies(self, hobby_list: List[str]):
        self._hobbies = hobby_list

So, basically we have created a Person class and a Sex enum and by using type hints we have declared that:
“first_name” and “last_name” must be a str type, “birth_date” a datetime type and “sex” a custom enum type Sex.
We have also specified the return type of get_age() method as int and inside its implementation we have referenced the date difference as a timedelta object.
Finally we have imported List from “typing” package in order to specify “hobbies” as a list of string objects (if we don’t care about list content we can just use list type by avoiding the import).

By using PyCharm, we can see that if we try to pass an invalid type as argument it complains as expected:

Unfortunately PyCharm does not complains if we try to specify “hobbies” via simple assignment:

But in my opinion, using the type hints as shown in the example code has the huge value of keeping code documented, especially if you work in a team, or if you want to write an open source project.

One limitation in type hints that I found is that you can’t create “circular references”, that means you can’t have a method in a class that specify itself as argument:

Update:

As suggested in the comments, this can be “bypassed” by using strings in place of types as reported here