Article Thumbnail

Introduction To Python's Functools Module

Introducing the functools functions using real world examples

Florian Dahlitz
17 min
May 4, 2020

What Is in the Module?

The functools module is part of Python's standard library and was implemented for higher-order functions. A higher-order function is a function that acts on or returns another function or multiple functions. In general, any callable object can be treated as a function for the purposes of this module.

functools provides 11 functions:

  • cached_property()
  • cmp_to_key()
  • lru_cache()
  • partial()
  • partialmethod()
  • reduce()
  • singledispatch()
  • singledispatchmethod()
  • total_ordering()
  • update_wrapper()
  • wraps()

Throughout the article, we will have a closer look at each function and a few examples where they are useful. You can find the code snippets used in the article on GitHub. Enjoy!

Note: The article is based on Python 3.8.2 (CPython). Some functions may not be present in earlier versions of CPython.

Functools' Functions

@Cached_property - Caching Instance Methods

Imagine, you have a large dataset and in order to analyse it, you implement a class holding the whole dataset. Furthermore, you implement functions to calculate information like the standard deviation of the dataset at hand. The problem: Each time you call the method, the standard deviation it calculated again - and this takes time! This is where @cached_property comes into play.

Its purpose is to transform a method of a class into a property whose value is computed once and then cached as a normal attribute for the lifetime of the instance. The behaviour is pretty similar to the built-in @property decorator [2], with the addition of caching. Let's have a look at the example from the Python documentation:

class DataSet:
    def __init__(self, sequence_of_numbers):
        self._data = sequence_of_numbers

    @cached_property
    def stdev(self):
        return statistics.stdev(self._data)

    @cached_property
    def variance(self):
        return statistics.variance(self._data)

In the scenario at hand, we have a (large) sequence of numbers stored in a DataSet instance. Additionally, two methods are defined, which compute standard deviation and variance, respectively. We apply the @cached_property decorator [3] to both functions to turn them into cached properties. This means that indeed, the value is only calculated once and then cached.

Note: Each instance of DataSet needs to have a __dict__ attribute with an immutable mapping. This is required by the decorator to be able to work correctly.

Cmp_to_key() - A Transition Function

Before we can move on, we first need to understand the difference between a comparison function and a key function.

A comparison function is any callable which takes two arguments, compares them and returns a number based on the order of the supplied arguments. A negative number means that the first argument is less than the second one, zero means they are equal and a positive number indicates that the first one is greater than the second one. A naive implementation in Python might look like this:

def compare(a, b):
    if a < b:
        return -1
    elif a == b:
        return 0
    else:
        return 1

In contrast, a key function is a callable that accepts one argument and returns another value to be used as the sort key. A prominent representative of this group is the operator.itemgetter() key function [4], which you might know from your day to day coding. Key functions are often supplied to built-in functions like sorted(), min(), and max().

In essence, cmp_to_key() turns a comparison function into a key function. The cmp_to_key() function was implemented to support the transition from Python 2 to 3, because in Python 2 there existed a function called cmp() (as well as a dunder method __cmp__()) for comparisons and ordering.

@Lru_cache() - Increasing Code Performance Through Caching

@lru_cache() is a decorator, which wraps a function with a memoizing callable that saves up to the maxsize most recent calls (default: 128).

Note: Simply put, memoization means saving the result of a function call and return it if the function is called with the same arguments again. For more information, checkout Dan Bader's article about memoization in Python [5].

This is especially useful if you have expensive or I/O bound functions that are periodically called with the same arguments. LRU Cache stands for Least-Recently-Used Cache and refers to a cache which drops the least recently used element if the maximum size of entries is reached. The LRU feature is disabled if maxsize is set to None.

Let's have a look at two examples. In the first example, we define a function get_pep(), which accepts a PEP number (Python Enhancement Proposal) and returns the content of the PEP if it exists.

# lru_cache_static_website_content.py

from functools import lru_cache
from urllib import error
from urllib import request


@lru_cache(maxsize=32)
def get_pep(number: int) -> str:
    resource = f"http://www.python.org/dev/peps/pep-{number:04d}/"
    print(resource)
    try:
        with request.urlopen(resource) as s:
            return s.read()
    except error.HTTPError:
        return "Not Found"


list_of_peps = [8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991]

for n in list_of_peps:
    pep = get_pep(n)
    print(n, len(pep))

print(get_pep.cache_info())

As you can see, we add the @lru_cache() decorator to the function and set the maximum size of the cache to 32. We call get_pep() in a for-loop with a number of PEPs. If you have a close look at the list_of_peps, you can see that there are two numbers occurring twice or even three times in the list: 8 and 320.

Once you execute the script, you will recognize that fetching PEPs, which were already requested from python.org, are immediately there without printing the URL. This is due to the fact that we do not call the function and fetch it again from the website but take it from our cache.

$ python lru_cache_static_website_content.py
http://www.python.org/dev/peps/pep-0008/
8 106914
http://www.python.org/dev/peps/pep-0290/
290 59806
http://www.python.org/dev/peps/pep-0308/
308 57012
http://www.python.org/dev/peps/pep-0320/
320 49591
8 106914
http://www.python.org/dev/peps/pep-0218/
218 46835
320 49591
http://www.python.org/dev/peps/pep-0279/
279 48593
http://www.python.org/dev/peps/pep-0289/
289 50922
320 49591
http://www.python.org/dev/peps/pep-9991/
9991 9
CacheInfo(hits=3, misses=8, maxsize=32, currsize=8)

At the end of the script, we print the cache info for get_pep(). This reveals that we had three hits meaning three times Python used a cached value instead of calling the function again (one time with the number 8 and two times with 320). The other eight calls were misses, so the function was called and the result added to the cache. Consequently, the final cache consists of eight entries.

In the second example, we have a recursive implementation of the Fibonacci sequence that we want to speed up.

# lru_cache_fibonacci.py

from functools import lru_cache


@lru_cache(maxsize=None)
def fib(n: int) -> int:
    if n < 2:
        return n
    return fib(n - 1) + fib(n - 2)


list_of_fib_numbers = [fib(n) for n in range(16)]
print(list_of_fib_numbers)
print(fib.cache_info())

In the example at hand, we compute a Fibonacci sequence of length 16 and print the resulting sequence as well as the cache info for the fib() function.

$ python lru_cache_fibonacci.py
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]
CacheInfo(hits=28, misses=16, maxsize=None, currsize=16)

You may be surprised by the number of hits and misses of the cache. However, consider the following: First, we compute the result for n=0. Since we have no entry in our cache, we need to compute the result, which increases the number of misses by one resulting in hits=0 and misses=1. This scenario happens again when calling fib() with n=1. Next, fib() is called with n=2. The result is recursively computed by calculating the results for n=1 and n=0 and summing them up. We already computed both results, so we can take them from the cache. Consequently, we only have one new miss as we have no entry for n=2 yet. This continues until all 16 n's were passed to fib() resulting in only 16 misses.

Are you curious to know how much time we saved using @lru_cache() in this example? We can test it using Python's timeit.timeit() function, which shows us something incredible:

Without @lru_cache: 2.7453888780000852 seconds
With @lru_cache: 2.127898915205151e-05 seconds

With @lru_cache() the fib() function is around 100.000 times faster - wow! It is definitely a decorator you want to remember.

@Total_ordering - Decreasing Lines of Code by Utilizing a Decorator

Programming in Python often involves writing your own classes. In some cases, you want to be able to compare different instances of that class. Depending on how you want to compare them, you might end up by implementing functions like __lt__(), __le__(), __gt__(), __ge__() or __eq__() to be able to use the corresponding <, <=, >, >=, and == operators. On the other hand, you could utilize the @total_ordering decorator. This way, you only need to implement one or more rich comparison ordering methods and the decorator supplies the rest. Additionally, it is recommended to define the __eq__() method, too.

Let's suppose you have a class Pythonista and you want to be able to order them lexicographically. To be able to do that, you need to implement the rich comparison ordering methods. But instead of implementing all of them, we only implement the __lt__() method as well as the __eq__() method. By using the @total_ordering decorator, the other ones are automatically defined.

# total_ordering.py

from functools import total_ordering


@total_ordering
class Pythonista:
    firstname: str
    lastname: str

    def __init__(self, firstname: str, lastname: str) -> None:
        self.firstname = firstname
        self.lastname = lastname

    def __eq__(self, other: object) -> bool:
        if not isinstance(other, Pythonista):
            return NotImplemented
        return ((self.lastname.lower(), self.firstname.lower()) ==
                (other.lastname.lower(), other.firstname.lower()))

    def __lt__(self, other: object):
        if not isinstance(other, Pythonista):
            return NotImplemented
        return ((self.lastname.lower(), self.firstname.lower()) <
                (other.lastname.lower(), other.firstname.lower()))


guido = Pythonista("Guido", "van Rossum")
brett = Pythonista("Brett", "Cannon")
print(guido > brett)

Executing the script will print True, because c comes before v. Notice that we were able to use the > operator even though we did not explicitly implemented __ge__().

The @total_ordering decorator is a nice way to reduce the lines of code and the places where you need to adjust your code if you want the instances to be compared based on different attributes. However, using the @total_ordering decorator adds overhead leading to slower execution. Furthermore, the stack traces for the derived comparison methods are more complex. So if you need very performant code, you should not use the decorator and implement the methods you need on your own instead.

Partial() - Simplifying Signatures

With partial() you can create partial objects. These objects behave like the function passed to partial() called with the (keyword) arguments supplied to partial(). Consequently, the newly created (partial) objects have a simplified signature compared to the original function.

Here is an example:

# partial_basetwo.py

from functools import partial

basetwo = partial(int, base=2)
basetwo.__doc__ = "Convert base 2 string to an int."

print(basetwo("10010"))

We create a partial object based on the built-in int() function. In this case, we supply base=2 as keyword argument. Consequently, the newly created basetwo object behaves as if we call int() with base=2. However, we can still override this behaviour by supplying a base argument to basetwo. So executing basetwo("10010", base=10) computes the same result as int("10010").

Let's take another example.

# partial_euclidian_distance.py

import math

from functools import partial
from typing import Tuple


def euclidean_distance(point1: Tuple[int, int], point2: Tuple[int, int]) -> float:
    x1, y1 = point1
    x2, y2 = point2
    return math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2)


zero_euclid = partial(euclidean_distance, (0, 0))
point = (1, 1)
print(zero_euclid(point))

The function at hand computes the euclidean distance between two points in a two dimensional space. We can create a partial object, which takes exactly one argument (a point) and calculates the euclidean distance between the supplied point and the point (0, 0).

Partialmethod() - The Partial() for Methods

partialmethod() is a function, which returns partialmethod descriptors. You can think of it as the partial() function for methods. This means, it is not intended to be callable, but as a way to define new methods. I really like the example from the Python documentation [6], so let's have a look at it.

# partialmethod.py

from functools import partialmethod


class Cell(object):
    def __init__(self):
        self._alive = False

    @property
    def alive(self):
        return self._alive

    def set_state(self, state):
        self._alive = bool(state)

    set_alive = partialmethod(set_state, True)
    set_dead = partialmethod(set_state, False)


c = Cell()
c.set_alive()
print(c.alive)

We define a class Cell representing a single cell. It has a property alive and an instance method set_state() setting alive to either True or False. Furthermore, we create two partialmethod descriptors set_alive() and set_dead(), which call set_state() with True or False, respectively. This allows us to create a new instance of the class Cell, call set_alive() to change the cell's state to True and print the property's value.

Reduce() - Calculating a Single Value Based on Many

Let's suppose you have an iterable of numbers and want to reduce it to a single value. In this case, the resulting value is the sum over all elements of the supplied iterable. One way to achieve this is to utilize reduce().

# reduce.py

from functools import reduce
from operator import add

iterable = [1, 2, 3, 4, 5]
result = reduce(add, iterable)
print(result)

As you can see, we defined a list containing the numbers 1 to 5. We calculate the sum over all elements of the list by calling the reduce() function with the operator.add() function as first argument and the the list as second argument. Of course you could use the built-in sum() function, but what if you want to compute the product over all elements instead? The only thing you need to change is to replace the operator.add() function with operator.mul() - done!

@Singledispatch - Function Overloading

Per definition, the @singledispatch decorator transforms a function into a single-dispatch generic function. In the case of @singledispatch the dispatch happens on the type of the first argument.

Note: A generic function is a function composed of multiple functions implementing the same operation for different types. Which implementation should be used during a call is determined by the dispatch algorithm [7].

Note: A single dispatch is a form of generic function dispatch where the implementation is chosen based on the type of a single argument [8].

Simply spoken, @singledispatch allows you to overload functions in Python. Let's take an example to illustrate it.

# singledispatch.py

from functools import singledispatch


@singledispatch
def mul(a, b):
    if type(a) is not type(b):
        return NotImplemented
    return a * b


@mul.register
def _(a: str, b: str):
    return a + b


print(mul(1, 2))
print(mul(1.5, 2.5))
print(mul("1", "2"))
print(mul(1, 1.0))

In the example at hand, we defined a function mul(), which takes two arguments and returns the product of them. However, multiplying two strings raises a TypeError in Python. We can supply a patch for this by registering the _() function. Executing the script results in:

$ python singledispatch.py
2
3.75
12
NotImplemented

@Singledispatchmethod - Method Overloading

The @singledispatchmethod decorator solves the same tasks as the @singledispatch decorator but for methods.

# singledispatchmethod.py

from functools import singledispatchmethod


class Negator:
    @singledispatchmethod
    def neg(self, arg):
        raise NotImplementedError("Cannot negate a")

    @neg.register
    def _(self, arg: int):
        return -arg

    @neg.register
    def _(self, arg: bool):
        return not arg


neg = Negator()
print(neg.neg(5))
print(neg.neg(True))
print(neg.neg("Hello"))

The Negator class has a single instance method called neg(). The neg() function raises a NotImplementedError by default. However, the function is overloaded for integers and boolean types and returns the negation in these cases. Executing the script results in:

$ python singledispatchmethod.py
-5
False
Traceback (most recent call last):
  File singledispatchmethod.py, line 23, in <module>
    print(neg.neg(Hello))
  File /usr/lib/python3.8/functools.py, line 911, in _method
    return method.__get__(obj, cls)(*args, **kwargs)
  File singledispatchmethod.py, line 9, in neg
    raise NotImplementedError(Cannot negate a)
NotImplementedError: Cannot negate a

Update_wrapper() - Hiding Wrapper Functions

The idea behind the update_wrapper() function is to update a wrapper function (as the name suggests) in a way that it looks like the wrapped function. In order to achieve this, update_wrapper() assigns the wrapped functions __module__, __name__, __qualname__, __annotations__, and __doc__ to the wrapper function ones. Furthermore, it updates the wrapper function's __dict__.

Let's have a look at the @wraps decorator for a practical example.

@Wraps - Convenience Function for Update_wrapper()

@wraps is a decorator acting as a convenience function to invoke update_wrapper(). To be precise, it is the same as calling partial(update_wrapper, wrapped=wrapped, assigned=assigned, updated=updated). After reading the technical details about update_wrapper() and @wraps you might ask yourself why we need to hide our wrapper functions.

The following code snippet defines a decorator @show_args. It prints the arguments and keyword arguments a function is called with before invoking the function itself.

# wraps.py

def show_args(f):
    def wrapper(*args, **kwargs):
        print(f"Calling function {f.__name__} with {args} and {kwargs}")
        return f(*args, **kwargs)
    return wrapper

Now, we can define a function add(), which returns the sum of two passed integers. Furthermore, we apply our newly written decorator to it as we are interested in the function's arguments and keyword arguments. At the end of the script, we print the result of a simple addition as well as the function's documentation string and name.

# previous wraps.py code

@show_args
def add(a: int, b: int) -> int:
    """Add two numbers a and b and return the result"""
    return a + b


print(add(5, 1))
print(add.__doc__)
print(add.__name__)
$ python wraps.py
Calling function add with (5, 1) and {}
6
None
wrapper

Did you expect to see a different documentation string and name than the ones printed? This is because we did not access the wrapped function's documentation string and name but the ones from the wrapper function. This is were @wraps comes into play. The only thing we need to change in our code is to apply the decorator to the wrapper() function.

def show_args(f):
    @wraps(f)
    def wrapper(*args, **kwargs):
        print(f"Calling function {f.__name__} with {args} and {kwargs}")
        return f(*args, **kwargs)
    return wrapper

If we now execute the script, we see the desired output:

$ python wraps.py
Calling function add with (5, 1) and {}
6
Add two numbers a and b and return the result
add

Summary

Congratulations, you have made it through the article! Now, you have a decent understanding of the functions the functools module contains. Additionally, you implemented examples where these functions are useful.

I hope you enjoyed reading the article. Make sure to share it with your friends and colleagues. If you have not already, consider following me on Twitter, where I am @DahlitzF or subscribing to my newsletter so you won't miss a future article. Stay curious and keep coding!

References