The functools
module is part of Python's standard library and was implemented for higher-order functions.
A higher-order function is a function that acts on or returns another function or multiple functions.
In general, any callable object can be treated as a function for the purposes of this module.
functools
provides 11 functions:
cached_property()
cmp_to_key()
lru_cache()
partial()
partialmethod()
reduce()
singledispatch()
singledispatchmethod()
total_ordering()
update_wrapper()
wraps()
Throughout the article, we will have a closer look at each function and a few examples where they are useful. You can find the code snippets used in the article on GitHub. Enjoy!
Note: The article is based on Python 3.8.2 (CPython). Some functions may not be present in earlier versions of CPython.
@Cached_property
- Caching Instance Methods¶Imagine, you have a large dataset and in order to analyse it, you implement a class holding the whole dataset.
Furthermore, you implement functions to calculate information like the standard deviation of the dataset at hand.
The problem: Each time you call the method, the standard deviation it calculated again - and this takes time!
This is where @cached_property
comes into play.
Its purpose is to transform a method of a class into a property whose value is computed once and then cached as a normal attribute for the lifetime of the instance.
The behaviour is pretty similar to the built-in @property
decorator [2], with the addition of caching.
Let's have a look at the example from the Python documentation:
class DataSet:
def __init__(self, sequence_of_numbers):
self._data = sequence_of_numbers
@cached_property
def stdev(self):
return statistics.stdev(self._data)
@cached_property
def variance(self):
return statistics.variance(self._data)
In the scenario at hand, we have a (large) sequence of numbers stored in a DataSet
instance.
Additionally, two methods are defined, which compute standard deviation and variance, respectively.
We apply the @cached_property
decorator [3] to both functions to turn them into cached properties.
This means that indeed, the value is only calculated once and then cached.
Note: Each instance of
DataSet
needs to have a__dict__
attribute with an immutable mapping. This is required by the decorator to be able to work correctly.
Cmp_to_key()
- A Transition Function¶Before we can move on, we first need to understand the difference between a comparison function and a key function.
A comparison function is any callable which takes two arguments, compares them and returns a number based on the order of the supplied arguments. A negative number means that the first argument is less than the second one, zero means they are equal and a positive number indicates that the first one is greater than the second one. A naive implementation in Python might look like this:
def compare(a, b):
if a < b:
return -1
elif a == b:
return 0
else:
return 1
In contrast, a key function is a callable that accepts one argument and returns another value to be used as the sort key.
A prominent representative of this group is the operator.itemgetter()
key function [4], which you might know from your day to day coding.
Key functions are often supplied to built-in functions like sorted()
, min()
, and max()
.
In essence, cmp_to_key()
turns a comparison function into a key function.
The cmp_to_key()
function was implemented to support the transition from Python 2 to 3, because in Python 2 there existed a function called cmp()
(as well as a dunder method __cmp__()
) for comparisons and ordering.
@Lru_cache()
- Increasing Code Performance Through Caching¶@lru_cache()
is a decorator, which wraps a function with a memoizing callable that saves up to the maxsize
most recent calls (default: 128
).
Note: Simply put, memoization means saving the result of a function call and return it if the function is called with the same arguments again. For more information, checkout Dan Bader's article about memoization in Python [5].
This is especially useful if you have expensive or I/O bound functions that are periodically called with the same arguments.
LRU Cache stands for Least-Recently-Used Cache and refers to a cache which drops the least recently used element if the maximum size of entries is reached.
The LRU feature is disabled if maxsize
is set to None
.
Let's have a look at two examples.
In the first example, we define a function get_pep()
, which accepts a PEP number (Python Enhancement Proposal) and returns the content of the PEP if it exists.
# lru_cache_static_website_content.py
from functools import lru_cache
from urllib import error
from urllib import request
@lru_cache(maxsize=32)
def get_pep(number: int) -> str:
resource = f"http://www.python.org/dev/peps/pep-{number:04d}/"
print(resource)
try:
with request.urlopen(resource) as s:
return s.read()
except error.HTTPError:
return "Not Found"
list_of_peps = [8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991]
for n in list_of_peps:
pep = get_pep(n)
print(n, len(pep))
print(get_pep.cache_info())
As you can see, we add the @lru_cache()
decorator to the function and set the maximum size of the cache to 32
.
We call get_pep()
in a for
-loop with a number of PEPs.
If you have a close look at the list_of_peps
, you can see that there are two numbers occurring twice or even three times in the list: 8
and 320
.
Once you execute the script, you will recognize that fetching PEPs, which were already requested from python.org, are immediately there without printing the URL. This is due to the fact that we do not call the function and fetch it again from the website but take it from our cache.
$ python lru_cache_static_website_content.py
http://www.python.org/dev/peps/pep-0008/
8 106914
http://www.python.org/dev/peps/pep-0290/
290 59806
http://www.python.org/dev/peps/pep-0308/
308 57012
http://www.python.org/dev/peps/pep-0320/
320 49591
8 106914
http://www.python.org/dev/peps/pep-0218/
218 46835
320 49591
http://www.python.org/dev/peps/pep-0279/
279 48593
http://www.python.org/dev/peps/pep-0289/
289 50922
320 49591
http://www.python.org/dev/peps/pep-9991/
9991 9
CacheInfo(hits=3, misses=8, maxsize=32, currsize=8)
At the end of the script, we print the cache info for get_pep()
.
This reveals that we had three hits meaning three times Python used a cached value instead of calling the function again (one time with the number 8
and two times with 320
).
The other eight calls were misses, so the function was called and the result added to the cache.
Consequently, the final cache consists of eight entries.
In the second example, we have a recursive implementation of the Fibonacci sequence that we want to speed up.
# lru_cache_fibonacci.py
from functools import lru_cache
@lru_cache(maxsize=None)
def fib(n: int) -> int:
if n < 2:
return n
return fib(n - 1) + fib(n - 2)
list_of_fib_numbers = [fib(n) for n in range(16)]
print(list_of_fib_numbers)
print(fib.cache_info())
In the example at hand, we compute a Fibonacci sequence of length 16 and print the resulting sequence as well as the cache info for the fib()
function.
$ python lru_cache_fibonacci.py
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]
CacheInfo(hits=28, misses=16, maxsize=None, currsize=16)
You may be surprised by the number of hits and misses of the cache.
However, consider the following: First, we compute the result for n=0
.
Since we have no entry in our cache, we need to compute the result, which increases the number of misses by one resulting in hits=0
and misses=1
.
This scenario happens again when calling fib()
with n=1
.
Next, fib()
is called with n=2
.
The result is recursively computed by calculating the results for n=1
and n=0
and summing them up.
We already computed both results, so we can take them from the cache.
Consequently, we only have one new miss as we have no entry for n=2
yet.
This continues until all 16 n
's were passed to fib()
resulting in only 16 misses.
Are you curious to know how much time we saved using @lru_cache()
in this example?
We can test it using Python's timeit.timeit()
function, which shows us something incredible:
Without @lru_cache: 2.7453888780000852 seconds
With @lru_cache: 2.127898915205151e-05 seconds
With @lru_cache()
the fib()
function is around 100.000 times faster - wow!
It is definitely a decorator you want to remember.
@Total_ordering
- Decreasing Lines of Code by Utilizing a Decorator¶Programming in Python often involves writing your own classes.
In some cases, you want to be able to compare different instances of that class.
Depending on how you want to compare them, you might end up by implementing functions like __lt__()
, __le__()
, __gt__()
, __ge__()
or __eq__()
to be able to use the corresponding <
, <=
, >
, >=
, and ==
operators.
On the other hand, you could utilize the @total_ordering
decorator.
This way, you only need to implement one or more rich comparison ordering methods and the decorator supplies the rest.
Additionally, it is recommended to define the __eq__()
method, too.
Let's suppose you have a class Pythonista
and you want to be able to order them lexicographically.
To be able to do that, you need to implement the rich comparison ordering methods.
But instead of implementing all of them, we only implement the __lt__()
method as well as the __eq__()
method.
By using the @total_ordering
decorator, the other ones are automatically defined.
# total_ordering.py
from functools import total_ordering
@total_ordering
class Pythonista:
firstname: str
lastname: str
def __init__(self, firstname: str, lastname: str) -> None:
self.firstname = firstname
self.lastname = lastname
def __eq__(self, other: object) -> bool:
if not isinstance(other, Pythonista):
return NotImplemented
return ((self.lastname.lower(), self.firstname.lower()) ==
(other.lastname.lower(), other.firstname.lower()))
def __lt__(self, other: object):
if not isinstance(other, Pythonista):
return NotImplemented
return ((self.lastname.lower(), self.firstname.lower()) <
(other.lastname.lower(), other.firstname.lower()))
guido = Pythonista("Guido", "van Rossum")
brett = Pythonista("Brett", "Cannon")
print(guido > brett)
Executing the script will print True
, because c
comes before v
.
Notice that we were able to use the >
operator even though we did not explicitly implemented __ge__()
.
The @total_ordering
decorator is a nice way to reduce the lines of code and the places where you need to adjust your code if you want the instances to be compared based on different attributes.
However, using the @total_ordering
decorator adds overhead leading to slower execution.
Furthermore, the stack traces for the derived comparison methods are more complex.
So if you need very performant code, you should not use the decorator and implement the methods you need on your own instead.
Partial()
- Simplifying Signatures¶With partial()
you can create partial
objects.
These objects behave like the function passed to partial()
called with the (keyword) arguments supplied to partial()
.
Consequently, the newly created (partial
) objects have a simplified signature compared to the original function.
Here is an example:
# partial_basetwo.py
from functools import partial
basetwo = partial(int, base=2)
basetwo.__doc__ = "Convert base 2 string to an int."
print(basetwo("10010"))
We create a partial
object based on the built-in int()
function.
In this case, we supply base=2
as keyword argument.
Consequently, the newly created basetwo
object behaves as if we call int()
with base=2
.
However, we can still override this behaviour by supplying a base
argument to basetwo
.
So executing basetwo("10010", base=10)
computes the same result as int("10010")
.
Let's take another example.
# partial_euclidian_distance.py
import math
from functools import partial
from typing import Tuple
def euclidean_distance(point1: Tuple[int, int], point2: Tuple[int, int]) -> float:
x1, y1 = point1
x2, y2 = point2
return math.sqrt((x2 - x1) ** 2 + (y2 - y1) ** 2)
zero_euclid = partial(euclidean_distance, (0, 0))
point = (1, 1)
print(zero_euclid(point))
The function at hand computes the euclidean distance between two points in a two dimensional space.
We can create a partial
object, which takes exactly one argument (a point) and calculates the euclidean distance between the supplied point and the point (0, 0)
.
Partialmethod()
- The Partial()
for Methods¶partialmethod()
is a function, which returns partialmethod
descriptors.
You can think of it as the partial()
function for methods.
This means, it is not intended to be callable, but as a way to define new methods.
I really like the example from the Python documentation [6], so let's have a look at it.
# partialmethod.py
from functools import partialmethod
class Cell(object):
def __init__(self):
self._alive = False
@property
def alive(self):
return self._alive
def set_state(self, state):
self._alive = bool(state)
set_alive = partialmethod(set_state, True)
set_dead = partialmethod(set_state, False)
c = Cell()
c.set_alive()
print(c.alive)
We define a class Cell
representing a single cell.
It has a property alive
and an instance method set_state()
setting alive
to either True
or False
.
Furthermore, we create two partialmethod
descriptors set_alive()
and set_dead()
, which call set_state()
with True
or False
, respectively.
This allows us to create a new instance of the class Cell
, call set_alive()
to change the cell's state to True
and print the property's value.
Reduce()
- Calculating a Single Value Based on Many¶Let's suppose you have an iterable of numbers and want to reduce it to a single value.
In this case, the resulting value is the sum over all elements of the supplied iterable.
One way to achieve this is to utilize reduce()
.
# reduce.py
from functools import reduce
from operator import add
iterable = [1, 2, 3, 4, 5]
result = reduce(add, iterable)
print(result)
As you can see, we defined a list containing the numbers 1
to 5
.
We calculate the sum over all elements of the list by calling the reduce()
function with the operator.add()
function as first argument and the the list as second argument.
Of course you could use the built-in sum()
function, but what if you want to compute the product over all elements instead?
The only thing you need to change is to replace the operator.add()
function with operator.mul()
- done!
@Singledispatch
- Function Overloading¶Per definition, the @singledispatch
decorator transforms a function into a single-dispatch generic function.
In the case of @singledispatch
the dispatch happens on the type of the first argument.
Note: A generic function is a function composed of multiple functions implementing the same operation for different types. Which implementation should be used during a call is determined by the dispatch algorithm [7].
Note: A single dispatch is a form of generic function dispatch where the implementation is chosen based on the type of a single argument [8].
Simply spoken, @singledispatch
allows you to overload functions in Python.
Let's take an example to illustrate it.
# singledispatch.py
from functools import singledispatch
@singledispatch
def mul(a, b):
if type(a) is not type(b):
return NotImplemented
return a * b
@mul.register
def _(a: str, b: str):
return a + b
print(mul(1, 2))
print(mul(1.5, 2.5))
print(mul("1", "2"))
print(mul(1, 1.0))
In the example at hand, we defined a function mul()
, which takes two arguments and returns the product of them.
However, multiplying two strings raises a TypeError
in Python.
We can supply a patch for this by registering the _()
function.
Executing the script results in:
$ python singledispatch.py
2
3.75
12
NotImplemented
@Singledispatchmethod
- Method Overloading¶The @singledispatchmethod
decorator solves the same tasks as the @singledispatch
decorator but for methods.
# singledispatchmethod.py
from functools import singledispatchmethod
class Negator:
@singledispatchmethod
def neg(self, arg):
raise NotImplementedError("Cannot negate a")
@neg.register
def _(self, arg: int):
return -arg
@neg.register
def _(self, arg: bool):
return not arg
neg = Negator()
print(neg.neg(5))
print(neg.neg(True))
print(neg.neg("Hello"))
The Negator
class has a single instance method called neg()
.
The neg()
function raises a NotImplementedError
by default.
However, the function is overloaded for integers and boolean types and returns the negation in these cases.
Executing the script results in:
$ python singledispatchmethod.py
-5
False
Traceback (most recent call last):
File singledispatchmethod.py, line 23, in <module>
print(neg.neg(Hello))
File /usr/lib/python3.8/functools.py, line 911, in _method
return method.__get__(obj, cls)(*args, **kwargs)
File singledispatchmethod.py, line 9, in neg
raise NotImplementedError(Cannot negate a)
NotImplementedError: Cannot negate a
Update_wrapper()
- Hiding Wrapper Functions¶The idea behind the update_wrapper()
function is to update a wrapper function (as the name suggests) in a way that it looks like the wrapped function.
In order to achieve this, update_wrapper()
assigns the wrapped functions __module__
, __name__
, __qualname__
, __annotations__
, and __doc__
to the wrapper function ones.
Furthermore, it updates the wrapper function's __dict__
.
Let's have a look at the @wraps
decorator for a practical example.
@Wraps
- Convenience Function for Update_wrapper()
¶@wraps
is a decorator acting as a convenience function to invoke update_wrapper()
.
To be precise, it is the same as calling partial(update_wrapper, wrapped=wrapped, assigned=assigned, updated=updated)
.
After reading the technical details about update_wrapper()
and @wraps
you might ask yourself why we need to hide our wrapper functions.
The following code snippet defines a decorator @show_args
.
It prints the arguments and keyword arguments a function is called with before invoking the function itself.
# wraps.py
def show_args(f):
def wrapper(*args, **kwargs):
print(f"Calling function {f.__name__} with {args} and {kwargs}")
return f(*args, **kwargs)
return wrapper
Now, we can define a function add()
, which returns the sum of two passed integers.
Furthermore, we apply our newly written decorator to it as we are interested in the function's arguments and keyword arguments.
At the end of the script, we print the result of a simple addition as well as the function's documentation string and name.
# previous wraps.py code
@show_args
def add(a: int, b: int) -> int:
"""Add two numbers a and b and return the result"""
return a + b
print(add(5, 1))
print(add.__doc__)
print(add.__name__)
$ python wraps.py
Calling function add with (5, 1) and {}
6
None
wrapper
Did you expect to see a different documentation string and name than the ones printed?
This is because we did not access the wrapped function's documentation string and name but the ones from the wrapper function.
This is were @wraps
comes into play.
The only thing we need to change in our code is to apply the decorator to the wrapper()
function.
def show_args(f):
@wraps(f)
def wrapper(*args, **kwargs):
print(f"Calling function {f.__name__} with {args} and {kwargs}")
return f(*args, **kwargs)
return wrapper
If we now execute the script, we see the desired output:
$ python wraps.py
Calling function add with (5, 1) and {}
6
Add two numbers a and b and return the result
add
Congratulations, you have made it through the article!
Now, you have a decent understanding of the functions the functools
module contains.
Additionally, you implemented examples where these functions are useful.
I hope you enjoyed reading the article. Make sure to share it with your friends and colleagues. If you have not already, consider following me on Twitter, where I am @DahlitzF or subscribing to my newsletter so you won't miss a future article. Stay curious and keep coding!