Retry your Python code until it fails

Use the Tenacity and Mock libraries to find the bugs hiding deep within your code.
3 readers like this.
Real python in the graphic jungle

Photo by Jen Wike Huger, CC BY-SA; Original photo by Torkild Retvedt

Sometimes, a function is called with bad inputs or in a bad program state, so it fails. In languages like Python, this usually results in an exception.

But sometimes exceptions are caused by different issues or are transitory. Imagine code that must keep working in the face of caching data being cleaned up. In theory, the code and the cleaner could carefully agree on the clean-up methodology to prevent the code from trying to access a non-existing file or directory. Unfortunately, that approach is complicated and error-prone. However, most of these problems are transitory, as the cleaner will eventually create the correct structures.

Even more frequently, the uncertain nature of network programming means that some functions that abstract a network call fail because packets were lost or corrupted.

A common solution is to retry the failing code. This practice allows skipping past transitional problems while still (eventually) failing if the issue persists. Python has several libraries to make retrying easier. This is a common "finger exercise."

Tenacity

One library that goes beyond a finger exercise and into useful abstraction is tenacity. Install it with pip install tenacity or depend on it using a dependencies = tenacity line in your pyproject.toml file.

Set up logging

A handy built-in feature of tenacity is support for logging. With error handling, seeing log details about retry attempts is invaluable.

To allow the remaining examples display log messages, set up the logging library. In a real program, the central entry point or a logging configuration plugin does this. Here's a sample:

import logging

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s:%(name)s:%(levelname)s:%(message)s",
)

TENACITY_LOGGER = logging.getLogger("Retrying")

Selective failure

To demonstrate the features of tenacity, it's helpful to have a way to fail a few times before finally succeeding. Using unittest.mock is useful for this scenario.

from unittest import mock

thing = mock.MagicMock(side_effect=[ValueError(), ValueError(), 3])

If you're new to unit testing, read my article on mock.

Before showing the power of tenacity, look at what happens when you implement retrying directly inside a function. Demonstrating this makes it easy to see the manual effort using tenacity saves.

def useit(a_thing):
    for i in range(3):
        try:
            value = a_thing()
        except ValueError:
            TENACITY_LOGGER.info("Recovering")
            continue
        else:
            break
    else:
        raise ValueError()
    print("the value is", value)

The function can be called with something that never fails:

>>> useit(lambda: 5)
the value is 5

With the eventually-successful thing:

>>> useit(thing)

2023-03-29 17:00:42,774:Retrying:INFO:Recovering
2023-03-29 17:00:42,779:Retrying:INFO:Recovering

the value is 3

Calling the function with something that fails too many times ends poorly:

try:
    useit(mock.MagicMock(side_effect=[ValueError()] * 5 + [4]))
except Exception as exc:
    print("could not use it", repr(exc))

The result:


2023-03-29 17:00:46,763:Retrying:INFO:Recovering
2023-03-29 17:00:46,767:Retrying:INFO:Recovering
2023-03-29 17:00:46,770:Retrying:INFO:Recovering

could not use it ValueError()

Simple tenacity usage

For the most part, the function above was retrying code. The next step is to have a decorator handle the retrying logic:

import tenacity

my_retry=tenacity.retry(
    stop=tenacity.stop_after_attempt(3),
    after=tenacity.after_log(TENACITY_LOGGER, logging.WARNING),
)

Tenacity supports a specified number of attempts and logging after getting an exception.

The useit function no longer has to care about retrying. Sometimes it makes sense for the function to still consider retryability. Tenacity allows code to determine retryability by itself by raising the special exception TryAgain:

@my_retry
def useit(a_thing):
    try:
        value = a_thing()
    except ValueError:
        raise tenacity.TryAgain()
    print("the value is", value)

Now when calling useit, it retries ValueError without needing custom retrying code:

useit(mock.MagicMock(side_effect=[ValueError(), ValueError(), 2]))

The output:

2023-03-29 17:12:19,074:Retrying:WARNING:Finished call to '__main__.useit' after 0.000(s), this was the 1st time calling it.
2023-03-29 17:12:19,080:Retrying:WARNING:Finished call to '__main__.useit' after 0.006(s), this was the 2nd time calling it.

the value is 2

Configure the decorator

The decorator above is just a small sample of what tenacity supports. Here's a more complicated decorator:

my_retry = tenacity.retry(
    stop=tenacity.stop_after_attempt(3),
    after=tenacity.after_log(TENACITY_LOGGER, logging.WARNING),
    before=tenacity.before_log(TENACITY_LOGGER, logging.WARNING),
    retry=tenacity.retry_if_exception_type(ValueError),
    wait=tenacity.wait_incrementing(1, 10, 2),
    reraise=True
)

This is a more realistic decorator example with additional parameters:

  • before: Log before calling the function
  • retry: Instead of only retrying TryAgain, retry exceptions with the given criteria
  • wait: Wait between calls (this is especially important if calling out to a service)
  • reraise: If retrying failed, reraise the last attempt's exception

Now that the decorator also specifies retryability, remove the code from useit:

@my_retry
def useit(a_thing):
    value = a_thing()
    print("the value is", value)

Here's how it works:

useit(mock.MagicMock(side_effect=[ValueError(), 5]))

The output:

2023-03-29 17:19:39,820:Retrying:WARNING:Starting call to '__main__.useit', this is the 1st time calling it.
2023-03-29 17:19:39,823:Retrying:WARNING:Finished call to '__main__.useit' after 0.003(s), this was the 1st time calling it.
2023-03-29 17:19:40,829:Retrying:WARNING:Starting call to '__main__.useit', this is the 2nd time calling it.


the value is 5

Notice the time delay between the second and third log lines. It's almost exactly one second:

>>> useit(mock.MagicMock(side_effect=[5]))

2023-03-29 17:20:25,172:Retrying:WARNING:Starting call to '__main__.useit', this is the 1st time calling it.

the value is 5

With more detail:

try:
    useit(mock.MagicMock(side_effect=[ValueError("detailed reason")]*3))
except Exception as exc:
    print("retrying failed", repr(exc))

The output:

2023-03-29 17:21:22,884:Retrying:WARNING:Starting call to '__main__.useit', this is the 1st time calling it.
2023-03-29 17:21:22,888:Retrying:WARNING:Finished call to '__main__.useit' after 0.004(s), this was the 1st time calling it.
2023-03-29 17:21:23,892:Retrying:WARNING:Starting call to '__main__.useit', this is the 2nd time calling it.
2023-03-29 17:21:23,894:Retrying:WARNING:Finished call to '__main__.useit' after 1.010(s), this was the 2nd time calling it.
2023-03-29 17:21:25,896:Retrying:WARNING:Starting call to '__main__.useit', this is the 3rd time calling it.
2023-03-29 17:21:25,899:Retrying:WARNING:Finished call to '__main__.useit' after 3.015(s), this was the 3rd time calling it.

retrying failed ValueError('detailed reason')

Again, with KeyError instead of ValueError:

try:
    useit(mock.MagicMock(side_effect=[KeyError("detailed reason")]*3))
except Exception as exc:
    print("retrying failed", repr(exc))

The output:

2023-03-29 17:21:37,345:Retrying:WARNING:Starting call to '__main__.useit', this is the 1st time calling it.

retrying failed KeyError('detailed reason')

Separate the decorator from the controller

Often, similar retrying parameters are needed repeatedly. In these cases, it's best to create a retrying controller with the parameters:

my_retryer = tenacity.Retrying(
    stop=tenacity.stop_after_attempt(3),
    after=tenacity.after_log(TENACITY_LOGGER, logging.WARNING),
    before=tenacity.before_log(TENACITY_LOGGER, logging.WARNING),
    retry=tenacity.retry_if_exception_type(ValueError),
    wait=tenacity.wait_incrementing(1, 10, 2),
    reraise=True
)

Decorate the function with the retrying controller:

@my_retryer.wraps
def useit(a_thing):
    value = a_thing()
    print("the value is", value)

Run it:

>>> useit(mock.MagicMock(side_effect=[ValueError(), 5]))

2023-03-29 17:29:25,656:Retrying:WARNING:Starting call to '__main__.useit', this is the 1st time calling it.
2023-03-29 17:29:25,663:Retrying:WARNING:Finished call to '__main__.useit' after 0.008(s), this was the 1st time calling it.
2023-03-29 17:29:26,667:Retrying:WARNING:Starting call to '__main__.useit', this is the 2nd time calling it.

the value is 5

This lets you gather the statistics of the last call:

>>> my_retryer.statistics

{'start_time': 26782.847558759,
 'attempt_number': 2,
 'idle_for': 1.0,
 'delay_since_first_attempt': 0.0075125470029888675}

Use these statistics to update an internal statistics registry and integrate with your monitoring framework.

Extend tenacity

Many of the arguments to the decorator are objects. These objects can be objects of subclasses, allowing deep extensionability.

For example, suppose the Fibonacci sequence should determine the wait times. The twist is that the API for asking for wait time only gives the attempt number, so the usual iterative way of calculating Fibonacci is not useful.

One way to accomplish the goal is to use the closed formula:

Closed formula for a Fibonacci sequence, written in LaTeX as $(((1+\sqrt{5})/2)^n - ((1-\sqrt{5})/2)^n)/\sqrt{5}$

A little-known trick is skipping the subtraction in favor of rounding to the closest integer:

Variant formula for a Fibonacci sequence, written in LaTeX as $\operatorname{round}((((1+\sqrt{5})/2)^n)/\sqrt{5})$

Which translates to Python as:

int(((1 + sqrt(5))/2)**n / sqrt(5) + 0.5)

This can be used directly in a Python function:

from math import sqrt

def fib(n):
    return int(((1 + sqrt(5))/2)**n / sqrt(5) + 0.5)

The Fibonacci sequence counts from 0 while the attempt numbers start at 1, so a wait function needs to compensate for that:

def wait_fib(rcs):
    return fib(rcs.attempt_number - 1)

The function can be passed directly as the wait parameter:

@tenacity.retry(
    stop=tenacity.stop_after_attempt(7),
    after=tenacity.after_log(TENACITY_LOGGER, logging.WARNING),
    wait=wait_fib,
)
def useit(thing):
    print("value is", thing())
try:
    useit(mock.MagicMock(side_effect=[tenacity.TryAgain()] * 7))
except Exception as exc:
    pass

Try it out:

2023-03-29 18:03:52,783:Retrying:WARNING:Finished call to '__main__.useit' after 0.000(s), this was the 1st time calling it.
2023-03-29 18:03:52,787:Retrying:WARNING:Finished call to '__main__.useit' after 0.004(s), this was the 2nd time calling it.
2023-03-29 18:03:53,789:Retrying:WARNING:Finished call to '__main__.useit' after 1.006(s), this was the 3rd time calling it.
2023-03-29 18:03:54,793:Retrying:WARNING:Finished call to '__main__.useit' after 2.009(s), this was the 4th time calling it.
2023-03-29 18:03:56,797:Retrying:WARNING:Finished call to '__main__.useit' after 4.014(s), this was the 5th time calling it.
2023-03-29 18:03:59,800:Retrying:WARNING:Finished call to '__main__.useit' after 7.017(s), this was the 6th time calling it.
2023-03-29 18:04:04,806:Retrying:WARNING:Finished call to '__main__.useit' after 12.023(s), this was the 7th time calling it.

Subtract subsequent numbers from the "after" time and round to see the Fibonacci sequence:

intervals = [
    0.000,
    0.004,
    1.006,
    2.009,
    4.014,
    7.017,
    12.023,
]
for x, y in zip(intervals[:-1], intervals[1:]):
    print(int(y-x), end=" ")

Does it work? Yes, exactly as expected:

0 1 1 2 3 5 

Wrap up

Writing ad-hoc retry code can be a fun distraction. For production-grade code, a better choice is a proven library like tenacity. The tenacity library is configurable and extendable, and it will likely meet your needs.

Tags
Moshe sitting down, head slightly to the side. His t-shirt has Guardians of the Galaxy silhoutes against a background of sound visualization bars.
Moshe has been involved in the Linux community since 1998, helping in Linux "installation parties". He has been programming Python since 1999, and has contributed to the core Python interpreter. Moshe has been a DevOps/SRE since before those terms existed, caring deeply about software reliability, build reproducibility and other such things.

Comments are closed.

Creative Commons LicenseThis work is licensed under a Creative Commons Attribution-Share Alike 4.0 International License.