The other day, while playing with a simple program involving randomness, Inoticed something strange. Python's `random.randint()` function feels quiteslow, in comparison to other randomness-generating functions. Since`randint()` is the canonical answer for "give me a random integer" in Python,I decided to dig deeper to understand what's going on.

This is a brief post that dives into the implementation of the `random`module, and discusses some alternative methods for generating pseudo-randomintegers.

First, a basic benchmark (Python 3.6):

$ python3 -m timeit -s 'import random' 'random.random()'10000000 loops, best of 3: 0.0523 usec per loop$ python3 -m timeit -s 'import random' 'random.randint(0, 128)'1000000 loops, best of 3: 1.09 usec per loop

Whoa! It's about 20x more expensive to generate a random integer in the range`[0, 128]` than to generate a random float in the range `[0, 1)`. That'spretty steep, indeed.

To understand why `randint()` is so slow, we'll have to dig into the Pythonsource. Let's start with `random()`[1]. In `Lib/random.py`, the exported function `random` is an alias to the`random` method of the class `Random`, which inherits this method directlyfrom `_Random`. This is the C companion defined in`Modules/_randommodule.c`, and it defines its `random` method as follows:

static PyObject *random_random(RandomObject *self, PyObject *Py_UNUSED(ignored)){ uint32_t a=genrand_int32(self)>>5, b=genrand_int32(self)>>6; return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));}

Where `getrand_int32` is defined directly above and implements a step of theMersenne Twister PRNG. Allin all, when we call `random.random()` in Python, the C function is directlyinvoked and there's not much extra work done beyond converting the result of`genrand_int32` to a floating point number in a line of C.

Now let's take a look at what `randint()` is up to:

def randint(self, a, b): """Return random integer in range [a, b], including both end points. """ return self.randrange(a, b+1)

It calls `randrange`, fair enough. Here it is:

def randrange(self, start, stop=None, step=1, _int=int): """Choose a random item from range(start, stop[, step]). This fixes the problem with randint() which includes the endpoint; in Python this is usually not what you want. """ # This code is a bit messy to make it fast for the # common case while still doing adequate error checking. istart = _int(start) if istart != start: raise ValueError("non-integer arg 1 for randrange()") if stop is None: if istart > 0: return self._randbelow(istart) raise ValueError("empty range for randrange()") # stop argument supplied. istop = _int(stop) if istop != stop: raise ValueError("non-integer stop for randrange()") width = istop - istart if step == 1 and width > 0: return istart + self._randbelow(width) if step == 1: raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) # Non-unit step argument supplied. istep = _int(step) if istep != step: raise ValueError("non-integer step for randrange()") if istep > 0: n = (width + istep - 1) // istep elif istep < 0: n = (width + istep + 1) // istep else: raise ValueError("zero step for randrange()") if n <= 0: raise ValueError("empty range for randrange()") return istart + istep*self._randbelow(n)

That's quite a bit of case checking and setting up parameters before we get tothe next level. There are a couple of fast-path cases (for example, when the`stop` parameter is not supplied, this function will be a bit faster),but overall after a bunch of checking we get to call the `_randbelow()`method.

By default, `_randbelow()` gets mapped to `_randbelow_with_getrandbits()`:

def _randbelow_with_getrandbits(self, n): "Return a random int in the range [0,n). Raises ValueError if n==0." getrandbits = self.getrandbits k = n.bit_length() # don't use (n-1) here because n can be 1 r = getrandbits(k) # 0 <= r < 2**k while r >= n: r = getrandbits(k) return r

Note that it does a couple more computations and can end up invoking`getrandbits()` multiple times (esp. if `n` is far from a power of two).`getrandbits()` is in C, and while it also ends up invoking the PRNG`getrand_int32()`, it's somewhat heavier than `random()` and runs twice asslow.

In other words, there's a lot of Python and C code in the way to invoke the sameunderlying C function. Since Python is bytecode-interpreted, all of this ends upbeing quite a bit slower than simply calling the C function directly. A deathby a thousand cuts. To be fair, `randint()` is also more flexible in that itcan generate pseudo-random numbers of any size; that said, it's not very commonto need huge pseudo-random numbers, and our tests were with small numbersanyway.

Here's a couple of experiments to help us test this hypothesis. First, let's tryto hit the fast-path we've seen above in `randrange`, by calling `randrange`without a `stop` parameter:

$ python3 -m timeit -s 'import random' 'random.randrange(1)'1000000 loops, best of 3: 0.784 usec per loop

As expected, the run-time is somewhat better than `randint`. Anotherexperiment is to rerun the comparison in PyPy, which is a JIT compiler thatshould end up tracing through the Python code and generating efficient machinecode that strips a lot of abstractions.

$ pypy -m timeit -s 'import random' 'random.random()'100000000 loops, best of 3: 0.0139 usec per loop$ pypy -m timeit -s 'import random' 'random.randint(0, 128)'100000000 loops, best of 3: 0.0168 usec per loop

As expected, the difference between these calls in PyPy is small.

## Faster methods for generating pseudo-random integers

So `randint()` turns out to be very slow. In most cases, no one cares; butjust occasionally, we need many random numbers - so what is there to do?

One tried and true trick is just using `random.random()` instead, multiplyingby our integer limit:

$ python3 -m timeit -s 'import random' 'int(128 * random.random())'10000000 loops, best of 3: 0.193 usec per loop

This gives us pseudo-random integers in the range `[0, 128)`, much faster. Oneword of caution: Python represents its floats in double-precision, with 53 bitsof accuracy. When the limit is above 53 bits, the numbers we'll be getting usingthis method are not quite random - bits will be missing. This is rarely aproblem because we don't usually need such huge integers, but definitelysomething to keep in mind [2].

Another quick way to generate pseudo-random integers is to use `getrandbits()`directly:

$ python3 -m timeit -s 'import random' 'random.getrandbits(7)'10000000 loops, best of 3: 0.102 usec per loop

This method is fast but limited - it only supports ranges that are powers oftwo. If we want to limit the range we can't just compute a modulo - this willskew the distribution; rather we'll have to use a loop similarly to what`_randbelow_with_getrandbits()` does in the sample above. This will slowthings down, of course.

Finally, we can turn away from the `random` module altogether, and use Numpy:

$ python3 -m timeit -s 'import numpy.random' 'numpy.random.randint(128)'1000000 loops, best of 3: 1.21 usec per loop

Surprisingly, this is slow; that's because Numpy isn't great for working withsingle datums - it likes to amortize costs over large arrays created /manipulated in C. To see this in action, let's see how long it takes to generate100 random integers:

$ python3 -m timeit -s 'import numpy.random' 'numpy.random.randint(128, size=100)'1000000 loops, best of 3: 1.91 usec per loop

Only 60% slower than generating a single one! With 0.019 usec per integer, thisis the fastest method *by far* - 3 times faster than calling`random.random()`. The reason this method is so fast is because the Pythoncall overheads are amortized over all generated integers, and deep inside Numpyruns an efficient C loop to generate them.

To conclude, use Numpy if you want to generate large numbers of random ints; ifyou're just generating one-at-a-time, it may not be as useful (but then how muchdo you care about performance, really?)

## Conclusion: performance vs. abstraction

In programming, performance and abstraction/flexibility are often at odds. Bymaking a certain function more flexible, we inevitably make it slower - and`randint()` is a great practical example of this problem. 9 times out of 10we don't care about the performance of these functions, but when we do, it'suseful to know what's going on and how to improve the situation.

In a way, pure Python code itself is one of the slowest abstractions weencounter, since every line gets translated to a bunch of bytecode thathas to be interpreted at run-time.

To mitigate these effects, Python programmers who care about performance havemany techniques at their disposal. Libraries like Numpy carefully move as muchcompute as possible to underlying C code; PyPy is a JIT compiler that can speedup most pure Python code sequences considerably. Numba is somewhere in between,while Cython lets us re-write chosen functions in a restricted subset of Pythonthat can be efficiently compiled to machine code.

[1] | From this point on, file names point to source files in the CPythonrepository. Feel free to follow along on your machine oron GitHub. |

[2] | As an experiment, try to generate pseudo-random integers up to 2^54using this technique. You'll notice that only even numbers are generated! More generally, the closer the multiplier is to machine precision, theless random the result becomes. Knuth has an interesting discussion ofthis in volume 2 of TAOCP - it has to do with unbalanced rounding thathas to happen every time a precision-limited float is multiplied by aninteger. That said, if the multiplier is much smaller than the precision,we'll be fine; for generating numbers up to 2^40, say, the bad effectson the distribution will be negligible. |