The other day, while playing with a simple program involving randomness, Inoticed something strange. Python's random.randint() function feels quiteslow, in comparison to other randomness-generating functions. Sincerandint() is the canonical answer for "give me a random integer" in Python,I decided to dig deeper to understand what's going on.
This is a brief post that dives into the implementation of the randommodule, and discusses some alternative methods for generating pseudo-randomintegers.
First, a basic benchmark (Python 3.6):
$ python3 -m timeit -s 'import random' 'random.random()'10000000 loops, best of 3: 0.0523 usec per loop$ python3 -m timeit -s 'import random' 'random.randint(0, 128)'1000000 loops, best of 3: 1.09 usec per loop
Whoa! It's about 20x more expensive to generate a random integer in the range[0, 128] than to generate a random float in the range [0, 1). That'spretty steep, indeed.
To understand why randint() is so slow, we'll have to dig into the Pythonsource. Let's start with random()[1]. In Lib/random.py, the exported function random is an alias to therandom method of the class Random, which inherits this method directlyfrom _Random. This is the C companion defined inModules/_randommodule.c, and it defines its random method as follows:
static PyObject *random_random(RandomObject *self, PyObject *Py_UNUSED(ignored)){ uint32_t a=genrand_int32(self)>>5, b=genrand_int32(self)>>6; return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));}
Where getrand_int32 is defined directly above and implements a step of theMersenne Twister PRNG. Allin all, when we call random.random() in Python, the C function is directlyinvoked and there's not much extra work done beyond converting the result ofgenrand_int32 to a floating point number in a line of C.
Now let's take a look at what randint() is up to:
def randint(self, a, b): """Return random integer in range [a, b], including both end points. """ return self.randrange(a, b+1)
It calls randrange, fair enough. Here it is:
def randrange(self, start, stop=None, step=1, _int=int): """Choose a random item from range(start, stop[, step]). This fixes the problem with randint() which includes the endpoint; in Python this is usually not what you want. """ # This code is a bit messy to make it fast for the # common case while still doing adequate error checking. istart = _int(start) if istart != start: raise ValueError("non-integer arg 1 for randrange()") if stop is None: if istart > 0: return self._randbelow(istart) raise ValueError("empty range for randrange()") # stop argument supplied. istop = _int(stop) if istop != stop: raise ValueError("non-integer stop for randrange()") width = istop - istart if step == 1 and width > 0: return istart + self._randbelow(width) if step == 1: raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) # Non-unit step argument supplied. istep = _int(step) if istep != step: raise ValueError("non-integer step for randrange()") if istep > 0: n = (width + istep - 1) // istep elif istep < 0: n = (width + istep + 1) // istep else: raise ValueError("zero step for randrange()") if n <= 0: raise ValueError("empty range for randrange()") return istart + istep*self._randbelow(n)
That's quite a bit of case checking and setting up parameters before we get tothe next level. There are a couple of fast-path cases (for example, when thestop parameter is not supplied, this function will be a bit faster),but overall after a bunch of checking we get to call the _randbelow()method.
By default, _randbelow() gets mapped to _randbelow_with_getrandbits():
def _randbelow_with_getrandbits(self, n): "Return a random int in the range [0,n). Raises ValueError if n==0." getrandbits = self.getrandbits k = n.bit_length() # don't use (n-1) here because n can be 1 r = getrandbits(k) # 0 <= r < 2**k while r >= n: r = getrandbits(k) return r
Note that it does a couple more computations and can end up invokinggetrandbits() multiple times (esp. if n is far from a power of two).getrandbits() is in C, and while it also ends up invoking the PRNGgetrand_int32(), it's somewhat heavier than random() and runs twice asslow.
In other words, there's a lot of Python and C code in the way to invoke the sameunderlying C function. Since Python is bytecode-interpreted, all of this ends upbeing quite a bit slower than simply calling the C function directly. A deathby a thousand cuts. To be fair, randint() is also more flexible in that itcan generate pseudo-random numbers of any size; that said, it's not very commonto need huge pseudo-random numbers, and our tests were with small numbersanyway.
Here's a couple of experiments to help us test this hypothesis. First, let's tryto hit the fast-path we've seen above in randrange, by calling randrangewithout a stop parameter:
$ python3 -m timeit -s 'import random' 'random.randrange(1)'1000000 loops, best of 3: 0.784 usec per loop
As expected, the run-time is somewhat better than randint. Anotherexperiment is to rerun the comparison in PyPy, which is a JIT compiler thatshould end up tracing through the Python code and generating efficient machinecode that strips a lot of abstractions.
$ pypy -m timeit -s 'import random' 'random.random()'100000000 loops, best of 3: 0.0139 usec per loop$ pypy -m timeit -s 'import random' 'random.randint(0, 128)'100000000 loops, best of 3: 0.0168 usec per loop
As expected, the difference between these calls in PyPy is small.
Faster methods for generating pseudo-random integers
So randint() turns out to be very slow. In most cases, no one cares; butjust occasionally, we need many random numbers - so what is there to do?
One tried and true trick is just using random.random() instead, multiplyingby our integer limit:
$ python3 -m timeit -s 'import random' 'int(128 * random.random())'10000000 loops, best of 3: 0.193 usec per loop
This gives us pseudo-random integers in the range [0, 128), much faster. Oneword of caution: Python represents its floats in double-precision, with 53 bitsof accuracy. When the limit is above 53 bits, the numbers we'll be getting usingthis method are not quite random - bits will be missing. This is rarely aproblem because we don't usually need such huge integers, but definitelysomething to keep in mind [2].
Another quick way to generate pseudo-random integers is to use getrandbits()directly:
$ python3 -m timeit -s 'import random' 'random.getrandbits(7)'10000000 loops, best of 3: 0.102 usec per loop
This method is fast but limited - it only supports ranges that are powers oftwo. If we want to limit the range we can't just compute a modulo - this willskew the distribution; rather we'll have to use a loop similarly to what_randbelow_with_getrandbits() does in the sample above. This will slowthings down, of course.
Finally, we can turn away from the random module altogether, and use Numpy:
$ python3 -m timeit -s 'import numpy.random' 'numpy.random.randint(128)'1000000 loops, best of 3: 1.21 usec per loop
Surprisingly, this is slow; that's because Numpy isn't great for working withsingle datums - it likes to amortize costs over large arrays created /manipulated in C. To see this in action, let's see how long it takes to generate100 random integers:
$ python3 -m timeit -s 'import numpy.random' 'numpy.random.randint(128, size=100)'1000000 loops, best of 3: 1.91 usec per loop
Only 60% slower than generating a single one! With 0.019 usec per integer, thisis the fastest method by far - 3 times faster than callingrandom.random(). The reason this method is so fast is because the Pythoncall overheads are amortized over all generated integers, and deep inside Numpyruns an efficient C loop to generate them.
To conclude, use Numpy if you want to generate large numbers of random ints; ifyou're just generating one-at-a-time, it may not be as useful (but then how muchdo you care about performance, really?)
Conclusion: performance vs. abstraction
In programming, performance and abstraction/flexibility are often at odds. Bymaking a certain function more flexible, we inevitably make it slower - andrandint() is a great practical example of this problem. 9 times out of 10we don't care about the performance of these functions, but when we do, it'suseful to know what's going on and how to improve the situation.
In a way, pure Python code itself is one of the slowest abstractions weencounter, since every line gets translated to a bunch of bytecode thathas to be interpreted at run-time.
To mitigate these effects, Python programmers who care about performance havemany techniques at their disposal. Libraries like Numpy carefully move as muchcompute as possible to underlying C code; PyPy is a JIT compiler that can speedup most pure Python code sequences considerably. Numba is somewhere in between,while Cython lets us re-write chosen functions in a restricted subset of Pythonthat can be efficiently compiled to machine code.
[1] | From this point on, file names point to source files in the CPythonrepository. Feel free to follow along on your machine oron GitHub. |
[2] | As an experiment, try to generate pseudo-random integers up to 2^54using this technique. You'll notice that only even numbers are generated! More generally, the closer the multiplier is to machine precision, theless random the result becomes. Knuth has an interesting discussion ofthis in volume 2 of TAOCP - it has to do with unbalanced rounding thathas to happen every time a precision-limited float is multiplied by aninteger. That said, if the multiplier is much smaller than the precision,we'll be fine; for generating numbers up to 2^40, say, the bad effectson the distribution will be negligible. |