next_inactive up previous


How to make a fast command line tool in Python

Andrew Bennetts

Abstract:

An overview of why Python programs tend to be slow to start running, and some techniques Bazaar uses to start quickly, such as lazy imports.

Introduction

People like using software that feels fast. What qualifies as ``fast'' is subjective, and varies by the type of tool and by the user's expectations (reasonble or not).

Roughly speaking, for a command line program, people expect results almost instantaneously. For a tool that appears to be doing a simple task a sub-second result is ok, but under 200ms is even better.

Obviously to achieve this, your program actually has to be fast at doing its work. But what if you've written your code in Python, and it takes 800ms1 just to import your code, let alone start running it?

This paper explains why Python programs tend to be slow to start, and how to make them start faster.

Bazaar

The program I most care about, because I'm a developer of it, is Bazaar2. Bazaar is a distributed version control tool, with a command line program, bzr.

Bazaar is written in Python -- a language with a virtual machine runtime rather than one that compiles to native machine code, and not known to be particularly fast3.

A trivial Python script that imports every module in bzrlib takes over 800ms to run on my laptop.

Thanks to the techniques shown in this paper, Bazaar can run many commands in much less than 800ms. bzr rocks (a ``hello world'' command that just prints a hard-coded message and exits) runs in just 65ms. bzr status on an empty branch runs in 258ms.

How fast can a Python program be?

Is it feasible to make a snappy command-line tool written in Python?

Consider the fastest possible Python program:

$ time python -c ""
real    0m0.011s
user    0m0.008s
sys     0m0.004s

Ok, so it takes 11ms4 to start the Python virtual machine, and immediately stop it. That's good: 11ms still gives us a fair bit of time to load and run our program before the user starts noticing a delay.

11ms is more than zero, though. Where is the time going? Partly it's just starting the virtual machine that executes Python bytecode. But it's largely because Python executes some Python code every time it starts, as revealed by the -v switch:

$ python -v -c ""
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# /usr/lib/python2.5/site.pyc matches /usr/lib/python2.5/site.py
import site # precompiled from /usr/lib/python2.5/site.pyc
... continued ...

On my system, Python has already loaded 26 modules before it starts running the real program. You can skip some of this with the -S switch (don't imply import site on initialisation), but real programs probably need the path configuration and other initialisation that that would skip, so using -S isn't an option.

os._exit: a.k.a. exit the interpreter now

Python does a bunch of garbage-collection when the interpreter shuts down. You almost certainly don't care about this (Python doesn't completely reliably call all object destructors before the interpreter exits, so any program relying on the final garbage collection runs is probably buggy). You can skip this by calling os._exit(0), so the fastest Python program is actually:

$ time python -c "from os import _exit; _exit(0)"
real    0m0.010s
user    0m0.004s
sys     0m0.004s

Bazaar saves about 10ms by using this trick, a small but easy improvement. Programs that allocate vast numbers of objects or amounts of memory will benefit even more.

Note that if you use this trick, you probably want to explicitly invoke sys.exitfunc() first to run any atexit functions that may have been registered. You may also want to flush your stdout and stderr buffers.

Imports are slow

It takes over 800ms for Python to import all of bzrlib. Even if we exclude test modules, it still takes 450ms. Why is it so slow?

bzrlib minus bzrlib.tests is over 60000 lines of source code in around 245 files occupying 4.5MB of disk space. Importing all of bzrlib's modules causes over 200 others, mostly from the standard library, to be imported.

Every time Python imports a module it first has to locate it on disk. Usually that means trying to load the module from every path on sys.path (16 paths long by default on Ubuntu) until it finds it or runs out of paths. For each path, Python has to try every possible filename for a module. For a module named ``foo'' on Linux that's foo/__init__.py, foo/__init__.pyc, foo.so, foomodule.so, foo.py and foo.pyc. Try running strace python -c "import foo" 2$>$&1 $\vert$ grep foo and see for yourself!

If this sounds time-consuming, imagine you just turned on your laptop and so virutally nothing in filesystem is cached: the disk has to seek all over the place looking for files that mostly don't exist when Python imports a module.

Python does one very good thing to speed up imports: it caches already imported modules in sys.modules so that later imports of the same module don't need to find and reload the same code. It also remembers which modules could not be found.

Make imports lazy

Bazaar has a module called lazy_import, inspired by Mercurial's demandimport module. It is used like this:

from bzrlib.lazy_import import lazy_import
lazy_import(globals(), """
import cStringIO
from bzrlib import branch
... etc ...
""")

Any import statement inside a lazy_import block is not executed until the name it would import is actually used.

lazy_import works by inserting a placeholder object into the module's namespace for each name that is lazy-imported. As soon as any attribute is accessed on a placeholder object, the real import is done and the placeholder in the module's namespace is replaced with the real object. The placeholder then does its best to act like the real object for anything that still has a reference to it.

This system is an almost transparent replacement for the regular import syntax, but there are a few minor caveats to using it. isinstance on a lazily imported name is not safe because it doesn't trigger the real import, so it will use the placeholder rather than the object you expect. So it's best to do this:

import foo   # inside a lazy_import() call
... etc ...
do_something(foo.bar)

Rather than this:

from foo import bar   # inside a lazy_import() call
... etc ...
do_something(bar)

Lazy imports make a big difference for Bazaar because no bzr invocation needs all of bzrlib. Most invocations need just a small fraction of it. lazy_import is the most important technique bzr uses to reduce startup time.

It is possible to do lazy importing without a special module by putting import statements inside functions, rather than at the top of modules. For example:

def function(string):
    if string == "hello":
        import hello
        hello.run()
    elif string == "goodbye":
        import goodbye
        hello.run()
     ... etc ...

The main drawback with this approach is if many functions need the same module then you need to repeat the same import statement many times. The import statement is also relatively slow (even when importing an already-imported module), so this can swap one performance issue for another!

Avoid doing work at import time

Importing a module doesn't mean just loading its code into memory, it also means executing some of that code. Quite a lot of time can be spent during imports executing top-level code in a module.

A common example of this is compiling regular expressions to make module-global constants, e.g.:

NUMBER = re.compile('[0-9]+')
IDENTIFIER = re.compile('[a-zA-Z_][a-zA-Z_0-9]*')
... etc ...

Each individual re.compile is pretty quick -- short expressions may compile in as little as 0.1ms, 200-byte long ones in 1.6ms -- but it quickly leads to death by a thousand cuts.

Even simple, standard modules that you'd expect to be innocuous do this. The string module takes 5ms to import!

Find slow imports

bzr has a -profile-imports option. It produces output like:

$ bzr -profile-imports status
cum  inline name                                               frame
23.1   0.5   [show_tree_status]bzrlib.status           @ bzrlib.builtins:180
22.6   2.8 +  [delta, log, osutils, tree, tsort,       @ bzrlib.status:19
15.8  15.2 ++  [errors, osutils]bzrlib                 @ bzrlib.delta:17
 0.6   0.6 +++  [MalformedHunkHeader, MalformedLin     @ bzrlib.errors:25
 2.0   0.7 ++  [conflicts, delta, osutils, revisi      @ bzrlib.tree:25
... continued ...

It intercepts all imports, including lazy imports, tracks how long they take and generates a report on import times when bzr finishes.

The profile_imports module that implements this feature is GPL (like the rest of Bazaar), and very useful.

Minimise dependencies

Another way to reduce import overhead is simply to depend on less modules.

It's especially worth thinking carefully about choosing to rely on a third-party library unless you really it: in addition to being another import that slows your program down, it's another dependency your users have to install.

Also, as your code evolves, obsolete import statements start to accumulate unnoticed. PyFlakes5 and other similar tools can help you find and remove these.

Don't promote all submodules in your __init__.py

Some Python package authors like to make their package look like a single module.

# mypackage/__init__.py
from mypackage.moduleA import Amber, Axolotl
from mypackage.moduleB import Berry, Brute
... etc ...
from mypackage.moduleZ import ZOMG_KitchenSink

This forces all users of your library always load the entire thing. Even if some code tries to just import one thing from a submodule, e.g. from mypackage.moduleA import Axolotl, all the modules will be loaded. Don't subject users of your library to this.

Don't load optional dependencies until you need them

ConfigObj is a nice library for reading and writing configuration files. It has an optional feature to read and write config values in Python syntax, controlled by the ``unrepr'' option. ConfigObj uses the compiler module from the standard library to implement the unrepr option. Bazaar doesn't use the unrepr option, but ConfigObj unconditionally imports the compiler module anyway. Loading compiler and its dependencies takes 10ms.

Problem: the standard library is slow

Unfortunately, the Python standard library has many modules that take a bit of time to import. In many cases this is due to compiling regular expressions.

Here are some standard library modules that cost bzrlib multiple milliseconds each, as revealed by bzr -profile-imports (in no particular order):

In many cases Bazaar only needs a small fraction of the module's functionality, but it has to pay the full price to import it.

Make re.compile lazy

Because it's so common for modules to compile regular expressions at import time, Bazaar has bzrlib.lazy_regex. It provides a install_lazy_compile function that monkey-patches re.compile to delay compiling the regex until the first use. This saves time all over the place: a total of 60ms for a bzr status invocation!

Ideas for improving Python itself

Some sort of lazy importing facility really should be built-in. On top of the performance benefit, it can help with cyclic import problems.

Maybe Python needs to maintain a cache of which modules are installed where on your system, like the ld_cache for loading ELF libraries. The import hooks are probably already flexible enough to support this.

Change the re module so that expression compilation is lazy by default.

Ruthlessly examine what's run by merely starting Python, especially site.py, and find ways to make that code run faster.

Add bzrlib.profile_imports, or something like it, to the standard library.

Conclusion

Startup time is just one part of performance. If your program takes 20 minutes to run, then shaving hundreds or even thousands of milliseconds off your execution time isn't going to make a significant difference.

But if your program is meant to run frequently and interactively, and you've already made the rest of it fast, then the techniques shown here can help.

About this document ...

How to make a fast command line tool in Python

This document was generated using the LaTeX2HTML translator Version 2012 (1.2)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -address 'Andrew Bennetts, Open Source Developers Conference, Sydney, 2008' -split 0 -local_icons fast-python.tex

The translation was initiated by Mary Gardiner on 2014-04-17


Footnotes

... 800ms1
Timings were done with CPython 2.5.2 as distributed with Ubuntu 8.04 on a Dell Inspiron 630m laptop.
... Bazaar2
http://bazaar-vcs.org/
... fast3
Between 1 and 290 times slower than C, if you believe the Computer Language Benchmarks Game at http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=python&lang2=gcc.
... 11ms4
Timing commands were actually run with for i in $(seq 5) ; do command ; done. For brevity only the best times out of the five runs are shown.
... PyFlakes5
http://divmod.org/projects/pyflakes

next_inactive up previous
Andrew Bennetts, Open Source Developers Conference, Sydney, 2008