The Internal Structure of Python Eggs#
STOP! This is not the first document you should read!
Eggs and their Formats#
A “Python egg” is a logical structure embodying the release of a specific version of a Python project, comprising its code, resources, and metadata. There are multiple formats that can be used to physically encode a Python egg, and others can be developed. However, a key principle of Python eggs is that they should be discoverable and importable. That is, it should be possible for a Python application to easily and efficiently find out what eggs are present on a system, and to ensure that the desired eggs’ contents are importable.
There are two basic formats currently implemented for Python eggs:
.eggformat: a directory or zipfile containing the project’s code and resources, along with an
EGG-INFOsubdirectory that contains the project’s metadata
.egg-infoformat: a file or directory placed adjacent to the project’s code and resources, that directly contains the project’s metadata.
Both formats can include arbitrary Python code and resources, including static data files, package and non-package directories, Python modules, C extension modules, and so on. But each format is optimized for different purposes.
.egg format is well-suited to distribution and the easy
uninstallation or upgrades of code, since the project is essentially
self-contained within a single directory or file, unmingled with any
other projects’ code or resources. It also makes it possible to have
multiple versions of a project simultaneously installed, such that
individual programs can select the versions they wish to use.
.egg-info format, on the other hand, was created to support
backward-compatibility, performance, and ease of installation for system
packaging tools that expect to install all projects’ code and resources
to a single directory (e.g.
site-packages). Placing the metadata
in that same directory simplifies the installation process, since it
isn’t necessary to create
.pth files or otherwise modify
sys.path to include each installed egg.
Its disadvantage, however, is that it provides no support for clean
uninstallation or upgrades, and of course only a single version of a
project can be installed to a given directory. Thus, support from a
package management tool is required. (This is why setuptools’ “install”
command refers to this type of egg installation as “single-version,
externally managed”.) Also, they lack sufficient data to allow them to
be copied from their installation source. easy_install can “ship” an
application by copying
.egg files or directories to a target
location, but it cannot do this for
.egg-info installs, because
there is no way to tell what code and resources belong to a particular
egg – there may be several eggs “scrambled” together in a single
installation location, and the
.egg-info format does not currently
include a way to list the files that were installed. (This may change
in a future version.)
Code and Resources#
The layout of the code and resources is dictated by Python’s normal import layout, relative to the egg’s “base location”.
.egg format, the base location is the
.egg itself. That
is, adding the
.egg filename or directory name to
makes its contents importable.
.egg-info format, however, the base location is the
directory that contains the
.egg-info, and thus it is the
directory that must be added to
sys.path to make the egg importable.
(Note that this means that the “normal” installation of a package to a
sys.path directory is sufficient to make it an “egg” if it has an
.egg-info file or directory installed alongside of it.)
If eggs contained only code and resources, there would of course be
no difference between them and any other directory or zip file on
sys.path. Thus, metadata must also be included, using a metadata
file or directory.
.egg format, the metadata is placed in an
subdirectory, directly within the
.egg file or directory. For the
.egg-info format, metadata is stored directly within the
.egg-info directory itself.
The minimum project metadata that all eggs must have is a standard
PKG-INFO file, named
PKG-INFO and placed within the
metadata directory appropriate to the format. Because it’s possible for
this to be the only metadata file included,
.egg-info format eggs
are not required to be a directory; they can just be a
file that directly contains the
PKG-INFO metadata. This eliminates
the need to create a directory just to store one file. This option is
not available for
.egg formats, since setuptools always includes
other metadata. (In fact, setuptools itself never generates
.egg-info files, either; the support for using files was added so
that the requirement could easily be satisfied by other tools, such
In addition to the
PKG-INFO file, an egg’s metadata directory may
also include files and directories representing various forms of
optional standard metadata (see the section on Standard Metadata,
below) or user-defined metadata required by the project. For example,
some projects may define a metadata format to describe their application
plugins, and metadata in this format would then be included by plugin
creators in their projects’ metadata directories.
To allow introspection of installed projects and runtime resolution of inter-project dependencies, a certain amount of information is embedded in egg filenames. At a minimum, this includes the project name, and ideally will also include the project version number. Optionally, it can also include the target Python version and required runtime platform if platform-specific C code is included. The syntax of an egg filename is as follows:
name ["-" version ["-py" pyver ["-" required_platform]]] "." ext
The “name” and “version” should be escaped using
safe_version() respectively then using
to_filename(). Note that the escaping is irreversible and the original
name can only be retrieved from the distribution metadata. For a detailed
description of these transformations, please see the “Parsing Utilities”
section of the
The “pyver” string is the Python major version, as found in the first
3 characters of
sys.version. “required_platform” is essentially
get_platform() string, but with enhancements to properly
distinguish Mac OS versions. (See the
documentation in the “Platform Utilities” section of the
pkg_resources manual for more details.)
Finally, the “ext” is either
.egg-info, as appropriate
for the egg’s format.
Normally, an egg’s filename should include at least the project name and version, as this allows the runtime system to find desired project versions without having to read the egg’s PKG-INFO to determine its version number.
Setuptools, however, only includes the version number in the filename
.egg file is built using the
bdist_egg command, or when
.egg-info directory is being installed by the
install_egg_info command. When generating metadata for use with the
original source tree, it only includes the project name, so that the
directory will not have to be renamed each time the project’s version
This is especially important when version numbers change frequently, and the source metadata directory is kept under version control with the rest of the project. (As would be the case when the project’s source includes project-defined metadata that is not generated from by setuptools from data in the setup script.)
In addition to the minimum required
PKG-INFO metadata, projects can
include a variety of standard metadata files or directories, as
described below. Except as otherwise noted, these files and directories
are automatically generated by setuptools, based on information supplied
in the setup script or through analysis of the project’s code and
Most of these files and directories are generated via “egg-info
writers” during execution of the setuptools
egg_info command, and
are listed in the
egg_info.writers entry point group defined by
Project authors can register their own metadata writers as entry points
in this group (as described in the setuptools manual under “Adding new
EGG-INFO Files”) to cause setuptools to generate project-specific
metadata files or directories during execution of the
command. It is up to project authors to document these new metadata
formats, if they create any.
.txt File Formats#
Files described in this section that have
.txt extensions have a
simple lexical format consisting of a sequence of text lines, each line
terminated by a linefeed character (regardless of platform). Leading
and trailing whitespace on each line is ignored, as are blank lines and
lines whose first nonblank character is a
# (comment symbol). (This
is the parsing format defined by the
yield_lines() function of
.txt files defined by this section follow this format, but some
are also “sectioned” files, meaning that their contents are divided into
sections, using square-bracketed section headers akin to Windows
.ini format. Note that this does not imply that the lines within
the sections follow an
.ini format, however. Please see an
individual metadata file’s documentation for a description of what the
lines and section names mean in that particular file.
Sectioned files can be parsed using the
see the “Parsing Utilities” section of the
pkg_resources manual for
This is a “sectioned” text file. Each section is a sequence of
“requirements”, as parsed by the
please see the
pkg_resources manual for the complete requirement
The first, unnamed section (i.e., before the first section header) in
this file is the project’s core requirements, which must be installed
for the project to function. (Specified using the
The remaining (named) sections describe the project’s “extra”
requirements, as specified using the
extras_require keyword to
setup(). The section name is the name of the optional feature, and
the section body lists that feature’s dependencies.
Note that it is not normally necessary to inspect this file directly;
pkg_resources.Distribution objects have a
that can be used to obtain
Requirement objects describing the
project’s core and optional dependencies.
requires.txt except represents the requirements
specified by the
setup_requires parameter to the Distribution.
depends.txt – Obsolete, do not create!#
This file follows an identical format to
requires.txt, but is
obsolete and should not be used. The earliest versions of setuptools
required users to manually create and maintain this file, so the runtime
still supports reading it, if it exists. The new filename was created
so that it could be automatically generated from
without overwriting an existing hand-created
depends.txt, if one
was already present in the project’s source
namespace_packages.txt – Namespace Package Metadata#
A list of namespace package names, one per line, as supplied to the
namespace_packages keyword to
setup(). Please see the manuals
for setuptools and
pkg_resources for more information about
entry_points.txt – “Entry Point”/Plugin Metadata#
This is a “sectioned” text file, whose contents encode the
entry_points keyword supplied to
setup(). All sections are
named, as the section names specify the entry point groups in which the
corresponding section’s entry points are registered.
Each section is a sequence of “entry point” lines, each parseable using
EntryPoint.parse classmethod; please see the
manual for the complete entry point parsing syntax.
Note that it is not necessary to parse this file directly; the
pkg_resources module provides a variety of APIs to locate and load
entry points automatically. Please see the setuptools and
pkg_resources manuals for details on the nature and uses of entry
This directory is currently only created for
.egg files built by
bdist_egg command. It will contain copies of all
of the project’s “traditional” scripts (i.e., those specified using the
scripts keyword to
setup()). This is so that they can be
reconstituted when an
.egg file is installed.
The scripts are placed here using the distutils’ standard
install_scripts command, so any
#! lines reflect the Python
installation where the egg was built. But instead of copying the
scripts to the local script installation directory, EasyInstall writes
short wrapper scripts that invoke the original scripts from inside the
egg, after ensuring that sys.path includes the egg and any eggs it
depends on. For more about script wrappers, see the section below on
Installation and Path Management Issues.
Zip Support Metadata#
A list of C extensions and other dynamic link libraries contained in
the egg, one per line. Paths are
/-separated and relative to the
egg’s base location.
This file is generated as part of
bdist_egg processing, and as such
only appears in
.egg files (and
.egg directories created by
unpacking them). It is used to ensure that all libraries are extracted
from a zipped egg at the same time, in case there is any direct linkage
between them. Please see the Zip File Issues section below for more
information on library and resource extraction from
A list of resource files and/or directories, one per line, as specified
eager_resources keyword to
setup(). Paths are
/-separated and relative to the egg’s base location.
Resource files or directories listed here will be extracted
simultaneously, if any of the named resources are extracted, or if any
native libraries listed in
native_libs.txt are extracted. Please
see the setuptools manual for details on what this feature is used for
and how it works, as well as the Zip File Issues section below.
These are zero-length files, and either one or the other should exist.
zip-safe exists, it means that the project will work properly
when installed as an
.egg zipfile, and conversely the existence of
not-zip-safe means the project should not be installed as an
.egg file. The
zip_safe option to setuptools’
determines which file will be written. If the option isn’t provided,
setuptools attempts to make its own assessment of whether the package
can work, based on code and content analysis.
If neither file is present at installation time, EasyInstall defaults
to assuming that the project should be unzipped. (Command-line options
to EasyInstall, however, take precedence even over an existing
Note that these flag files appear only in
.egg files generated by
bdist_egg, and in
.egg directories created by unpacking such an
top_level.txt – Conflict Management Metadata#
This file is a list of the top-level module or package names provided by the project, one Python identifier per line.
Subpackages are not included; a project containing both a
foo.baz would include only one line,
foo, in its
This data is used by
pkg_resources at runtime to issue a warning if
an egg is added to
sys.path when its contained packages may have
already been imported.
(It was also once used to detect conflicts with non-egg packages at installation time, but in more recent versions, setuptools installs eggs in such a way that they always override non-egg packages, thus preventing a problem from arising.)
SOURCES.txt – Source Files Manifest#
This file is roughly equivalent to the distutils’
The differences are as follows:
The filenames always use
/as a path separator, which must be converted back to a platform-specific path whenever they are read.
The file is automatically generated by setuptools whenever the
sdistcommands are run, and it is not user-editable.
Although this metadata is included with distributed eggs, it is not actually used at runtime for any purpose. Its function is to ensure that setuptools-built source distributions can correctly discover what files are part of the project’s source, even if the list had been generated using revision control metadata on the original author’s system.
In other words,
SOURCES.txt has little or no runtime value for being
included in distributed eggs, and it is possible that future versions of
install_egg_info commands will strip it before
installation or distribution. Therefore, do not rely on its being
available outside of an original source directory or source
Other Technical Considerations#
Zip File Issues#
Although zip files resemble directories, they are not fully
substitutable for them. Most platforms do not support loading dynamic
link libraries contained in zipfiles, so it is not possible to directly
import C extensions from
.egg zipfiles. Similarly, there are many
existing libraries – whether in Python or C – that require actual
operating system filenames, and do not work with arbitrary “file-like”
objects or in-memory strings, and thus cannot operate directly on the
contents of zip files.
To address these issues, the
pkg_resources module provides a
“resource API” to support obtaining either the contents of a resource,
or a true operating system filename for the resource. If the egg
containing the resource is a directory, the resource’s real filename
is simply returned. However, if the egg is a zipfile, then the
resource is first extracted to a cache directory, and the filename
within the cache is returned.
The cache directory is determined by the
pkg_resources API; please
The Extraction Process#
Resources are extracted to a cache subdirectory whose name is based
on the enclosing
.egg filename and the path to the resource. If
there is already a file of the correct name, size, and timestamp, its
filename is returned to the requester. Otherwise, the desired file is
extracted first to a temporary name generated using
mkstemp(".$extract",target_dir), and then its timestamp is set to
match the one in the zip file, before renaming it to its final name.
(Some collision detection and resolution code is used to handle the
fact that Windows doesn’t overwrite files when renaming.)
If a resource directory is requested, all of its contents are recursively extracted in this fashion, to ensure that the directory name can be used as if it were valid all along.
If the resource requested for extraction is listed in the
eager_resources.txt metadata files, then
all resources listed in either file will be extracted before the
requested resource’s filename is returned, thus ensuring that all
C extensions and data used by them will be simultaneously available.
Extension Import Wrappers#
Since Python’s built-in zip import feature does not support loading
C extension modules from zipfiles, the setuptools
generates special import wrappers to make it work.
The wrappers are
.py files (along with corresponding
.pyo files) that have the same module name as the
corresponding C extension. These wrappers are located in the same
package directory (or top-level directory) within the zipfile, so that
foomodule.so will get a corresponding
bar/baz.pyd will get a corresponding
These wrapper files contain a short stanza of Python code that asks
pkg_resources for the filename of the corresponding C extension,
then reloads the module using the obtained filename. This will cause
pkg_resources to first ensure that all of the egg’s C extensions
(and any accompanying “eager resources”) are extracted to the cache
before attempting to link to the C library.
Note, by the way, that
.egg directories will also contain these
wrapper files. However, Python’s default import priority is such that
C extensions take precedence over same-named Python modules, so the
import wrappers are ignored unless the egg is a zipfile.
Installation and Path Management Issues#
Python’s initial setup of
sys.path is very dependent on the Python
version and installation platform, as well as how Python was started
(i.e., script vs.
-m vs. interactive interpreter).
In fact, Python also provides only two relatively robust ways to affect
sys.path outside of direct manipulation in code: the
environment variable, and
However, with no cross-platform way to safely and persistently change
environment variables, this leaves
.pth files as EasyInstall’s only
real option for persistent configuration of
.pth files are rather strictly limited in what they are allowed
to do normally. They add directories only to the end of
after any locally-installed
site-packages directory, and they are
only processed in the
site-packages directory to start with.
This is a double whammy for users who lack write access to that
directory, because they can’t create a
.pth file that Python will
read, and even if a sympathetic system administrator adds one for them
site.addsitedir() to allow some other directory to
.pth files, they won’t be able to install newer versions of
anything that’s installed in the systemwide
their paths will still be added after
So EasyInstall applies two workarounds to solve these problems.
The first is that EasyInstall leverages
.pth files’ “import” feature
sys.path and ensure that anything EasyInstall adds
.pth file will always appear before both the standard library
and the local
site-packages directories. Thus, it is always
possible for a user who can write a Python-read
.pth file to ensure
that their packages come first in their own environment.
Second, when installing to a
PYTHONPATH directory (as opposed to
a “site” directory like
site-packages) EasyInstall will also install
a special version of the
site module. Because it’s in a
PYTHONPATH directory, this module will get control before the
standard library version of
site does. It will record the state of
sys.path before invoking the “real”
site module, and then
afterwards it processes any
.pth files found in
directories, including all the fixups needed to ensure that eggs always
appear before the standard library in sys.path, but are in a relative
order to one another that is defined by their
The net result of these changes is that
sys.path order will be
as follows at runtime:
sys.argvdirectory, or an empty string if no script is being executed.
All eggs installed by EasyInstall in any
.pthfile in each
PYTHONPATHdirectory, in order first by
PYTHONPATHorder, then normal
.pthprocessing order (which is to say alphabetical by
.pthfilename, then by the order of listing within each
All eggs installed by EasyInstall in any
.pthfile in each “site” directory (such as
site-packages), following the same ordering rules as for the ones on
PYTHONPATHdirectories themselves, in their original order
Any paths from
.pthfiles found on
PYTHONPATHthat were not eggs installed by EasyInstall, again following the same relative ordering rules.
The standard library and “site” directories, along with the contents of any
.pthfiles found in the “site” directories.
Notice that sections 1, 4, and 6 comprise the “normal” Python setup for
sys.path. Sections 2 and 3 are inserted to support eggs, and
section 5 emulates what the “normal” semantics of
.pth files on
PYTHONPATH would be if Python natively supported them.
For further discussion of the tradeoffs that went into this design, as
well as notes on the actual magic inserted into
.pth files to make
them do these things, please see also the following messages to the
distutils-SIG mailing list:
EasyInstall never directly installs a project’s original scripts to a script installation directory. Instead, it writes short wrapper scripts that first ensure that the project’s dependencies are active on sys.path, before invoking the original script. These wrappers have a #! line that points to the version of Python that was used to install them, and their second line is always a comment that indicates the type of script wrapper, the project version required for the script to run, and information identifying the script to be invoked.
The format of this marker line is:
"# EASY-INSTALL-" script_type ": " tuple_of_strings "\n"
script_type is one of
tuple_of_strings is a comma-separated
sequence of Python string constants. For
wrappers, there are two strings: the project version requirement, and
the script name (as a filename within the
ENTRY-SCRIPT wrappers, there are three:
the project version requirement, the entry point group name, and the
entry point name. (See the “Automatic Script Creation” section in the
setuptools manual for more information about entry point scripts.)
In each case, the project version requirement string will be a string
parseable with the
classmethod. The only difference between a
SCRIPT wrapper and a
DEV-SCRIPT is that a
DEV-SCRIPT actually executes the original
source script in the project’s source tree, and is created when the
“setup.py develop” command is run. A
SCRIPT wrapper, on the other
hand, uses the “installed” script written to the
subdirectory of the corresponding
.egg zipfile or directory.
.egg-info eggs do not have script wrappers associated with them,
except in the “setup.py develop” case.)
The purpose of including the marker line in generated script wrappers is to facilitate introspection of installed scripts, and their relationship to installed eggs. For example, an uninstallation tool could use this data to identify what scripts can safely be removed, and/or identify what scripts would stop working if a particular egg is uninstalled.