summaryrefslogtreecommitdiff
path: root/debian/README.venv
blob: 9711ee1d6770cc60b1384b83e6c5e04678483716 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
=========================================
 pyvenv support in Python 3.4 and beyond
=========================================

In Python 3.3, built-in support for virtual environments (venvs) was added via
the `pyvenv`_ command.  For building venvs using Python 3, this is
functionally equivalent to the standalone `virtualenv`_ tool, except that
before Python 3.4, the pyvenv created venv didn't include pip and setuptools.

In Python 3.4, this was made even more convenient by the `automatic
inclusion`_ of the `pip`_ command into the venv so that third party libraries
can be easily installed from the Python Package Index (PyPI_).  The stdlib
module `ensurepip`_ is run when the `pyvenv-3.4` command is run to create the
venv.

This poses a problem for Debian.  ensurepip comes bundled with two third party
libraries, setuptools and pip itself, as these are requirements for pip to
function properly in the venv.  These are bundled in the ensurepip module of
the upstream Python 3.4 tarball as `universal wheels`_, essentially a zip of
the source code and a new ``dist-info`` metadata directory.  Upstream pip
itself comes bundled with a half dozen or so of *its* dependencies, except
that these are "vendorized", meaning their unpacked source code lives within
the pip module, under a submodule from which pip imports them rather than the
top-level package namespace.

To make matters worse, one of pip's vendorized dependencies, the `requests`_
module, *also* vendorizes a bunch of its own dependencies.  This stack of
vendorized and bundled third party libraries fundamentally violates the DFSG
and Debian policy against including code not built from source available
within Debian, and for including embedded "convenience" copies of code in
other packages.

It's worth noting that the virtualenv package actually suffers from the same
conflict, but its current solution in Debian is `incomplete`_.


Solving the conflict
====================

This conflict between Debian policy and upstream Python convenience must be
resolved, because pyvenv is the recommended way of creating venvs in Python 3,
and because at some point, the standalone virtualenv tool will be rewritten as
a thin layer above pyvenv.  Obviously, we want to provide the best Python
virtual environment experience to our developers, adherent to Debian policy.

The approach we've taken is layered and nuanced, so I'll provide a significant
amount of detail to explain both what we do and why.

The first thing to notice is how upstream ensurepip works to have its pip and
setuptools dependencies available, both at venv creation time and when
``<venv>/bin/pip`` is run.  When pyvenv-3.4 runs, it ends up calling the
following Python command::

    <venv>/bin/python -Im ensurepip --upgrade

This runs the ensurepip's ``__main__.py`` module using the venv's Python in
isolation mode, with a switch to upgrade the setuptools and pip dependencies
(if for example, they've been updated in a new micro version of Python).

Internally, ensurepip bootstraps itself by byte-copying its embedded wheels
into a temporary directory, putting those copied wheels on ``sys.path``, and
then calling into pip as a library.  Because wheels are just elaborate zips,
Python can execute (pure-Python) code directly from them, if they are on
``sys.path`` of course.  Once ensurepip has set up its execution environment,
it calls into pip to install both pip and setuptools into the newly created
venv.  If you poke inside the venv after successful creation, you'll see
unpacked pip and setuptools directories in the venv's ``site-packages`
directory.

The important thing to note here is that ensurepip is *already* able to import
from and install wheels, and because wheels are self-contained single files
(of zipped content), it makes manipulating them quite easy.  In order to
minimize the delta from upstream (and to eventually work with upstream to
eliminate this delta), it seems optimal that Debian's solution should also be
based on wheels, and re-use as much of the existing machinery as possible.

The difference for Debian though is that we don't want to use the embedded pip
and setuptools wheels from upstream Python's ensurepip; we want to use wheels
created from the pip and setuptools *Debian* packages.  This would solve the
problem of distributing binary packages not built from source in Debian.

Thus, we modify the python-pip and python-setuptools packages to include new
binary packages ``python-pip-whl`` and ``python-setuptools-whl` which contain
*only* the relevant universal wheels.  Those packages ``debian/rules`` files
gain an extra command::

    python3 setup.py bdist_wheel --universal -d <path>

The ``bdist_wheel`` command is provided by the `wheel`_ package, which as of
this writing is newly available in Jessie.

Note that the name of the binary packages, and other details of when and how
wheels may be used in Debian, is described in `Debian Python Policy`_ 0.9.6 or
newer.

The universal wheels (i.e. pure-Python code compatible with both Python 2 and
Python 3) are built for pip and setuptools and installed into
``/usr/share/python-wheels`` when the python-{pip,setuptols}-whl packages are
installed.  These are not needed for normal, usual, and typical operation of
Python, so none of these are installed by default.

However, this isn't enough, because since the pip and setuptools wheels are
built from the *patched* and de-vendorized versions of the code in Debian, the
wheels will not contain their own recursive dependencies.  That's a good thing
for Debian policy compliance, but does add complications to the stack of hack.

Using the same approach as for pip and setuptools, we *also* wheelify their
dependencies, recursively.  As of this writing, the list of packages needing
to be wheelified are (by Debian source package name):

 * chardet
 * distlib
 * html5lib
 * python-colorama
 * python-pip
 * python-setuptools
 * python-urllib3
 * requests
 * six

Most of these are DPMT maintained.  six, distlib, and colorama are not team
maintained, so coordination with those maintainers is required.  Also note
that the `bdist_wheel` command is a setuptools extension, so since some of
those projects use ``distutils.core.setup()`` by default, they must be patched
to use ``setuptools.setup()`` instead.  This isn't a problem because there's
no functional difference relevant to those packages; they likely use
distutils.core to avoid a third party dependency on setuptools.

Each of these Debian source packages grow an additional binary package, just
like pip and setuptools, e.g. python-chardet-whl which contains the universal
wheel for that package built from patched Debian source.  As above, when
installed, these binary packages drop their .whl files into the
``/usr/share/python-wheels`` directory.

Now comes the fun part.

In the python3.4 source package, we add a new binary package called
python3.4-venv.  This will only contain the ``/usr/bin/pyvenv-3.4``
executable, and its associated manpage.  It also includes all the run-time
dependencies to make pyvenv work *including the wheel packages described
above*.

(Technically speaking, you should substitute "Python 3.4 or later" for all
these discussions, and e.g. pyvenv-3.x for all versions subsequent to 3.4.)

Python's ensurepip module has been modified in the following ways (see
``debian/patches/ensurepip.diff``):

 * When ensurepip is run outside of a venv as root, it raises an exception.
   This use case is only to be supported by the separate python{,3}-pip
   packages.

 * When ensurepip is run inside of a venv, it copies all dependent wheels from
   ``/usr/share/python-wheels``.  This includes the direct dependencies pip
   and setuptools, as well as the recursive dependencies listed above.  The
   rest of the ensurepip machinery is unchanged: the wheels are still copied
   into a temporary directory and placed on ``sys.path``, however only the
   direct dependencies (i.e. pip and setuptools) are *installed* into the
   venv's ``site-packages`` directory.  The indirect dependencies are copied
   to ``<venv>/lib/python-wheels`` since they'll be needed by the venv's pip
   executable.

Why do we do this latter rather than also installing the recursive
dependencies into the venv's ``site-packages``?  It's because pip requires a
very specific set of dependencies and we don't want pip to break when the user
upgrades or downgrades one of those packages, which is perfectly valid in a
venv.  It's exactly the same reason why pip vendorizes those libraries in the
first place; it's just that we're doing it in a more principled way (from the
point of view of the Debian distribution).

The final piece of the puzzle is that Debian's pip will, when run inside of a
venv, introspect ``<venv>/lib/python-wheels`` and put every .whl file it sees
there *at the front of its sys.path*.  Again, this is so that when pip runs,
it will find the versions of packages known to be good first, rather than any
other versions in the venv's ``site-packages``.

As an example of the bad things that can happen if you don't do this, try
installing nose2_ into the venv, followed by genshi_.  nose2 has a hard
requirement on a version of six that is older than the one used by pip
(indirectly).  This older version of six is compatible with genshi, but *not*
with pip, so once nose2 is installed, if pip didn't load its version of six
from the private wheel, the installation attempt of genshi would traceback.
As it is, with the wheels early enough on ``sys.path``, pip itself works just
fine so that both nose2 and genshi can live together in the venv.


Updating packages
=================

Inevitably, new versions of Python or the pyvenv dependent packages will
appear.  Unfortunately, as currently implemented (by both upstream ensurepip
and in our ensurepip patch), the versions of both the direct and indirect
dependencies are hardcoded in ``Lib/ensurepip/__init__.py``.  When a Debian
developer updates any of the dependent packages, you will need to:

 * *Test that the new version is compatible with ensurepip*.

 * Update the version numbers in the ``debian/control`` file, for the
   python3.x-venv binary package.

 * ``quilt push`` to the ensurepip patch, and update the version number in
   ``Lib/ensurepip/__init__.py``

Then rebuild and upload python3.4.

Yes, this isn't ideal, and I am working with upstream to find a good solution
that we can share.


Author
======

Barry A. Warsaw <barry@debian.org>
2014-05-15



.. _pyvenv: http://legacy.python.org/dev/peps/pep-0405/
.. _virtualenv: https://pypi.python.org/pypi/virtualenv
.. _`automatic inclusion`: http://legacy.python.org/dev/peps/pep-0453/
.. _pip: https://pypi.python.org/pypi/pip
.. _PyPI: https://pypi.python.org/pypi
.. _ensurepip: https://docs.python.org/3/library/ensurepip.html
.. _`universal wheels`: http://legacy.python.org/dev/peps/pep-0427/
.. _requests: https://pypi.python.org/pypi/requests
.. _incomplete: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=719767
.. _wheel: https://pypi.python.org/pypi/wheel
.. _nose2: https://pypi.python.org/pypi/nose2
.. _genshi: https://pypi.python.org/pypi/Genshi
.. _`Debian Python Policy`: https://www.debian.org/doc/packaging-manuals/python-policy/