summaryrefslogtreecommitdiff
path: root/mk/bulk/parallel.txt
diff options
context:
space:
mode:
authordmcmahill <dmcmahill@pkgsrc.org>2003-03-16 13:45:12 +0000
committerdmcmahill <dmcmahill@pkgsrc.org>2003-03-16 13:45:12 +0000
commit5fb18e18270fb0f0c6692d5e1302fd4c17865b37 (patch)
tree24f941a112b56429754cf6d9d63cfa753fe268a3 /mk/bulk/parallel.txt
parentc76dd1e5ba08b2b038ba7968c3a57238e9e50c53 (diff)
downloadpkgsrc-5fb18e18270fb0f0c6692d5e1302fd4c17865b37.tar.gz
Add some notes with thoughts on what a parallel bulk build system should do.
Diffstat (limited to 'mk/bulk/parallel.txt')
-rw-r--r--mk/bulk/parallel.txt210
1 files changed, 210 insertions, 0 deletions
diff --git a/mk/bulk/parallel.txt b/mk/bulk/parallel.txt
new file mode 100644
index 00000000000..9b4bf96f4e2
--- /dev/null
+++ b/mk/bulk/parallel.txt
@@ -0,0 +1,210 @@
+# $Id: parallel.txt,v 1.1 2003/03/16 13:45:12 dmcmahill Exp $
+#
+
+These are my thoughts on how one would want a parallel bulk build to
+work.
+
+
+====================================================================
+Single Machine Build Process
+====================================================================
+
+The current (as of 2003-03-16) bulk build system works in the
+following manner:
+
+1) All installed packages are removed.
+
+2) Packages listed in the BULK_PREREQ variable are installed. This
+ must be done before step 2 as some packages (like xpkgwedge) can
+ affect the dependencies of other packages when installed.
+
+3) Each package directory is visited and its explicitly listed
+ dependencies are extracted and put in a 'dependstree' file. The
+ mk/bulk/tflat script is used to generate flattened dependencies
+ for all packages from this dependstree file in both the up and
+ down directions. The result is a file 'dependsfile' which has one
+ line per package that lists all build dependencies. Additionally,
+ a 'supportsfile' is created which has one line for each package
+ and lists all packages which depend upon the listed pacakge.
+ Finally, tsort(1) is applied to the 'dependstree' file to
+ determine the correct build order for the bulk build. The build
+ order is stored in a 'buildorder' file. This is all achieved via
+ the 'bulk-cache' top level target. By extracting dependencies in
+ this fashion, we avoid highly redundant recursive make calls. For
+ example, we no longer need to use a recursive make to find the
+ dependencies for libtool literally thousands and thousands of
+ times throughout the build.
+
+4) During the build, the 'buildorder' file is consulted to figure out
+ which package should be built next. Then to build the package,
+ the following steps are taken:
+
+ a) Check for the existance of a '.broken' file in the package
+ directory. If this file exists, then the package is already
+ broken for some reason so move on to the next package.
+
+ b) Remove all packages which are not needed to build the current
+ package. This dependency list is obtained from the 'dependsfile'
+ created in step 3 and the BULK_PREREQ variable.
+
+ c) Install via pkg_add all packages which are needed to build the
+ current package. We are able to do this because we have been
+ building our packages in a bottom up order so all dependencies
+ should have been build.
+
+ d) Build and package the package.
+
+ e) If the package build fails, then we copy over the build log to
+ a .broken file and in addition, we consult the 'supportsfile' and
+ mark all packages which depend upon this one as broken by adding a
+ line to their .broken files (creating them if needed). By going
+ ahead and marking these packages as broken, we avoid wasting time
+ on them later.
+
+ f) Append the package directory name to the top level pkgsrc
+ '.make' file to indicate that we have processed this package.
+
+5) Run the mk/bulk/post-build script to collect the summary and
+ generate html pages and the email we've all seen.
+
+====================================================================
+Single Machine Build Comments
+====================================================================
+
+There are several features of this approach that are worth mentioning
+explicitly.
+
+1) Packages are built in the correct order. We don't want to rebuild
+ the gnome meta-pkg and then rebuild gnome-libs for example.
+
+2) Restarting the build is a cheap operation. Remember that this
+ build can take weeks or more. In fact the 1.6 build took nearly 6
+ weeks on a sparc 20! If for some reason, the build needs to be
+ interrupted, it can be easily restarted because in step 4f we keep
+ track of what has been built in a file. The lines in the build
+ script which control this are:
+
+ for pkgdir in `cat $ORDERFILE` ; do
+ if ! grep -q "^${pkgdir}\$" $BUILDLOG ; then
+ (cd $pkgdir && \
+ nice -n 20 ${BMAKE} USE_BULK_CACHE=yes bulk-package)
+ fi
+ done
+
+ In addition to storing the progress to disk, the bulk cache files
+ (the 'dependstreefile', 'dependsfile', 'supportsfile', and
+ 'orderfile') are stored on disk so they do not need to be
+ recreated if a build is stopped and then restarted.
+
+3) By leaving packages installed and only deleting the ones which are
+ not needed before each build, we reduce the amount of installing
+ and deinstalling needed during the build. For example, it is
+ quite common to build several packages in a row which all need GNU
+ make or perl.
+
+4) Using the 'supportsfile' to mark all packages which depend upon a
+ package which has just failed to build can greatly reduce the time
+ wasted on trying to build packages which known broken dependencies.
+
+====================================================================
+Parallel Build Thoughts
+====================================================================
+
+To exploit multiple machines in an attempt to reduce the build time,
+many of the same ideas used in the single machine build can still be
+used. My view of how a parallel build should work is detailed here.
+
+master == master machine. This machine is in charge of directing
+ the build and may or may not actively participate in it.
+ In addition, this machine might not be of the same
+ architecture or operating system as the slaves (unless it
+ is to be used as a slave as well).
+
+slave#x == slave machine #x. All slave machines are of the same
+ MACHINE_ARCH and have the same operating system and access
+ the same pkgsrc tree via NFS and access the same binary
+ packages directory.
+
+ If the master machine is also to be used as a build
+ machine, then it is also considered a slave.
+
+Prior to starting the build, the master directs one of the slaves to
+extract the dependency information per steps 1-3 in the single machine
+case.
+
+The actually build should progress as follows:
+
+1) For each slave which needs a job, the master assigns a package to
+ build based on the rule that only packages that have had all their
+ dependencies built will be sent to slaves for compilation.
+
+2) When a slave finishes, the master either notes that the binary
+ package is now available for use as a depends _or_ notes failure
+ and marks all pacakges which depend upon it as broken as in step
+ 4e of the single machine build.
+
+
+Each slave builds a package in the same way as it would in a single
+machine build (steps 4a-d).
+
+====================================================================
+Important Parallel Build Considerations
+====================================================================
+
+
+1) Security. Packages are installed as root prior to packaging.
+
+2) All state kept by the master should be stored to disk to
+ facilitate restarting a build. Remember this could take weeks so
+ we don't want to have to start over.
+
+3) The master needs to be able to monitor all slaves for signs of
+ life. Ie, if a slave machine is simply shut off, the master
+ should detect that its no longer there an re-assign that slaves
+ current job.
+
+3a) The master must be able to distinguish between a slave failing to
+ compile a package due to the package failing vs a
+ network/power/disk/etc. failure. The former causes the package to
+ be marked as broken, the latter causes the slave to be marked as
+ broken.
+
+4) Security.
+
+5) Ability to add and remove slaves from the cluster during a build.
+ Again, a build may take a long time so we want to add/remove
+ slaves while the build is in progress.
+
+====================================================================
+Additional Thoughts
+====================================================================
+
+This is mostly related to using slaves which are not on a local
+network.
+
+- maybe a hook could be put in place which rsync's the binary package
+ tree between the binary package repository machine and the slave
+ machine before and after each package is built?
+
+- security
+
+- Support for kerberos?
+
+====================================================================
+Implementation Thoughts
+====================================================================
+
+- Can this all be written around using ssh to send out tasks? How do
+ we monitor slaves for signs of life? How do we indicate 'build
+ failed/build succeeded/slave failed' conditions?
+
+- Maybe we could have a file listing slaves and the master consults
+ this each time it needs a slave. That would make adding/removing
+ slaves easy. There would need to be another file to keep track of
+ which slaves are busy (and with what).
+
+- Do we want to use something like pvm instead? There is a
+ p5-Parallel-Pvm package and perl nicely deals with parsing some of
+ these files and sorting dependencies although I hate to add any
+ extra dependencies to the build system.
+