summaryrefslogtreecommitdiff
path: root/parallel/slurm
AgeCommit message (Collapse)AuthorFilesLines
2013-07-12Bump PKGREVISION of all packages which create users, to pick up change ofjperkin1-2/+2
sysutils/user_* packages.
2013-05-31Bump all packages for perl-5.18, thatwiz1-2/+2
a) refer 'perl' in their Makefile, or b) have a directory name of p5-*, or c) have any dependency on any p5-* package Like last time, where this caused no complaints.
2013-02-06PKGREVISION bumps for the security/openssl 1.0.1d update.jperkin1-2/+2
2012-10-03Bump all packages that use perl, or depend on a p5-* package, orwiz1-1/+2
are called p5-*. I hope that's all of them.
2012-09-11"user-destdir" is default these daysasau1-3/+1
2012-07-03Update to SLURM 2.4.1asau3-8/+10
* Changes in SLURM 2.4.1 ======================== -- Fix bug for job state change from 2.3 -> 2.4 job state can now be preserved correctly when transitioning. This also applies for 2.4.0 -> 2.4.1, no state will be lost. (Thanks to Carles Fenoy) * Changes in SLURM 2.4.0 ======================== -- Cray - Improve support for zero compute note resource allocations. Partition used can now be configured with no nodes nodes. -- BGQ - make it so srun -i<taskid> works correctly. -- Fix parse_uint32/16 to complain if a non-digit is given. -- Add SUBMITHOST to job state passed to Moab vial sched/wiki2. Patch by Jon Bringhurst (LANL). -- BGQ - Fix issue when running with AllowSubBlockAllocations=Yes without compiling with --enable-debug -- Modify scontrol to require "-dd" option to report batch job's script. Patch from Don Albert, Bull. -- Modify SchedulerParamters option to match documentation: "bf_res=" changed to "bf_resolution=". Patch from Rod Schultz, Bull. -- Fix bug that clears job pending reason field. Patch fron Don Lipari, LLNL. -- In etc/init.d/slurm move check for scontrol after sourcing /etc/sysconfig/slurm. Patch from Andy Wettstein, University of Chicago. -- Fix in scheduling logic that can delay jobs with min/max node counts. -- BGQ - fix issue where if a step uses the entire allocation and then the next step in the allocation only uses part of the allocation it gets the correct cnodes. -- BGQ - Fix checking for IO on a block with new IBM driver V1R1M1 previous function didn't always work correctly. -- BGQ - Fix issue when a nodeboard goes down and you want to combine blocks to make a larger small block and are running with sub-blocks. -- BLUEGENE - Better logic for making small blocks around bad nodeboard/card. -- BGQ - When using an old IBM driver cnodes that go into error because of a job kill timeout aren't always reported to the system. This is now handled by the runjob_mux plugin. -- BGQ - Added information on how to setup the runjob_mux to run as SlurmUser. -- Improve memory consumption on step layouts with high task count. -- BGQ - quiter debug when the real time server comes back but there are still messages we find when we poll but haven't given it back to the real time yet. -- BGQ - fix for if a request comes in smaller than the smallest block and we must use a small block instead of a shared midplane block. -- Fix issues on large jobs (>64k tasks) to have the correct counter type when packing the step layout structure. -- BGQ - fix issue where if a user was asking for tasks and ntasks-per-node but not node count the node count is correctly figured out. -- Move logic to always use the 1st alphanumeric node as the batch host for batch jobs. -- BLUEGENE - fix race condition where if a nodeboard/card goes down at the same time a block is destroyed and that block just happens to be the smallest overlapping block over the bad hardware. -- Fix bug when querying accounting looking for a job node size. -- BLUEGENE - fix possible race condition if cleaning up a block and the removal of the job on the block failed. -- BLUEGENE - fix issue if a cable was in an error state make it so we can check if a block is still makable if the cable wasn't in error. -- Put nodes names in alphabetic order in node table. -- If preempted job should have a grace time and preempt mode is not cancel but job is going to be canceled because it is interactive or other reason it now receives the grace time. -- BGQ - Modified documents to explain new plugin_flags needed in bg.properties in order for the runjob_mux to run correctly. -- BGQ - change linking from libslurm.o to libslurmhelper.la to avoid warning. * Changes in SLURM 2.4.0.rc1 ============================= -- Improve task binding logic by making fuller use of HWLOC library, especially with respect to Opteron 6000 series processors. Work contributed by Komoto Masahiro. -- Add new configuration parameter PriorityFlags, based upon work by Carles Fenoy (Barcelona Supercomputer Center). -- Modify the step completion RPC between slurmd and slurmstepd in order to eliminate a possible deadlock. Based on work by Matthieu Hautreux, CEA. -- Change the owner of slurmctld and slurmdbd log files to the appropriate user. Without this change the files will be created by and owned by the user starting the daemons (likely user root). -- Reorganize the slurmstepd logic in order to better support NFS and Kerberos credentials via the AUKS plugin. Work by Matthieu Hautreux, CEA. -- Fix bug in allocating GRES that are associated with specific CPUs. In some cases the code allocated first available GRES to job instead of allocating GRES accessible to the specific CPUs allocated to the job. -- spank: Add callbacks in slurmd: slurm_spank_slurmd_{init,exit} and job epilog/prolog: slurm_spank_job_{prolog,epilog} -- spank: Add spank_option_getopt() function to api -- Change resolution of switch wait time from minutes to seconds. -- Added CrpCPUMins to the output of sshare -l for those using hard limit accounting. Work contributed by Mark Nelson. -- Added mpi/pmi2 plugin for complete support of pmi2 including acquiring additional resources for newly launched tasks. Contributed by Hongjia Cao, NUDT. -- BGQ - fixed issue where if a user asked for a specific node count and more tasks than possible without overcommit the request would be allowed on more nodes than requested. -- Add support for new SchedulerParameters of bf_max_job_user, maximum number of jobs to attempt backfilling per user. Work by BjæËrn-Helge Mevik, University of Oslo. -- BLUEGENE - fixed issue where MaxNodes limit on a partition only limited larger than midplane jobs. -- Added cpu_run_min to the output of sshare --long. Work contributed by Mark Nelson. -- BGQ - allow regular users to resolve Rack-Midplane to AXYZ coords. -- Add sinfo output format option of "%R" for partition name without "*" appended for default partition. -- Cray - Add support for zero compute note resource allocation to run batch script on front-end node with no ALPS reservation. Useful for pre- or post- processing. -- Support for cyclic distribution of cpus in task/cgroup plugin from Martin Perry, Bull. -- GrpMEM limit for QOSes and associations added Patch from BjæËrn-Helge Mevik, University of Oslo. -- Various performance improvements for up to 500% higher throughput depending upon configuration. Work supported by the Oak Ridge National Laboratory Extreme Scale Systems Center. -- Added jobacct_gather/cgroup plugin. It is not advised to use this in production as it isn't currently complete and doesn't provide an equivalent substitution for jobacct_gather/linux yet. Work by Martin Perry, Bull.
2012-07-03Add "latest" subdirectory to look for distfiles.asau1-1/+2
2012-06-13Distfile has moved to archive subdirectory.asau1-2/+3
2012-03-20Import SLURM 2.4.0pre4 as parallel/slurmasau12-0/+517
SLURM is an open-source resource manager designed for Linux clusters of all sizes. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.