The Flux Framework: A User-Space Task Scheduler
Think of the Flux Framework as a "scheduler-in-a-scheduler" that any user can use to schedule many mixed tasks within their own batch jobs across multiple HPC compute nodes. One can pack a variety of small tasks efficiently onto their assigned compute nodes and have them queued up to run until all are complete. It supports any mix of serial executables/scripts, MPI, OpenMP and most typical executable types that one runs in a batch script. It is available for installation in user-space using the Spack HPC package manager. It supports advanced scheduler features as well, such as task dependencies, multiple logging options, task completion notifications, and even a Python interface.
PET tested the Flux Framework on multiple HPC systems and measured the performance of mixed sample tasks (serial, MPI, OpenMP, hybrid MPI/OpenMP) queued up and running concurrently on one or more compute nodes. There is no measurable performance penalty for using Flux until node(s) are full of tasks. In this case, new tasks may be suboptimally assigned to remaining cores as they become available, but the performance loss is typically minor and still much more efficient overall than under-utilizing the compute nodes.
Some system schedulers, such as Slurm, provide a similar capability. Slurm refers to this as "job steps." Performance of Slurm is being compared to Flux and will be presented. However, HPCMP systems have different schedulers (PBS, Slurm, LSF). Not all schedulers provide this capability at this time. Also, using the Flux Framework allows one to maintain the same set of scripts across all systems regardless of system scheduler.
This presentation will explain how to obtain/install the Flux Framework with Spack, how to use it, basic and intermediate features, the Python interface, a brief comparison to Slurm Job Steps and other alternatives, and performance results to allow the user to estimate their trade-offs for their own mixed task workloads.
PRESENTER
Ziegeler, Sean
sean.ziegeler@gdit.com
228-363-2799
DoD HPCMP PET
CATEGORY
Other: HPC Software Tools
SYSTEMS USED
Narwhal, Nautilus, Applies to all systems
SECRET
No