Large Scale Hyperparameter Sweeps on Slurm
MMF provides a utility script for running large scale hyperparameter sweeps on SLURM based cluster setups. A grid search is run on all permutations for the values provided for each of the hyperparameters in the script. The dotlist overrides provided via MMF's configuration system allows to easily override any configuration parameter through this script. This script is created based on sweep scripts provided in FAIRSeq authored by @myleott.
An example script to sweep over learning rate and batch size for MMBT on hateful memes would look like (assuming it is living at tools/sweeps/sweep_mmbt_hm.py
):
An example command to run this sweep on 2 nodes containing each 8 GPUs each would be:
tip
Add --dry_run
argument to first print out what exactly is going to be run without actually running it.
An actual complex sweep config for visual bert with more options can be found at ./tools/sweep/sweep_visual_bert.py. Command following the above command to run it: