MMF relies on OmegaConf for its configuration system and adds some sugar on top of it. We have developed MMF as a config-first framework. Most of the parameters/settings in MMF are configurable. MMF defines some default configuration settings for its system including datasets and models. Users can then update these values via their own config or a command line dotlist.
- MMF uses OmegaConf for its configuration system with some sugar on top.
- MMF defines base defaults config containing all MMF specific parameters and then each dataset and model define their own configs (example configs: [model] [dataset]).
- The user can define its own config specified by
config=<x>at command line for each unique experiment or training setup. This has higher priority then base, model and dataset default configs and can override anything in those.
- Finally, user can override (highest priority) the final config generated by merge of all above configs by specifying config parameters as dotlist in their command. This is the recommended way of overriding the config parameters in MMF.
- How MMF knows which config to pick for dataset and model? The user needs to specify those in his command as
- Some of the MMF config parameters under
envfield can be overridden by environment variable. Have a look at them.
For understanding and using the MMF configuration system to its full extent having a look at OmegaConf docs especially the sections on interpolation, access and configuration flags. MMF's config currently is by default in struct mode and we plan to make it readonly in future.
MMF follows set hierarchy rules to determine the final configuration values. Following list shows the building blocks of MMF's configuration in an increasing order of priority (higher rank will override lower rank).
- Base Defaults Config
- Dataset's Config (defined in dataset's
- Model's Config (defined in model's
- User's Config (Passed by user as
- Command Line DotList (Passed by user as
x.y.z=vdotlist in command)
Configs other than base defaults can still add new nodes that are not in base defaults config, so user can add their own config parameters if they need to without changing the base defaults. If a node has same path, nodes in higher priority config will override the lower priority nodes.
Full base defaults config can be seen below. This config is base of MMF's configuration system and is included in all of the experiments. It sets up nodes for training related configuration and those that need to be filled by other configs which are specified by user. Main configuration parameters that base defaults define:
- training parameters
- distributed training parameters
- env parameters
- evaluation parameters
- checkpoint parameters
- run_type parameters
Each dataset registered to MMF can define its defaults config by specifying it in classmethod
config_path (example). If
processors key whose value is a dictionary is specified, processors will be initialized by the dataset builder. If dataset builder inherits from MMFDatasetBuilder, it will look for
images field as well in the configuration. A sample config for a builder inheriting MMFDatasetBuilder would look like:
Configs for datasets packages with MMF are present at mmf/configs/datasets. Each dataset also provides composable configs which can be used to use some different from default but standard variation of the datasets. These can be directly included into user config by using includes directive.
User needs to specify the dataset they are using by adding
dataset=<dataset_key> option to their command.
Similar to dataset config, each model registered to MMF can define its config. this is defined by model's
config_path classmethod (example). Configs for models live at mmf/configs/models. Again, like datasets models also provide some variations which can be used by including configs for those variations in the user config.
User needs to specify the model they want to use by adding
model=<model_key> option to their command. A sample model config would look like:
User can specify their configuration specific to an experiment or training setup by adding
config=<config_path> argument to their command. User config can specify for e.g. training parameters according to their experiment such as batch size using
training.batch_size. Most common use case for user config is to specify optimizer, scheduler and training parameters. Other than that user config can also include configs for variations of models and datasets they want to test on. Have a look at an example user config here.
Updating the configuration through dot list syntax is very helpful when running multiple versions of an experiment without actually updating a config. For example, to override batch size from command line you can add
training.batch_size=x at the end of your command. Similarly, for overriding an annotation in the hateful memes dataset, you can do
Command Line Dot List overrides are our recommended way of updating config parameters instead of manually updating them in config for every other change.
MMF's configuration system on top of OmegaConf allows building user configs by including composable configs provided by the datasets and models. You can include it following the syntax
The configs will override in the sequence of how they appear in the directive. Finally, the config parameters defined in the current config will override what is present in the includes. So, for e.g.
And final user config,
would result in final config:
We also support some useful overrides schemes at the same level of command line dot list override. For example, user can specify their overrides in form of demjson as value to argument
--config_override which will them override each part of config accordingly.
MMF supports overriding some of the config parameters through environment variables. Have a look at them in base default config's
Have a look at the defaults config of MMF along with description of parameters from which you may need to override parameters for your experiments.