Resolve "Rewrite settings input using YAML"
Closes #10
In this MR please discuss the changes introduced by rewriting the input system using YAML. I list below some snippets for handling settings via command line / yaml run card and from within the code.
I have also put this into the Wiki for later reference.
Input file
Input files are now called Sherpa.yaml
. They use standard YAML syntax, see YAML Preview and this primer. E.g.
EVENTS: 1M
ERROR: 0.99
ME_GENERATORS: [Comix, Amegic]
# alternative syntax for vector-like settings:
ME_GENERATORS:
- Comix
- Amegic
See W+jets example for a complete example.
You can look around in the source branch to inspect other examples. Most of the configs in Examples/
are migrated.
Nesting
YAML allows easy nesting of settings, e.g. all hard decay settings now bundled in one YAML mapping:
HARD_DECAYS:
Enabled: true
Apply_Branching_Ratios: false
Channels:
"25 -> 5 -5": {Status: 2}
...
(As you can see, nested maps can be structured either by indentation or by enclosing a child map with curly brackets.)
Another example would be the proposed new particle data syntax:
PARTICLE_DATA:
5:
Massive: true
15:
Massive: true
25:
Stable: 0
Width: 0.0
However, I tried not to do this too much and only in cases where many settings are clearly related. Most settings are still top-level, even some that might be worthwhile to nest like the family of CSS_
settings.
Process definition
Process definitions are now a sequence of mappings in YAML-speak. Here is a non-trivial example:
PROCESSES:
- Process: "93 93 -> 90 90 93{4}"
Order: {QCD: Any, EW: 2}
CKKW: sqr(30/$(E_CMS))
$(LJET):
NLO_QCD_Mode: MC@NLO
ME_Generator: Amegic
RS_ME_Generator: Comix
Loop_Generator: $(LOOPGEN)
5:
Integration_Error: 0.05
6:
Integration_Error: 0.10
- # more process definitions ...
You can see several things here:
- YAML needs quotation marks when specifying values with certain special characters, like curly braces and commas. This makes the syntax less clean unfortunately
- Tags are enclosed in a shell-syntax like way:
$(E_CMS)
,$(LOOPGEN)
,$(LJET)
. This also obfuscates a bit, but the advantage is of course that mistakes are prevented - I ditched the trailing multiplicity specification, instead I group multiplicity-dependent settings under a multiplicity key, in this examples
$(LJET): { ... }, 5: { ... }, 6: { ... }
; everything in the curly brackets is only applied to final-state multiplicities$(LJET)
, 5 or 6, respectively.
Tags
Tags are specified in a nested setting, like this:
TAGS:
SF: 1.0
LOOPGEN: BlackHat
QCUT: 20
They can be used in any settings value, e.g. as in ANALYIS_OUTPUT: analysis_scf-$(SF)
.
Command line
The command line accepts -x
/--x
settings as before. In addition, each positional argument is parsed as one YAML line (as usual, they take precedence over anything specified in Sherpa.yaml
). Hence, in principle everything can be specified via the command line, e.g.
Sherpa -e1k
Sherpa 'EVENTS: 1k' # the same
or a more complex example:
Sherpa \
'INIT_ONLY: true' \
'TAGS: {LGEN: BlackHat}' \
'ME_GENERATORS: {Comix, $(LGEN)}' \
'PARTICLE_DATA: { 25: {Mass: 125, Width: 0} }'
All those quotations are not overly pretty, but maybe it's nice to have completely the same capabilities as with the config file, and settings do not look like environment variables anymore (not sure if that's really an advantage practically).
However, I'm not sure if consistency and power really trump simplicity here, so I guess there is a case for a quotation-avoiding syntax. Therefore, these forms will also work:
Sherpa INIT_ONLY:true # YAML with space after colon omitted (to avoid quotation)
Sherpa INIT_ONLY=true # legacy-syntax setting
Sherpa LGEN:=BlackHat # legacy-syntax tag specification
Note that legacy-syntax tags can not be mixed with a YAML-style specification like 'TAGS: {QCUT: 40}'
.
How it looks like in the code
There is a global Settings
instance which can be used to set defaults, retrieve nested settings, and get the resolved value (where command line takes precedence over config file, with the default value as fallback). Having a global instance allows to define a default once and ensures that not several different defaults are used accidentally. It also allows for some global features like a Settings report (see below). Here is an example:
auto hds = Settings::GetMainSettings()["HARD_DECAYS"];
const auto apply_br
= hds["Apply_Branching_Ratios"].SetDefault(true).Get<bool>();
You can see that you can traverse the settings by the []
operator. Also, most Settings
functions just return a reference to *this
, so you can chain methods like SetDefault
and Get
. Later retrievals of this settings will reuse the default, so you don't have to specify it again (in fact, an exception would be thrown if a different default would be set later):
auto& s = Settings::GetMainSettings();
const auto apply_br
= s["HARD_DECAYS"]["Apply_Branching_Ratios"].Get<bool>();
Another example for a vector-like setting:
auto& hds = Settings::GetMainSettings();
const auto gens = s["ME_GENERATORS"]
.SetDefault({"Comix", "Amegic"})
.UseNoneReplacements()
.GetVector<string>();
Any settings can have a list of replacements (a map<string, string>
) that is applied before returning the value. UseNoneReplacements
adds a special replacements list, that replaces all occurrences of Off
, false
, 0
and no
with None
.
There is also Override...
, variants for matrix-like values, GetWithOtherDefault
, AddDefaultTag
, SetInterpreterEnabled
, GetKeys
(for mapping-like settings), GetItems
(for sequence-like settings) ...
Settings report
Because all interactions regarding settings go through one instance, we can print a report of all used settings with their resolved values after a run. This can be viewed with a web-browser after a run: example report.
Note that we could also detect if customised settings are never read (which is probably something the user does not expect, maybe due to a typo) and then print out a warning to the terminal, or mark those in the report. This is not yet implemented.
Multiple config files
There is no support anymore for splitting settings into different files according to their function (like Run.dat
/Analysis.dat
/...). However, you can now use arbitrary numbers of config files on the command line (e.g. Sherpa "RUNDATA: [1.yaml, 2.yaml, ...]" ...
) if needed. The files more to the right take precedence, i.e. we could have a base config file and then specializations: Sherpa "RUNDATA: [v+jets-base.yaml, z+jets.yaml]"
. This could be quite useful for the BENCHMARKS and similar applications.
Validation
The statistics is unchanged, so I can directly diff outputs from the source and the target branch. They are identical for all migrated examples. However, because some examples do not work right now, this is not a complete coverage. I can re-do this when master
stabilises a bit and also migrate/validate with the BENCHMARKS configurations. Note also that EXTAMP example configurations have not been migrated yet.
I checked that the number of calls to Settings
is constant with respect to the number of events generated. In particular this means that settings are not re-read for each event. Hence, event generation performance should be identical.