Enrico Bothmann requested to merge ebothmann/sherpa:10-rewrite-settings-input into master Apr 23, 2018

Closes #10

In this MR please discuss the changes introduced by rewriting the input system using YAML. I list below some snippets for handling settings via command line / yaml run card and from within the code.

I have also put this into the Wiki for later reference.

Input file

Input files are now called Sherpa.yaml. They use standard YAML syntax, see YAML Preview and this primer. E.g.

EVENTS: 1M
ERROR: 0.99
ME_GENERATORS: [Comix, Amegic]

# alternative syntax for vector-like settings:
ME_GENERATORS:
- Comix
- Amegic

See W+jets example for a complete example. You can look around in the source branch to inspect other examples. Most of the configs in Examples/ are migrated.

Nesting

YAML allows easy nesting of settings, e.g. all hard decay settings now bundled in one YAML mapping:

HARD_DECAYS:
  Enabled: true
  Apply_Branching_Ratios: false
  Channels:
    "25 -> 5 -5": {Status: 2}
    ...

(As you can see, nested maps can be structured either by indentation or by enclosing a child map with curly brackets.)

Another example would be the proposed new particle data syntax:

PARTICLE_DATA:
  5:
    Massive: true
  15:
    Massive: true
  25:
    Stable: 0
    Width: 0.0

However, I tried not to do this too much and only in cases where many settings are clearly related. Most settings are still top-level, even some that might be worthwhile to nest like the family of CSS_ settings.

Process definition

Process definitions are now a sequence of mappings in YAML-speak. Here is a non-trivial example:

PROCESSES:
- Process: "93 93 -> 90 90 93{4}"
  Order: {QCD: Any, EW: 2}
  CKKW: sqr(30/$(E_CMS))
  $(LJET):
    NLO_QCD_Mode: MC@NLO
    ME_Generator: Amegic
    RS_ME_Generator: Comix
    Loop_Generator: $(LOOPGEN)
  5:
    Integration_Error: 0.05
  6:
    Integration_Error: 0.10
- # more process definitions ...

You can see several things here:

YAML needs quotation marks when specifying values with certain special characters, like curly braces and commas. This makes the syntax less clean unfortunately
Tags are enclosed in a shell-syntax like way: $(E_CMS), $(LOOPGEN), $(LJET). This also obfuscates a bit, but the advantage is of course that mistakes are prevented
I ditched the trailing multiplicity specification, instead I group multiplicity-dependent settings under a multiplicity key, in this examples $(LJET): { ... }, 5: { ... }, 6: { ... }; everything in the curly brackets is only applied to final-state multiplicities $(LJET), 5 or 6, respectively.

Command line

The command line accepts -x/--x settings as before. In addition, each positional argument is parsed as one YAML line (as usual, they take precedence over anything specified in Sherpa.yaml). Hence, in principle everything can be specified via the command line, e.g.

Sherpa -e1k
Sherpa 'EVENTS: 1k'  # the same

or a more complex example:

Sherpa \
    'INIT_ONLY: true' \
    'TAGS: {LGEN: BlackHat}' \
    'ME_GENERATORS: {Comix, $(LGEN)}' \
    'PARTICLE_DATA: { 25: {Mass: 125, Width: 0} }'

All those quotations are not overly pretty, but maybe it's nice to have completely the same capabilities as with the config file, and settings do not look like environment variables anymore (not sure if that's really an advantage practically).

However, I'm not sure if consistency and power really trump simplicity here, so I guess there is a case for a quotation-avoiding syntax. Therefore, these forms will also work:

Sherpa INIT_ONLY:true  # YAML with space after colon omitted (to avoid quotation)
Sherpa INIT_ONLY=true  # legacy-syntax setting
Sherpa LGEN:=BlackHat  # legacy-syntax tag specification

Note that legacy-syntax tags can not be mixed with a YAML-style specification like 'TAGS: {QCUT: 40}'.

How it looks like in the code

There is a global Settings instance which can be used to set defaults, retrieve nested settings, and get the resolved value (where command line takes precedence over config file, with the default value as fallback). Having a global instance allows to define a default once and ensures that not several different defaults are used accidentally. It also allows for some global features like a Settings report (see below). Here is an example:

auto hds = Settings::GetMainSettings()["HARD_DECAYS"];
const auto apply_br
  = hds["Apply_Branching_Ratios"].SetDefault(true).Get<bool>();

You can see that you can traverse the settings by the [] operator. Also, most Settings functions just return a reference to *this, so you can chain methods like SetDefault and Get. Later retrievals of this settings will reuse the default, so you don't have to specify it again (in fact, an exception would be thrown if a different default would be set later):

auto& s = Settings::GetMainSettings();
const auto apply_br
  = s["HARD_DECAYS"]["Apply_Branching_Ratios"].Get<bool>();

Another example for a vector-like setting:

auto& hds = Settings::GetMainSettings();
const auto gens = s["ME_GENERATORS"]
  .SetDefault({"Comix", "Amegic"})
  .UseNoneReplacements()
  .GetVector<string>();

Any settings can have a list of replacements (a map<string, string>) that is applied before returning the value. UseNoneReplacements adds a special replacements list, that replaces all occurrences of Off, false, 0 and no with None.

There is also Override..., variants for matrix-like values, GetWithOtherDefault, AddDefaultTag, SetInterpreterEnabled, GetKeys (for mapping-like settings), GetItems (for sequence-like settings) ...

Settings report

Because all interactions regarding settings go through one instance, we can print a report of all used settings with their resolved values after a run. This can be viewed with a web-browser after a run: example report.

Note that we could also detect if customised settings are never read (which is probably something the user does not expect, maybe due to a typo) and then print out a warning to the terminal, or mark those in the report. This is not yet implemented.

Multiple config files

There is no support anymore for splitting settings into different files according to their function (like Run.dat/Analysis.dat/...). However, you can now use arbitrary numbers of config files on the command line (e.g. Sherpa "RUNDATA: [1.yaml, 2.yaml, ...]" ...) if needed. The files more to the right take precedence, i.e. we could have a base config file and then specializations: Sherpa "RUNDATA: [v+jets-base.yaml, z+jets.yaml]". This could be quite useful for the BENCHMARKS and similar applications.

Validation

The statistics is unchanged, so I can directly diff outputs from the source and the target branch. They are identical for all migrated examples. However, because some examples do not work right now, this is not a complete coverage. I can re-do this when master stabilises a bit and also migrate/validate with the BENCHMARKS configurations. Note also that EXTAMP example configurations have not been migrated yet.

I checked that the number of calls to Settings is constant with respect to the number of events generated. In particular this means that settings are not re-read for each event. Hence, event generation performance should be identical.

Edited Jan 24, 2019 by Enrico Bothmann

Resolve "Rewrite settings input using YAML"