Kat script grammar in Finesse 3

I think there's room for some small improvements in the new kat script grammar. There are a few things that I find look a bit bodged, which might just be down to personal preference or might stem from us initially trying to keep the new script resembling kat2 but gradually adding backwards-incompatible features. Now that kat2 script is officially incompatible with kat3 (mainly but not just because of the removal of explicit node names) and we have a working kat2 parser that users can always fall back on, I think we could consider some improvements to the kat3 language design.

Below there are some descriptions of things I find peculiar and some suggestions I have to make the language a little nicer. I also added an example at the bottom of how the suggestions might look if implemented. This is just my opinion, and perhaps my interpretation of how the language should "feel" is different from others, but it's worth discussing changes soon before we write too many examples and tests. I think with the changes I suggested here we could make kat script even more powerful while at the same time simplifying many aspects of the parser/generator code - win win.

Spaces are currently never allowed in strings

Because whitespace is important in kat script, strings like my label would be parsed into separate tokens my and label. This is probably what we want for component names since they get added as attributes to Model, but later we might want to support strings with spaces for e.g. labels in plot actions:

series (
    xaxis ( l1.P lin 1 10 100 on_complete=plot (label=my plot))
                                                        ^ unexpected space
)

Maybe enforcing the use of single or double quotes for strings would be nice here. Then we could even allow use of model parameters in strings like this:

series (
    xaxis ( l1.P lin 1 10 100 on_complete=plot (label="plot of {l1.name}"))
)

The component names would still remain as unquoted because they are not actually strings; they have to be stored as attributes in the model so they're more akin to a function definition in e.g. Python (the "myfunc" part of def myfunc(...)).

Then for things like sweep types we might want to make lin and log into actual keywords instead of strings, so e.g.

xaxis ( l1.P lin 1 10 100 )

and not

xaxis ( l1.P "lin" 1 10 100 )

Component parameter lists don't use commas, but value lists do

We use whitespace (well, technically not - #104 (closed)) to separate parameters...

laser l1(P=1 f=2)

...whereas lists require commas to separate values (to allow unbracketed negative numbers - see 218d0258):

pd mypd(f=[10e6, 20e6] phase=[90, 180])

The way I see the kat script with brackets is kind of like "create an object with ID l1 of type laser and give it parameters P=1 and f=2". This very much to me resembles a function definition in Python and other languages, where the parameters in brackets are almost always comma separated. I think we should enforce the use of commas to separate parameters inside brackets.

If we were to use commas for parameter lists (perenthesis-delimited) as well as value lists (bracket-delimited), then it would look weird alongside the undelimited definition:

laser l1 P=1 f=2
laser l1 (P=1, f=2)

We could instead allow single-valued parameters to be left unbracketed, like laser l1 10 or laser l1 P=10 but insist on parentheses when there are multiple parameters: laser l1 (10, phase=180).

We might also want to get rid of square brackets altogether this way, by making it the rule that parentheses simply delimit anything - parameter lists or value lists or anything else. That would look like this:

pd mypd (f=(10M, 20M), phase=(90, 180))

That would have the added benefit of freeing up the square brackets for use in documentation and syntax suggestions for optional parameters a la kat2 documentation:

modulator name ( f, midx, mod_type[, P, phase, f] )

Currently because square brackets are actual kat3 grammar constructs, it might confuse users if we use square brackets to denote optional parameters like is done in the kat2 syntax reference.

Command instructions have no implied "type" like components

As I said above, my understanding of component definitions like laser l1 is that the instruction "type" is "laser", which appears first. Using the same logic with command instructions like lambda 1064e-9 though, that would read "create a type lambda" when in actual fact the type is implied to be model. Perhaps it would read better if we were explicit about model being the type:

model lambda 1064e-9

We could then extend this to allow arbitrary public properties from some whitelist of types to be set, e.g.

model lambda 1064e-9
model modes ((1, 0), (0, 1))
model label "my model"  # sets model.label, if that were to exist - could get dumped into the comments when generating?
model analysis label "my action"  # sets model.analysis.label. Maybe we don't want to allow setting beyond one level of model though.

plotting line_width 5  # assumes we set up some object called "plotting" which handles setting matplotlib config
logging level debug    # set the logging level for this simulation

We could configure model to pass through to the constructed model, plotting to set some (whitelisted) values of Matplotlib's config, and logging to configure the log verbosity and formatting etc.

Hah, technically because the root action in a script is set to the Model.analysis property, we could also insist that analysis trees are set this way:

model analysis noxaxis  # same as not specifying anything
model analysis serial(...)

Example

# all definitions follow the form "type name/field value" or "type name/field (key_value_list)"

# turn on debugging
debug enabled 1  # we could make the default 1 by having this call a function set_enabled(enabled=True), so that just writing "debug enabled" would work too
logging (level="debug", format="{date}: {message}")

# "model" always does something to the model
model lambda 1550n  # single-valued commands could be set without parentheses
model maxtem 5
model label "Fabry Perot"
model select_modes "odd"  # string in quotes, unless we made "odd" a reserved keyword like "lin" or "log"
model select_modes ("odd", maxtem=5)  # setting more than one value requires parentheses

laser l1 10  # set power to 10 W
modulator mod1 (10M, 0.1, lin, mod_type=am)  # am/pm also reserved keywords?
mirror itm (R=0.99, T=0.01)
mirror etm (T=100u, L=0)
pd refl (itm.p1.o, f=(10M, 20M), phase=(0, 180))

model analysis serial (
    xaxis ( itm.R, lin, 0.9, 0.999, 100, name="sweep 1", on_complete=plot ),
    print_model,  # no parameters
)

That looks quite intuitive to me, and makes kat script almost as powerful as the Python API. It departs from kat2 syntax in many ways but more closely corresponds to the equivalent Python code without being too generic (e.g. needing variable assignments and calls to model.add).

This would mean a large overhaul for the parser and generator but it's something I'm happy to work on.

Thoughts?