Skip to content
  • Author Owner

    Design goals:

    • Transforming plaintext lyrics into markup shouldn't require much typing.
    • "Light" markup (header, deduping refs, and cues) should be about as human-readable as the original.
    • The most common elements should also be the shortest.
    • Lyrics shouldn't need to be romanised in order to be parsed. However...
    • Markup elements (everything besides string literals) shouldn't use punctuation from outside ASCII (ensuring it can be typed with most keyboards).

    Design non-goals:

    • Anything involving nested blocks/spans. It's too unwieldy. (For an example, take something in Genius' internal JSON format and pretty-print it.) And it must be at least mostly usable on Android, where keyboards don't have a TAB key. And ideally spaces aren't needed at the start of a line since they could be lost in copy-pasting. So that's precluded normal indentation anyway just on a technical basis.
    • Localised "keywords"/markup. Added complexity with no payoff.
    • Transcribing every objective part of a performance that's tangentially related to vocals. The use-case for this is a vocalist wishing to perform the song themself, and part of that is the assumption that they've memorised both the melody and the intonation/ornamentation. (The sort of things that would be relevant include cues and rhyme markers.)
    • Including objective metadata that's not relevant. Genius has performers' names in the lyrics, and the release date and producer(s) name(s) outside lyrics, and those don't serve a purpose. Circulyr needs the act's name, track name, and maybe tracklist name in metadata, and so will also dictate the format of those, but anything else can be cross-referenced from MusicBrainz/GMDB. There's also no need to include the definitions of uncommon words and phrases—maybe if they're being repurposed as part of the act's or track's mythos.
    • Transcribing subjective things i.e. "meaning". That's Genius' raison d'être, and even then the feature is only used properly by a tiny proportion of artists. Crowdsourced interpretations tend to fall between useless and mediocre, as you'd expect. (Of course, users could always abuse freeform comments to include this, but the language design shouldn't encourage it.)
    • A manual of style shouldn't be imposed upon all users of the language; one should definitely exist, but it should be optional/replaceable.

    A nugget of truth from my experience of transcribing much of my own music library and migrating them through a few prototypes of this language: You can't cover everything. By that I mean that you can feature-creep for years, but some experimental band will invent something you can't transcribe. (I intend to draw the line at "3+ distinct contemporaneous vocal lines".) But it's also true in the sense that the language needs a way to express the limits of the transcriber.

    Edited by YoshiRulz
  • Author Owner

    These are the features I've either used or had sufficient need for in my relatively diverse library:

    • {&} indistinct/unknown, unknown syllable count
    • {string} unsure of transcription
    • &&& n indistinct syllables
    • || caesura (pause/breathe marker)
    • / enjambment marker
    • ... continuation marker (sentence continues in next section)
    • "" in-line quotations (begrudgingly)
    • [x#] repeat
    • [r:]/[s:]/[g:] section reference+goto (for dedup)
    • [p:]/[s:]/[q:] line(s) reference+goto (for dedup)
    • [m:]/[mv:] movement marker
    • [b:]/[br:] break marker/comment
    • [f:]/[fx:] sound effect comment
    • [ts:] timestamp cue
    • [v:]/[vs:] lang version header
    • [t:]/[tl:] translation or transliteration of line or part of it
    • [c:] freeform comment
    • rhyme markers
      • still haven't had the heart to put this in 0.5, but I was thinking something like I stand= / You can't=, and then digits following the '=' if there are multiple in a paragraph (TODO find a better example than an Eminem-style forced rhyme lmao)
    • syllable count markers
      • just used this twice on one day after not running into it for ages; anyway my idea is So lately, been 3'wondering
      • but maybe I want to reserve apostrophes for content, so it should be 3&wondering
      • but maybe I want to reserve ampersands for TODOs, so it should be... 3%wondering? not many special chars left and I want to reserve '#' for the spec itself

    These are ideas I've considered at some point but currently plan to not do, plus a few things I've seen on Genius and elsewhere:

    • Allowing '!', or even (line-end) ',' or '.', in addition to '?'.
      • At some point I retcon'd out '!', since I couldn't come up with objective criteria for when to use it over the null line-end (implied '.')—it took me far too long to realise this considering some of my earliest transcriptions were Ironhand (industrial metal, a good mix of "obviously screaming", "obviously not screaming", and "arguably screaming"). Also the question of scream-whispering remains. I'll be leaving that one to the philosophers.
      • '.' and line-end ',' were out from the start because I thought they looked ugly, in addition to the objectivity problem. (',' has always been allowed in the middle of a line and is the most common parsed-but-not-semantic punctuation in my dataset.)
      • '?' remains for now, since it would be weird to allow it in the middle of a line and not at the end. (Like ',', it's always been allowed in the middle of a line.)
    • Allowing ';' or ':'. I figured in the context of lyrics they and '—' (em dash) are interchangable, so I went with that. It's the only non-ASCII punctuation, though, so I might swap it for --.
    • Indicating a cut-off phrase with a - (hyphen) suffix on a word, no separating space. (Genius has this, but with an em dash instead of a hyphen. Weirdly, they have an auto-replacement syntax for both em and en dashes.) Dropped this because it was too common, but I'm not married to that position. It shouldn't be a problem to have it in the syntax alongside non-semantic hyphens in the middle of words. But I need to be careful if I'm to bring it back, since there's a slippery slope to st-st-stuttering (something profound to the support on the line), which Genius also has.
    • Numerals for numbers. With even a smidge of thought, these are out. (Genius allows numerals in things like measures and proper nouns, which is dumb.)
    • sdrow desreveR. Since you can't sing it (ehh true enough) without a DVR, this would fall into the category of "sound effect", and should be transcribed as such.
    • Quickfire on other Genius conventions I disagree with (some of these will be delegated to style guides):
      • Unnecessarily accurate spellings e.g. feelin' or wanna. (I prefer e.g. feeling and want to, with explicit carve-outs to resolve ambiguities, like gonna being a syllable shorter than going to. Also AAVE words don't have standardised spellings, so picking one is definitely a concern for the style guides.)
      • **** for any censored word, regardless of length or whether you can infer what word it is. I think I'll leave it to style guides to dictate whether the word should be inferred or omitted, but Genius' option shouldn't be considered. (Bars.)
      • Their capitalisation and proper noun conventions are inconsistent and I also don't like them. This is a language concern insofar as punctuation besides apostrophes and hyphens are—sometimes (inconsistently)—fair game on Genius.
      • Including every "ad-lib" (which is what they call the usually-meaningless vocalisations from the 2nd voice between 1st voice lines). Also transcribing echoes as 2nd voice lines. Though again I think this sort of thing is a style guide concern.
      • Pre-Chorus. (In lyrics, this would be a style guide concern, but I'll have well-known section names in the language, and I prefer Pre-chorus.)
      • Labelling parts with the vocalist's name. This is useless for people familiar with the band, and useless for people unfamiliar with the band. I can see it being helpful for something like Ayreon, maybe, in which case you can use comments.
    Edited by YoshiRulz
  • Author Owner

    attempt in Kotlin https://gitlab.com/YoshiRulz/Circulyr-ANTLR-attempt-2021

    • ANTLR plugin was updated, to 1.0 in fact, so maybe it finally works properly?

    new direction: start over with yacc + Rust https://softdevteam.github.io/grmtools/master/book/quickstart.html

    Edited by YoshiRulz
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment