Commit f5b5dbfc authored by Taeluf's avatar Taeluf
Browse files

revert because last commit was on the wrong branch

parent e5652eda
# Development Status of Lexer
## Rambling
I need to think of a more efficient way to handle multiple paths and complex parsing. Heck, maybe I could just do commands that are separated by `;` so its easier to string a bunch of them together. But also threading. I think i REALLY need threading. This would allow me to start listening for two different things at the same time & follow each of them to their full conclusion.
Example:
I have a comment which contains an `@` sign that is NOT followed by a name. When I reach the `@` sign, I want to keep listening for the stop of the comment, but also START listening for the name.
Example: I want to parse `# comment @name description` and `# comment @name(arg) description`
When I encounter `@`, I start listening for a `(` and a alphanumeric.
I DONT KNOW & I have to come back to this.
## Next
- Add feature to reference directive from another grammar (if it doesn't already exist)
## High Priority
- Write DocBlock + Comments Grammar
- `src`, `description`, and `attributes`
- Can come from multiple different styles (`##\n#\n#\n#` or `/**\n*\n*` or `// comment` or `! comment` or whatever)
- Maybe there is a docblock cleanup step that is non lexerryy?? Or I have to add something to make it flexible ...
- I suppose I could have a method that modifies one of my directives, so basically you just set the docblock type to `/*`, then the relevant directives are modified accordingly. This is a fairly convoluted approach, but it would also work. It could even be "aware" of what other language grammar is present, maybe. (maybe not though).
- catches `@name and the description about it`, `@name(arg1,arg2) description about it`, ... what about `@arg prop_name description about it` & catch `prop_name`??? I think this is a case-by-case kind of thing. Maybe depending upon the particular attribute? maybe the language grammar can define it
- Re-design how tests are run
- The way `ast.new` runs things ... Export that into a function, then make it possible to just call functions that way as normal instructions
- A couple more PhpGrammar tests & new features
- BashGrammar + simple tests
- Write a good, short readme.
## Low Priority
- Performance: Use arrays, & do object conversions on directives at the last possible moment. I only want them as objects, so they're easier to work with by reference.
- add command expansion features (see below). I think they're all implemented except for `command []`.
- move command parsing into its own function like `$bits = $this->parse($cliCommandString, $arguments)`... or something where `$bits` contains an object, a method name, and an args array. There is also a TODO for this in the instructions file
- review how `match` instruction set works. Currently, I'm only using `start` and `stop` ...
## Latest
- Refactor instruction execution
- Move the switch into its entire own function & provide more clarity about passed in args
- Improve naming scheme by namespacing all commands. There are also namespace free defaults
- Create separate methods to replace complex `case`s (for readability & maintainability)
- Create command parsing function (`executeMethodString()`)that handles things like `_token:buffer` or `_lexer:unsetPrevious docblock`
- Disable bash grammar tests
- Reorganize lexer's internals into traits & remove old functions
- cleaned up PhpGrammar test.
- Refactored names & namespaces of php grammar & directives
- Deleted old, useless php grammars
- Sorted PhpGrammar directives into traits
- removed unneeded functions for phpgrammarnew & separated directives into a trait
- Got PhpGrammarNew working for SampleClass.php
- Got v0.6 working
## Command Expansion
Add `command ...`, `command []`, and `command // abc dogs and cats` features
- default: `command abc=>[1,2,3]` is same as `command abc [1,2,3]`. (though arrays aren't allowed in the command shorthands. The array is a single argument)
- `...`: `command abc ...=>['d','e','f']` is same as `command abc d e f`
- `[]`: `command abc []=>['a'=>true, 'b'=>'dog']` is same as `command abc a true` and `command abc b dog`
- `command abc []=>['a','b']` is same as `command abc a` and `command abc b`
- `//`: `command abc // cats` is same as `command abc`. The `//` lets you write multiple matches, for example.
## Directive Overrides
I don't know if this is accurate, but I think it is & once verified, these notes can be used for documentation.
- Inheritance rules:
- if raw `match` directive is first in src directive, then add it first
- then add instructions from the child directive, in declared order
- then add all other instructions from the source directive, in their declared order.
- If child directive contains any keys found in source directive, then do not copy the value from the source directive. The child simply overwrites (but in the child's declared order)
# Info I wanna keep around just because
Some of this is actual like ... historical information for the project. A lot of it is just ... stuff I wrote down & might want to use later.
## v0.6 changes
A complete redesign to how directives are declared. The codebase is significantly cleaned up, and the project should be far more maintainable going forward, as well as much more useful as a lexer. I think `v0.5` was never fully functional. I believe I abandoned that in favor of a new design for v0.6
## Some questions
These are probably not accurate. But I wanted to keep them around & maybe answer them again.
......@@ -106,7 +102,7 @@ This is out of date. We no longer use a `state` approach. Instead, each directiv
- `state_not=>[state3, state_4]`. If `$lexer->getState()` is one of these, do not check this regex. For every other state, check this regex (unless `state=>` is also given.)
- `onMatch =>` the function to call when this regex is matched
- `set_state` => `new_state`. to call `$lexer->setState('new_state')` when this regex is matched
- `buffer.clear => true` to call `$token->clearBuffer()` when regex is matched
- `clear_buffer => true` to call `$token->clearBuffer()` when regex is matched
- `pop_state => true` to call `$lexer->popState()` and `$token->clearBuffer()`
4. Every `success_regex` now begins processing
5. if `$debug` is on (hardcoded rn), then print state information.
......
......@@ -12,11 +12,7 @@ Uses Grammars to parse complex strings/files
## Install
For development, it depends upon [taeluf/php/php-tests](https://gitlab.com/taeluf/php/php-tests) and [taeluf/php/CodeScrawl](https://gitlab.com/taeluf/php/CodeScrawl), but they're not setup on packigist yet.
```bash
composer require taeluf/lexer v0.6.x-dev
```
or in your `composer.json`
```json
{"require":{ "taeluf/lexer": "v0.6.x-dev"}}
composer require taeluf/lexer v0.5.x-dev
```
## Use it
......
# Development Status of Lexer
## Current
- PhpGrammar. See Php Grammar status file
## Next (High Priority)
- Separate grammar into multiple files. For example, the PHP Grammar is getting HUGE. I would love for its directives to be separated between multiple files, but for them to still be all part of the PhpGrammar. I could do this via traits? Probably. Then the root class could define methods & directives that are used globally, then each of the traits could define its own subset. Trait-ifying is annoying. It might be nicer to just have them as a php array in a simple file like that, but the trait... might not work either because conflicting property names.... UGH. I just know the file is getting way to big & I would like it to settle DOWN
- I need some other kind of match than just 'start', 'stop', and 'match'. Sometimes I just have simpler needs? Like... Maybe my "start" is actually a sequence of two starts? Like `class` must be followed by a `\`, `/**`, `//`, `#`, or a whitespace char. Else it is not actually `class`.
- New declaration style?
```json
start=>[
'match'=>'class', // str.match is str.end, str.contains, str.start, reg.match
//or just 'next'=>[...]
'next.check'=>[
'chars'=>1,
'match'=>'/[^a-zA-Z_0-9]/',
],
'then :namespace',
'then :cats'=>[
],
'then :dogs'=>[
],
]
```
- Mayyybe make `match` only be checked when `stop` is checked
- Write some light overview documentation
- Write v0.5 PhpGrammar
- Write v0.5 compatible DocBlockGrammar
- have php grammar reference docblock grammar to run in the same lexer instance
- Write a v0.5 compatible BashGrammar
## Latest (newest to oldest)
- Refactor code base
- wrote docblocks
- Add `_fillReg` flag for regexes like `/happy$1cat` to have `$1` replaced with `$matches[1]`
- Add option to fill in previous match via `['string', 1, 'string']` where `1` will be replaced by `$matches[1]`
- Added string-based checking (no noticeable performance improvement)
- `$matches[0]` & `$matches[1]` are the exact string being checked for
- clean up Status Doc
## Next (Low Priority)
- refactor tests
- Consider re-adding `setMatch()` to the token... The method is there, but I'm not using it.
- Write tests for caching
- Convenience methods to run grammar from a particular starting directive.
- Docblock grammar could accept `/** .... */` by default, but could have convenience methods to parse from `/** */` or from a comment that has the `/**` removed.
# Php Grammar
This is to keep track of development of the PhpGrammar & to log any information that may be crucial to future development.
# Php Grammar Status
## Current
- `PhpGrammar` is functional! But it has a very small set of features.
- Working on `PhpGrammarNew`, on the `'namespace'` directive. Check the message echod from the PhpGrammarNew test
- Major progress on `PhpGrammarNew` & made lots of long-winded notes regarding plans.
### What to Catch
This list has not been updated in a while. Some of this stuff has been handled in the PhpGrammar. Much of it has not. These notes updated June 26, 2021
I may have to catch additional things (like arrays) in order to make default value capturing work
- `use NS\ClassName as ShortName` statements (maybe)
- `use TraitName` statements
- done
- `use` statements (maybe) (I care more inside classes than for aliasing)
- interfaces
- traits
- docblocks
......
<?php
namespace Tlf\Lexer\Bash;
trait OtherDirectives {
protected $_other_directives = [
'bash'=>[
'is'=>[
':comment',
],
],
'comment'=>[
'start'=>[
'match'=>'#',
'rewind 1',
'buffer.clear',
'forward 1',
'ast.new'=>[
'_addto'=>'comments',
'_type'=>'comment',
'src'=>'_token:buffer',
],
'buffer.clear //again',
],
// I definitely need a comment parsing directive
'match'=>[
//then I have to decide whether I want to allow them with no parentheses.
//And I do. It should work with or without.
//So, heck. That complicates things a lot
'match'=>'/@[a-zA-Z0-9]/',
'rewind 1',
'ast.append src',
'rewind 1 // again',
'ast.append description',
'forward 2',
'buffer.clear',
'then :_blank-'=>[
'start'=>[
//just immediately start
'match'=>'',
'rewind 1',
],
'stop'=>[
'match'=>'/(\\r|\\n)/',
'rewind 1',
'ast.append src',
'buffer.clear',
]
],
],
'stop'=>[
'match'=>'/(\\r|\\n)/',
'rewind'=>1,
'ast.append src',
'ast.append description',
'forward'=>1,
'buffer.clear',
],
],
];
}
......@@ -34,187 +34,88 @@ class Grammar {
*
* @return a key=>value array of directives.
*/
public function getDirectives($directiveName, array $overrides=[]){
public function getDirectives(...$directiveNames){
$regularDirectives = [];
$isDirectives = [];
// echo "\n\n---\n";
// var_dump($directiveName);
// var_dump($overrides);
// echo "\n\n";
// exit;
if (substr($directiveName,0,1)!==':'){
echo "\n\n'$directiveName' needs to start with a colon...\n\n";
throw new \Exception("So fix it...");
}
$overrides = $this->normalizeDirective($overrides);
$directiveName = substr($directiveName, 1);
$sourceDirective = $this->directives[$directiveName] ?? $this->getDotDirective($directiveName) ?? null;
if ($sourceDirective===null){
if (substr($directiveName,0,6)=='_blank'){
$sourceDirective = [];
if (strlen($directiveName)==6){
$directiveName = $directiveName .'-'.$this->blankCount++;
}
}
}
if ($sourceDirective===null){
throw new \Exception("Directive '$directiveName' not available on '".get_class($this)."'");
}
$sourceDirective = $this->normalizeDirective($sourceDirective);
$overriddenDirective = $this->getOverriddenDirective($overrides, $sourceDirective);
$overriddenDirective->_name = $directiveName;
$overriddenDirective->_grammar = $this;
if (!isset($overriddenDirective->is))return [$directiveName=>$overriddenDirective];
return $this->expandDirectiveWithIs($overriddenDirective, (array)$overrides);
}
/**
*
* Get multiple directives this one is pointing to.
*
* @roadmap(current) Version 1: Load all the targets as source directives, then process overrides with $directive as the overrides
* @roadmap(next) Version 2: Load all the targets as source directives, using their own array values as overrides over the source, then process that as the source and $directive as the override
* @roadmap(future, maybe) Version 3: The parent directive (who defines the 'is')... It's key/values are treated as overrides. The 'is' targets are loaded as source directives, and the parent key=>values (except 'is') are applied as the overrides to make a new source directive. Then the 'is' targets own array values are aplied as the overrides to make another new source directive. Then $directive is applied on each to make ANOTHER new source directive. These ones are returned.
*
*
*/
public function expandDirectiveWithIs(object $directive, $overridesDirective = []){
$isDirectiveName = $directive->_name;
$d = $directive;
$maxCount = 1;
if (isset($directive->_name))$maxCount++;
if (isset($directive->_grammar))$maxCount++;
$out = [];
// if (count((array)$d)>$maxCount){
// // print_r($d->stop);
// // print_r($overridesDirective);
// throw new \Exception("'$isDirectiveName' cannot be processed because it has instructions other than 'is'");
// }
foreach ($d->is as $key=>$override){
if (count($override)>0){
throw new \Exception("We don't yet process overrides on 'is' entries.");
}
$subDirectives = $this->getDirectives($key, (array)$overridesDirective);
foreach ($subDirectives as $directiveToAdd){
$out[$directiveToAdd->_name] = $directiveToAdd;
}
}
return $out;
}
/**
*
* - Inheritance rules:
* - if raw `match` directive is first in src directive, then add it first
* - then add instructions from the child directive, in declared order
* - then add all other instructions from the source directive, in their declared order.
* - If child directive contains any keys found in source directive, then do not copy the value from the source directive. The child simply overwrites (but in the child's declared order)
*
* @param $newDirective The new directive / overrides, but not yet filled by the source
* @param $sourceDirective the original directive
*/
public function getOverriddenDirective(object $overridesDirective, object $sourceDirective){
// $source = (array)$sourceDirective;
$newDirective = [];
foreach ($sourceDirective as $isn=>$instructionList){
$firstInstruction = array_slice($sourceDirective->$isn,0,1);
if (isset($firstInstruction['match'])
&&!isset($overridesDirective->$isn['match'])
){
$newDirective[$isn]['match'] = $firstInstruction['match'];
}
}
foreach ($overridesDirective as $isn=>$instructionList){
if ($isn[0]=='_')continue;
if (!isset($newDirective[$isn]))$newDirective[$isn] = [];
$newDirective[$isn] = $newDirective[$isn] + $instructionList;
}
foreach ($sourceDirective as $isn=>$instructionList){
foreach ($instructionList as $instruction=>$value){
if (!isset($newDirective[$isn][$instruction]))$newDirective[$isn][$instruction] = $value;
foreach ($directiveNames as $directiveName){
if (substr($directiveName,0,1)!==':'){
echo "\n\n'$directiveName' needs to start with a colon...\n\n";
throw new \Exception("So fix it...");
}
}
return (object)$newDirective;
}
/**
* - Arrayify instructions that need to be arrayified.
* - Convert bool value-keys like 'buffer.clear' to 'buffer.clear'=>true
*/
public function normalizeDirective($d){
if (!is_object($d))$d = (object)$d;
$instructionSetNames = ['start', 'match', 'stop'];
$arrayify = [
'match',
];
foreach ($instructionSetNames as $isn){
if (!isset($d->$isn))continue;
$in = $d->$isn;
$out = [];
if (!is_array($in))$in = ['match'=>$in];
foreach ($in as $key=>$value){
if (is_int($key)){
if (substr($value,0,5)=='then '){
$out[$value] = [];
} else {
$out[$value] = true;
$key = substr($directiveName,1);
$directive = $this->directives[$key] ?? $this->getDirectiveFancy($key) ?? null;
// $directive = $this->directives[$key] ?? null;
if ($directive==null){
if (substr($key,0,6)=='_blank'){
if (strlen($key)==6){
$directiveName = $directiveName .'-'.$this->blankCount++;
}
$regularDirectives[$directiveName] = [];
continue;
}
// convert values to array
if (in_array($key, $arrayify)&&!is_array($value)){
$out[$key] = [$value];
continue;
}
$out[$key] = $value;
throw new \Exception("Directive '$key' is null on ".get_class($this));
}
$d->$isn = $out;
if (isset($directive['is']))$isDirectives[$directiveName] = $directive;
else $regularDirectives[$directiveName] = $directive;
}
$alternateSetAutoValues = [
'is'=>[],
];
foreach ($alternateSetAutoValues as $setName=>$autoValue){
if (!isset($d->$setName))continue;
foreach ($d->$setName as $key=>$value){
if (is_int($key)){
$newKey = $value;
$newValue = $autoValue;
foreach ($isDirectives as $isDirectiveName=>$d){
if (count($d)>1){
throw new \Exception("'$isDirectiveName' cannot be processed because it has instructions other than 'is'");
}
foreach ($d['is'] as $k1=>$k2){
if (is_string($k1)){
$key = $k1;
$override = $k2;
} else {
$newKey = $key;
$newValue = $value;
$key = $k2;
$override = [];
}
if (count($override)>0){
throw new \Exception("We don't yet process overrides on 'is' entries.");
}
$subDirectives = $this->getDirectives($key);
foreach ($subDirectives as $directiveToAdd){
// foreach ($override as $okey=>$value){
// $directiveToAdd->$okey = $value;
// }
$regularDirectives[$directiveToAdd->_name] = $directiveToAdd;
}
unset($d->$setName[$key]);
$d->$setName[$newKey] = $newValue;
}
}
foreach ($regularDirectives as $directiveName=>$d){
$d = (object)$d;
$regularDirectives[$directiveName] = $d;
$d->_name = $directiveName;
$d->_grammar = $this->getNamespace();
// convert stop.then to 'then'=>'onStop'=>[]
// foreach ($regularDirectives as $directiveName => $directive){
// foreach ($directive as $key=>$value){
// if ($key=='stop.then'){
// unset($regularDirectives[$directiveName][$key]);
// $regularDirectives[$directiveName]['onStop']['then'][] = $value;
// }
// }
// }
}
return $d;
return $regularDirectives;
}
/**
* Get a directive that has a dot-form like `:string.stop` to get a new directive who's `start` is `:string`'s `stop`
* @return a single directive
*/
protected function getDotDirective($key){
protected function getDirectiveFancy($key){
$parts = explode('.',$key);
if (count($parts)<=1)return null;
if (count($parts)>2){
......
<?php
namespace Tlf\Lexer;
/**
*
* This is not for actual parsing yet. This is for design work. The $directives array & 'php_open' and 'namespace' are design aspects I'm interested in implementing at some point... maybe
*/
class BashGrammar extends Grammar {
// use Bash\LanguageDirectives;
use Bash\OtherDirectives;
protected $expect = ['html', 'php_open'];
/**
* Filled by traits
*/
protected $directives;
public $notin = [
'asfdasdfkeyword'=>[
// 'match'=>'/this-regex-available on php.net keywords page/',
'__halt_compiler', 'abstract', 'and', 'array', 'as', 'break', 'callable', 'case', 'catch', 'class', 'clone', 'const', 'continue', 'declare', 'default', 'die', 'do', 'echo', 'else', 'elseif', 'empty', 'enddeclare', 'endfor', 'endforeach', 'endif', 'endswitch', 'endwhile', 'eval', 'exit', 'extends', 'final', 'for', 'foreach', 'function', 'global', 'goto', 'if', 'implements', 'include', 'include_once', 'instanceof', 'insteadof', 'interface', 'isset', 'list', 'namespace', 'new', 'or', 'print', 'private', 'protected', 'public', 'require', 'require_once', 'return', 'static', 'switch', 'throw', 'trait', 'try', 'unset', 'use', 'var', 'while', 'xor'
],
];
public function getNamespace(){return 'bashgrammar';}
public function __construct(){
$this->directives = array_merge(
// $this->_language_directives,
$this->_other_directives,
);
}
public function onLexerStart($lexer,$file,$token){
// $lexer->addDirective($this->getDirectives(':bash')['php_open']);
}
}
<?php
namespace Tlf\Lexer;
/**
*
* This is not for actual parsing yet. This is for design work. The $directives array & 'php_open' and 'namespace' are design aspects I'm interested in implementing at some point... maybe
*/
class PhpGrammar extends Grammar {
use Php\LanguageDirectives;
use Php\ClassDirectives;
use Php\ClassMemberDirectives;
use Php\BodyDirectives;
use Php\OtherDirectives;
protected $expect = ['html', 'php_open'];
public $notin = [
'keyword'=>[
// 'match'=>'/this-regex-available on php.net keywords page/',
'__halt_compiler', 'abstract', 'and', 'array', 'as', 'break', 'callable', 'case', 'catch', 'class', 'clone', 'const', 'continue', 'declare', 'default', 'die', 'do', 'echo', 'else', 'elseif', 'empty', 'enddeclare', 'endfor', 'endforeach', 'endif', 'endswitch', 'endwhile', 'eval', 'exit', 'extends', 'final', 'for', 'foreach', 'function', 'global', 'goto', 'if', 'implements', 'include', 'include_once', 'instanceof', 'insteadof', 'interface', 'isset', 'list', 'namespace', 'new', 'or', 'print', 'private', 'protected', 'public', 'require', 'require_once', 'return', 'static', 'switch', 'throw', 'trait', 'try', 'unset', 'use', 'var', 'while', 'xor'
],
];
public function getNamespace(){return 'phpgrammar';}
public function onLexerStart($lexer,$file,$token){
$this->directives = array_merge(
$this->_language_directives,
$this->_body_directives,
$this->_class_directives,
$this->_class_member_directives,
$this->_other_directives,
);
// $lexer->addDirective($this->getDirectives(':html')[':html']);
$lexer->addDirective($this->getDirectives(':php_open')['php_open']);
// $lexer->stackDirectiveList('phpgrammar:html', 'phpgrammar:php_open');
// $lexer->setDirective([$this->getDirective('html')]);
$file->set('namespace', '');
}
public function holdNamespaceName($lexer, $file, $token){
$prev = $lexer->previous('namespace.name');
if ($prev == null)$prev = [];
$prev[] = $token->buffer();
$lexer->setPrevious('namespace.name', $prev);
}
public function saveNamespace($lexer, $file, $token){
$namespace = $lexer->previous('namespace.name');
$namespace = implode('\\',$namespace);
$file->set('namespace', $namespace);
$lexer->setPrevious('namespace.name', $namespace);
}
public function handleClassDeclaration($lexer, $class, $token){
$class->set('declaration', $lexer->unsetPrevious('class.declaration'));
}
public function processDocBlock($lexer, $ast, $token){
$lexer->setPrevious('docblock', $token->buffer());
}
public function captureUseTrait($lexer, $ast, $token){
$ast->add('traits',$token->buffer());
}
public function processComment($lexer, $ast, $token){
$comment = trim($token->buffer());
$ast->add('comments', $comment);
$lexer->previous('comment', $comment);
}
// public function end_docblock($lexer, $unknownAst, $token){
// $block = $token->buffer();
// $block = trim($block);
// $block = trim(substr($block,strlen('/**'),-1));