Commit 60f163cb authored by Maximilian Berghoff's avatar Maximilian Berghoff
Browse files

fix entity extraction, add blog post

parent 990afe10
# Natural Language Understanding done with PHP
Having a look into AI topics, NLU – Natural Language Understanding – is a really interesting one. Wenn thinking about text parsing and structuring it, tools from Amazon and Google come into the scene mostly, but I would not recomend to use them for private usage. You can ask why, but you should admit, tha nobody really knows about the lifetime of data in the networks of those global companies.
For the same reason, i would not recomend my customer. And again, i would like to support my recomendation, with the data. So what do you think from "Data Policy" perspective into our project? Have you ever ssen a working "Data Policy " Page with all the information a customer would need? I don't know if it is even possible, without breaking any law or start lying.
So there can be only one solution from my point of view: Set it up on your one by using existing opensource project. Except your training data is to complex, it won't really hurt you from a hardware perspective.
## Back to basics
Before having a look into a solution for PHP developers, you I would like to give a short introduction into the topic.
First thing to know is, that NLU is not NLP, which means Natural Language Understanding is not Natural Language Processing. Both technics got something in common, which mostly is the NLU part, but NLP is more. This [blog post](https://medium.com/@lola.com/nlp-vs-nlu-whats-the-difference-d91c06780992) describes it very well. While NLP is the complete ecosystem to do human machine interaction, NLU is "only" the AI part in
* sorting unstructured data (text) and bring it into a form machines can understand
* evaluating an *intent*
* extraction of meaningfull *entities*
* possibility to train *models*
Having that list, 3 words shoulds be described a little bit deeper.
*Intent*
The intent is a property even some participants of human to human conversations do not really get. What does my the other mean? What is the value my friend would like to transport. Instead of a common "yes, but ..." we should ask questions to understand the intent, but yea .. that is an other topic of our society. At the moment we try to train machines to understand intent of sentence. Let us have a look into the following example:
> I have to stay home until friday.
That sentence can have multiple intents, which makes it quite hard to understand:
* The narator got kids and no nanny for the upcomming week
* The narator is ill and the doctors says that.
* The narators company is closed for some reasons, so remote work is needed.
That are intents, which do have almost nothing in common except the fact, that the narator is back on friday.
*Entity*
For those intents it is mostly important to grab some usefull values out of the text. So having the sentence i mentioned above, what can be a value we should try to to get? It is "friday", i would say. Cause depending on the intent, this information is really usefull. So lets persist it in a kind of a variable. Let's say it the `last_day` of something. Doing so we would have it available to work on. Cause, having an intent like a "report for illness" this value can be used for the employes documenation. In our case a Jira Ticket will be created with an subject "Illness: Maximilian Berghoff [Data of today] - [Date of last day]" or something equal. This is nothing complicated.
*Model*
To get the machine understand our intent and grab the entities, we have to train it. For the tool i would like to propose in this blog post – RASA NLU – training data looks like:
```md
language: "en"
pipeline:
- name: "nlp_spacy"
model: "de"
- name: "tokenizer_spacy"
- name: "ner_crf"
- name: "intent_featurizer_spacy"
- name: "intent_classifier_sklearn"
data: |
## intent:report_illness_duration
- I will stay home for [3 days](duration)
- I can not come the next [4 days](duration)
- I will stay in bed thee upcomming [5 days](duration)
## intent:report_illness_from_to
- I will stay home until [friday](last)
- My doctor suggests me to stay home until [friday](last)
- I am ill from [monday](first) to [friday](last)
- I will be back on [friday](last)
## lookup:duration
- 1 days
- 2 days
- a week
## lookup:first
- monday
- thuesday
- wednessday
- thursday
- friday
## lookup:last
- monday
- thuesday
- wednessday
- thursday
- friday
```
The first two key should not interesst at the moment as they are simple configuration. I would like to point you into the data key. Here you can find the intents. I separated the illness report into two different ones. One to give a range for you illness and one to give a duration. Having two different intents is conftable here, as i can define two different sets of entitites. For the first one, i need a date for the first an the last day, at least the last day should exist (to asume i.e. today as the first day). For the second one i only need the duration (and asume i.e. today as the first day). In that example i also described a kind of a setting, which values are possible. This will be enriched in traning process.
This really loooks like a kind of a pattern or algorythm to do the lookup for intent and entities, but it isn't. At the end the machine should recognize sentences, which are not in this list. Until this is done in a acceptable confidence, we have to give more examples like this one.
## Tool to use - RASA NLU
There are other software out there to do NLU, but i decided to give some insight how to do it with [RASA NLU](https://rasa.com/docs/getting-started/overview/). It has all the features we need for our usecases and is quite easy to install:
```bash
pip install rasa_nlu
# OR (to get bleeding edge)
# git clone https://github.com/RasaHQ/rasa_nlu.git
cd rasa_nlu
pip install -r requirements.txt
pip install -e .
```
You can pass/create some configurations now and start training. But this one has a little downside: it is written in Python. To be honest, it shouldn't be that issue to switch to an other language. But what if you still run a possible running PHP Application? I would propose to use the HTTP API of RASA and include it into your Application by doing simple curl requests.
## PHP Integration
For a [Talk at PHP Central Europe](https://2018.phpce.eu/de/#agenda) I prepared some code to show, that is possible to integrate NLU into PHP whithout implementing NLU in PHP. You can checkout the [repository](https://gitlab.com/ElectricMaxxx/php-meets-nlu) and should have a look into `rasa_client/lib` This code should be enough to do some basic requests get meaninfull models back. For that use case i also introduced a commandline application written in symfony. This is not mandatory at all. It is just the fastest way for me to call the given code and to output something readable.
To start working with that example, you should run both docker containers and enter the one for the app code:
```bash
$ cd examples/
$ docker-compose up -d
$ docker exec -it rasa-nlu-client sh
$ cd /app/src/
$ bin/console
```
The comand `bin/console` should give you a list looking like:
```bash
rasa
rasa:nlu:parse Parse a given text for its intents.
rasa:nlu:remove-model Remove a training model.
rasa:nlu:status return the available projects and models
rasa:nlu:train Train a project by a well defined training data. For the training data you should have a look into: https://rasa.com/docs/nlu/dataformat/
```
which is an overview of the given commands. So then let's do them
*status*
```bash
$ bin/console rasa:nlu:status
Got following projects\
Project: illness_report
=======================
currently training:0
-----------------------
Available Model
-----------------------
model_20181027-164038
model_20181027-173358
-----------------------
-----------------------
Loaded Model
-----------------------
model_20181027-173358
-----------------------
```
Gives an overview of models and currently running trainings. Models listed in "Loaded Model" are that ones living in memory, means those you will get the fastest answers for.
*train*
```bash
rasa:nlu:train --project=illness_report data/config_train_illness_report.yml
new model trained
=================
Created Model: model_20181029-062412
```
Posts a valid training data file into a project (i used the one i mentioned above), to train a new model. You can also mention a model by using `--model` to train an existing odel.
*Status (new model there)*
```bash
# bin/console rasa:nlu:status
Got following projects\
Project: illness_report
=======================
currently training:0
-----------------------
Available Model
-----------------------
model_20181027-164038
model_20181027-173358
<span class="mark">model_20181029-062412</span>
-----------------------
-----------------------
Loaded Model
-----------------------
model_20181027-173358
-----------------------
```
After add new training data, without mentioning the model the newly created model will be visible in the status list.
*Parse*
```bash
$ bin/console rasa:nlu:parse --project=illness_report "I will be ill until friday"
Intent: report_illness_from_to - Confidence: 0.8078944273721
============================================================
Entities found:
------ -------- ------- ----- ----------- ------------------
Name Value start end extractor confidence
------ -------- ------- ----- ----------- ------------------
last friday 20 26 ner_crf 0.93667437133644
------ -------- ------- ----- ----------- ------------------
Ranking:
------------------------- ----------------- ------------
Pos. Name Confidence
------------------------- ----------------- ------------
report_illness_from_to 0.8078944273721
report_illness_duration 0.1921055726279
------------------------- ----------------- ------------
```
Now you can as for parsing a string. You will get a static answer back. That means, each intent you get is one with a calculated confidence only. "0.8" is quite ok, but a little bit more training will increase you confirmation level. You also get an entitities back if there whrere defined in training data.
## Conclusion
According to the current [PHP-Is-Dead-Blog-Post](https://hackernoon.com/php-is-dead-viva-le-php-f5dc5eb5c9c4) PHP can also be used to interact with NLU. Sure i would not
start training complex data in user request, but parsing text in loaded model will be fast enough to give valid decisions. If there are many usecased beside creating a chatbot
for NLU. I think you are able to use them from a PHP application now.
\ No newline at end of file
......@@ -17,22 +17,39 @@ class Entity
* @var string
*/
private $value;
private $start;
private $end;
private $extractor;
private $confidence;
private function __construct(string $name, string $value)
private function __construct(string $name, string $value, $start, $end, $extractor, $confidence)
{
$this->name = $name;
$this->value = $value;
$this->start = $start;
$this->end = $end;
$this->extractor = $extractor;
$this->confidence = $confidence;
}
public static function fromValues($values): Entity
{
Assert::isArray($values);
Assert::keyExists($values, 'name', 'An entity should have a "name"');
$name = $values['name'];
Assert::keyExists($values, 'value', 'An entity should have a "name"');
$value = $values['value'];
Assert::isArray($values, 'Values to create an entity must be an array');
Assert::keyExists($values, 'entity', 'An entity should have a "entity"');
Assert::keyExists($values, 'value', 'An entity should have a "value"');
Assert::keyExists($values, 'start', 'An entity should have a "start"');
Assert::keyExists($values, 'end', 'An entity should have a "end"'); $end = $values['end'];
Assert::keyExists($values, 'extractor', 'An entity should have a "extractor"');
Assert::keyExists($values, 'confidence', 'An entity should have a "confidence"');
return new self($name, $value);
return new self(
$values['entity'],
$values['value'],
$values['start'],
$values['end'],
$values['extractor'],
$values['confidence']
);
}
/**
......@@ -50,4 +67,36 @@ class Entity
{
return $this->value;
}
/**
* @return mixed
*/
public function getStart()
{
return $this->start;
}
/**
* @return mixed
*/
public function getEnd()
{
return $this->end;
}
/**
* @return mixed
*/
public function getExtractor()
{
return $this->extractor;
}
/**
* @return mixed
*/
public function getConfidence()
{
return $this->confidence;
}
}
......@@ -21,6 +21,18 @@ class ParseResponse
* @var array|Intent[]
*/
private $intentRanking;
/**
* @var string
*/
private $text;
/**
* @var string
*/
private $projectKey;
/**
* @var Model
*/
private $model;
/**
* ParseResponse constructor.
......@@ -29,11 +41,14 @@ class ParseResponse
* @param Entity[] $entities
* @param Intent[] $intentRanking
*/
private function __construct($intent = null, array $entities = [], array $intentRanking = [])
private function __construct($intent = null, array $entities, array $intentRanking, string $text, string $projectKey, Model $model)
{
$this->intent = $intent;
$this->entities = $entities;
$this->intentRanking = $intentRanking;
$this->text = $text;
$this->projectKey = $projectKey;
$this->model = $model;
}
public static function fromJsonString($value): ParseResponse
......@@ -44,9 +59,10 @@ class ParseResponse
$entities = [];
if (array_key_exists('entities', $jsonArray)) {
Assert::isArray($jsonArray['entities'], '"entities" on parse response must be an array');
foreach ($jsonArray['entities'] as $name => $value) {
$entities[] = Entity::fromValues($name, $value);
Assert::isArray($jsonArray['entities'], '"entities" on parse response must be an array of arrays');
foreach ($jsonArray['entities'] as $position => $jsonEntities) {
Assert::isArray($jsonEntities, '"entities" include an array again');
$entities[] = Entity::fromValues($jsonEntities);
}
}
......@@ -58,7 +74,18 @@ class ParseResponse
}
}
return new self($intent, $entities, $intentRanking);
Assert::string($jsonArray['text']);
Assert::string($jsonArray['project']);
Assert::string($jsonArray['model']);
return new self(
$intent,
$entities,
$intentRanking,
$jsonArray['text'],
$jsonArray['project'],
Model::fromString($jsonArray['model'])
);
}
/**
......@@ -84,4 +111,28 @@ class ParseResponse
{
return $this->intentRanking;
}
/**
* @return string
*/
public function getText(): string
{
return $this->text;
}
/**
* @return string
*/
public function getProjectKey(): string
{
return $this->projectKey;
}
/**
* @return Model
*/
public function getModel(): Model
{
return $this->model;
}
}
......@@ -52,7 +52,12 @@ class ProjectEndpoint extends RasaClient implements TrainModelsInProjectEndpoint
return $response->getBody()->getContents();
}
return TrainDataResponse::fromJsonString($response->getBody()->getContents());
try {
return TrainDataResponse::fromJsonString($response->getBody()->getContents());
} catch (\InvalidArgumentException $exception) {
print("\n ERROR: ".$exception->getMessage());
return null;
}
})->otherwise(function ($response) {
if ($response instanceof \Error) {
return $response->getMessage();
......@@ -81,7 +86,12 @@ class ProjectEndpoint extends RasaClient implements TrainModelsInProjectEndpoint
return $response->getBody()->getContents();
}
return ParseResponse::fromJsonString($response->getBody()->getContents());
try {
return ParseResponse::fromJsonString($response->getBody()->getContents());
} catch (\InvalidArgumentException $exception) {
print("\n ERROR: ".$exception->getMessage());
return null;
}
})->otherwise(function (\GuzzleHttp\Psr7\Response $response) {
if ($response instanceof \Error) {
return $response->getMessage();
......
......@@ -2,6 +2,7 @@
namespace App\Command;
use RASA\NLU\Model\Entity;
use RASA\NLU\Model\Intent;
use RASA\NLU\Model\ParseResponse;
use Symfony\Component\Console\Input\InputArgument;
......@@ -46,18 +47,20 @@ class ParseTextCommand extends CommonCommand
return;
}
$this->io->title('Intent: '.$response->getIntent()->getName().' - Confidence: '.$response->getIntent()->getConfidence());
$this->io->writeln('Entities found:');
if (is_array($response->getEntities())) {
$this->io->table(['Name', 'Value', 'start', 'end', 'extractor', 'confidence'], array_map(function (Entity $entity) {
return [$entity->getName(), $entity->getValue(), $entity->getStart(), $entity->getEnd(), $entity->getExtractor(), $entity->getConfidence()];
}, $response->getEntities()));
}
$this->io->writeln('');
$this->io->writeln('Ranking:');
if (is_array($response->getIntentRanking())) {
$this->io->table(['Name', 'Confidence'], array_map(function (Intent $intent) {
$this->io->table(['Pos.', 'Name', 'Confidence'], array_map(function (Intent $intent) {
return [$intent->getName(), $intent->getConfidence()];
}, $response->getIntentRanking()));
}
if (is_array($response->getEntities())) {
foreach ($response->getEntities() as $entity) {
$this->io->writeln('KEY: '.$entity->getName().' VALUE: '.$entity->getValue());
}
}
$this->io->writeln('');
}
}
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment