Commit ff7ac1f1 authored by Davide Liga's avatar Davide Liga

doc update

parent 9932ce22
......@@ -670,7 +670,7 @@ The Marker-Converter convert this information in \<term\> with the related \<TLC
<a name="five"></a>
##5. Akoma Ntoso Marker and Conversion
## 5. Akoma Ntoso Marker and Conversion
This module produce the Akoma Ntoso XML using Regular expressions and heuristics for detecting coverPage, preface, preamble, body, conclusions, annex, table.
......@@ -679,7 +679,7 @@ This module reuse all the previous step knowledge for marking correctly the sema
**Even if the AKN4UN define the resolutions as \<documentCollection\> composed by different parts, we have preferred in this challenge to simplify the structure using \<statement\> only. It will not a big deal to wrap the result in a \<documentCollection\> later on.**
<a name="five-1"></a>
###5.1 Process of Conversion
### 5.1 Process of Conversion
The first step of the conversion consists in loading the provided word document and converting it (or rather: its parts) into txt.
The second step is parsing the text top to bottom and using pattern matching to identify structural elements such as the document title, number, the paragraphs, sections, annexes and so on.
......@@ -974,7 +974,7 @@ expeditiously;</p>
~~~~
<a name="five-2"></a>
###5.2 Results
### 5.2 Results
We have processed the UN documents with the following results:
......@@ -991,7 +991,7 @@ The invalidity is mostly due to:
We can improve the marker-converter with a better precision of the references recognition, of the semantic annotation, the presentation markup part. We need two weeks for this task.
<a name="six"></a>
##6. RDF Generation
## 6. RDF Generation
The information used inside of the Akoma Ntoso could be serialized in RDF using UNDO ontology. The idea is to connect them in a RDF assertion connecting the agent with the role, the action and the duration of the event. An example is showed below:
......@@ -1047,7 +1047,7 @@ We will do this task in two months.
<a name="seven"></a>
##7. Milestones
## 7. Milestones
We have completed the 60% of the work respect the tasks expected. We need more time for implementing the following tasks (6 months in term of schedule, 12 man/months in term of effort):
......@@ -1064,7 +1064,7 @@ We have completed the 60% of the work respect the tasks expected. We need more t
<a name="eight"></a>
##8. Installation
## 8. Installation
- requires python3.7+
- clone this repo
......@@ -1073,7 +1073,7 @@ We have completed the 60% of the work respect the tasks expected. We need more t
- install the dependencies: pip install -r requirements.txt
<a name="eight-1"></a>
###8.1 Usage
### 8.1 Usage
- download all the documents: python run.py --download
- to parse one document: python run.py --parse \<filepath\>
- to parse all the documents: python run.py --parseall
......@@ -1082,7 +1082,7 @@ We have completed the 60% of the work respect the tasks expected. We need more t
**All the converted files will be written in the directory _out_**
<a name="eight-2"></a>
###8.2 Troubleshooting
### 8.2 Troubleshooting
If you are experiencing problems with import errors, export the PYTHONPATH as follows:
~~~~
......@@ -1090,7 +1090,7 @@ export PYTHONPATH:${PYTHONPATH}:<full/path/to/the/repo/>
~~~~
<a name="nine"></a>
##9. References
## 9. References
1. [Arthmetic properties of word embeddings](https://medium.com/data-from-the-trenches/arithmetic-properties-of-word-embeddings-e918e3fda2ac)
2. [Bag-of-Words TFIDF explained](http://datameetsmedia.com/bag-of-words-tf-idf-explained/)
......@@ -1111,7 +1111,7 @@ export PYTHONPATH:${PYTHONPATH}:<full/path/to/the/repo/>
17. https://iswc2017.semanticweb.org/wp-content/uploads/papers/MainProceedings/179.pdf
<a name="ten"></a>
##10. Resources
## 10. Resources
- Documents: http://undocs.org/; https://digitallibrary.un.org/
- UNDO ontology:https://unsceb-hlcm.github.io/onto-undo/index.html
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment