W

wordcut.jl

wordcut.jl is a word tokenizer for ASEAN languages written in Julia

a285e2ec example · by Vee Satayamas
Name Last Update
src Loading commit data...
test Loading commit data...
.codecov.yml Loading commit data...
.gitignore Loading commit data...
.travis.yml Loading commit data...
LICENSE.md Loading commit data...
README.md Loading commit data...
REQUIRE Loading commit data...
appveyor.yml Loading commit data...

wordcut

wordcut.jl is a word tokenizer for ASEAN languages written in Julia

Install

It has not been registered yet, so it has to be installed from source code?

Example

import wordcut

tree = wordcut.read_plain_dict("/your-path/your-dict.txt")
tokenizer = wordcut.create_tokenizer(tree)

for line in readlines(STDIN)
    println(join(wordcut.tokenize(tokenizer, chomp(line)), "|"))
end