first try pretokenizing cn texts with jieba

1 job for !37 with jieba in 1 minute and 2 seconds
latest merge request