I am trying to create an CoNLL-U format data using multiple tagged dataset with different task(pos, ner, dep, etc).
My first attempt is to tag data separately (by different task), and combine them into single CoNLL-U dataset. However, results from recipe "dep.correct" will always missing a full stop at the end of sentence, as shown as below.
- The data is retrieved using "db-out" and printout "text".
- The comparison is taken from the result for "pos.correct"(top) and "dep.correct"(bottom)
- The command I used for both task is: "dep.correct(pos.correct) <database_name> en_core_web_sm <dataset.txt> --unsegmented"
- The full stop missing issue can be solved by dropping the "unsegmented" argument, but this argument is needed for my case.
- Since I am combining datasets to form a single dataset, the "text" and "token" section of the tagged datasets need to be identical.
How do I solve the issue? Is there any method to perform multiple tagging (pos, ner, dep, relation) in the same session on same dataset?
Thanks