Missing full stop when using "dep.correct"

I am trying to create an CoNLL-U format data using multiple tagged dataset with different task(pos, ner, dep, etc).
My first attempt is to tag data separately (by different task), and combine them into single CoNLL-U dataset. However, results from recipe "dep.correct" will always missing a full stop at the end of sentence, as shown as below.

  • The data is retrieved using "db-out" and printout "text".
  • The comparison is taken from the result for "pos.correct"(top) and "dep.correct"(bottom)
  • The command I used for both task is: "dep.correct(pos.correct) <database_name> en_core_web_sm <dataset.txt> --unsegmented"
  • The full stop missing issue can be solved by dropping the "unsegmented" argument, but this argument is needed for my case.
  • Since I am combining datasets to form a single dataset, the "text" and "token" section of the tagged datasets need to be identical.

How do I solve the issue? Is there any method to perform multiple tagging (pos, ner, dep, relation) in the same session on same dataset?


I kinda solve the problem by changing line 113 on dep.py from "sents = [doc[: len(doc) - 1]] if unsegmented else doc.sents" to "sents = [doc[: len(doc)]] if unsegmented else doc.sents"
Does this line important for some reason? If not, I think I will keep the changes.

So, my remaining question is: can I do multiple tagging (POS, DEP, Relation, etc) on the same dataset in the same session?