I'm trying to annotate a dataset with the following string:
"The partnership comes after Baidu received approval earlier this week from regulators to test its self-driving cars in California, where Tesla Motors Inc.,Ford Motor Co. and Google parent Alphabet Inc., among others, are testing their autonomous-driving cars on the road."
Notice the period and comma between Inc and Ford – trying to load this dataset will trigger an UI error:
I cannot change/remove the blue box next to the token Ford (correctly identified as
ORG, but somehow prodigy is tripped up by the leading comma). Trying to remove the entity label leads to a JS error...
I've seen this happen before with similarly malformed data, which leads all identified entity labels to shift by one token.
I've manually changed the data for now. Ideally, prodigy would not split longer strings in those cases where this bug might occur.
Thanks for looking into this!