Hi, I did about 300 annotations with ner.manual with my data. I then stopped to do a
prodigy train ner with the dataset and got this error :
ValueError: Unmatched ''"' when decoding 'string'
Any help? some of my document are a bit long (about 10000 characters)
I did some research in the forum and found some similar issues but mos of theme are related to image dataset and don't apply to my case.
If the problem is due to problems in the annotation database how could I tell
prodigy train ner to ignore those documents or how could I programmatically delete those documents in the annotation database
This is a generic Python error and typically means that there's a closing quotation missing. So it's possible that you have one example in there that's so long that the SQLite DB can't handle it, so the task JSON is cut off?
If that's the case, you'll probably see the same error when running
db-out. In that case, you could use something like the SQLite Browser and investigate your database to see if there's something in it that looks suspicious – for example, something to look for could be JSON data that doesn't end with
} and is likely truncated.
Thanks for the response actually I am using MySQL server.
So if there is such case the only solution is to manually delete that annotation in the database.
It could be could if prodigy could notify this before labeling and saving it to the database, since the annotation can't be used anyway.
I check the database and deleted the malformed annotation. the error was because the text was too long, it seems that the content column in example is limited to something like 65535 characters, since that's the size of the content key in the example column. all the bad examples have that size in the cont
Ah okay, that makes sense. I'm surprised this didn't cause a MySQL error when the examples got saved in the first place
For now, I guess you can just remove those columns and maybe there's a way to increase the character limit? You could also add a check in a custom recipe that shows a warning if the examples are too long. There's typically not a good reason why you'd want to work with individual examples that are this long.
Yees , you right.
it's indeed weird that MySQL didn't say anything, but it's because their was no quote problem but just a JSON one.
I just didnt know long document could cause problem when annotating, I splitted all long long documents with spacy sentence segmenter