Testing add_tokens

pvcastro · June 19, 2018, 7:55pm

Hi there,

I was running some tests trying to fix a Mismatched tokenization error I’m getting, and I tried running the following code, that was recommended in another thread:

from prodigy.components.preprocess import add_tokens
import en_core_web_sm

nlp = en_core_web_sm.load()
text = " The upstart streaming service, which is primarily geared for sports fans, has an uphill climb against deep-pocketed competitors marketing cable alternatives to cord-cutters: YouTube TV, Hulu Live and Sony's PlayStation Vue."
stream = [{'text': text, 'spans': {'start': 175, 'end': 185}}]
new_stream = add_tokens(nlp, stream)
print(list(new_stream))

I’m getting the following exception when running this code:

TypeError                                 Traceback (most recent call last)
<ipython-input-115-958a6dcd96e1> in <module>()
  6 stream = [{'text': text, 'spans': {'start': 175, 'end': 185}}]
  7 new_stream = add_tokens(nlp, stream)
----> 8 print(list(new_stream))

cython_src/prodigy/components/preprocess.pyx in add_tokens()

TypeError: string indices must be integers

Thanks!

ines · June 20, 2018, 7:46am

I think you might actually have a small typo in your stream: "spans" here is a dictionary, when it should be a list of dictionaries.

The new validation mechanism should catch errors like that within the recipes. If you want to implement something like this yourself (e.g. to make sure that your stream is formatted correctly for the annotation task), you can also call into the validator directly.

from prodigy.components.validate import Validator

validator = Validator('ner_manual')  # the view_id you want to use the stream with
for eg in stream:
    validator.check(eg)

Disclaimer: This is currently internals only, so the API may change in the future.

pvcastro · June 21, 2018, 2:11pm

Thanks!

I copied the code exactly from here, but adding the brackets solved it!

ines · June 21, 2018, 2:31pm

Ah, sorry, that was also a typo in my semi-pseudocode then! Thanks, fixed

Topic		Replies	Views
ValueError: Mismatched tokenization. in ner.make-gold ner , done	5	1450	March 11, 2018
ner.manual task with add_tokens and skip=True fails with KeyError. ner , done	5	614	December 11, 2018
TypeError when reviewing annotations spans.manual spancat	3	288	January 6, 2023
Audio Transcription \| Input Hash Error usage , done , audio	3	487	November 10, 2021
IndexError [E035] training recipe ner , database , solved	6	775	June 2, 2022

Testing add_tokens

Related topics