Hi there,
I was running some tests trying to fix a Mismatched tokenization error I’m getting, and I tried running the following code, that was recommended in another thread:
from prodigy.components.preprocess import add_tokens
import en_core_web_sm
nlp = en_core_web_sm.load()
text = " The upstart streaming service, which is primarily geared for sports fans, has an uphill climb against deep-pocketed competitors marketing cable alternatives to cord-cutters: YouTube TV, Hulu Live and Sony's PlayStation Vue."
stream = [{'text': text, 'spans': {'start': 175, 'end': 185}}]
new_stream = add_tokens(nlp, stream)
print(list(new_stream))
I’m getting the following exception when running this code:
TypeError Traceback (most recent call last)
<ipython-input-115-958a6dcd96e1> in <module>()
6 stream = [{'text': text, 'spans': {'start': 175, 'end': 185}}]
7 new_stream = add_tokens(nlp, stream)
----> 8 print(list(new_stream))
cython_src/prodigy/components/preprocess.pyx in add_tokens()
TypeError: string indices must be integers
Thanks!