What's wrong with this training data

I'm trying to train a model, but when I run project run all the script stops and throws an error

Traceback (most recent call last):
  File "/Users/tomtom/fun/projects/tutorials/nel_emerson/./scripts/create_corpus.py", line 57, in <module>
    typer.run(main)
  File "/Users/tomtom/Library/Python/3.9/lib/python/site-packages/typer/main.py", line 859, in run
    app()
  File "/Users/tomtom/Library/Python/3.9/lib/python/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/Users/tomtom/Library/Python/3.9/lib/python/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/tomtom/Library/Python/3.9/lib/python/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/tomtom/Library/Python/3.9/lib/python/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/tomtom/Library/Python/3.9/lib/python/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/tomtom/Library/Python/3.9/lib/python/site-packages/typer/main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/tomtom/fun/projects/tutorials/nel_emerson/./scripts/create_corpus.py", line 32, in main
    doc.ents = [entity]
  File "spacy/tokens/doc.pyx", line 728, in spacy.tokens.doc.Doc.ents.__set__
  File "spacy/tokens/doc.pyx", line 1737, in spacy.tokens.doc.get_entity_info
TypeError: object of type 'NoneType' has no len()

The offending line is this, which comes from Wikipedia. Not the best piece of text, but this just happens to be where the script throws an error.

{'text': 'Following the release on appeal of the defendants in the Oz trial, "an unmitigated disaster for the children of our country",Evening Standard, 6 November 1971, quote as reproduced in Tracey and Morrison, p.135, 207 n.6:14 Whitehouse launched the Nationwide Petition for Public Decency in January 1972, which gained 1.35 million signatures by the time it was presented to Edward Heath in April 1973.Dominic Sandbrook State of Emergency, The Way We Were: Britain 1970–74, London: Allen Lane, 2010, p.462 She had around 300 speaking engagements during the period of her highest profile. A pornographic magazine Whitehouse was launched in 1975 by publisher David Sullivan, who deliberately named it after herself.Roy Greenslade Press Gang: How Newspapers Make Profits From Propaganda, London: Macmillan, 2004 [2003], p.490Jamie Doward "Top shelf gathers dust" , The Observer, 13 May 2001', '_input_hash': '7077609340', '_task_hash': '1942734476', 'spans': [{'start': 655, 'end': 669, 'text': 'David Sullivan', 'rank': 0, 'label': 'PER', 'score': 1, 'source': 'en_core_web_lg', 'input_hash': '0383666125'}], 'meta': {'score': 1}, 'options': [], '_session_id': 'null', '_view_id': 'choice', 'accept': ['Q1046824'], 'answer': 'accept'}

I don't quite understand what the error is referring to?

Turns out the something got corrupted and the string positions were incorrect

1 Like