Entity Linking demo error

I am having a hard time getting the entity linking demo here to work.

From what I am seeing I am guessing it has to do with it using old versions of spacy and/or prodigy.

I managed to fairly easily fix one problem where the scripts are callling "KnowledgeBase" and Spacy complained that it was an abstract class. I replaced instances of "KnowledgeBase with "InMemoryLookupKB" and that seemed to work fine.

More worryingly, when I try to run the recipe, I get:

prodigy entity_linker.manual emersons_annotated assets/emerson_input_text.txt my_output/my_nlp/ my_output/my_kb assets/entities.csv -F scripts/el_recipe.py

/Users/alan/.pyenv/versions/3.8.18/lib/python3.8/site-packages/spacy/util.py:910: UserWarning: [W095] Model 'en_core_web_lg' (3.5.0) was trained with spaCy v3.5.0 and may not be 100% compatible with the current version (3.7.4). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
Traceback (most recent call last):
  File "/Users/alan/.pyenv/versions/3.8.18/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/alan/.pyenv/versions/3.8.18/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/alan/.pyenv/versions/3.8.18/lib/python3.8/site-packages/prodigy/__main__.py", line 50, in <module>
    main()
  File "/Users/alan/.pyenv/versions/3.8.18/lib/python3.8/site-packages/prodigy/__main__.py", line 44, in main
    controller = run_recipe(run_args)
  File "cython_src/prodigy/cli.pyx", line 123, in prodigy.cli.run_recipe
  File "cython_src/prodigy/cli.pyx", line 124, in prodigy.cli.run_recipe
  File "scripts/el_recipe.py", line 44, in entity_linker_manual
    model = EntityRecognizer(nlp)
  File "spacy/pipeline/ner.pyx", line 198, in spacy.pipeline.ner.EntityRecognizer.__init__
TypeError: __init__() takes at least 2 positional arguments (1 given)

Any idea what the problem is here? I looked around the docs and in support tickets, and I saw this line being used without comment, so I am not sure what might have changed recently?

I have:

spacy==3.7.4
prodigy==1.15.0

I have tried using an older version of spacy but I had trouble installing it.

It would be so nice to update this demo since it is really useful!

After looking at the API a bit, I tried changing this line:

model = EntityRecognizer(nlp.vocab, nlp)

to have the additional 'nlp.vocab' arg. That seems to work. But now I am getting this error:

prodigy entity_linker.manual emersons_annotated assets/emerson_input_text.txt my_output/my_nlp my_output/my_kb assets/entities.csv -F scripts/el_recipe.py
/Users/alan/repos/agolo/projects/prodigy-clean/.venv/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/alan/repos/agolo/projects/prodigy-clean/.venv/lib/python3.9/site-packages/prodigy/__main__.py", line 50, in <module>
    main()
  File "/Users/alan/repos/agolo/projects/prodigy-clean/.venv/lib/python3.9/site-packages/prodigy/__main__.py", line 44, in main
    controller = run_recipe(run_args)
  File "cython_src/prodigy/cli.pyx", line 123, in prodigy.cli.run_recipe
  File "cython_src/prodigy/cli.pyx", line 124, in prodigy.cli.run_recipe
  File "scripts/el_recipe.py", line 56, in entity_linker_manual
    stream = (eg for score, eg in model(stream))
TypeError: Argument 'doc' has incorrect type (expected spacy.tokens.doc.Doc, got list)

I hope I'm not doing something obviously wrong here, but I just reproduced all of this on a completely clean env, this time with python 3.9. Everything happened the same way.

It seems like maybe this demo code hasn't been run by anyone in a while. Would it be possible for someone who is an expert to test it, perhaps? It's crucial for my use case. Thank you!

Hi @alan-hogue ,

You are trying to run the outdated version of the tutorial incompatible with spaCy v3.
Here you can find the updated version reimplemented as a weasel project: projects/tutorials/nel_emerson at v3 · explosion/projects · GitHub

I cloned that repo and followed all instructions (downloading model, creating KB) and I am getting the same result that I did previously reported in the other thread when trying to run the modified version you posted:

prodigy entity_linker.manual emersons_annotated assets/emerson_input_text.txt temp/my_nlp/ temp/my_kb assets/entities.csv -F scripts/el_recipe.py
/Users/alan/repos/explosion/projects/.venv/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
Traceback (most recent call last):
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/alan/repos/explosion/projects/.venv/lib/python3.9/site-packages/prodigy/__main__.py", line 50, in <module>
    main()
  File "/Users/alan/repos/explosion/projects/.venv/lib/python3.9/site-packages/prodigy/__main__.py", line 44, in main
    controller = run_recipe(run_args)
  File "cython_src/prodigy/cli.pyx", line 123, in prodigy.cli.run_recipe
  File "cython_src/prodigy/cli.pyx", line 124, in prodigy.cli.run_recipe
  File "scripts/el_recipe.py", line 42, in entity_linker_manual
    kb = KnowledgeBase(vocab=nlp.vocab, entity_vector_length=1)
  File "spacy/kb/kb.pyx", line 25, in spacy.kb.kb.KnowledgeBase.__init__
TypeError: [E1046] KnowledgeBase is an abstract class and cannot be instantiated. If you are looking for spaCy's default knowledge base, use `InMemoryLookupKB`.

So the problem doesn't appear to be that I am running the wrong version.

Are you able to run the same thing and get it to work?

Since this thread is essentially dealing with the same issue as Annotation pipeline - chaining multiple annotation task types - #11 by magdaaniol
I'll close this one in favor of the other one.