We have a model which works reasonably well in a jupyter notebook (a reasonable starting point for improvement)
But the same model, with the same input performs really badly when we use it outside of jupyter
# result contains text
def makedoc(result, model):
nlp = spacy.load(model)
doc = nlp(result)
return doc
doc = makedoc(result, model)
print("entities for ner")
for ent in doc.ents:
print(ent.label_, ent.text)
Different people on different machines (windows and mac) have successfully used the model on jupyter (jupyter notebook, in vscode, in pycharm). Jupyter chose the spacy version and it has worked with spacy 3.6.1 and 3.7.2.
When we set up a poetry environment for easier experimentation, everything works (annotate, train, extracting text from pdfs, using model). But the returned entities are much much worse (not a good starting point for model development)
I have tried quite a few different spacy and python versions in case there was some incompatibility. In general spacy is a later version eg 3.7.4, but we even tied the spacy version back to 3.6.1, and got the same poor results.
In case it was a poetry problem, I just ran the code in a .py file
(interpreter was ~/miniconda3/lib/python3.9, spacy was 3.7.2)
We got the same poor results as when using poetry
Question 1 – do we have any known incompatibilities between spacy versions and python versions? (I understand the numpy 2.0.0 problem.)
Question 2 – are there issues with a miniconda environment
Question 3 – any known incompatibilities with poetry
Question 4- is there something else missing that jupyter would provide eg any spacy dependencies I should have included
Any other thoughts on why I get better results in a jupyter notebook or anything else I might have forgotten to control for?