update to Prodigy 1.8 and spaCy 2.1

I have used the version 1.7.1 with spaCy 2.0.18 for defining a model for customized NER. It works almost good as I have been asked I want to remove one tag and add this tag by regex and update model. I really like your new interface, do you think that should I update both spaCy and prodigy? I mean, the data that I provided, is consistent with these updates?

how can I update prodigy, (I do not have access to a new link) can I use the last link for updating in order to have the last update?

many many thanks
Best

Hi! The download link never changes, so if you're still within your update period, you can just use the existing link and download the latest version from there :slightly_smiling_face:

I'd suggest to just try it out in a separate virtual environment. Both environments will read from the same prodigy.json and database, so you'll be able to run the same experiments. And if you need to test something with the old version, it's all still there. You can even back up your config and database if you want to be 100% safe.

You should still be able to use the same data in the latest Prodigy and spaCy and everything should be compatible. The only known problem that can happen in some specific cases is this one, which should be easy to fix – see thread for details: ner.batch-train after ner.maual results error (Value error : [E024]) Basically, in spaCy v2.1, it's now "illegal" for the named entity recognizer to predict whitespace-only/newline-only spans as entities. This is nice, because it should give you better accuracy overall. But it also means that you need to remove those spans from your training data if they're in there.

1 Like

Hello, I have been working with the prodigy 1.7.1 version and spacy==2.0.18 and I would like to migrated to spacy 2.1.x and prodigy 1.8.3 but for my use case the models display better results on the 2.0.18, could exist a solution to emulate the hyperparameters from architecture 2.0.18 on the 2.1?
Best regards

that is great! still could use this..since when I update my prodigy and spacy

prodigy 1.8.3 has requirement jsonschema<3.0.0,>=2.6.0, but you'll have jsonschema 3.0.2 which is incompatible.

when I update my jsonschema to 2.6.0, it says

jupyterlab-server 1.0.0 has requirement jsonschema>=3.0.1, but you'll have jsonschema 2.6.0 which is incompatible.

I wrote

!python -m prodigy ner.manual my_data en_core_web_sm my_data.jsonl --my labels

the asterisk does not disapear.

I have updated my spacy to 2.1.8 and prodigy to 1.8.3 in a new environment as you recommend, I retrain my model using this new update, but when I write:
path= '../data/model_date_u01' nlp=spacy.load(path)
It says that:

could not broadcast input array from shape (96) into shape (128)

I have search in you website I found this thread

it seems it is kind of dependency problem,however I am using last update of prodigy and spaCy, could you please let me know your idea to solve this error? Many thanks

How was the model created? It sounds like what you're loading in as model_date_u01 is incompatible with that version? You can always check the meta.json in the model directory for the spaCy version it was created with. If that's an old version, you'll need to retrain.

1 Like

the model created with the new version of SpaCy 2.1.8

python -m prodigy ner.batch-train data_merged_u01 en_core_web_sm  --output Model_U01 --n-iter 10 --eval-split 0.2 --dropout 0.2 --no-missing

based on information pip list new env. when I see the metadata, it says that spacy_version":">=2.1.0" o which seems correct.

{"accuracy":{"ents_f":85.8587845242,"ents_p":86.3317889027,"ents_r":85.3909350025,"las":89.6616629074,"tags_acc":96.7783856079,"token_acc":99.0697323163,"uas":91.5287392082},"author":"Explosion AI","description":"English multi-task CNN trained on OntoNotes. Assigns context-specific token vectors, POS tags, dependency parse and named entities.","email":"contact@explosion.ai","lang":"en","license":"MIT","name":"core_web_sm","parent_package":"spacy","pipeline":["sentencizer","tagger","parser","ner"],"sources":["OntoNotes 5"],"spacy_version":">=2.1.0","speed":{"cpu":6684.8046553827,"gpu":null,"nwords":291314},"url":"https://explosion.ai","version":"2.1.0","vectors":{"width":0,"vectors":0,"keys":0,"name":null}}

I do not know what is the reason?

I would be very appreciated if I can know your idea, I still face this:

could not broadcast input array from shape (96) into shape (128)

I am sure I have used spaCY 2.1 AND PRODIGY 3.1.8 to build this model! I

I also found you comment in thread a bit connect

"
Ah yes, each Prodigy wheel will define the spaCy version it’s compatible with in its dependencies, and it’ll be installed when you install the wheel. So you usually want to let Prodigy handle its dependencies – if you install other version on top of it, you might end up with incompatibilities (and pip won’t tell you, because it doesn’t resolve the dependencies recursively, like conda etc.).
"
since I also used the pip for installing the new prodigy ,but I do not know what do you mean of recursive and what is the solution.
maybe I need to download the model again?

python -m spacy download en_core_web_sm

I also test with the last version to spaCy and prodigy and new model based on new version (since i have made env Ican do that) ,it gives exactly the same error!

I do not have any idea in my mind, any comments would be appreciated

It's definitely surprising that you're getting that error, if your versions are all correct. Could you give me the output of the following commands:

python -m spacy info
python -m spacy validate
python -m pip list
1 Like

thank you for the response

here is the output of first two

============================== Info about spaCy ==============================

spaCy version    2.1.4
Location         C:\Users\moha\Updated_Project\venv\lib\site-packages\spacy
Platform         Windows-10-10.0.17763-SP0
Python version   3.7.1
Models


(venv) (base) C:\Users\moha>python -m spacy validate
✔ Loaded compatibility table

====================== Installed models (spaCy v2.1.4) ======================
ℹ spaCy installation: C:\Users\moha\Updated_Project\venv\lib\site-packages\spacy

TYPE      NAME             MODEL            VERSION
package   en-core-web-sm   en_core_web_sm   2.1.0     ✔
package   en-core-web-md   en_core_web_md   2.1.0     ✔


(venv) (base) C:\Users\moha>python -m spacy validate
✔ Loaded compatibility table

====================== Installed models (spaCy v2.1.4) ======================
ℹ spaCy installation: C:\Users\moha\Updated_Project\venv\lib\site-packages\spacy

TYPE      NAME             MODEL            VERSION
package   en-core-web-sm   en_core_web_sm   2.1.0     ✔
package   en-core-web-md   en_core_web_md   2.1.0     ✔

here is the output of the third

(venv) (base) C:\Users\moha>python -m pip list
Package                            Version     Location
---------------------------------- ----------- ----------------------
-umpy                              1.15.4
absl-py                            0.7.1
alabaster                          0.7.12
anaconda-client                    1.7.2
anaconda-navigator                 1.9.6
anaconda-project                   0.8.2
asn1crypto                         0.24.0
astor                              0.7.1
astroid                            2.1.0
astropy                            3.1
atomicwrites                       1.2.1
attrs                              18.2.0
Babel                              2.6.0
backcall                           0.1.0
backports.functools-lru-cache      1.5
backports.os                       0.1.1
backports.shutil-get-terminal-size 1.0.0
beautifulsoup4                     4.6.3
bitarray                           0.8.3
bkcharts                           0.2
blaze                              0.11.3
bleach                             3.0.2
blis                               0.2.4
bokeh                              1.3.4
boto                               2.49.0
boto3                              1.9.143
botocore                           1.12.143
Bottleneck                         1.2.1
bz2file                            0.98
cachetools                         2.1.0
certifi                            2018.11.29
cffi                               1.11.5
chardet                            3.0.4
cheroot                            6.5.5
CherryPy                           18.1.2
citableclass                       0.0.21      c:\citableclass-master
Click                              7.0
cloudpickle                        0.6.1
clyent                             1.2.2
colorama                           0.4.1
comtypes                           1.1.7
conda                              4.7.11
conda-build                        3.17.6
conda-package-handling             1.3.10
conda-verify                       3.1.1
contextlib2                        0.5.5
cryptography                       2.4.2
cycler                             0.10.0
cymem                              2.0.2
Cython                             0.29.2
cytoolz                            0.9.0.1
dask                               1.0.0
datashape                          0.5.4
decorator                          4.3.0
defusedxml                         0.5.0
dill                               0.2.9
distlib                            0.2.9.post0
distributed                        1.25.1
docutils                           0.14
eli5                               0.8.1
en-core-web-md                     2.1.0
en-core-web-sm                     2.1.0
entrypoints                        0.2.3
et-xmlfile                         1.0.1
falcon                             1.4.1
fastcache                          1.0.2
feather-format                     0.4.0
filelock                           3.0.10
Flask                              1.0.2
Flask-Cors                         3.0.7
funcy                              1.12
future                             0.17.1
future-fstrings                    1.2.0
gast                               0.2.2
gensim                             3.7.1
gevent                             1.3.7
glob2                              0.6
google-api-python-client           1.7.8
google-auth                        1.6.3
google-auth-httplib2               0.0.3
graphviz                           0.10.1
greenlet                           0.4.15
grpcio                             1.16.1
h5py                               2.8.0
heapdict                           1.0.0
html5lib                           1.0.1
httplib2                           0.12.3
hug                                2.4.8
hyperopt                           0.1.2
idna                               2.8
imageio                            2.4.1
imagesize                          1.1.0
importlib-metadata                 0.6
ipykernel                          5.1.0
ipython                            7.2.0
ipython-genutils                   0.2.0
ipywidgets                         7.4.2
isort                              4.3.4
itsdangerous                       1.1.0
jaraco.functools                   2.0
jdcal                              1.4
jedi                               0.13.2
Jinja2                             2.10
jmespath                           0.9.4
joblib                             0.13.2
json-lines                         0.5.0
json5                              0.8.5
jsonlines                          1.2.0
jsonschema                         2.6.0
jupyter                            1.0.0
jupyter-client                     5.2.4
jupyter-console                    6.0.0
jupyter-contrib-core               0.3.3
jupyter-contrib-nbextensions       0.5.1
jupyter-core                       4.4.0
jupyter-highlight-selected-word    0.2.0
jupyter-latex-envs                 1.4.4
jupyter-nbextensions-configurator  0.4.1
jupyterlab                         1.0.0rc0
jupyterlab-server                  1.0.0
Keras                              2.2.4
Keras-Applications                 1.0.7
Keras-Preprocessing                1.0.9
keyring                            17.0.0
kiwisolver                         1.0.1
lazy-object-proxy                  1.3.1
libarchive-c                       2.8
lime                               0.1.1.34
llvmlite                           0.26.0
locket                             0.2.0
lxml                               4.2.5
Markdown                           2.6.11
MarkupSafe                         1.1.0
matplotlib                         3.1.1
mccabe                             0.6.1
menuinst                           1.4.14
mistune                            0.8.4
mkl-fft                            1.0.6
mkl-random                         1.0.2
mock                               2.0.0
more-itertools                     4.3.0
mpmath                             1.1.0
mpu                                0.19.1
msgpack                            0.5.6
msgpack-numpy                      0.4.3.2
multipledispatch                   0.6.0
murmurhash                         1.0.2
navigator-updater                  0.2.1
nbconvert                          5.4.0
nbformat                           4.4.0
networkx                           2.2
nltk                               3.4
nose                               1.3.7
notebook                           5.7.4
numba                              0.41.0
numexpr                            2.6.8
numpy                              1.16.4
numpydoc                           0.9.1
oauth2client                       4.1.3
odo                                0.5.1
of                                 1.0.1
olefile                            0.46
openpyxl                           2.5.12
packaging                          18.0
pandas                             0.23.4
pandocfilters                      1.4.2
parso                              0.3.1
partd                              0.3.9
path.py                            11.5.0
pathlib                            1.0.1
pathlib2                           2.3.3
patsy                              0.5.1
pbr                                5.1.3
peewee                             2.10.2
pep8                               1.7.1
pickleshare                        0.7.5
Pillow                             5.3.0
pip                                10.0.1
pkginfo                            1.4.2
plac                               0.9.6
plotly                             3.8.1
pluggy                             0.8.0
ply                                3.11
portend                            2.5
preshed                            2.0.1
prodigy                            1.8.3
progressbar2                       3.42.0
prometheus-client                  0.5.0
prompt-toolkit                     2.0.7
protobuf                           3.7.1
psutil                             5.4.8
py                                 1.7.0
pyarrow                            0.14.0
pyasn1                             0.4.4
pyasn1-modules                     0.2.4
pycodestyle                        2.4.0
pycosat                            0.6.3
pycparser                          2.19
pycrypto                           2.6.1
pycurl                             7.43.0.2
PyDrive                            1.3.1
pyemd                              0.5.1
pyflakes                           2.0.0
Pygments                           2.3.1
PyJWT                              1.7.1
pykg2vec                           0.0.48
pyLDAvis                           2.1.2
pylint                             2.2.2
pymongo                            3.8.0
pyodbc                             4.0.25
pyOpenSSL                          18.0.0
pyparsing                          2.3.0
Pyphen                             0.9.5
pyrsistent                         0.15.4
PySocks                            1.6.8
pyspellchecker                     0.4.0
pytest                             4.0.2
pytest-arraydiff                   0.3
pytest-astropy                     0.5.0
pytest-doctestplus                 0.2.0
pytest-openfiles                   0.3.1
pytest-remotedata                  0.3.1
python-crfsuite                    0.9.6
python-dateutil                    2.7.5
python-docx                        0.8.10
python-doi                         0.1.1
python-Levenshtein                 0.12.0
python-mimeparse                   1.6.0
python-utils                       2.3.0
pytz                               2018.7
PyWavelets                         1.0.1
pywin32                            224
pywinpty                           0.5.5
PyYAML                             3.13
pyzmq                              17.1.2
QtAwesome                          0.5.3
qtconsole                          4.4.3
QtPy                               1.5.2
regex                              2018.1.10
requests                           2.21.0
retrying                           1.3.3
rope                               0.11.0
rsa                                3.4.2
ruamel-yaml                        0.15.46
s3transfer                         0.2.0
scikit-image                       0.14.1
scikit-learn                       0.21.2
scipy                              1.1.0
seaborn                            0.9.0
Send2Trash                         1.5.0
setuptools                         40.8.0
simplegeneric                      0.8.1
simplejson                         3.16.0
singledispatch                     3.4.0.3
six                                1.12.0
sklearn-crfsuite                   0.3.6
smart-open                         1.8.3
snowballstemmer                    1.2.1
sortedcollections                  1.0.1
sortedcontainers                   2.1.0
spacy                              2.1.4
Sphinx                             2.1.2
sphinx-gallery                     0.4.0
sphinx-rtd-theme                   0.4.3
sphinxcontrib-applehelp            1.0.1
sphinxcontrib-devhelp              1.0.1
sphinxcontrib-htmlhelp             1.0.2
sphinxcontrib-jsmath               1.0.1
sphinxcontrib-qthelp               1.0.2
sphinxcontrib-serializinghtml      1.1.3
sphinxcontrib-websupport           1.1.0
spyder                             3.3.2
spyder-kernels                     0.3.0
SQLAlchemy                         1.2.15
srsly                              0.1.0
statsmodels                        0.9.0
style                              1.1.0
sympy                              1.3
tables                             3.4.4
tabulate                           0.8.3
tblib                              1.3.2
tempora                            1.14.1
tensorboard                        1.13.1
tensorflow                         1.13.1
tensorflow-estimator               1.13.0
tensorflow-hub                     0.4.0
termcolor                          1.1.0
terminado                          0.8.1
testpath                           0.4.2
textacy                            0.7.1
thinc                              7.0.8
toolz                              0.9.0
tornado                            5.1.1
tqdm                               4.28.1
traitlets                          4.3.2
typing                             3.6.4
ujson                              1.35
unicodecsv                         0.14.1
Unidecode                          1.1.0
update                             0.0.1
uritemplate                        3.0.0
urllib3                            1.24.1
waitress                           1.2.1
wasabi                             0.2.1
wcwidth                            0.1.7
webencodings                       0.5.1
Werkzeug                           0.14.1
wget                               3.2
wheel                              0.32.3
widgetsnbextension                 3.4.2
win-inet-pton                      1.0.1
win-unicode-console                0.5
wincertstore                       0.2
wordcloud                          1.5.0
wrapt                              1.10.11
ws4py                              0.5.1
xgboost                            0.90
xlrd                               1.2.0
XlsxWriter                         1.1.2
xlwings                            0.15.1
xlwt                               1.3.0
yellowbrick                        0.9.1
zc.lockfile                        1.4
zenodo-get                         1.1.1
zict                               0.1.3
You are using pip version 10.0.1, however version 19.2.3 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

I know that there is new version of spacy 2.1.8

I have tried also that. but I faced with the same error

did i do anything wrong that caused to dependency problem?

Many thanks

It would be very nice if you can answer this, I want to go the further steps of my project (improving some entities and classification) I rather like to go with new prodigy,since you have apparently used bert .that is very interesting,I reread relevant posts still did not find a solution ...bests

Now, I am very excited! problem is solved! :slight_smile:
I only define a new clean environment using conda (not pip)as follows:

conda create -n envc python=3.7 anaconda

then I activated as usual

conda activate envc

then Installed prodigy using pip!! after some intermediate errors that i solved that* , then it works like Benz:)! I am very excited since the result looks much better! however probably I would have some questions about that soon or late :slight_smile:

thank you both of you for your supports! however, I would be still interested to know what was the reason of error using venv by pip

P.S.
*
(there was some error related to dll, I have solved it by this

pip uninstall pyzmq
pip install pyzmq