Can't use upper-case label in patterns for ner.teach

I am trying to add an entity – viz. label=EY <=> “ethnicity” – to an existing NER model using the ner.teach + patterns.jsonl approach. Given our domain is the life sciences, the purpose is to distinguish NORP from true ethnicity mentions that might be biologically or culturally relevant (e.g., “The French [NORP] Academy of Blah, Blah, Blah” vs. “French [EY] patients with Disease X”, resp.). So I created a seed of patterns of the following form:

{"label":"EY","pattern":[{"lower":"african"},{"lower":"-"},{"lower":"american"}]}
{"label":"EY","pattern":[{"lower":"african"},{"lower":"-"},{"lower":"americans"}]}
{"label":"EY","pattern":[{"lower":"african"},{"lower":"american"}]}
{"label":"EY","pattern":[{"lower":"african"},{"lower":"americans"}]}
{"label":"EY","pattern":[{"lower":"african"},{"lower":"british"}]}
{"label":"EY","pattern":[{"lower":"african"}]}
{"label":"EY","pattern":[{"lower":"akan"}]}
{"label":"EY","pattern":[{"lower":"alangan"}]}
{"label":"EY","pattern":[{"lower":"alaskan"}]}
{"label":"EY","pattern":[{"lower":"albanian"},{"lower":"american"}]}
{"label":"EY","pattern":[{"lower":"albanian"},{"lower":"british"}]}
...[and many, many more]

When I run ner.teach using this pattern file with the flag --label EY, I get basically random examples – meaning, that it isn’t using the pattern matches. The initial model does not have this label (nor even NORP, for that matter), so it makes sense, except that I thought this recipe was supposed to find pattern matches (exactly for this case of adding a new entity type!). Anyhow, on a hunch I changed the label to lower-case like so

{"label":"ey","pattern":[{"lower":"african"},{"lower":"-"},{"lower":"american"}]}
{"label":"ey","pattern":[{"lower":"african"},{"lower":"-"},{"lower":"americans"}]}
{"label":"ey","pattern":[{"lower":"african"},{"lower":"american"}]}
{"label":"ey","pattern":[{"lower":"african"},{"lower":"americans"}]}
{"label":"ey","pattern":[{"lower":"african"},{"lower":"british"}]}
{"label":"ey","pattern":[{"lower":"african"}]}
{"label":"ey","pattern":[{"lower":"akan"}]}
{"label":"ey","pattern":[{"lower":"alabamian"}]}
{"label":"ey","pattern":[{"lower":"alangan"}]}
{"label":"ey","pattern":[{"lower":"alaskan"}]}
{"label":"ey","pattern":[{"lower":"albanian"},{"lower":"american"}]}
{"label":"ey","pattern":[{"lower":"albanian"},{"lower":"british"}]}
...[and many, many more]

and suddenly the examples I was presented with for accept/reject/skip decisions all seemed reasonable. The problem is that I don’t want the label to be lower-case, and also the labels that were accepted are all “ey” and those rejected are “EY” (rejected “EY”, that is – I forgot to change the flag to --label ey and left it as --label EY). Or maybe it was the other way around; I can’t remember.

So my questions are (1) Why is this happening?, (2) Should this be happening? and (3) How do I make it work with upper-case labels? (It isn’t the number of characters in the label, either, b/c I tried with “ETHNICITY” and it also presented random predictions in the ner.teach session.)

I probably should mention that I just purchased a few weeks ago, so it’s unlikely to be out of date wrt the latest prodigy, but I am getting some numpy warnings about dimensionality incompatibility, or something similar:

 prodigy ner.teach ethnicity_ner <init_model> <raw_text>.jsonl --patterns patterns.jsonl --label EY
/Users/dennismehay/.virtualenvs/prodigy/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/Users/dennismehay/.virtualenvs/prodigy/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192, got 176
  return f(*args, **kwds)
/Users/dennismehay/.virtualenvs/prodigy/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  return f(*args, **kwds)
/Users/dennismehay/.virtualenvs/prodigy/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192, got 176
  return f(*args, **kwds)
Using 1 labels: EY

  ✨  Starting the web server at http://localhost:8080 ...
  Open the app in your browser and start annotating!

Just in case those errors might be responsible for the inability to use upper-case labels (unlikely, but remotely possible). The stackoverflow forums are split as to whether this is something to worry about.

Hi! The numpy warning came up the other day on the spaCy tracker and if I remember correctly, it happens when the current numpy version is newer than the one that a package was compiled with. So downgrading numpy to 1.14.5 should solve that.

The problem you describe is pretty strange indeed... labels are case-sensitive to as far as the entity recognizer and Prodigy are concerned, "ey" and "EY" are two different labels. They're also passed through exactly like that, so I don't see how this could be happening :thinking:

How frequent are the ethnicity mentions in your data? (Just roughly estimated – do you expect there to be one in every other example, or are they rarer?) And as a quick sanity check, what happens when you run ner.match (a recipe for only annotating pattern matches) instead of ner.teach?

And finally, to help debug this, are you able to reproduce the behaviour in a new dataset? Just to make sure it wasn't caused by an accidental label mismatch, or something completely different. (For example, the issue you describe would be consistent with a typo in the label – maybe even an invisible character that was copied over by accident? So Prodigy is looking to train label A, while the patterns only ever describe label B, resulting in only the model's predictions being displayed.)

To verify that the patterns were used, you can also check the meta information in the bottom right corner of the annotation card. It will say "pattern" and a numeric ID, which is the line number of the pattern that produced the match.

As a quick fix, you can always export your dataset using db-out, run a quick search and replace and then re-add it again under a new name via db-in. This will give you consistent data to train from.

That was fast!

I’ve done a few of the things you suggested, just to rule out the easy things:

I uninstalled numpy 1.15.0 and installed numpy-1.14.5. The warnings went away, but the issues persist. I still get presented with random, non-pattern-based choices. E.g. here’s the first one I see:

This also confirms that the label EY is being used. Clicking a whole bunch of rejects doesn’t make it get better either. Also, after clicking reject on several, I noticed that the score is very low, but it still presented me with those examples. (I would post another image, but the forum is limiting me to one image.)

However, running ner.match produces sensible things, even with the same pattern file. To confirm that it wasn’t the data set, I created yet another dummy data set and reran ner.teach…it still doesn’t work.

So, to recap: It is not a problem with the patterns.jsonl file, b/c it works using ner.match. It is not a numpy issue. It is some strange interaction btw the ner.teach recipe and upper-case labels in the JSON-L pattern file.

As for the number of possible ethnicity mentions in my corpus, is there some way to dump the output of ner.match? Otherwise, I have a quarter million examples and it would be a bit difficult to estimate. They are more spread out than other NER concepts like DISEASE, GENE, etc. that I am training on – i.e., not one per example; more like one per every 20 examples, or thereabouts. But in 250K examples, you do see quite a few. I can verify that they are there, b/c I ran the lower-case “ey” version of ner.teach and annotated almost 2,000 of the low-confidence cases.

To confirm that it wasn’t anything mistyped with the labels, I did the following:

$ cat /tmp/patterns.json 
{"label":"A","pattern":[{"lower":"cancer"}]}
{"label":"A","pattern":[{"lower":"tumor"}]}
{"label":"A","pattern":[{"lower":"tumour"}]}
{"label":"A","pattern":[{"lower":"carcinoma"}]}
{"label":"A","pattern":[{"lower":"adenocarcinoma"}]}
(prodigy) Catalytic-DM:prodigy_installation dennismehay$ prodigy ner.teach new_ds  en_core_web_md  corpus_250k.jsonl  --patterns /tmp/patterns.json  --label A
Using 1 labels: A

  ✨  Starting the web server at http://localhost:8080 ...
  Open the app in your browser and start annotating!

That is literally copied from the command-line. Also, this time I used a standard spaCy model as a base model, just to rule out any funny stuff happening b/c of any peculiarity of my pre-trained model. Still I get presented with random, non-matching suggestions:

But…(see next post where I paste in another image)

…now it doesn’t work, even if I use lower-cased labels:

$ (prodigy) cat /tmp/patterns.json
{"label":"a","pattern":[{"lower":"cancer"}]}
{"label":"a","pattern":[{"lower":"tumor"}]}
{"label":"a","pattern":[{"lower":"tumour"}]}
{"label":"a","pattern":[{"lower":"carcinoma"}]}
{"label":"a","pattern":[{"lower":"adenocarcinoma"}]}
(prodigy) Catalytic-DM:prodigy_installation dennismehay$ prodigy ner.teach new_ds  en_core_web_md  corpus_250k.jsonl  --patterns /tmp/patterns.json  --label a
[DOES NOT WORK]

Unless I use --label A, even though the labels are in lower-case in the patterns.jsonl file. This is what I see as the first example, if I do that:

But…when I accept the labels they are in lower-case in the data dump – despite being presented in upper-case in the web app!

(prodigy) Catalytic-DM:prodigy_installation dennismehay$ prodigy db-out new_ds /tmp/

  ✨  Exported 21 annotations for 'new_ds' from database SQLite
  /private/tmp/new_ds.jsonl

(prodigy) Catalytic-DM:prodigy_installation dennismehay$ head -3 /tmp/new_ds.jsonl 
{"text":"Further research is needed to empirically test consultation models in routine clinical practice, specifically for advanced cancer specialist nurses.","_input_hash":-1824203478,"_task_hash":-1848506420,"spans":[{"text":"cancer","start":123,"end":129,"priority":0,"score":0,"pattern":-655452327,"label":"a"}],"meta":{"score":0,"pattern":0},"answer":"accept"}
{"text":"The Prostate Cancer Model of Consultation can be used to structure clinical consultations to target self-management care plans at the individual level of need over the cancer care continuum.","_input_hash":1209784133,"_task_hash":-277571715,"spans":[{"text":"Cancer","start":13,"end":19,"priority":0,"score":0,"pattern":-655452327,"label":"a"}],"meta":{"score":0,"pattern":0},"answer":"accept"}
{"text":"The Prostate Cancer Model of Consultation can be used to structure clinical consultations to target self-management care plans at the individual level of need over the cancer care continuum.","_input_hash":1209784133,"_task_hash":263703371,"spans":[{"text":"cancer","start":168,"end":174,"priority":0,"score":0,"pattern":-655452327,"label":"a"}],"meta":{"score":0,"pattern":0},"answer":"accept"}

Thanks for the super detailed analysis – this is really helpful! I’ll see if I can reproduce the behaviour using your examples. In general, the expectation is that --label A should obviously not yield any matches for "label": "a", and that --label a should. The label seems to be correctly passed to the model, so it must be something related to the matching…

Btw, one more question: Could you check which version of spaCy you’re running?

Sure thing:

>>> spacy.__version__
'2.0.12'

Maybe too old? [edit: I had assumed that prodigy would install the required spaCy version, but maybe it didn’t.]

Thanks! And no, this is the latest version, released just the other week. Prodigy is version-pinned to 2.0.x, to make sure you always get the latest compatible version. And we usually pay close attention to backwards compatibility.

Even though we didn’t change anything related to the matcher, I wonder if there’s a connection here… if it’s not too much of a hassle, could you try downgrading to v2.0.11 and re-run your experiment?

Will do. No trouble at all. Thanks for all the support. I’ll report back in a few…

1 Like

I downgraded to 2.0.11 and got the same results. To recap:

(1) Does not work with upper-cased labels in the pattern file.
(2) Does not work with lower-cased labels in the pattern file with --label <lower_case_label_variant>...
(3) Only works with lower-cased labels in the pattern file with --label <upper_case_label_variant>...

This is super strange! Trying to reproduce. Here’s what I have:

Input.jsonl:

{"text": "An African-American man"}
{"text": "The African embassy"}

mehay.jsonl:

{"label":"EY","pattern":[{"lower":"african"},{"lower":"-"},{"lower":"american"}]}
{"label":"EY","pattern":[{"lower":"african"},{"lower":"-"},{"lower":"americans"}]}
{"label":"EY","pattern":[{"lower":"african"},{"lower":"american"}]}
{"label":"EY","pattern":[{"lower":"african"},{"lower":"americans"}]}
{"label":"EY","pattern":[{"lower":"african"},{"lower":"british"}]}
{"label":"EY","pattern":[{"lower":"african"}]}
{"label":"EY","pattern":[{"lower":"akan"}]}
{"label":"EY","pattern":[{"lower":"alangan"}]}
{"label":"EY","pattern":[{"lower":"alaskan"}]}
{"label":"EY","pattern":[{"lower":"albanian"},{"lower":"american"}]}
{"label":"EY","pattern":[{"lower":"albanian"},{"lower":"british"}]}

Command:

cat inputs.jsonl | prodigy ner.teach -l EY -pt mehay.jsonl debug-matcher en_core_web_sm

This seems to work. Could you check it works for you, and maybe modify until you get the behaviour?

Yep. That worked:

cat /tmp/inputs.jsonl | prodigy ner.teach -l EY -pt /tmp/mehay.jsonl debug-matcher en_core_web_sm

I see “An African-American man” as the first match.

However, if I use the same pattern file and my data, it doesn’t seem to work (my proverbial “inputs.json” file, that is). Very strange, indeed.

The really strange thing is that ner.match works on my large data file, using my large file of patterns. It just doesn’t work for ner.teach

Quick thought: I still don't see how this would cause the uppercase/lowercase thing, but it's possible that there's a connection to how the scoring is handled in the teach recipes. Essentially, the matcher and model are combined, and the prefer_uncertain sorter then decides whether to show and example or not. My comment here discusses this in more detail:

Thanks, Ines, that looks tantalizingly close to what’s happening to me. I didn’t get the problem to go away using, e.g., the matcher = PatternMatcher(model.nlp, label_span=False, label_task=True).from_disk(patterns) fix that you suggested. (I realize that this was for textcat.teach not ner.teach, but still, it was worth a shot.)

So I have roughly 700 patterns in my file. Could this be causing the matching to be so extensive that there is no more room for the model to explore around in? I’m referring to the interaction of the combined model-cum-matcher (combine_models(model, matcher)) and the prefer_uncertain(.) behavior. [edit: But that still wouldn’t explain why only two patterns – as Matt suggested – on my larger inputs.json file did not work either, though.]

Sorry for the radio silence. The forum limited me to 18 posts (“new user” limit). I’m going to use the first suggested work-around for now (export improperly cased labelled data, then re-case and re-import it to a new dataset, then train). It’s still way better than the non-prodigy alternative. Thanks so much for the support.

Ah, damn, sorry about that – I’ll see if I can disable that feature! (We do need some basic spam bot protection, but this one seems pretty counterproductive.)

Thanks again for sharing your analysis! We’ll definitely look into this and possible solutions to improve the matcher integration into the active learning workflow. I think I might have already mentioned this in my other post, but we probably want to exclude the matches from being filtered by the sorter – for example by updating the current logic used to combine streams produced by two models.