Duplicates in ner.correct in 1.10.2

  1. Annotated some examples in ner.correct in 1.9.9 (or similar version, it's difficult to keep track)
  2. Upgrade to 1.10.2
  3. Continue annotating same dataset from same input file using the same command except for changing spacy model to 2.3 version, since the spacy version changed as well
  4. A familiar example shows up, I "accept" the example
  5. db-out sure enough shows the same example twice with identical input hashes

This is not what I want and not what the documentation leads me to believe should happen. Am I missing anything?

prodigy.json is empty

There was an issue in v1.10.2 with exclude_by=input and feed_overlap=True where the filtering didn't work. The v1.10.3 release fixes that issue.

If you're not intending to have multiple sessions annotate the same examples, you could try putting feed_overlap=False in your prodigy.json file instead of updating.

Thanks for releasing this. On its own upgrading to 1.10.3 didn't seem to resolve the issue so I had to also set feed_overlap=False (which is not ideal).

@geniki thanks for the follow up. It seems like you were able to get a workable solution, so that's great.

You mentioned that upgrading didn't seem to resolve the issue, so you set feed_overlap=False. Can you confirm that this is the case? Duplicate items should be filtered out entirely for matching sessions when you use feed_overlap=True.

When you set feed_overlap=True, prodigy shows all the examples to each distinct "session" is encounters. If you don't use a named session (by adding ?session=name on the URL) you'll get a new default "session" each time you restart prodigy. This is because the default session is generated using the current timestamp when prodigy starts. Generally speaking, it's not good to use the default session when you intend to have named session overlap.

To sum it up, if you use feed_overlap=True you should always append a named session to the prodigy URL when doing annotations. This ensures that prodigy treats your examples as coming from the same place and filters out duplicates.