Examples from stream are shown twice

Hi!

We are having trouble running an annotation task with a custom recipe. Unfortunately, some examples from the stream are shown multiple times to the annotator. Therefore, we are ending up with lots of duplicates in the data.

To demonstrate the problem, we created a minimal example.

We are using the current prodigy version 1.11.4 and the following recipe:

import prodigy
import spacy
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens


@prodigy.recipe('ner_ate',
                dataset=("The dataset to save results", "positional", None, str),
                file_path=("The input data to use", "positional", None, str))
def ner_ate(dataset, file_path):
    def get_stream(fp):
        records = JSONL(fp)
        # use dummy image
        src = 'https://prodi.gy/static/social_dark-73aae237522610d930c61b32422092ef.jpg'
        for record_num, record in enumerate(records):
                # examples with text for ner.manual and html code with image for html block
                # record number in meta data to keep track of order
                yield {"text": record["text"],  # text for ner.manual block
                       "html": f'<img src={src} alt="" height="200" /></p>',
                       "meta": {"record_num": record_num}}

    nlp = spacy.blank("en")  # blank spaCy pipeline for tokenization
    stream = get_stream(fp=file_path)  # set up the stream
    stream = add_tokens(nlp, stream)  # tokenize the stream for ner.manual

    return {
        "dataset": dataset,  # the dataset to save annotations to
        "view_id": "blocks",  # set the view_id to "blocks"
        "stream": stream,  # the stream of incoming examples
        "config": {
            "host": "0.0.0.0",
            "port": 8182,
            "labels": ["Aspect"],
            "blocks": [{"view_id": "ner_manual"},
                       {"view_id": "html"}]
        }
    }

For testing purposes, we were using the news_headlines dataset
and ran prodigy in a docker container with the following simple dockerfile:

FROM python:3.7

ENV PRODIGY_HOME ./app
ENV PRODIGY_LOGGING basic

RUN pip3 install --upgrade pip && \
    pip3 install spacy==3.1.2

COPY wheel/prodigy-1.11.4-cp37-cp37m-linux_x86_64.whl ./wheel/
RUN pip3 install wheel/prodigy-1.11.4-cp37-cp37m-linux_x86_64.whl \
    && rm -rf wheel/prodigy-1.11.4-cp37-cp37m-linux_x86_64.whl
RUN python3 -m spacy download en_core_web_sm

COPY data ./data/
COPY recipes ./recipes/

CMD python3 -m prodigy ner_ate ate_highinv ./data/news_headlines.jsonl.txt -F ./recipes/ner_ate.py

EXPOSE 8182

There is no extra config.json.

Using the record_num in the meta data, we were able to keep track of the order, in which the examples are shown. There were multiple points in time, were prodigy jumped back, e.g., from record_num 35 to record_num 11 and repeated some examples before continuing at record_num 36. The result was a dataset of 240 annotations (see screenshot below), even though there are only 200 examples in the dataset. Meaning, 40 examples (20% of the data) were repeated and shown twice to the user.

Do you have any idea what could be going wrong here?

Thanks a lot in advance!
Niclas

2 Likes

Thanks for the detailed report and the example!

One quick question about your annotation process: Did this all happen within the same annotation session or did you ever restart the server? And if you look at the duplicate examples in the data, are the _input_hash and/or _task_hash values identical?

Hi @ines, thanks for the fast reply.

Yes, the annotations were all done in one session. I clicked through the 200 examples in one go without interruption. But I also didn't hold down the "accept" key or anything like that.

Here is an example of a duplicated record from the dataset. They are identical except for the timestamp. record_num 127 was correctly displayed at position 127 and then repeated at position 151.

                           text                 meta  _input_hash  _task_hash  \
127  Digital Muse for Beat Poet  {'record_num': 127}   -805557754   893792064   
151  Digital Muse for Beat Poet  {'record_num': 127}   -805557754   893792064   

     answer  _timestamp  
127  accept  1633723793  
151  accept  1633723803

Hi,
thank you for the amazing tool :pray: I have unfortunately the same problem of duplicates in NER annotation and would appreciate any advice to solve it.

Hi! Are you using the same version and observing the same problem, i.e. examples with identical task hashes and input hashes being repeated in the same session?

We have a new version coming up and I'll update this thread once it's out. It might have an impact so definitely try it and see if you can still encounter duplication.

Hi! I have the same issue, multiple repeating examples are shown, but in our case we use the multi-session mode. Our team noticed it happens when more than one user is annotating at the same time. When just one person is annotating, after a few examples it stops repeating. Running on version 1.11.4, it's a ner.manual recipe, here is the config:

{
    "theme": "basic",
    "buttons": ["accept", "reject", "ignore", "undo"],
    "batch_size": 20,
    "history_size": 20,
    "port": 8000,
    "host": "0.0.0.0",
    "cors": true,
    "db": "sqlite",
    "db_settings": {
        "sqlite": {
        "name": "prodigy.db",
        "path": "/app/database"
        }
    },
    "api_keys": {},
    "validate": true,
    "auto_exclude_current": true,
    "instant_submit": false,
    "feed_overlap": false,
    "ui_lang": "pt",
    "project_info": ["dataset", "session", "lang", "recipe_name", "view_id", "label"],
    "show_stats": false,
    "hide_meta": false,
    "show_flag": false,
    "instructions": false,
    "swipe": false,
    "swipe_gestures": { "left": "accept", "right": "reject" },
    "split_sents_threshold": false,
    "html_template": false,
    "global_css": null,
    "javascript": null,
    "writing_dir": "ltr",
    "show_whitespace": false
}

some examples from db-out:

{'text': 'As pessoas reclamam,...', '_input_hash': 1683003174, '_task_hash': -82705531, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'As pessoas reclamam,...', '_input_hash': 1683003174, '_task_hash': -82705531, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'Destarte, é inegável...', '_input_hash': 916516496, '_task_hash': -354755047, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'Destarte, é inegável...', '_input_hash': 916516496, '_task_hash': -354755047, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'Esse cenário antagôn...', '_input_hash': 1442234240, '_task_hash': -1947921422, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'Esse cenário antagôn...', '_input_hash': 1442234240, '_task_hash': -1947921422, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'Logo, para o sociólo...', '_input_hash': 989722103, '_task_hash': 1152576164, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'Logo, para o sociólo...', '_input_hash': 989722103, '_task_hash': 1152576164, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'No início de 2021 oc...', '_input_hash': -555826576, '_task_hash': 1621368279, '_annotator_id': 'tagger_v2-marinastri'}
{'text': 'No início de 2021 oc...', '_input_hash': -555826576, '_task_hash': 1621368279, '_annotator_id': 'tagger_v2-marinastri'}
{'text': 'O acesso ao conteúdo...', '_input_hash': -718039749, '_task_hash': -672062954, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'O acesso ao conteúdo...', '_input_hash': -718039749, '_task_hash': -672062954, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'Primeiramente, como ...', '_input_hash': 1883889543, '_task_hash': 27718084, '_annotator_id': 'tagger_v2-guiij'}
{'text': 'Primeiramente, como ...', '_input_hash': 1883889543, '_task_hash': 27718084, '_annotator_id': 'tagger_v2-guiij'}

Just released v1.11.5 that includes a fix that's likely relevant. Could you re-run your process with the new version and see if it resolves the problem?

Hi Ines, unfortunately my teammate's still finding repeated examples, just like shown above, with identical hashes. They were annotating alone, so it's not a multiuser session issue probably.

Hi, about when in your stream is the duplication happening? If it's towards the end that's actually expected behavior. The idea being we don't want a user to queue up a bunch of tasks, not annotate them and those examples not be available for ready members of the team to annotate. If you're seeing duplicates right away or somewhere in the middle of your data then that's something I'll look into for sure.
Thanks!

Greetings, Kabir
Our stream contains 4000 examples, we started noticing the repetition around 30-40 examples in.
Thank you so much!

edit: I'm leaving some examples from the stream

$ head -n 10 4000_sents_postag.jsonl 
{"text": "No entanto, o filósofo Immanuel Kant afirma: ´´O ser humano é aquilo que a educação faz dele.", "spans": [{"start": 0, "end": 2, "label": "KC"}, {"start": 3, "end": 10, "label": "KC"}, {"start": 10, "end": 11, "label": "PU"}, {"start": 12, "end": 13, "label": "ART"}, {"start": 14, "end": 22, "label": "N"}, {"start": 23, "end": 31, "label": "NPROP"}, {"start": 32, "end": 36, "label": "NPROP"}, {"start": 37, "end": 43, "label": "V"}, {"start": 43, "end": 44, "label": "PU"}, {"start": 45, "end": 46, "label": "N"}, {"start": 46, "end": 47, "label": "N"}, {"start": 47, "end": 48, "label": "ART"}, {"start": 49, "end": 52, "label": "N"}, {"start": 53, "end": 59, "label": "N"}, {"start": 60, "end": 61, "label": "V"}, {"start": 62, "end": 68, "label": "PROSUB"}, {"start": 69, "end": 72, "label": "PRO-KS"}, {"start": 73, "end": 74, "label": "ART"}, {"start": 75, "end": 83, "label": "N"}, {"start": 84, "end": 87, "label": "V"}, {"start": 88, "end": 92, "label": "PREP+PROPESS"}, {"start": 92, "end": 93, "label": "PU"}]}
{"text": "Por isso é preciso que o Estado promova longos debates com as escolas proporcionando aos professores uma maneira agradável de instruir os alunos desde o ensino fundamental que o cancelamento não é a forma correta de punir alguém pelo erro.", "spans": [{"start": 0, "end": 3, "label": "PREP"}, {"start": 4, "end": 8, "label": "PROSUB"}, {"start": 9, "end": 10, "label": "V"}, {"start": 11, "end": 18, "label": "ADJ"}, {"start": 19, "end": 22, "label": "KS"}, {"start": 23, "end": 24, "label": "ART"}, {"start": 25, "end": 31, "label": "N"}, {"start": 32, "end": 39, "label": "V"}, {"start": 40, "end": 46, "label": "ADJ"}, {"start": 47, "end": 54, "label": "ADJ"}, {"start": 55, "end": 58, "label": "PREP"}, {"start": 59, "end": 61, "label": "ART"}, {"start": 62, "end": 69, "label": "N"}, {"start": 70, "end": 84, "label": "V"}, {"start": 85, "end": 88, "label": "PREP+ART"}, {"start": 89, "end": 100, "label": "N"}, {"start": 101, "end": 104, "label": "ART"}, {"start": 105, "end": 112, "label": "N"}, {"start": 113, "end": 122, "label": "ADJ"}, {"start": 123, "end": 125, "label": "PREP"}, {"start": 126, "end": 134, "label": "V"}, {"start": 135, "end": 137, "label": "ART"}, {"start": 138, "end": 144, "label": "N"}, {"start": 145, "end": 150, "label": "PREP"}, {"start": 151, "end": 152, "label": "ART"}, {"start": 153, "end": 159, "label": "N"}, {"start": 160, "end": 171, "label": "ADJ"}, {"start": 172, "end": 175, "label": "PRO-KS"}, {"start": 176, "end": 177, "label": "ART"}, {"start": 178, "end": 190, "label": "N"}, {"start": 191, "end": 194, "label": "ADV"}, {"start": 195, "end": 196, "label": "V"}, {"start": 197, "end": 198, "label": "ART"}, {"start": 199, "end": 204, "label": "N"}, {"start": 205, "end": 212, "label": "ADJ"}, {"start": 213, "end": 215, "label": "PREP"}, {"start": 216, "end": 221, "label": "V"}, {"start": 222, "end": 228, "label": "PROSUB"}, {"start": 229, "end": 233, "label": "PREP+ART"}, {"start": 234, "end": 238, "label": "N"}, {"start": 238, "end": 239, "label": "PU"}]}
{"text": "Nesse sentido, observa-se como o consumo exagerado favorece a degradação do meio ambiente, além de prejudicar a qualidade de vida dos cidadãos.", "spans": [{"start": 0, "end": 5, "label": "PREP+PROADJ"}, {"start": 6, "end": 13, "label": "N"}, {"start": 13, "end": 14, "label": "PU"}, {"start": 15, "end": 25, "label": "V+PROPESS"}, {"start": 26, "end": 30, "label": "PREP"}, {"start": 31, "end": 32, "label": "ART"}, {"start": 33, "end": 40, "label": "N"}, {"start": 41, "end": 50, "label": "PCP"}, {"start": 51, "end": 59, "label": "V"}, {"start": 60, "end": 61, "label": "ART"}, {"start": 62, "end": 72, "label": "N"}, {"start": 73, "end": 75, "label": "PREP+ART"}, {"start": 76, "end": 80, "label": "N"}, {"start": 81, "end": 89, "label": "N"}, {"start": 89, "end": 90, "label": "PU"}, {"start": 91, "end": 95, "label": "PREP"}, {"start": 96, "end": 98, "label": "PREP"}, {"start": 99, "end": 109, "label": "V"}, {"start": 110, "end": 111, "label": "ART"}, {"start": 112, "end": 121, "label": "N"}, {"start": 122, "end": 124, "label": "PREP"}, {"start": 125, "end": 129, "label": "N"}, {"start": 130, "end": 133, "label": "PREP+ART"}, {"start": 134, "end": 142, "label": "N"}, {"start": 142, "end": 143, "label": "PU"}]}
{"text": "Nesse prisma, destacam-se dois aspectos importantes: o papel da indústria agropecuária no desflorestamento, e quais os impactos de tais atividades.", "spans": [{"start": 0, "end": 5, "label": "PREP+PROADJ"}, {"start": 6, "end": 12, "label": "N"}, {"start": 12, "end": 13, "label": "PU"}, {"start": 14, "end": 25, "label": "V+PROPESS"}, {"start": 26, "end": 30, "label": "NUM"}, {"start": 31, "end": 39, "label": "N"}, {"start": 40, "end": 51, "label": "ADJ"}, {"start": 51, "end": 52, "label": "PU"}, {"start": 53, "end": 54, "label": "ART"}, {"start": 55, "end": 60, "label": "N"}, {"start": 61, "end": 63, "label": "PREP+ART"}, {"start": 64, "end": 73, "label": "N"}, {"start": 74, "end": 86, "label": "ADJ"}, {"start": 87, "end": 89, "label": "PREP+ART"}, {"start": 90, "end": 106, "label": "N"}, {"start": 106, "end": 107, "label": "PU"}, {"start": 108, "end": 109, "label": "KC"}, {"start": 110, "end": 115, "label": "PRO-KS"}, {"start": 116, "end": 118, "label": "ART"}, {"start": 119, "end": 127, "label": "N"}, {"start": 128, "end": 130, "label": "PREP"}, {"start": 131, "end": 135, "label": "PROADJ"}, {"start": 136, "end": 146, "label": "N"}, {"start": 146, "end": 147, "label": "PU"}]}
{"text": "Esta taxa alarmante já se via por relatos mais antigos, especificamente dos anos 90.", "spans": [{"start": 0, "end": 4, "label": "PROADJ"}, {"start": 5, "end": 9, "label": "N"}, {"start": 10, "end": 19, "label": "ADJ"}, {"start": 20, "end": 22, "label": "ADV"}, {"start": 23, "end": 25, "label": "PROPESS"}, {"start": 26, "end": 29, "label": "V"}, {"start": 30, "end": 33, "label": "PREP"}, {"start": 34, "end": 41, "label": "N"}, {"start": 42, "end": 46, "label": "ADV"}, {"start": 47, "end": 54, "label": "ADJ"}, {"start": 54, "end": 55, "label": "PU"}, {"start": 56, "end": 71, "label": "ADJ"}, {"start": 72, "end": 75, "label": "PREP+ART"}, {"start": 76, "end": 80, "label": "N"}, {"start": 81, "end": 83, "label": "N"}, {"start": 83, "end": 84, "label": "PU"}]}
{"text": "Em dado momento, Martin Luther King, um escritor ativista, diz que a injustiça em um lugar qualquer é uma ameaça à justiça em todo lugar.", "spans": [{"start": 0, "end": 2, "label": "PREP"}, {"start": 3, "end": 7, "label": "PCP"}, {"start": 8, "end": 15, "label": "N"}, {"start": 15, "end": 16, "label": "PU"}, {"start": 17, "end": 23, "label": "NPROP"}, {"start": 24, "end": 30, "label": "NPROP"}, {"start": 31, "end": 35, "label": "NPROP"}, {"start": 35, "end": 36, "label": "PU"}, {"start": 37, "end": 39, "label": "ART"}, {"start": 40, "end": 48, "label": "N"}, {"start": 49, "end": 57, "label": "ADJ"}, {"start": 57, "end": 58, "label": "PU"}, {"start": 59, "end": 62, "label": "V"}, {"start": 63, "end": 66, "label": "KS"}, {"start": 66, "end": 67, "label": "V+PROPESS"}, {"start": 67, "end": 68, "label": "PREP"}, {"start": 69, "end": 78, "label": "N"}, {"start": 79, "end": 81, "label": "PREP"}, {"start": 82, "end": 84, "label": "ART"}, {"start": 85, "end": 90, "label": "N"}, {"start": 91, "end": 99, "label": "PROADJ"}, {"start": 100, "end": 101, "label": "V"}, {"start": 102, "end": 105, "label": "ART"}, {"start": 106, "end": 112, "label": "N"}, {"start": 113, "end": 114, "label": "PREP+ART"}, {"start": 115, "end": 122, "label": "N"}, {"start": 123, "end": 125, "label": "PREP"}, {"start": 126, "end": 130, "label": "PROADJ"}, {"start": 131, "end": 136, "label": "N"}, {"start": 136, "end": 137, "label": "PU"}]}
{"text": "A cada dia que passa a exploração exacerbada destroem matas, florestas e poluem mais o ambiente, essa devastação em nome do desenvolvimento econômico extrapola os limites do consumo consciente prejudicando a todos os cidadões.", "spans": [{"start": 0, "end": 1, "label": "PREP"}, {"start": 2, "end": 6, "label": "PROADJ"}, {"start": 7, "end": 10, "label": "N"}, {"start": 11, "end": 14, "label": "PRO-KS"}, {"start": 15, "end": 20, "label": "V"}, {"start": 21, "end": 22, "label": "ART"}, {"start": 23, "end": 33, "label": "N"}, {"start": 34, "end": 44, "label": "PCP"}, {"start": 45, "end": 53, "label": "V"}, {"start": 54, "end": 59, "label": "N"}, {"start": 59, "end": 60, "label": "PU"}, {"start": 61, "end": 70, "label": "N"}, {"start": 71, "end": 72, "label": "KC"}, {"start": 73, "end": 79, "label": "V"}, {"start": 80, "end": 84, "label": "ADV"}, {"start": 85, "end": 86, "label": "ART"}, {"start": 87, "end": 95, "label": "N"}, {"start": 95, "end": 96, "label": "PU"}, {"start": 97, "end": 101, "label": "PROADJ"}, {"start": 102, "end": 112, "label": "N"}, {"start": 113, "end": 115, "label": "PREP"}, {"start": 116, "end": 120, "label": "N"}, {"start": 121, "end": 123, "label": "PREP+ART"}, {"start": 124, "end": 139, "label": "N"}, {"start": 140, "end": 149, "label": "ADJ"}, {"start": 150, "end": 159, "label": "V"}, {"start": 160, "end": 162, "label": "ART"}, {"start": 163, "end": 170, "label": "N"}, {"start": 171, "end": 173, "label": "PREP+ART"}, {"start": 174, "end": 181, "label": "N"}, {"start": 182, "end": 192, "label": "ADJ"}, {"start": 193, "end": 205, "label": "V"}, {"start": 206, "end": 207, "label": "PREP"}, {"start": 208, "end": 213, "label": "PROADJ"}, {"start": 214, "end": 216, "label": "ART"}, {"start": 217, "end": 225, "label": "N"}, {"start": 225, "end": 226, "label": "PU"}]}
{"text": "Mas também, ações racistas e de injurias raciais ainda acometem a nossa sociedade, com o intuito de ofender a vítima com elementos referentes à raça, religião e etnia.", "spans": [{"start": 0, "end": 3, "label": "KC"}, {"start": 4, "end": 10, "label": "PDEN"}, {"start": 10, "end": 11, "label": "PU"}, {"start": 12, "end": 17, "label": "N"}, {"start": 18, "end": 26, "label": "ADJ"}, {"start": 27, "end": 28, "label": "KC"}, {"start": 29, "end": 31, "label": "PREP"}, {"start": 32, "end": 40, "label": "N"}, {"start": 41, "end": 48, "label": "ADJ"}, {"start": 49, "end": 54, "label": "ADV"}, {"start": 55, "end": 63, "label": "V"}, {"start": 64, "end": 65, "label": "ART"}, {"start": 66, "end": 71, "label": "PROADJ"}, {"start": 72, "end": 81, "label": "N"}, {"start": 81, "end": 82, "label": "PU"}, {"start": 83, "end": 86, "label": "PREP"}, {"start": 87, "end": 88, "label": "ART"}, {"start": 89, "end": 96, "label": "N"}, {"start": 97, "end": 99, "label": "PREP"}, {"start": 100, "end": 107, "label": "V"}, {"start": 108, "end": 109, "label": "ART"}, {"start": 110, "end": 116, "label": "N"}, {"start": 117, "end": 120, "label": "PREP"}, {"start": 121, "end": 130, "label": "N"}, {"start": 131, "end": 141, "label": "PREP"}, {"start": 142, "end": 143, "label": "PREP+ART"}, {"start": 144, "end": 148, "label": "N"}, {"start": 148, "end": 149, "label": "PU"}, {"start": 150, "end": 158, "label": "N"}, {"start": 159, "end": 160, "label": "KC"}, {"start": 161, "end": 166, "label": "N"}, {"start": 166, "end": 167, "label": "PU"}]}
{"text": "Muitos são adeptos da filosofia na Gilles Lipovestsk, onde o mesmo acredita que o consumismo é uma forma terapêutica de aliviar a tenção e a ansiedade.", "spans": [{"start": 0, "end": 6, "label": "PROSUB"}, {"start": 7, "end": 10, "label": "V"}, {"start": 11, "end": 18, "label": "N"}, {"start": 19, "end": 21, "label": "PREP+ART"}, {"start": 22, "end": 31, "label": "N"}, {"start": 32, "end": 34, "label": "PREP+ART"}, {"start": 35, "end": 41, "label": "NPROP"}, {"start": 42, "end": 52, "label": "NPROP"}, {"start": 52, "end": 53, "label": "PU"}, {"start": 54, "end": 58, "label": "ADV-KS"}, {"start": 59, "end": 60, "label": "ART"}, {"start": 61, "end": 66, "label": "PROSUB"}, {"start": 67, "end": 75, "label": "V"}, {"start": 76, "end": 79, "label": "KS"}, {"start": 80, "end": 81, "label": "ART"}, {"start": 82, "end": 92, "label": "N"}, {"start": 93, "end": 94, "label": "V"}, {"start": 95, "end": 98, "label": "ART"}, {"start": 99, "end": 104, "label": "N"}, {"start": 105, "end": 116, "label": "ADJ"}, {"start": 117, "end": 119, "label": "PREP"}, {"start": 120, "end": 127, "label": "V"}, {"start": 128, "end": 129, "label": "ART"}, {"start": 130, "end": 136, "label": "N"}, {"start": 137, "end": 138, "label": "KC"}, {"start": 139, "end": 140, "label": "ART"}, {"start": 141, "end": 150, "label": "N"}, {"start": 150, "end": 151, "label": "PU"}]}
{"text": "Durante esse período diversas pessoas sentiram-se sozinhas por conta de não poder estar encontrando familiares ou alguém próximo e consequentemente causando problemas para ela mesmo, como algumas doenças.", "spans": [{"start": 0, "end": 7, "label": "PREP"}, {"start": 8, "end": 12, "label": "PROADJ"}, {"start": 13, "end": 20, "label": "N"}, {"start": 21, "end": 29, "label": "PROADJ"}, {"start": 30, "end": 37, "label": "N"}, {"start": 38, "end": 49, "label": "V+PROPESS"}, {"start": 50, "end": 58, "label": "N"}, {"start": 59, "end": 62, "label": "PREP"}, {"start": 63, "end": 68, "label": "N"}, {"start": 69, "end": 71, "label": "PREP"}, {"start": 72, "end": 75, "label": "ADV"}, {"start": 76, "end": 81, "label": "V"}, {"start": 82, "end": 87, "label": "V"}, {"start": 88, "end": 99, "label": "V"}, {"start": 100, "end": 110, "label": "N"}, {"start": 111, "end": 113, "label": "KC"}, {"start": 114, "end": 120, "label": "PROSUB"}, {"start": 121, "end": 128, "label": "ADJ"}, {"start": 129, "end": 130, "label": "KC"}, {"start": 131, "end": 147, "label": "ADV"}, {"start": 148, "end": 156, "label": "V"}, {"start": 157, "end": 166, "label": "N"}, {"start": 167, "end": 171, "label": "PREP"}, {"start": 172, "end": 175, "label": "PROPESS"}, {"start": 176, "end": 181, "label": "PROADJ"}, {"start": 181, "end": 182, "label": "PU"}, {"start": 183, "end": 187, "label": "PREP"}, {"start": 188, "end": 195, "label": "PROADJ"}, {"start": 196, "end": 203, "label": "N"}, {"start": 203, "end": 204, "label": "PU"}]}

Alright I'll take a look today, thanks for sharing those examples.

1 Like

Hi! After a ton of attempts with multiple datasets I'm still not able to reproduce this in 1.11.5. If you're comfortable sharing more of your data, especially to the point you start noticing duplicates, that could help a lot with debugging your issue. (I can have you email it to me personally so it's not public on the support forum). Totally understand if you're not comfortable with that and I'll continue looking into it independently, however I'm wondering if there's something specific in your dataset that I'm missing in my tests. Thanks!

Hi! We are totally stuck in this, so it helps a lot. I am comfortable emaling you more info, what's the @?

You can email Kabir at kabir@explosion.ai :slight_smile:

1 Like