ERROR: Can't fetch tasks. Make sure the server is running correctly when running ner.correct

Dear prodigy team:

I created an annotated dataset for NER using the ner.manual command with the SciBERT model. When I tried to use ner.correct to review all my previous annotations, it started smoothly without any errors. However, after going through a small subset of the annotations, an error message suddenly popped up in the Prodigy interface. It immediately showed "No tasks available," even though I still had a large portion left to correct. At the same time, the following error appeared in the terminal: AttributeError: 'NoneType' object has no attribute 'end'.
Screenshot 2025-06-30 at 3.42.14 PM

The ner.manual command I use:
prodigy ner.manual thedataset en_core_sci_scibert ./paper1.jsonl --label Software --highlight-chars
The ner.correct command I use:
prodigy ner.correct CorrectData1 en_core_sci_scibert dataset:thedataset --label Software

Could you tell me how to solve this issue? I used print-dataset to check my dataset, and it looks good. I also checked my Prodigy version and the murmurhash version as suggested in a previous post, but the issue still persists. Thank you so much for any help!

Hi @Fangjian,

I can't see the full traceback but I suspect the error originates in this line in the ner.correct recipe, where the recipe tries to process the existing (manually annotated) spans with spaCy

for span in eg.get("spans", []):
    spans.append(doc.char_span(span["start"], span["end"], span["label"]))

The issue here is that doc.char_span() can return None when it fails to create a valid span (due to tokenization mismatches or invalid character positions), but the code later tries to access .end on these None objects.

This means that your thedataset has some spans that are not aligned with the tokenization. I can see you used the highlight-chars in your ner.manual and that's the likely cause of the misalignment. highlight-chars only modifies the spans, but it doesn't modify the underlying tokenization. Please see the warning box in the docs here. The main use case for highlight-chars is to systematically collect examples for modifying the tokenizer's rules or inform the data preprocessing procedures.

If you are not planning to modify the tokenizer (e.g. by adding custom rules based on the cases you needed to use the character-level tokenization) there's not much point in annotating with highlight-chars. Your span annotations must be aligned with tokens, otherwise the model will never be able to predict these spans.

I recommend you filter out the misaligned examples and reannotate them using en_core_sci_scibert like you did but without the highlight-chars option.
You can use this simple script that tests whether a valid spaCy span can be formed given the tokenization and the annotated spans offsets:

import spacy
import srsly

def clean_annotations(input_file, output_file_valid, output_file_reannot, model_name):
    nlp = spacy.load(model_name)
    to_reannotate = []
    valid_examples = []
    input_data = srsly.read_jsonl(input_file)
    
    for example in input_data:
        if 'spans' not in example:
            valid_examples.append(example)
            continue
                
        # Process the text to check span validity
        doc = nlp(example['text'])
        has_invalid_span = False  # Flag to track if any span is invalid
            
        for span in example['spans']:
            # Check if char_span would return None
            char_span = doc.char_span(span['start'], span['end'], span['label'])
            if char_span is None:
                print(f"Detected invalid span: {span} in text: '{example['text'][:100]}...'")
                to_reannotate.append(example)
                has_invalid_span = True
                break
        
        # Only add to valid if no invalid spans were found
        if not has_invalid_span:
            valid_examples.append(example)
    
    srsly.write_jsonl(output_file_valid, valid_examples)
    srsly.write_jsonl(output_file_reannot, to_reannotate)

# Usage
clean_annotations('thedataset.jsonl', 'valid_annotations.jsonl', 'examples_to_reannotate.jsonl', 'en_core_sci_scibert')

You can export your thedataset with the db-out command which will save it as a jsonl file on disk. The script will produce two files valid_annotations.jsonl and examples_to_reannotate.jsonl. Once you have reannotated examples_to_reannotate.jsonl with ner.manual you can then merge the reannotated dataset with your valid_annoations.jsonl and use that for the next stage.