Questionable results from NER - we must be doing something wrong

I tried using a blank model, but for some reason it gave me accuracy of 0.00 when running ner.batch-train after doing a little more than 1000 annotations.

When creating the blank model, I got the error described here, I checked my spaCy version which is 2.0.12 - so I worked around that and ended up with this:

from __future__ import unicode_literals, print_function
from pathlib import Path 

import shutil
import spacy

def main(output_dir=None):
    nlp = spacy.blank('en')  # create blank Language class
    print("Created blank 'en' model")

    if 'ner' not in nlp.pipe_names:
        print("Adding ner pipe")
        ner = nlp.create_pipe('ner')
        nlp.add_pipe(ner, last=True)

    ner = nlp.get_pipe('ner')
    nlp.vocab.vectors.name = 'en_core_web_lg.vectors'
    optimizer = nlp.begin_training();
    losses = {}

    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        for i in range(1):
            nlp.update(
                [],  # batch of texts
                [],  # batch of annotations
                drop=0.5,  # dropout - make it harder to memorise data
                sgd=optimizer,  # callable to update weights
                losses=losses)

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)

        if not output_dir.exists():
            output_dir.mkdir()
        else:
            shutil.rmtree(output_dir)

        nlp.meta['name'] = 'blank_ner_model'  # rename model
        nlp.to_disk(output_dir)

        print("Saved model to", output_dir)


if __name__ == '__main__':
    main('./blankv1')

I tried en_core_web_sm, and while it is faster to work with, accuracy and annotation quality suffers a little (as far as I can see).

I generated match patterns for Manufacturer Serial Number (MSN) to feed into Prodigy, and after doing ~600 annotations and training using en_core_web_lg I am up to 92,6% accuracy for the MSN model.

This will probably be my next attempt at getting it to >95% accuracy