E866 - Expected a string or 'Doc' as input, but got: <class 'NoneType'>.

I am using a session in Prodigy with my own costume recipes. I am facing an issue that the server keeps throwing an error. My recipe preprocessing looks like this:

    stream = JSONL(source)  # load the data
    stream = [set_hashes(eg, ["title", "review"]) for eg in stream]
    stream = filter_duplicates(stream, by_input=True, by_task=True)
    stream = add_tokens(spacy.load("en_core_web_sm"), stream, skip=True,)
    html_template = """<div style="text-align:left;font-size: 14px;"><u><strong>Category</u></strong>:<br>{{category_name}}<br><u><strong>Product</u></strong>:<br>{{product_name}}</div>"""
    blocks = [
        {"view_id": "html", "html_template": html_template},
        {"view_id": "ner_manual"},
    ]

It worked well for the most part, but then I noticed my annotators don't get the same examples, and also, Im getting the following error:

future: <Task finished name='Task-42' coro=<RequestResponseCycle.run_asgi() done, defined at /home/azureuser/.local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py:394> exception=ValueError("[E866] Expected a string or 'Doc' as input, but got: <class 'NoneType'>.")>
Traceback (most recent call last):
  File "/home/azureuser/.local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 399, in run_asgi
    self.logger.error(msg, exc_info=exc)
  File "/usr/lib/python3.8/logging/__init__.py", line 1475, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/usr/lib/python3.8/logging/__init__.py", line 1589, in _log
    self.handle(record)
  File "/usr/lib/python3.8/logging/__init__.py", line 1598, in handle
    if (not self.disabled) and self.filter(record):
  File "/usr/lib/python3.8/logging/__init__.py", line 811, in filter
    result = f.filter(record)
  File "cython_src/prodigy/util.pyx", line 145, in prodigy.util.ServerErrorFilter.filter
  File "/home/azureuser/.local/lib/python3.8/site-packages/uvicorn/protocols/http/h11_impl.py", line 396, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/home/azureuser/.local/lib/python3.8/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/home/azureuser/.local/lib/python3.8/site-packages/fastapi/applications.py", line 208, in __call__
    await super().__call__(scope, receive, send)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 86, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/middleware/cors.py", line 142, in simple_response
    await self.app(scope, receive, send)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/middleware/base.py", line 25, in __call__
    response = await self.dispatch_func(request, self.call_next)
  File "/home/azureuser/.local/lib/python3.8/site-packages/prodigy/app.py", line 205, in reset_db_middleware
    response = await call_next(request)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/middleware/base.py", line 45, in call_next
    task.result()
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/middleware/base.py", line 38, in coro
    await self.app(scope, receive, send)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/home/azureuser/.local/lib/python3.8/site-packages/fastapi/routing.py", line 226, in app
    raw_response = await run_endpoint_function(
  File "/home/azureuser/.local/lib/python3.8/site-packages/fastapi/routing.py", line 161, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/home/azureuser/.local/lib/python3.8/site-packages/starlette/concurrency.py", line 40, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/azureuser/.local/lib/python3.8/site-packages/prodigy/app.py", line 434, in get_session_questions
    return _shared_get_questions(req.session_id, excludes=req.excludes)
  File "/home/azureuser/.local/lib/python3.8/site-packages/prodigy/app.py", line 399, in _shared_get_questions
    tasks = controller.get_questions(session_id=session_id, excludes=excludes)
  File "cython_src/prodigy/core.pyx", line 221, in prodigy.core.Controller.get_questions
  File "cython_src/prodigy/core.pyx", line 222, in prodigy.core.Controller.get_questions
  File "cython_src/prodigy/components/feeds.pyx", line 379, in prodigy.components.feeds.Feed.get_batch
  File "cython_src/prodigy/components/feeds.pyx", line 323, in prodigy.components.feeds.Feed._enqueue_tasks
  File "cython_src/prodigy/components/stream.pyx", line 170, in prodigy.components.stream.Stream.__next__
  File "cython_src/prodigy/components/stream.pyx", line 174, in prodigy.components.stream.Stream.__next__
  File "cython_src/prodigy/components/preprocess.pyx", line 164, in add_tokens
  File "/home/azureuser/.local/lib/python3.8/site-packages/spacy/language.py", line 1528, in pipe
    for doc in docs:
  File "/home/azureuser/.local/lib/python3.8/site-packages/spacy/language.py", line 1572, in pipe
    for doc in docs:
  File "/home/azureuser/.local/lib/python3.8/site-packages/spacy/language.py", line 1569, in <genexpr>
    docs = (self._ensure_doc(text) for text in texts)
  File "/home/azureuser/.local/lib/python3.8/site-packages/spacy/language.py", line 1519, in <genexpr>
    self._ensure_doc_with_context(text, context) for text, context in texts
  File "/home/azureuser/.local/lib/python3.8/site-packages/spacy/language.py", line 1097, in _ensure_doc_with_context
    doc = self._ensure_doc(doc_like)
  File "/home/azureuser/.local/lib/python3.8/site-packages/spacy/language.py", line 1093, in _ensure_doc
    raise ValueError(Errors.E866.format(type=type(doc_like)))
ValueError: [E866] Expected a string or 'Doc' as input, but got: <class 'NoneType'>.

If I remove the add_token function, the server runs without exception but then I receive the following error in the web API:

TypeError: (intermediate value)(intermediate value)(intermediate value).sort is not a function

    in t
    in Jss(t)
    in t
    in Jss(t)
    in div
    in t
    in Jss(t)
    in Unknown
    in t
    in Jss(t)
    in div
    in div
    in t
    in Jss(t)
    in Connect(Jss(t))
    in main
    in div
    in Shortcuts
    in t
    in n
    in Jss(n)
    in Connect(Jss(n))
    in t
    in t
    in Connect(t)
    in t
    in t

I suspect it has to do with one of my samples, but Im not sure how to fix it by keeping track of their past work.

In the end, it happened to be the case that somehow a null existed in one of my samples under the text key.

I would consider improving the logs in that case or adding a way to ensure the JSONL file (or whatever input format is valid)

hi @shaked571!

Thanks for the update! I'm glad you were able to get it worked out.

Thanks for the feedback! I've written an internal note and we'll prioritize this with the other enhancements / fixes.