textcat.teach error with ja language model

I am trying to experiment with textcat.teach using the ja_core_news_lg model.

When running the below command:

prodigy textcat.teach my_test ja_core_news_lg $F_PATH/9697.title.jsonl --label ProductSold

The following error is returned. I believe I have installed all dependencies, but what is the issue here please? Thank you.

Task exception was never retrieved
future: <Task finished coro=<RequestResponseCycle.run_asgi() done, defined at /Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py:383> exception=AttributeError("'sudachipy.latticenode.LatticeNode' object has no attribute 'is_defined'",)>
Traceback (most recent call last):
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 388, in run_asgi
    self.logger.error(msg, exc_info=exc)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/logging/__init__.py", line 1335, in error
    self._log(ERROR, msg, args, **kwargs)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/logging/__init__.py", line 1442, in _log
    self.handle(record)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/logging/__init__.py", line 1451, in handle
    if (not self.disabled) and self.filter(record):
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/logging/__init__.py", line 718, in filter
    result = f.filter(record)
  File "cython_src/prodigy/util.pyx", line 120, in prodigy.util.ServerErrorFilter.filter
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/fastapi/applications.py", line 142, in __call__
    await super().__call__(scope, receive, send)  # pragma: no cover
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/applications.py", line 134, in __call__
    await self.error_middleware(scope, receive, send)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/middleware/errors.py", line 178, in __call__
    raise exc from None
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/middleware/errors.py", line 156, in __call__
    await self.app(scope, receive, _send)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/middleware/cors.py", line 84, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/middleware/cors.py", line 140, in simple_response
    await self.app(scope, receive, send)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/exceptions.py", line 73, in __call__
    raise exc from None
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/exceptions.py", line 62, in __call__
    await self.app(scope, receive, sender)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/routing.py", line 590, in __call__
    await route(scope, receive, send)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/routing.py", line 208, in __call__
    await self.app(scope, receive, send)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/fastapi/routing.py", line 129, in app
    raw_response = await run_in_threadpool(dependant.call, **values)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/starlette/concurrency.py", line 25, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/concurrent/futures/thread.py", line 55, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/prodigy/app.py", line 416, in get_session_questions
    return _shared_get_questions(req.session_id, excludes=req.excludes)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/prodigy/app.py", line 387, in _shared_get_questions
    tasks = controller.get_questions(session_id=session_id, excludes=excludes)
  File "cython_src/prodigy/core.pyx", line 143, in prodigy.core.Controller.get_questions
  File "cython_src/prodigy/core.pyx", line 147, in prodigy.core.Controller.get_questions
  File "cython_src/prodigy/components/feeds.pyx", line 67, in prodigy.components.feeds.SharedFeed.get_questions
  File "cython_src/prodigy/components/feeds.pyx", line 72, in prodigy.components.feeds.SharedFeed.get_next_batch
  File "cython_src/prodigy/components/feeds.pyx", line 153, in prodigy.components.feeds.SessionFeed.get_session_stream
  File "cython_src/prodigy/components/feeds.pyx", line 135, in prodigy.components.feeds.SessionFeed.validate_stream
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/toolz/itertoolz.py", line 376, in first
    return next(iter(seq))
  File "cython_src/prodigy/components/sorters.pyx", line 98, in __iter__
  File "cython_src/prodigy/components/sorters.pyx", line 10, in genexpr
  File "cython_src/prodigy/models/textcat.pyx", line 159, in __call__
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/language.py", line 785, in pipe
    for doc, context in izip(docs, contexts):
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/language.py", line 826, in pipe
    for doc in docs:
  File "pipes.pyx", line 959, in pipe
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/util.py", line 513, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "nn_parser.pyx", line 249, in pipe
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/util.py", line 513, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "nn_parser.pyx", line 249, in pipe
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/util.py", line 513, in minibatch
    batch = list(itertools.islice(items, int(batch_size)))
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/language.py", line 811, in <genexpr>
    docs = (self.make_doc(text) for text in texts)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 284, in make_doc
    return self.tokenizer(text)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 194, in __call__
    dtokens = get_dtokens(self.tokenizer, text)
  File "/Users/jb/.pyenv/versions/3.6.1/lib/python3.6/site-packages/spacy/lang/ja/__init__.py", line 116, in get_dtokens
    tokens = tokenizer.tokenize(text)
  File "sudachipy/tokenizer.pyx", line 149, in sudachipy.tokenizer.Tokenizer.tokenize
  File "sudachipy/lattice.pyx", line 134, in sudachipy.lattice.Lattice.dump
AttributeError: 'sudachipy.latticenode.LatticeNode' object has no attribute 'is_defined'

Can you check the version of sudachipy? We tested with >=0.4.5. (Just for spaCy, I haven't tested prodigy yet for the new languages but I'm glad to see that people are trying out the new models!)

Sure - here's the version number I am using... SudachiPy==0.4.7
Thank you

It didn't fail immediately but I was able to see the same error with sudachipy 0.4.7. I'm pretty sure this code was introduced in 0.4.6, so I would suggest using sudachipy==0.4.5 for now, which is what we primarily tested with.

(Dealing with versioning around 3rd-party dependencies is tricky. I didn't want to package spacy or the models with sudachipy==0.4.5 because it would mean we'd have to release new versions for users not to see dependency conflicts if newer versions are released with bugfixes, but it also means that updates can break things, too. I think we should add some notes to the docs to prefer 0.4.5 for now.)

Wait... it turns out that I'm not quite sure about this. I haven't been able to reproduce the error with 0.4.5, but it looks like it's actually a bug in debugging/logging code within sudachipy that's only run here because of prodigy's configuration that also uses logging. With just plain spacy (no use of logging) I haven't been able to reproduce it. In any case, we'll get in touch with the sudachipy developers about it. Let me know if you still see the problem in 0.4.5, since that would be useful to know?

This error should be fixed in sudachipy 0.4.8 now.

What still isn't great is that somehow prodigy is modifying sudachipy's logging configuration (we don't need or want this extra logging), which is going to make it a lot slower overall. We will look into it.

1 Like

Thank you Adriane - I was able to get teach running with 0.4.5 as well.
Much appreciated.

Pretty sure this is happening in the server, in uvicorn, which by default, seems to include logs from third-party loggers? I remember we had to implement some of our own workarounds for this so we're not showing the output from third-party loggers by default in Prodigy. If sudachipy always use a named logger (or at least, use a named logger in that part of the code, which I think they do?), we could at least add our workaround for this in the logging handling in prodigy.util.

I think it should include third-party loggers if they're enabled, but whatever is doing this really shouldn't change the state of other loggers. That seems to be the main problem. It should be a named logger called sudachipy.tokenizer, I think.

Yeah, I have no idea why it wouldn't respect logger.disabled (assuming that code is also in the release version). Sorry, just re-read your post above – I was using v0.4.8, which I assume added that fix.

But here's a very minimal example without any Prodigy abstractions that produces the INFO logging statements from SudachiPy, since uvicorn will surface all logging by default (which we try to work around in Prodigy, because we only want to show the user logging if they actually enable it).

from fastapi import FastAPI
import spacy
import uvicorn

nlp = spacy.load("ja_core_news_sm")
app = FastAPI()


@app.get("/")
def test():
    doc = nlp("hello world, this is a test")
    return {"text": doc.text}


uvicorn.run(app)

Here is what seems incorrect:

from fastapi import FastAPI
import spacy
import uvicorn

nlp = spacy.load("ja_core_news_sm")
assert nlp.tokenizer.tokenizer._logger.disabled == True
app = FastAPI()


@app.get("/")
def test():
    assert nlp.tokenizer.tokenizer._logger.disabled == False
    doc = nlp("hello world, this is a test")
    return {"text": doc.text}

uvicorn.run(app)

uvicorn seems to be modifying the disabled state (and also level, but you can't see that here) of sudachipy's logger? This seems like something it shouldn't be doing?

Sorry for the trouble with this - a new version of SudachiPy that fixes the issue with the logging code, 0.4.9, has just been released. The code that triggered the error wasn't called in standard usage or in tests so the bug slipped through release checks.

No idea why uvicorn is calling the dump code though...

@polm: sudachipy's logging configuration seems fine on its own, though, so this seems more like a prodigy or uvicorn problem that we need to figure out.

1 Like

The logging problem in Uvicorn was actually solved in a recent PR, but it hasn't been released yet (I wish I had noticed that before debugging the whole thing :joy: ).

Once they merge my PR with the new release (https://github.com/encode/uvicorn/pull/705), and actually release a new version, it should work properly.

A funny detail I noticed, the problem only shows up calling Uvicorn from Python as with uvicorn.run(), it doesn't show up calling Uvicorn from the command line. That detail doesn't help here, but now I see why I wasn't seeing the problem before.