✨🔗 Beta testers wanted: new manual dependencies & relations UI (v1.10)

Thanks, that's great to hear! :smiley:

There's a little info box on this in the relations UI docs but we should probably make this more prominent! If you don't want to wrap/unwrap, you can also hold down shift and click the start and end token of the span you want to assign. You can also customise that "special" key in the custom shortcuts.

Couldn't you express this by making "special police force" and "rapid action force" merged spans and then creating two relations starting from "setting"? Or maybe I'm misunderstanding the question.

That's an interesting idea, I'll play with this! I'm not sure if it actually works in practice or if it ends up being too confusing. There's currently a very clear distinction between span labels and relation labels and I don't want to break that, because it is pretty important.

For a use case like yours, it could be woth experimenting with more automation. For instance, "special police force" is a noun phrase you can very easily describe with a pattern. So if you find yourself having to do a lot of merging, maybe it's possible to automate some of the span creation.

1 Like

I'm also getting some weird formatting (see below) when I view the manual relations interface running on Linux using Brave. I don't get the problem with Linux+Chrome and I don't get it with Mac+Brave. Let me know if I can help get more info.

@jcbmyrstn Sorry, just realised I missed your post:

The intent parser approach would works if you're annotating relationships between single tokens – in this case, you'd be training a custom dependency parser from scratch but instead of syntactic dependencies, you're asking it to predict your own custom labels. I'm not sure how many examples you need for this to actually make it work reliably – that's something you'd have to test.

Aside from that, we currently don't have a built-in component in spaCy that lets you predict arbitrary labels over tokens and spans. This will be much easier to implement in spaCy v3, but for now, you'd have to bring your own implementation for what you want to predict (semantic role labelling / slot filling, coreference, etc.) Annotating relations is probably one of the most "ambiguous" tasks that we provide annotation workflows for, because your annotations can mean a lot of different things, and depending on that, you'd want to chose very different approaches for training and evaluation. That also makes it difficult to recommend next steps.

@adamkgoldfarb Update on that Firefox issue, the plot thickens! (It does mean I'm kinda reluctant to go ahead with v1.10 until there's a solution or clarity on what the problem is. Maybe it's a Firefox bug that's going to be resolved in the stable release, maybe it's related to a bug/issue that only Firefox has fixed...)

Thanks, that's interesting! I'll try it out. Looks like for some reason, the browser fails to measure the individual text nodes here but not sure why (I've only ever heard about this as an old browser bug). Should hopefully be easy to fix by just using a less exact method to estinate the width if the "proper way" fails.

1 Like

@SofieVL we have a biomedical use case, happy to talk about it. thanks!

Hi Tom, sure! If you can share the details of the use-case publicly, you could open a new forumpost and we can discuss in more detail there. The added advantage is that others may be inspired from your approach / use-case. Howevever if you can't share important details, feel free to email me privately on sofie@explosion.ai

Hi,

I find the following issue: I annotated a text with some relations on it, and exported the annotations to a json file, but now I cannot load that json file back into prodigy using the rel.manual recipe. I used rel.manual to produce the annotations.

J.

What exactly do you mean by you can't load it back? Is there an error? Or are the existing annotations not shown when you load the annotated data back in?

It gives me an error in the console when prodigy fails to load the annotated text. I also got an error importing a raw text file for annotation in the dep.correct recipe.

Do you want me to copy the log here?

J.

Yes, the error messages would be helpful – otherwise, there's not really much we can do or help with.

This is my log in macos:

(.venv-beta) jacobo@lola prodigy rel.manual test en_core_web_sm ./prodigy_relations_annot_output.json --label SUBJECT,OBJECT
Using 2 label(s): SUBJECT, OBJECT

:sparkles: Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

Task exception was never retrieved
future: <Task finished coro=<RequestResponseCycle.run_asgi() done, defined at /Users/jacobo/.venv-beta/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py:383> exception=ValueError('Trailing data')>
Traceback (most recent call last):
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 388, in run_asgi
self.logger.error(msg, exc_info=exc)
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/logging/init.py", line 1407, in error
self._log(ERROR, msg, args, **kwargs)
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/logging/init.py", line 1514, in _log
self.handle(record)
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/logging/init.py", line 1523, in handle
if (not self.disabled) and self.filter(record):
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/logging/init.py", line 751, in filter
result = f.filter(record)
File "cython_src/prodigy/util.pyx", line 120, in prodigy.util.ServerErrorFilter.filter
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi
result = await app(self.scope, self.receive, self.send)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in call
return await self.app(scope, receive, send)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/fastapi/applications.py", line 140, in call
await super().call(scope, receive, send)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/applications.py", line 134, in call
await self.error_middleware(scope, receive, send)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/middleware/errors.py", line 178, in call
raise exc from None
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/middleware/errors.py", line 156, in call
await self.app(scope, receive, _send)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/middleware/cors.py", line 84, in call
await self.simple_response(scope, receive, send, request_headers=headers)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/middleware/cors.py", line 140, in simple_response
await self.app(scope, receive, send)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/middleware/base.py", line 25, in call
response = await self.dispatch_func(request, self.call_next)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/prodigy/app.py", line 198, in reset_db_middleware
response = await call_next(request)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/middleware/base.py", line 45, in call_next
task.result()
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/middleware/base.py", line 38, in coro
await self.app(scope, receive, send)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/exceptions.py", line 73, in call
raise exc from None
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/exceptions.py", line 62, in call
await self.app(scope, receive, sender)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/routing.py", line 590, in call
await route(scope, receive, send)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/routing.py", line 208, in call
await self.app(scope, receive, send)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/routing.py", line 41, in app
response = await func(request)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/fastapi/routing.py", line 129, in app
raw_response = await run_in_threadpool(dependant.call, **values)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/starlette/concurrency.py", line 25, in run_in_threadpool
return await loop.run_in_executor(None, func, *args)
File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/prodigy/app.py", line 418, in get_session_questions
return _shared_get_questions(req.session_id, excludes=req.excludes)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/prodigy/app.py", line 389, in _shared_get_questions
tasks = controller.get_questions(session_id=session_id, excludes=excludes)
File "cython_src/prodigy/core.pyx", line 160, in prodigy.core.Controller.get_questions
File "cython_src/prodigy/components/feeds.pyx", line 68, in prodigy.components.feeds.SharedFeed.get_questions
File "cython_src/prodigy/components/feeds.pyx", line 73, in prodigy.components.feeds.SharedFeed.get_next_batch
File "cython_src/prodigy/components/feeds.pyx", line 169, in prodigy.components.feeds.RepeatingFeed.get_session_stream
File "cython_src/prodigy/components/feeds.pyx", line 135, in prodigy.components.feeds.SessionFeed.validate_stream
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/toolz/itertoolz.py", line 376, in first
return next(iter(seq))
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/prodigy/recipes/rel.py", line 135, in preprocess_stream
for doc, eg in nlp.pipe(data_tuples, as_tuples=True):
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/spacy/language.py", line 778, in pipe
for doc, context in izip(docs, contexts):
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/spacy/language.py", line 819, in pipe
for doc in docs:
File "nn_parser.pyx", line 248, in pipe
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/spacy/util.py", line 481, in minibatch
batch = list(itertools.islice(items, int(batch_size)))
File "nn_parser.pyx", line 248, in pipe
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/spacy/util.py", line 481, in minibatch
batch = list(itertools.islice(items, int(batch_size)))
File "pipes.pyx", line 401, in pipe
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/spacy/util.py", line 481, in minibatch
batch = list(itertools.islice(items, int(batch_size)))
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/spacy/language.py", line 804, in
docs = (self.make_doc(text) for text in texts)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/spacy/language.py", line 769, in
texts = (tc[0] for tc in text_context1)
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/prodigy/recipes/rel.py", line 133, in
data_tuples = ((eg["text"], copy.deepcopy(eg)) for eg in stream)
File "cython_src/prodigy/components/filters.pyx", line 37, in filter_duplicates
File "cython_src/prodigy/components/filters.pyx", line 13, in filter_empty
File "cython_src/prodigy/components/loaders.pyx", line 24, in _rehash_stream
File "cython_src/prodigy/components/loaders.pyx", line 150, in JSON
File "/Users/jacobo/.venv-beta/lib/python3.7/site-packages/srsly/_json_api.py", line 38, in json_loads
return ujson.loads(data)
ValueError: Trailing data

This means that the JSONL is broken and there's likely a missing newline somewhere in it. Did you edit the data at all outside of Prodigy?

If you didn't touch the file at all, did you export to the same file more than once? Check what version of srsly you have installed (should be v0.2.0 or up – if you installed Prodigy in a fresh virtual environment, this shouldn't be a problem, though).

Hello Ines

I'm having a similar ValueError: Trailing data error message as well, but this is before I even get as far as trying to load a data file. For me it comes up if try to check the installation with
!python -m prodigy stats

Traceback (most recent call last):
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\site-packages\prodigy_main
.py", line 60, in
controller = recipe(args, use_plac=True)
File "cython_src\prodigy\core.pyx", line 251, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\site-packages\plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\site-packages\plac_core.py", line 232, in consume
return cmd, self.func(
(args + varargs + extraopts), **kwargs)
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\site-packages\prodigy\recipes\commands.py", line 35, in stats
DB = connect()
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\site-packages\prodigy\components\db.py", line 64, in connect
config = get_config()
File "cython_src\prodigy\util.pyx", line 160, in prodigy.util.get_config
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\site-packages\srsly_json_api.py", line 52, in read_json
return ujson.load(f)
ValueError: Trailing data

It also gives me a very similar message if I try to load the news_headlines.jsonl as per the instructions on the jupyterlab extension page. I haven't opened or edited that file so it should be ok.

!python -m prodigy ner.teach my_set en_core_web_sm news_headlines.jsonl

Traceback (most recent call last):
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\site-packages\prodigy_main
.py", line 60, in
controller = recipe(*args, use_plac=True)
File "cython_src\prodigy\core.pyx", line 265, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "cython_src\prodigy\util.pyx", line 182, in prodigy.util.add_global_config
File "cython_src\prodigy\util.pyx", line 160, in prodigy.util.get_config
File "C:\Users\Julia\anaconda3\envs\prodigy10\lib\site-packages\srsly_json_api.py", line 52, in read_json
return ujson.load(f)
ValueError: Trailing data

I'm still very much finding my way around all this so may well be doing something stupid, but as it is falling over at the prodigy stats line, I'm presuming that that it is nothing to do with the data file.
I'm using the dev3.whl on Windows 10 via Anaconda. It's a fresh install of Anaconda and a new environment for the beta as suggested. Any ideas?

Many thanks - I'm really looking forward to getting going with this!

Julia

In your case, it looks like the problem occurs when loading the config, i.e. the prodigy.json file in your ~/.prodigy directory. Did you edit that by any chance? And can you double-check that it looks correct and that there's no JSON issues here?

Brilliant - that was it! Thank you. I'd been trying to work out why the other browser wasn't opening within JupyterLab and dropped some stuff in there I'd seen from another answer. Won't make that mistake again.. Now to start playing :slight_smile:

Thank you for an amazingly quick response

1 Like

Hi,

I'm working with a fresh env. I did not edit the file (I think). Tt seems to me that I get this error with large files only. I will try other experiments again later this week to be 100% sure that the file has not been modified.

J.

@julia No worries :slightly_smiling_face: For v1.10, I've added a friendlier error message if loading the config fails - there's no reason to let it default to the cryptic and confusing JSON default error.

How large are your files and how many characters do the longest examples have? And are you using the default SQLite database, or MySQL/Postgres? I wonder if you're hitting something similar to this...

1 Like

I wasn't able to reproduce the issue on Linux + Brave, but I've added a fallback in case the text width calculation fails (for whatever reason).

Update: fixed and resolved in Firefox 77 :tada: 1639574 - getImageData returns incorrect pixel information

1 Like

@Ines Happy to help shape a solution in this area. I see a tremendous opportunity for partnership between prod.igy and a significant party I work with around this new feature. Please can you give me a call to discuss further. Regards Barrie +44 7977 522924.