Test whether a token is a conjunction head

Fourthought · June 29, 2020, 2:36pm

Hopefully a quick query to resolve, is there a way to identify whether a token is the head of a conjunction?

I'm seeking to refactor the noun_chunks iterator for customisation. Where the existing noun_chunk iterator uses token.left_edge.i, I'm looking to expand under certain conditions using token.subtree attributes.

I'd like the following sentence:

"Both Americans and Muslim friends and citizens, tax-paying citizens, and Muslims in nations were just appalled and could not believe what -- what we saw on our TV screens."

...to return the following custom_chunks:

Americans,
Muslim friends,
citizens, # poss modifier Muslim to be added as a hidden element
tax-paying citizens,
Muslims in nations,
what,
what,
we,
TV screens

This sentence contains a sub-conjunction within a main conjunction, which makes the tasks more complicated:

main conjunction
"Americans" : Conjuncts(friends, citizens, Muslims, citizens)
"Americans" : Children(Both, and, friends, ,, citizens, ,, and, Muslims)
sub-conjunction
"Friends : Conjuncts(Americans, citizens, Muslims, citizens)
"Friends" : Children(Muslim, and, citizens) # children attribute correctly identifies the sub-conjunction

Presently, I'm using the following code that produces the desired answer:

# if the word has conjuncts but does not have a `conj` dependency it is the head of the main conjunction.
if word.conjuncts and word.dep != conj:
        # prev_end is the current word index
    prev_end = word.i         
    yield word.left_edge.i, word.i + 1, cc_label    
            
# if the word has a `conj` dependency and its subtree contains `conj` dependencies, it is the head of a sub-conjunction to a main conjunction
elif word.dep == conj and list(word.rights) and conj in [t.dep for t in word.rights]:
    # prev_end is the current word index
    prev_end = word.i            
    yield word.left_edge.i, word.i + 1, cc_label
    
# for when the word is not part of a conjunction    
elif word.dep in np_deps: # `conj` added to np_deps for other tokens of a conjunction
    # prev_end marks the right edge of the token subtree
    prev_end = word.right_edge.i                     
    yield word.left_edge.i, word.right_edge.i + 1, cc_label

The first elif statement to identify the sub-conjunction head feels somewhat hacky, is there a more affirmative way to identify whether a token is a conjunction head, or could such an attribute be requested?

honnibal · June 30, 2020, 11:36am

We do have a token.conjuncts attribute, although the logic in it isn't quite perfect.

Maybe you could try StackOverflow for this type of spaCy question? There's a larger community there, and this forum is really focused on Prodigy.

Fourthought · June 30, 2020, 11:46am

Thanks Matt, that's posted on both the Github page and Stackoverflow

Topic		Replies	Views
Merging a noun_chunk slice for Hearst Pattern Detection usage , spacy , off-topic	1	1221	May 22, 2020
How to handle multiple concepts in the same phrase joined by a conjunction ner , spacy , best-practices	2	633	June 1, 2021
Opposite token attributes for dependencies usage , spacy	1	297	August 9, 2021
Training dependency parser for multi-word entities usage , spacy , dep , finance	6	1701	June 27, 2019
FYI bad link in the documentation to be fixed docs , done	1	336	March 6, 2022

Test whether a token is a conjunction head

Related topics