Extracting Noun Phrases from Parsed Trees 4

Following on from my previous post about NLTK Trees, here is a short Python function to extract phrases from an NLTK Tree structure.

Recently I needed to extract noun phrases from a section of text. This was in an attempt to choose “interesting concept phrases”. N-gram collocations are a common way of performing this, but these also resulted in partial phrases that poorly defined a concept. Most of the phrases I was interested in were noun phrases, so I chose to tag and chunk the text. The noun phrases (tagged ‘NP’) were then extracted from the chunked tree structures.

Here is the code:

from nltk.tree import *

# Tree manipulation

# Extract phrases from a parsed (chunked) tree
# Phrase = tag for the string phrase (sub-tree) to extract
# Returns: List of deep copies;  Recursive
def ExtractPhrases( myTree, phrase):
    myPhrases = []
    if (myTree.node == phrase):
        myPhrases.append( myTree.copy(True) )
    for child in myTree:
        if (type(child) is Tree):
            list_of_phrases = ExtractPhrases(child, phrase)
            if (len(list_of_phrases) > 0):
                myPhrases.extend(list_of_phrases)
    return myPhrases

This function iterates through the tree, finding all sub-trees with matching tags (‘NP’ in my application), and returning a list of deep copies of these sub-trees.

Here is an example of the function’s usage:

test = Tree.parse('(S (NP I) (VP (V enjoyed) (NP my cookies)))')
print "Input tree: ", test

print "\nNoun phrases:"
list_of_noun_phrases = ExtractPhrases(test, 'NP')
for phrase in list_of_noun_phrases:
    print " ", phrase

This function is a simple demonstration of how the Tree structure can be easily processed using short functions.

It is written in a very procedural manner and is neither very functional nor Pythonic. Perhaps you could write a more elegant and Pythonic version?

4 thoughts on “Extracting Noun Phrases from Parsed Trees”

Jateen Mittal Oct 4,2015 11:55 am

I tried implementing the above code but i get the errors of modifying the method to access the node value and also to set the label
- Richard Marsden Oct 5,2015 7:43 am
  
  The above code is Python 2 of course. also it was written nearly 4 yrs ago, so it is possible that NLTK has changed slightly.
Pradeep Raje Oct 13,2015 3:52 am

@Jateen:
In line 3

if (myTree.node == phrase)

, replace node with label(). That is, myTree.label()==phrase.
- Richard Marsden Oct 13,2015 6:04 am
  
  Thanks – I think there must have been a change to the tree definition in NLTK.

Comments are closed.

Winwaed Blog

Extracting Noun Phrases from Parsed Trees 4

Related Posts

4 thoughts on “Extracting Noun Phrases from Parsed Trees”