Extracting Noun Phrases from Parsed Trees 6

Following on from my previous post about NLTK Trees, here is a short Python function to extract phrases from an NLTK Tree structure.

Recently I needed to extract noun phrases from a section of text. This was in an attempt to choose “interesting concept phrases”. N-gram collocations are a common way of performing this, but these also resulted in partial phrases that poorly defined a concept. Most of the phrases I was interested in were noun phrases, so I chose to tag and chunk the text. The noun phrases (tagged ‘NP’) were then extracted from the chunked tree structures.

Here is the code:

from nltk.tree import *

# Tree manipulation

# Extract phrases from a parsed (chunked) tree
# Phrase = tag for the string phrase (sub-tree) to extract
# Returns: List of deep copies;  Recursive
def ExtractPhrases( myTree, phrase):
    myPhrases = []
    if (myTree.node == phrase):
        myPhrases.append( myTree.copy(True) )
    for child in myTree:
        if (type(child) is Tree):
            list_of_phrases = ExtractPhrases(child, phrase)
            if (len(list_of_phrases) > 0):
    return myPhrases

This function iterates through the tree, finding all sub-trees with matching tags (‘NP’ in my application), and returning a list of deep copies of these sub-trees.

Here is an example of the function’s usage:

test = Tree.parse('(S (NP I) (VP (V enjoyed) (NP my cookies)))')
print "Input tree: ", test

print "\nNoun phrases:"
list_of_noun_phrases = ExtractPhrases(test, 'NP')
for phrase in list_of_noun_phrases:
    print " ", phrase

This function is a simple demonstration of how the Tree structure can be easily processed using short functions.

It is written in a very procedural manner and is neither very functional nor Pythonic. Perhaps you could write a more elegant and Pythonic version?



6 thoughts on “Extracting Noun Phrases from Parsed Trees

  1. Reply Jateen Mittal Oct 4,2015 11:55 am

    I tried implementing the above code but i get the errors of modifying the method to access the node value and also to set the label

  2. Reply Pradeep Raje Oct 13,2015 3:52 am

    In line 3

    if (myTree.node == phrase)

    , replace node with label(). That is, myTree.label()==phrase.

  3. Reply Pedram Hosseini Apr 2,2019 2:37 pm

    This is a handy function. I have also written a similar function. I’m wondering if you thought about those NPs that overlap? For example, what if we have nested NPs? Does this function return all of them or just simply chooses between either the parent or children NP?

    • Reply Richard Marsden Apr 2,2019 7:24 pm

      This works on text which has already been parsed into a tree. A single tree cannot represent “overlapping” noun phrases, although it could represent nested ones. This function just extracts the highest level in such cases.

Leave a Reply