A frequently asked question is “What do the Part of Speech tags (VB, JJ, etc) mean?” The bottom line is that these tags mean whatever they meant in your original training data. You are free to invent your own tags in your training data, as long as you are consistent in their usage.
Training data generally takes a lot of work to create, so a pre-existing corpus is typically used. These usually use the Penn Treebank or Brown Corpus tags.
The most common part of speech (POS) tag schemes are those developed for the Penn Treebank and Brown Corpus. Penn Treebank is probably the most common, but both corpora are available with NLTK.
Penn Treebank POS Tags
Here are the POS tags used in the Penn Treebank:
|CD||cardinal number||1, third|
|EX||existential there||there is|
|IN||preposition/subordinating conjunction||in, of, like|
|NN||noun, singular or mass||door|
|NNP||proper noun, singular||John|
|NNPS||proper noun, plural||Vikings|
|PDT||predeterminer||both the boys|
|PRP||personal pronoun||I, he, it|
|PRP$||possessive pronoun||my, his|
|RB||adverb||however, usually, naturally, here, good|
|TO||to||to go, to him|
|VB||verb, base form||take|
|VBD||verb, past tense||took|
|VBG||verb, gerund/present participle||taking|
|VBN||verb, past participle||taken|
|VBP||verb, sing. present, non-3d||take|
|VBZ||verb, 3rd person sing. present||takes|
The official annotation guidelines including full descriptions can be found here (GZip-compressed Postscript file). This includes confusing parts of speech, capitalization, and other conventions.
Brown Corpus POS Tags
The Brown Corpus POS tags are very similar, and there is the potential for some confusion. However, there are differences. For example, the Penn Treebank has three types of adjective (JJ, JJR, JJS) but the Brown Corpus divides JJS into JJS and JJT.
The Brown Corpus also has rules for combining tags. For example, the colloquial “wanna” means “want to” and is tagged “VB+TO” (“want/VB to/TO”). Similarly, a suffix asterisk indicates a negative, so that “aren’t” becomes “BER*”.