Chunking
Chunking is a shallow form of parsing that identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.
Example:
Vinken | , | 61 | years | old |
---|---|---|---|---|
B-NLP | I-NP | I-NP | I-NP | I-NP |
Penn Treebank
The Penn Treebank is typically used for evaluating chunking. Sections 15-18 are used for training, section 19 for development, and and section 20 for testing. Models are evaluated based on F1.
Model | F1 score | Paper / Source | Code |
---|---|---|---|
Low supervision by Søgaard and Goldberg (2016) | 95.57 | Deep multi-task learning with low level tasks supervised at lower layers | |
Suzuki and Isozaki (2008) | 95.15 | Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data |
Søgaard and Goldberg (2016)
95.57
Suzuki and Isozaki (2008)
95.15