Enter a sentence in Dutch, English, or German (auto-detected).
The sentence will be parsed and the most probable parse tree will be shown
(show technical details).
Linear Context-Free Rewriting Systems (LCFRS) allow for parsing with
discontinuous constituents. The Data-Oriented Parsing (DOP) framework
entails constructing analyses from fragments of past experience.
Double-DOP operationalizes this as the set of fragments that occur at least twice
in the training data.
For efficiency, sentences are parsed with the following coarse-to-fine
pipeline:
- Split-PCFG (prune items with posterior probability < 1e-5)
- PLCFRS (prune items not in 50-best derivations)
- Discontinuous Double-DOP (use 1000-best derivations to approximate most probable parse)
Training data:
- English: WSJ section of Penn treebank
- German: Negra treebank
- Dutch: Alpino treebank