Enter a sentence in Dutch, English, or German (auto-detected). The sentence will be parsed and the most probable parse tree will be shown (show technical details).
Linear Context-Free Rewriting Systems (LCFRS) allow for parsing with discontinuous constituents. The Data-Oriented Parsing (DOP) framework entails constructing analyses from fragments of past experience. Double-DOP operationalizes this as the set of fragments that occur at least twice in the training data. For efficiency, sentences are parsed with the following coarse-to-fine pipeline:
Training data:
- Split-PCFG (prune items with posterior probability < 1e-5)
- PLCFRS (prune items not in 50-best derivations)
- Discontinuous Double-DOP (use 1000-best derivations to approximate most probable parse)
Objective functions:
- English: WSJ section of Penn treebank
- German: Negra treebank
- Dutch: Alpino treebank
Estimators:
- MPP: most probable parse
- MPD: most probable derivation
- MPSD: most probable shortest derivation
- SL-DOP: shortest derivation among n most probable parse trees (n=7)
- SL-DOP: shortest derivation among derivations of n most probable parse trees (n=7; approximation)
Marginalization:
- RFE: Relative Frequency Estimate
- EWE: Equal Weights Estimate
Coarse stage parser:
- n-best: find the n most probable derivations.
- sample: sample derivations according to their probability distribution
- CKY: Standard CKY parser
- posterior: Prune with posterior probabilities
- bitpar: Use the bitpar parser (max 1000 derivations)