Metadata-Version: 2.1
Name: syn-nli
Version: 0.0.1
Summary: package for the paper Syntax Aware Natural Language Inference@<link>
Home-page: https://github.com/EazyReal/2020-IIS-internship
Author: ytlin
Author-email: 0312fs3@gmail.com
License: Apache
Keywords: allennlp NLP deep learning machine
Platform: UNKNOWN
Classifier: Intended Audience :: Science/Research
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.6.1
Description-Content-Type: text/markdown
Requires-Dist: torch (<1.7.0,>=1.6.0)
Requires-Dist: allennlp (==1.1.0rc3)
Requires-Dist: pytorch-geometic

# SynNLI 

## Description
- this repo uses allennlp as base repo

## AllenNlp
- a quick guide of mine can be found at the same folder
- for insight, please visit allennlp document and github

## Custom Classes and Operations
- `GraphPair2VecEncoder`
    - 'gen', 'gmn'
- `Graph2GraphEncoder`
    - known as `graph convolution layer` in `pytorch_geometric` 
- `GraphPair2GraphPairEncoder`
    - for graph matching in sparse batch
    - tf.dynamic_partition + normal attention
- `NodeUpdater`
    - A wrapper over `RNN`s
- `Graph2VecEncoder`
    - known as `global pooling layer` in `pytorch_geometric` 
    - 'global_attention'
- `SynNLIModel(base=Model)`
    - use `Embedder` to embed input
    - use `GraphPair2VecEncoder` to get compare vector for classifier to make final decision
- `tensor_op.py`
    - batch conversion between normal model and graph model
        - sparse2dense
        - dense2sparse
- `SparseAdjacencyField`
    - cooperate with `pytorch_geometric` to get sparce graph batch
    - see `batch_tensors()` and `as_tensor()` for the key of implementation
- `NLIGraphReader`
    - read graph input (parsed by `Stanza`)
- `preprocess.py`
    - see the `Preprocess` section for detail
- `configs`
    - can be found in `src/training`
    - for allennlp train

## Usage (Cur)
- ./install_dependencies.sh 
- download NLI style data set to data
    - and specify path in jsonne
- parse data (see Parse Data section)
    - and specify path in jsonnet
- train model (see Training Area)
    - with jsonnet

## Parse Data with Stanza
- Stanza will be loaded in preprocess.py
    - the parser version is the one @ 2020/8/22
- use preprocess.py
```
python preprocess.py -i <raw_data_path> \
 -o <target_path> \
 --files <file_names> \
 --force(if activated, force execution when <target_path exists>) \
 -m 10(if provided, maximum instances to process is set, this is mainly for testing)
```
```
# example
python preprocess.py -i ../data/anli_v1.0/R2/ \
 -o ../data/anli_v1.0_preprocessed/R2/ \
 --files dev.jsonl test.jsonl train.jsonl \
 --force \
 -m 10
```
- if want to use allennlp (less recommended)
    - download allennlp dependency parser and SRL labeler from path

## Training
- refer to "the config.jsonnet"
```
allennlp train "./src_gmn/training_config.jsonnet" -s "./param/testv1"   --include-package "package_v1" --force
```

## Future Supported Usage
- pip install -r requirements
- + add configs folder for various config
- note that should take lemmatized as node attr if use word level embedding(or + char embedding to ease)
- root to spetial token
- use MLP prjection



