Metadata-Version: 2.4
Name: asc-analyzer
Version: 0.0.5
Summary: asc-analyzer
Home-page: https://github.com/hksung/ASC-analyzer
Author: Hakyung Sung
Author-email: hksung001@gmail.com
License: CC-BY-NC-SA-4.0
Classifier: Programming Language :: Python :: 3
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Requires-Dist: spacy
Requires-Dist: spacy-transformers
Requires-Dist: en_core_web_trf
Dynamic: author
Dynamic: author-email
Dynamic: requires-dist
Dynamic: summary

# ASC-analyzer

The ASC Analyzer extracts Argument Structure Constructions (ASCs) from raw English texts and computes indices related to ASC usage.


## Installation

```bash
pip install asc-analyzer
```

## Quickstart
Prepare a directory with `.txt` files (e.g., `data/text/`). Each file should contain plain English text.

Then run:

```bash
asc-analyzer \
  --input-dir data/text \
  --source cow \
  --print-asc \
  --save-asc-output
````

This command will:

* Assign ASC tags to each sentence
* Print the ASC-tagged results directly to the terminal (`--print-asc`)
* Save token-level ASC tagging results as `*_ASCinfo.txt` files (`--save-asc-output`)
* Compute ASC usage statistics (e.g., diversity, proportion, frequency, and verb–ASC association strength) and save them in a CSV summary file
* The `--source` option determines which reference corpus is used for computing frequency and association measures:
    * `cow`: uses the *COW* corpus (web-based, written English)
    * `subt`: uses the *SUBTLEX* corpus (subtitle-based, spoken English)
    * Choose the source based on the register that best matches your input data.
    
## Options

| Option               | Description                                                    |
| -------------------- | -------------------------------------------------------------- |
| `--input-dir`, `-i`  | Input folder with `.txt` files (default: `data/test`)          |
| `--output-csv`, `-o` | Path to save output CSV (default: `data/Written_<source>.csv`) |
| `--source`, `-s`     | Reference dataset: `cow` (written, default) or `subt` (spoken) |
| `--indices`, `-x`    | Comma-separated list of indices to include in CSV              |
| `--save-asc-output`  | Save ASC-tagged outputs for each file (`*_ASCinfo.txt`)        |
| `--print-asc`        | Print ASC-tagged output to terminal                            |

## Output for `--print-asc`

When using the `--print-asc` option, the output for each sentence shows aligned token information and its ASC label (`None` if no ASC applies):

```
# sent_id = 1
1	The	the	
2	idea	idea	
3	is	be	ATTR
4	trust	trust	
```
You can save this output to txt files by including `--save-asc-output`.

## Citation

- If you use the ASC tagger (`--print-asc`, `--save-asc-output`) in your research, please cite:
    - Sung, H., & Kyle, K. (2024). [Leveraging Pre-trained Language Models for Linguistic Analysis: A Case of Argument Structure Constructions](https://aclanthology.org/2024.emnlp-main.415/). *Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP)*.

- The ASC Analyzer is currently in beta testing and will be updated.

# License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

See the full license [here](https://creativecommons.org/licenses/by-nc-sa/4.0/).
