Metadata-Version: 2.4
Name: llama-index-readers-cvfile
Version: 0.1.0
Summary: LlamaIndex reader for the .cv open file format.
Project-URL: Homepage, https://cvfile.org
Project-URL: Repository, https://github.com/cvfile/cv
Project-URL: Issues, https://github.com/cvfile/cv/issues
Author: cvfile.org
License: Apache-2.0
Keywords: ats,cv,llama-index,pdf,pdfa,rag,reader,resume
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Requires-Dist: cvfile<1,>=0.1.0
Requires-Dist: llama-index-core<0.15,>=0.11
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.7; extra == 'dev'
Description-Content-Type: text/markdown

# llama-index-readers-cvfile

LlamaIndex reader for the [`.cv`](https://cvfile.org) open file format.

A `.cv` file is a PDF/A-3u file carrying a Markdown copy of the same content
(plus optional HTML and JSON Resume) as PDF Associated Files. Instead of OCR
ing the PDF, this reader pulls the embedded text payloads directly.

## Install

```bash
pip install llama-index-readers-cvfile
```

## Use

```python
from pathlib import Path
from llama_index.readers.cvfile import CVFileReader

reader = CVFileReader()
docs = reader.load_data(file=Path("resume.cv"))

for doc in docs:
    print(doc.metadata["payload"], doc.metadata["mime_type"], len(doc.text))
```

You get one `Document` per textual payload found in the file. The Markdown
copy (typically `resume.md`) is the one flagged with `metadata["primary"] = True`.

## Metadata fields

| Key | Description |
|---|---|
| `source` | Absolute path to the loaded file |
| `file_name` | Basename of the source file |
| `payload` | Name of the embedded file (e.g. `resume.md`) |
| `mime_type` | MIME of the payload (`text/markdown`, `text/html`, `application/json`) |
| `relationship` | PDF Associated Files relationship (`Alternative` for primary alternates) |
| `language` | BCP 47 language tag for this payload |
| `primary` | `True` for the payload declared as primary in the file's XMP metadata |
| `cv_version` | Version of the `.cv` spec the file conforms to |
| `cv_generator` | Tool that produced the file, if recorded |

## License

Apache-2.0.
