Metadata-Version: 2.4
Name: langchain-cvfile
Version: 0.1.0
Summary: LangChain document loader for the .cv open file format.
Project-URL: Homepage, https://cvfile.org
Project-URL: Repository, https://github.com/cvfile/cv
Project-URL: Issues, https://github.com/cvfile/cv/issues
Author: cvfile.org
License: Apache-2.0
Keywords: ats,cv,document-loader,langchain,pdf,pdfa,rag,resume
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Requires-Dist: cvfile<1,>=0.1.0
Requires-Dist: langchain-core<1,>=0.3
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.7; extra == 'dev'
Description-Content-Type: text/markdown

# langchain-cvfile

LangChain document loader for the [`.cv`](https://cvfile.org) open file format.

A `.cv` file is a PDF/A-3u file carrying a Markdown copy of the same content
(plus optional HTML and JSON Resume) as PDF Associated Files. Instead of OCR
ing the PDF, this loader pulls the embedded text payloads directly.

## Install

```bash
pip install langchain-cvfile
```

## Use

```python
from langchain_cvfile import CVFileLoader

loader = CVFileLoader("resume.cv")
docs = loader.load()

for doc in docs:
    print(doc.metadata["payload"], doc.metadata["mime_type"], len(doc.page_content))
```

You get one `Document` per textual payload found in the file. The Markdown
copy (typically `resume.md`) is the one flagged with `metadata["primary"] = True`.

## Metadata fields

| Key | Description |
|---|---|
| `source` | The file path you passed to the loader |
| `payload` | Name of the embedded file (e.g. `resume.md`) |
| `mime_type` | MIME of the payload (`text/markdown`, `text/html`, `application/json`) |
| `relationship` | PDF Associated Files relationship (`Alternative` for primary alternates) |
| `language` | BCP 47 language tag for this payload |
| `primary` | `True` for the payload declared as primary in the file's XMP metadata |
| `cv_version` | Version of the `.cv` spec the file conforms to |
| `cv_generator` | Tool that produced the file, if recorded |

## License

Apache-2.0.
