Metadata-Version: 2.4
Name: msdocx
Version: 0.2.1
Summary: Python tools for Word OOXML documents, including MS-DOCX extension namespaces
Author: Anthony Shaw
License-Expression: MIT
Keywords: docx,word,ooxml,ms-docx,office
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Office/Business
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: lxml>=5.0
Requires-Dist: pydantic>=2.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-cov>=5.0; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: hypothesis>=6.0; extra == "dev"
Dynamic: license-file

# msdocx

Python tools for Word OOXML documents, including MS-DOCX extension namespaces (w14/w15/w16).

## Overview

`msdocx` is a Python package for working with Word `.docx` files, including the published [MS-DOCX specification](https://learn.microsoft.com/en-us/openspecs/office_standards/ms-docx/b839fe1f-e1ca-4fa6-8c26-5954d0abbccd). It supports extension namespaces that are not part of the base ISO/IEC 29500 standard.

## Installation

```bash
pip install msdocx
```

## Quick Start

```python
from msdocx import Document
from msdocx.text_effects import TextEffect, ShadowEffect, GlowEffect

# Create a new document
doc = Document.new()

# Add content
doc.add_heading("My Document", level=1)
doc.add_paragraph("Hello, World!", bold=True, font="Calibri", size=28)

# Add a table
doc.add_table(rows=3, cols=2, width=9360)

# Add lists
doc.add_bullet_list(["Item 1", "Item 2", "Item 3"])
doc.add_numbered_list(["Step 1", "Step 2", "Step 3"])

# Save
doc.save("output.docx")
```

## MS-DOCX Extension Features

These features are available in `msdocx` and cover MS-DOCX extension areas beyond the base OOXML standard:

### Text Effects (w14 namespace)

```python
from msdocx.text_effects import TextEffect, ShadowEffect, GlowEffect, ReflectionEffect, TextOutline
from msdocx.oxml.text import make_run
from msdocx.oxml.ns import qn

# Create a run with text effects
run = make_run(text="Styled Text", bold=True, size=48)
rpr = run.find(qn("w:rPr"))

effect = TextEffect(
    shadow=ShadowEffect(color="4472C4", alpha=60000),
    glow=GlowEffect(color="00FF00", radius=63500),
    reflection=ReflectionEffect(),
    ligatures="standardContextual",
    numeral_form="oldStyle",
)
effect.apply(rpr)
```

### Content Controls with MS-DOCX Extensions

```python
from msdocx.content_controls import ContentControl, ContentControlType

# Checkbox (w14 extension)
checkbox = ContentControl(ContentControlType.CHECKBOX)
checkbox.set_checked(True)

# Repeating section (w15 extension)
repeating = ContentControl(ContentControlType.REPEATING_SECTION)

# Entity picker (w15 extension)
picker = ContentControl(ContentControlType.ENTITY_PICKER)

# Color picker (w14 extension)
color = ContentControl(ContentControlType.COLOR)
```

### Tracked Changes with Conflict Resolution

```python
from msdocx.tracked_changes import TrackedChange

# Standard tracked changes
ins = TrackedChange.insert("new text", author="Alice")
del_el = TrackedChange.delete("old text", author="Bob")
del_el, ins_el = TrackedChange.replace("old", "new", author="Editor")

# MS-DOCX conflict resolution (w14 extension)
conflict_ins = TrackedChange.conflict_insert("conflict text", author="User1")
conflict_del = TrackedChange.conflict_delete("deleted in conflict", author="User2")
```

### Collaboration Features

```python
from msdocx.collaboration import CollaborationInfo, mark_spelling_error

# Paragraph unique IDs (w14:paraId, w14:textId) — auto-generated
info = CollaborationInfo.generate()
info.apply_to_paragraph(paragraph_element)

# Inline spelling markup
mark_spelling_error(paragraph, start_run_index=1, end_run_index=1)
```

### Extended Compatibility Settings

```python
from msdocx.compatibility import set_compat_mode, enable_opentype_features

# Set Word 2013+ compatibility mode
set_compat_mode(doc.settings, mode=15)

# Enable OpenType font features (MS-DOCX extension)
enable_opentype_features(doc.settings)
```

### Accept / Reject Tracked Changes

```python
from msdocx.tracked_changes import TrackedChange
from msdocx.oxml.ns import qn

# Accept a single insertion (keep the inserted text, remove tracking wrapper)
ins_element = body.find(f".//{qn('w:ins')}")
TrackedChange.accept_insertion(ins_element)

# Reject a single insertion (remove the inserted text entirely)
TrackedChange.reject_insertion(ins_element)

# Accept a deletion (remove the deleted text permanently)
TrackedChange.accept_deletion(del_element)

# Reject a deletion (restore the deleted text)
TrackedChange.reject_deletion(del_element)

# Accept or reject ALL changes in the document body at once
TrackedChange.accept_all(doc.body)
TrackedChange.reject_all(doc.body)
```

### Reading Document Content

```python
doc = Document.open("existing.docx")

# Get all text as a single string
text = doc.get_text()

# Get structured paragraph data
paragraphs = doc.get_paragraphs()
for p in paragraphs:
    print(p["text"], p["style"], p["bold"], p["italic"])

# Get tables as nested lists
tables = doc.get_tables()
for table in tables:
    for row in table:
        print(row)  # ["cell1", "cell2", ...]

# Get all comments with threading info
comments = doc.get_comments()
for c in comments:
    print(f'{c["author"]}: {c["text"]}')
    if c["parent_id"] is not None:
        print(f'  (reply to comment {c["parent_id"]})')
```

### Comment Threading

```python
# Create a parent comment
parent_id = doc.add_comment("Please review this section", author="Reviewer")

# Reply to the parent comment — automatically creates commentsExtended.xml
reply_id = doc.add_comment("Fixed in v2", author="Author", parent_id=parent_id)

# Read back threaded comments
comments = doc.get_comments()
# comments[1]["parent_id"] == parent_id
```

## Specification Reference

This package implements the [MS-DOCX] specification, revision 22.1 (November 2025):
https://learn.microsoft.com/en-us/openspecs/office_standards/ms-docx/b839fe1f-e1ca-4fa6-8c26-5954d0abbccd
