Metadata-Version: 2.4
Name: deepwiki-to-md
Version: 2.0.3
Summary: Extensible Next.js/DeepWiki content extractor with zero external dependencies
Author-email: DeepWiki Extractor Team <contact@example.com>
Maintainer-email: DeepWiki Extractor Team <contact@example.com>
License: MIT
Project-URL: Homepage, https://github.com/yuyu1815/deepwiki_to_md
Project-URL: Repository, https://github.com/yuyu1815/deepwiki_to_md
Project-URL: Bug Tracker, https://github.com/yuyu1815/deepwiki_to_md/issues
Project-URL: Documentation, https://github.com/yuyu1815/deepwiki_to_md/blob/main/docs/
Project-URL: Changelog, https://github.com/yuyu1815/deepwiki_to_md/blob/main/docs/CHANGELOG.md
Keywords: scraping,nextjs,deepwiki,markdown,extraction,content,zero-dependency
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP :: Indexing/Search
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.4.0; extra == "docs"
Dynamic: license-file

# deepwiki-to-md


English README. 日本語はこちら → [README_JP.md](README_JP.md)



Zero-dependency CLI and Python library to extract Markdown from Next.js/DeepWiki HTML. Includes a small search helper for public repository indexes and an optional chat helper.

- CLI: `deepwiki-to-md`
- Requirements: Python 3.8+
- Dependencies: Standard library only (optional extras for dev/docs)

## Install

```bash
pip install deepwiki-to-md
```

## Usage

- From local HTML/string (CLI and Python):
```bash
# CLI
echo "<html>...</html>" | deepwiki-to-md
```
```python
# Python API
from deepwiki_to_md import ContentExtractor

html = """
<!doctype html>
<html>...</html>
"""

extractor = ContentExtractor()
md = extractor.extract_from_html(html)
print(md)
```

- From URL (files are saved only when the input is a URL):
```bash
# CLI
# Files under .deepwiki are created only for URL input
deepwiki-to-md https://deepwiki.com/microsoft/vscode/some-page --path ./.deepwiki
```
```python
# Python API (same behavior as the CLI)
from deepwiki_to_md import ContentExtractor, save_markdown_to_library

url = "https://deepwiki.com/microsoft/vscode/some-page"
base_dir = "./.deepwiki"  # equivalent to --path (optional)

extractor = ContentExtractor()
md = extractor.extract_from_url(url)

result = save_markdown_to_library(md, url, base_dir)
print("saved files:")
for p in result["saved_files"]:
    print(" -", p)
print("library index:", result["library_file"])  # .deepwiki/<username>/<library>.md
```

- Search public repository indexes:
```bash
# CLI (JSON by default)
deepwiki-to-md --search "Gemini"

# Human-readable development-log style
deepwiki-to-md --search "Gemini" --devlog
```
```python
# Python API (same search capability)
from search_repository import search_repositories, API_URL

print(API_URL)  # => https://api.devin.ai/ada/list_public_indexes
result = search_repositories("Gemini")
indices = result.get("indices", [])
print("indices:", len(indices))
```

- Chat with Devin API (via CLI):
```bash
# Positional argument must be a DeepWiki URL
# JSON output by default
deepwiki-to-md https://deepwiki.com/microsoft/vscode --chat "What is the purpose of this repository?"

# Human-readable output for development logs
deepwiki-to-md https://deepwiki.com/microsoft/vscode --chat "Summarize top features" --devlog
```
Options for chat via deepwiki-to-md:
- `--chat MESSAGE`: Message to send. Requires a DeepWiki URL as the positional input.
- `--deep-research`: Enable deep research mode for chat.
- `--config-file PATH`: Path to chat config JSON (default: ./config.json). The file must exist and contain complete settings.
- `--devlog`: When used with --chat, prints a human-readable response body and reference files.

## License

MIT License

## More documentation

- Library reference (includes both Python API and CLI examples): [deepwiki_to_md.md](deepwiki_to_md.md)

### Chat (Devin API) result object: ChatResult

The chat helper (src/chat.py) returns a ChatResult object instead of a plain dict.

- Highlights
  - Inherits from dict → works with json.dumps(result) directly.
  - Convenient attribute access (e.g., result.response_message) and to_dict().
  - print(result) shows a human-readable summary.

- Main properties
  - sent_message: str
  - response_message: Optional[str]
  - status_code: Any
  - reference_files: List[str]
  - reference_file_contents: Dict[str, str]

- Example (excerpt)
```python
import asyncio
import json
from chat import load_or_create_config, send_chat_message, ChatResult

async def main() -> None:

    result: ChatResult = await send_chat_message(
        wiki_url='https://deepwiki.com/microsoft/vscode',
        message='What is the purpose of this repository?',
        use_deep_research=False,
    )

    print(result)  # human-readable summary via __str__
    print(result.response_message)  # attribute access
    print(json.dumps(result, indent=2, ensure_ascii=False))  # still a dict

if __name__ == '__main__':
    asyncio.run(main())
```


Arguments for chat.py:
- `--url`: URL of the chat interface.
- `--message`: Message to send.
- `--selector`: CSS selector for the chat input (default: textarea).
- `--button`: CSS selector for the submit button (default: button).
- `--wait`: Time to wait for response in seconds (default: 30).
- `--debug`: Enable debug mode.
- `--output`: Output directory (default: ChatResponses).
- `--deep`: Enable "Deep Research" mode (specific to some interfaces).
- `--headless`: Run browser in headless mode.
- `--format`: Output format(s): html, md, yaml, or comma-separated list (default: html).

Note: The chat scraper uses Selenium, which requires a compatible browser installed.

## License

This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.

