Metadata-Version: 2.4
Name: pvw-cli
Version: 1.11.10
Summary: Microsoft Purview CLI with comprehensive automation capabilities
Author-email: AYOUB KEBAILI <keayoub@msn.com>
Maintainer-email: AYOUB KEBAILI <keayoub@msn.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/Keayoub/pvw-cli
Project-URL: Documentation, https://github.com/Keayoub/pvw-cli/wiki
Project-URL: Repository, https://github.com/Keayoub/pvw-cli.git
Project-URL: Bug Tracker, https://github.com/Keayoub/pvw-cli/issues
Project-URL: Source, https://github.com/Keayoub/pvw-cli
Keywords: azure,purview,cli,data,catalog,governance,automation,pvw
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: System Administrators
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: System :: Systems Administration
Classifier: Topic :: Database
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Utilities
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: azure-identity>=1.23.0
Requires-Dist: azure-core>=1.34.0
Requires-Dist: click>=8.1.7
Requires-Dist: rich>=13.7.0
Requires-Dist: requests>=2.32.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: aiohttp>=3.13.0
Requires-Dist: PyYAML<7.0,>=6.0
Requires-Dist: cryptography<50.0.0,>=42.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.20.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=0.991; extra == "dev"
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=5.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: myst-parser>=0.18.0; extra == "docs"
Provides-Extra: test
Requires-Dist: requests-mock>=1.9.0; extra == "test"

# pvw-cli — Microsoft Purview Command-Line Interface

[![Version](https://img.shields.io/badge/version-1.11.5-blue.svg)](https://github.com/Keayoub/pvw-cli/releases/tag/v1.11.10)
[![Status](https://img.shields.io/badge/status-stable-success.svg)](https://github.com/Keayoub/pvw-cli)
[![Docs](https://img.shields.io/badge/docs-github%20pages-0A8F84?logo=githubpages&logoColor=white)](https://keayoub.github.io/pvw-cli/)

A Python CLI and library for automating Microsoft Purview. Covers the Data Map, Unified Catalog, Collections, Search, Lineage, Scan, and Management APIs.

---

## Install

```bash
pip install pvw-cli
```

For the latest development version:

```bash
git clone https://github.com/Keayoub/pvw-cli.git
cd pvw-cli
pip install -r requirements.txt
pip install -e .
```

---

## Configuration

Set these three environment variables before running any command:

| Variable | Description |
|---|---|
| `PURVIEW_ACCOUNT_NAME` | Your Purview account name (e.g. `mycompany-purview`) |
| `PURVIEW_ACCOUNT_ID` | Your Azure Tenant ID (used as the Purview account ID for UC APIs) |
| `PURVIEW_RESOURCE_GROUP` | The resource group containing your Purview account |

**PowerShell:**

```powershell
$env:PURVIEW_ACCOUNT_NAME = "your-purview-account"
$env:PURVIEW_ACCOUNT_ID   = "your-tenant-id-guid"
$env:PURVIEW_RESOURCE_GROUP = "your-resource-group"
```

**Bash / Linux / macOS:**

```bash
export PURVIEW_ACCOUNT_NAME=your-purview-account
export PURVIEW_ACCOUNT_ID=your-tenant-id-guid
export PURVIEW_RESOURCE_GROUP=your-resource-group
```

To find your Tenant ID:

```bash
az account show --query tenantId -o tsv
```

---

## Authentication

The CLI uses `DefaultAzureCredential` and tries methods in this order:

1. **Azure CLI** — run `az login` (easiest for local use)
2. **Service Principal** — set `AZURE_CLIENT_ID`, `AZURE_TENANT_ID`, `AZURE_CLIENT_SECRET`
3. **Managed Identity** — works automatically on Azure VMs, App Service, etc.

**Legacy tenant note:** If you get `AADSTS500011: resource principal https://purview.azure.com not found`, your tenant uses the older service principal. Set:

```bash
export PURVIEW_AUTH_SCOPE=https://purview.azure.net/.default
```

Check which your tenant uses:

```bash
az ad sp show --id "73c2949e-da2d-457a-9607-fcc665198967" --query servicePrincipalNames -o json
```

---

## Command Groups

```
pvw account          Account management
pvw collections      Collections CRUD and permissions
pvw entity           Entity read, create, update, bulk operations
pvw glossary         Classic glossary terms
pvw lineage          Lineage creation and CSV import
pvw scan             Data source scanning
pvw search           Search and discovery
pvw types            Type definitions
pvw uc               Unified Catalog (domains, terms, data products, OKRs, CDEs, quality)
pvw workflow         Approval workflows
pvw diagnostics      Cache stats and profile info
```

Run `pvw <command> --help` for full options on any command.

---

## 📚 Quick Start & Documentation

### Quick Reference Guide
For a comprehensive command reference with examples, see **[docs/quick-reference.md](docs/quick-reference.md)**

This guide covers:
- All Unified Catalog commands (terms, domains, data products, CDEs, OKRs)
- Data Quality commands and workflow examples
- Facets, hierarchy, and relationship operations
- Common patterns and troubleshooting tips

### Additional Documentation

- **[API Implementation Status](docs/API_GAPS_ANALYSIS.md)** - Complete API coverage analysis
- **[Performance Guide](docs/performance-optimization-guide.md)** - Optimization techniques and caching
- **[Authentication Troubleshooting](docs/authentication-troubleshooting.md)** - Fix auth issues
- **[Sample Notebooks](samples/notebooks%20(basic)/)** - Jupyter notebooks with working examples
- **[Advanced Notebooks](samples/notebooks%20(plus)/)** - Data visualization and analytics

---
---

## Examples

### Search

```bash
# Search by keyword
pvw search query --keywords "customer" --limit 10

# Table output (default), JSON, or colored JSON
pvw search query --keywords "sales" --limit 5
pvw search query --keywords "sales" --limit 5 --output json
pvw search query --keywords "sales" --limit 5 --output jsonc

# Show GUIDs in output (useful for follow-up operations)
pvw search query --keywords "customer" --show-ids

# Autocomplete and suggestions
pvw search autocomplete --keywords "ord" --limit 5
pvw search suggest --keywords "prod" --limit 5
```

### Entity

```bash
# List all entities
pvw entity list --limit 25

# Filter by type
pvw entity list --type-name azure_sql_table --limit 10

# Read entity by GUID
pvw entity read --guid "4fae348b-e960-42f7-834c-38f6f6f60000"

# Update a single attribute
pvw entity update-attribute \
  --guid "4fae348b-e960-42f7-834c-38f6f6f60000" \
  --attribute description \
  --value "Customer address data - SalesLT schema"

# Add a classification
pvw entity add-classification \
  --guid "ea3412c3-7387-4bc1-9923-11f6f6f60000" \
  --classification "MICROSOFT.PERSONAL.EMAIL"

# Business metadata
pvw entity add-business-metadata \
  --guid "entity-guid" \
  --bm-name "Compliance" \
  --attr-name "DataOwner" \
  --attr-value "finance-team"
```

### Collections

```bash
# List collections and hierarchy
pvw collections list
pvw collections read-hierarchy --collection-name "Data Engineering"

# Create a collection
pvw collections create \
  --name "analytics" \
  --friendly-name "Analytics Team" \
  --description "Assets for the analytics team"

# View permissions
pvw collections read-permissions --collection-name "analytics"
```

### Unified Catalog (UC)

```bash
# Domains
pvw uc domain list
pvw uc domain create --name "Finance" --description "Financial data governance"
pvw uc domain get --domain-id "abc-123"

# Glossary terms
pvw uc term list --domain-id "abc-123"
pvw uc term list --domain-id "abc-123" --output json
pvw uc term create --name "Customer" --domain-id "abc-123" --description "A person who purchases products"
pvw uc term show --term-id "term-456"
pvw uc term update --term-id "term-456" --description "Updated definition"
pvw uc term delete --term-id "term-456" --confirm

# Bulk term import from CSV
pvw uc term import-csv --csv-file samples/csv/uc_terms_bulk_example.csv --domain-id "abc-123" --dry-run
pvw uc term import-csv --csv-file samples/csv/uc_terms_bulk_example.csv --domain-id "abc-123"

# Bulk term import from JSON
pvw uc term import-json --json-file samples/json/term/uc_terms_bulk_example.json --domain-id "abc-123"

# Sync UC terms to a classic glossary
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid"
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing --delete-removed
pvw uc term sync-classic --domain-id "abc-123" --glossary-guid "gloss-guid" --update-existing --dry-run

# Data products
pvw uc dataproduct list --domain-id "abc-123"
pvw uc dataproduct create --name "Customer Analytics" --domain-id "abc-123" --type Analytical --status Draft
pvw uc dataproduct update --product-id "prod-789" --status Published --endorsed

# Link a data product to an entity
pvw uc dataproduct link-entity \
  --id "prod-789" \
  --entity-id "4fae348b-e960-42f7-834c-38f6f6f60000" \
  --type-name azure_sql_table

# Objectives (OKRs)
pvw uc objective list --domain-id "abc-123"
pvw uc objective create --definition "Improve data quality score to 95%" --domain-id "abc-123"

# Critical Data Elements (CDEs)
pvw uc cde list --domain-id "abc-123"
pvw uc cde create --name "Social Security Number" --data-type String --domain-id "abc-123"
pvw uc cde link-entity --id "cde-789" --entity-id "ea3412c3-7387-4bc1-9923-11f6f6f60000"

# Facets and analytics
pvw uc term facets --output table
pvw uc dataproduct facets --domain-id "abc-123" --output json
pvw uc cde facets --output table

# Governance health
pvw uc health query
pvw uc health query --severity High
pvw uc health summary
pvw uc health update --action-id "action-guid" --status InProgress
```

### Lineage

```bash
# Create column-level lineage
pvw lineage create-column \
  --process-name "ETL_Sales_Transform" \
  --source-table-guid "9ebbd583-4987-4d1b-b4f5-d8f6f6f60000" \
  --target-table-guids "c88126ba-5fb5-4d33-bbe2-5ff6f6f60000" \
  --column-mapping "ProductID:ProductID,Name:Name"

# Import from CSV
pvw lineage validate lineage_data.csv
pvw lineage import lineage_data.csv
pvw lineage sample output.csv --num-samples 10 --template detailed
```

Lineage CSV columns: `source_entity_guid`, `target_entity_guid`, `relationship_type`, `process_name`, `description`, `confidence_score`, `owner`, `metadata`

### Classic Glossary

```bash
pvw glossary list-terms --glossary-guid "your-glossary-guid"
pvw glossary create-term --payload-file term.json
```

### Workflows

```bash
pvw workflow list
pvw workflow get --workflow-id "workflow-123"
pvw workflow create --workflow-id "approval-1" --payload-file workflow-definition.json
pvw workflow execute --workflow-id "workflow-123"
pvw workflow executions --workflow-id "workflow-123"
```

### Diagnostics

```bash
pvw diagnostics cache-stats
pvw diagnostics profile-info
pvw diagnostics clear-cache
```

---

## Output Formats

Most list commands support `--output`:

| Format | Use case |
|---|---|
| `table` | Default — human-readable Rich table |
| `json` | Plain JSON for piping to PowerShell, bash, jq |
| `jsonc` | Colored JSON for viewing in terminal |

**PowerShell example:**

```powershell
$terms = pvw uc term list --domain-id $domainId --output json | ConvertFrom-Json
$terms | Where-Object { $_.status -eq "Draft" } | Export-Csv draft_terms.csv -NoTypeInformation
```

**Bash / jq example:**

```bash
pvw uc term list --domain-id $DOMAIN_ID --output json | jq '.[] | .name'
```

---

## Bulk Import CSV Format (Terms)

```csv
name,description,status,acronym,owner_id,resource_name,resource_url
Customer Acquisition Cost,Cost to acquire a new customer,Draft,CAC,<entra-object-id-guid>,Metrics Guide,https://docs.example.com
```

Notes:
- `owner_id` must be an Entra ID Object ID (GUID), not an email address
- Terms in unpublished domains must use `Draft` status
- Sample files: `samples/csv/uc_terms_bulk_example.csv`, `samples/json/term/uc_terms_bulk_example.json`

---

## Sample Files

| Path | Contents |
|---|---|
| `samples/csv/uc_terms_bulk_example.csv` | 8 sample UC terms for import |
| `samples/json/term/uc_terms_bulk_example.json` | 8 data management terms (JSON format) |
| `samples/csv/lineage_example.csv` | Sample lineage relationships |
| `samples/notebooks (basic)/` | Basic Purview CLI notebook examples |
| `samples/notebooks (plus)/` | Advanced examples including bulk import |

---

## Documentation

- [Full docs](docs/readme.md)
- [Unified Catalog commands](docs/commands/unified-catalog.md)
- [Term bulk import guide](docs/commands/unified-catalog/term-bulk-import.md)
- [Performance optimization guide](docs/performance-optimization-guide.md)
- [Release archive](releases/)

---

## Requirements

- Python 3.8+
- Microsoft Purview account
- Azure CLI (`az login`) or Service Principal credentials

---

## Support

- Issues: [GitHub Issues](https://github.com/Keayoub/pvw-cli/issues)
- Email: [keayoub@msn.com](mailto:keayoub@msn.com)

---

## License

See [LICENSE](LICENSE) for details.
