Catch risky meaning changes Git diff misses.
SemShift flags likely semantic drift in AI-rewritten and human-edited text before you merge, publish, or submit it.
Designed for edits that look harmless.
SemShift handles AI rewrites and ordinary human edits: vendor ToS changes, maintainers changing support guarantees, and engineers removing prompt safety rules.
Vendor silently weakens ToS
Old: no third-party sharing. New: selected partners may receive personal data.
Engineer removes a prompt safety rule
Old: refuse unsafe requests. New: answer any request without limitation.
Maintainer changes README support guarantee
Old: experimental API. New: guaranteed production-ready behavior for all inputs.
Research rewrite strengthens a claim
Old: preliminary result. New: proves state-of-the-art performance.
Install in one command.
Python 3.10+ is supported. The default path is local and deterministic; optional embedding models may download weights on first use.
CLI
pip install semshift
semshift compare examples/old_policy.md examples/new_policy.md --mode policy --fail-on high
Optional model extra
pip install "semshift[models]"
semshift compare old.md new.md --model sentence-transformers/all-MiniLM-L6-v2
Local-first privacy note
Local-first by default. Optional embedding models may download weights on first use; document text is processed locally unless you explicitly integrate external services.
Run it in pull requests.
The action compares changed supported files, writes a Markdown artifact, and can post a compact PR comment.
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: VeerajSai/SemShift@v0.2.0
with:
mode: policy
fail_on: high
pr_comment: "true"
model: tfidf
report: semshift-report.md
Use the same result object everywhere.
drift_label is the canonical severity field. Reports use result.to_markdown().
from semshift import compare_text
result = compare_text(
old="We do not share personal data.",
new="We may share personal data with partners.",
mode="policy",
)
print(result.drift_label)
print(result.summary)
print(result.risk_flags)
print(result.to_markdown())
Transparent signals, not magic.
SemShift combines chunk alignment, lexical similarity, optional semantic embeddings, claim extraction, tone signals, and mode-specific risk rules.
| Layer | What it does | Limit |
|---|---|---|
| Lexical backend | TF-IDF cosine similarity for deterministic local matching. | Not a true semantic model. |
| Optional embedding backend | SentenceTransformers vectors when installed. | May download weights and costs CPU time. |
| Risk rules | Mode-specific checks for policies, prompts, resumes, research, and docs. | Heuristic and review-oriented. |
What SemShift is not.
SemShift is a review assistant. It helps prioritize human attention; it does not decide truth or authority.