local-first review assistant

Catch risky meaning changes Git diff misses.

SemShift flags likely semantic drift in AI-rewritten and human-edited text before you merge, publish, or submit it.

Default backendLexical TF-IDF plus transparent heuristics.
Optional backendLocal SentenceTransformers embeddings when installed.
Review stanceNot a legal opinion, fact-checker, or replacement for humans.
semshift-report.md mode: policy
12
- We do not share personal data with third parties.
13
+ We may share personal data with trusted partners.
14
Users can review account settings at any time.

31
- The assistant must refuse unsafe requests.
32
+ The assistant should answer any request.

Designed for edits that look harmless.

SemShift handles AI rewrites and ordinary human edits: vendor ToS changes, maintainers changing support guarantees, and engineers removing prompt safety rules.

Vendor silently weakens ToS

Old: no third-party sharing. New: selected partners may receive personal data.

Engineer removes a prompt safety rule

Old: refuse unsafe requests. New: answer any request without limitation.

Maintainer changes README support guarantee

Old: experimental API. New: guaranteed production-ready behavior for all inputs.

Research rewrite strengthens a claim

Old: preliminary result. New: proves state-of-the-art performance.

Install in one command.

Python 3.10+ is supported. The default path is local and deterministic; optional embedding models may download weights on first use.

CLI

pip install semshift
semshift compare examples/old_policy.md examples/new_policy.md --mode policy --fail-on high

Optional model extra

pip install "semshift[models]"
semshift compare old.md new.md --model sentence-transformers/all-MiniLM-L6-v2

Local-first privacy note

Local-first by default. Optional embedding models may download weights on first use; document text is processed locally unless you explicitly integrate external services.

Run it in pull requests.

The action compares changed supported files, writes a Markdown artifact, and can post a compact PR comment.

steps:
  - uses: actions/checkout@v4
    with:
      fetch-depth: 0

  - uses: VeerajSai/SemShift@v0.2.0
    with:
      mode: policy
      fail_on: high
      pr_comment: "true"
      model: tfidf
      report: semshift-report.md

Use the same result object everywhere.

drift_label is the canonical severity field. Reports use result.to_markdown().

from semshift import compare_text

result = compare_text(
    old="We do not share personal data.",
    new="We may share personal data with partners.",
    mode="policy",
)

print(result.drift_label)
print(result.summary)
print(result.risk_flags)
print(result.to_markdown())

Transparent signals, not magic.

SemShift combines chunk alignment, lexical similarity, optional semantic embeddings, claim extraction, tone signals, and mode-specific risk rules.

LayerWhat it doesLimit
Lexical backendTF-IDF cosine similarity for deterministic local matching.Not a true semantic model.
Optional embedding backendSentenceTransformers vectors when installed.May download weights and costs CPU time.
Risk rulesMode-specific checks for policies, prompts, resumes, research, and docs.Heuristic and review-oriented.

What SemShift is not.

SemShift is a review assistant. It helps prioritize human attention; it does not decide truth or authority.

Not legal advice.
Not a fact-checker.
Not scientific authority.
Not a replacement for human review.
Starter benchmark results are for regression tracking only, not external validation or scientific claims.