# adversarial_inputs.txt -- SMS sanitizer edge-case corpus
# Format: each case is three lines:
#   # CASE  : <human description of the offender(s)>
#   # EXPECT: <exact sanitized output, default drop_unknown=True>
#   <raw input line>
# Blank lines separate cases. EXPECT/raw are stored on ONE physical line each;
# real newlines inside an input are shown escaped as \n in BOTH input and EXPECT.
# Generated against smssafe.charsets (single source of truth).

# CASE  : smart single quote (apostrophe) U+2019
# EXPECT: Don't miss your payment
Don’t miss your payment

# CASE  : smart double quotes U+201C/201D
# EXPECT: "Karibu" kwenye Briq
“Karibu” kwenye Briq

# CASE  : em dash U+2014
# EXPECT: Pay now-thank you
Pay now—thank you

# CASE  : en dash range U+2013
# EXPECT: Open 9-5pm today
Open 9–5pm today

# CASE  : figure/minus dashes U+2012/U+2212
# EXPECT: Bal -5,000 -2,000
Bal −5,000 ‒2,000

# CASE  : bullet U+2022 list
# EXPECT: - Loan 50000\n- Due Friday
• Loan 50000\n• Due Friday

# CASE  : triangular + white bullet U+2023/U+25E6
# EXPECT: - one - two
‣ one ◦ two

# CASE  : horizontal ellipsis U+2026
# EXPECT: Tafadhali subiri...
Tafadhali subiri…

# CASE  : midline ellipsis U+22EF
# EXPECT: 1, 2, 3... done
1, 2, 3⋯ done

# CASE  : no-break space U+00A0
# EXPECT: Lipa TZS 5,000 leo
Lipa TZS 5,000 leo

# CASE  : narrow + thin space U+202F/U+2009
# EXPECT: 5 000 TZS
5 000 TZS

# CASE  : ideographic space U+3000
# EXPECT: Briq SMS
Briq　SMS

# CASE  : zero-width space inside word U+200B
# EXPECT: Hello world
He​llo wor​ld

# CASE  : zero-width joiner/non-joiner U+200D/200C
# EXPECT: testing
test‍‌ing

# CASE  : BOM / zero-width no-break at start U+FEFF
# EXPECT: Welcome to Briq
﻿Welcome to Briq

# CASE  : word joiner U+2060
# EXPECT: BriqSMS
Briq⁠SMS

# CASE  : soft hyphen U+00AD (invisible)
# EXPECT: registration
regis­tration

# CASE  : tilde U+007E is GSM-7 extended -> passes through (preserves "approx")
# EXPECT: Approx ~5000 TZS
Approx ~5000 TZS

# CASE  : tilde operator U+223C -> ASCII tilde
# EXPECT: ~1hr remaining
∼1hr remaining

# CASE  : fullwidth tilde U+FF5E -> ASCII tilde
# EXPECT: 9~5pm
9～5pm

# CASE  : logical not U+00AC -> dash
# EXPECT: Status: -active
Status: ¬active

# CASE  : backtick U+0060 -> apostrophe
# EXPECT: Use 'OTP' now
Use `OTP` now

# CASE  : acute accent U+00B4 -> apostrophe
# EXPECT: ng'ombe
ng´ombe

# CASE  : Swahili modifier apostrophe U+02BC
# EXPECT: ng'ombe wengi
ngʼombe wengi

# CASE  : turned comma okina U+02BB
# EXPECT: m'mojawapo
mʻmojawapo

# CASE  : trademark U+2122
# EXPECT: BriqTM Platform
Briq™ Platform

# CASE  : copyright U+00A9
# EXPECT: (c) 2026 Briq
© 2026 Briq

# CASE  : registered U+00AE
# EXPECT: Visa(R) card
Visa® card

# CASE  : numero sign U+2116
# EXPECT: Acct No. 4471
Acct № 4471

# CASE  : care-of U+2105
# EXPECT: Juma c/o Asha
Juma ℅ Asha

# CASE  : rupee U+20B9
# EXPECT: INR500 fee
₹500 fee

# CASE  : bitcoin U+20BF
# EXPECT: Send BTC0.5
Send ₿0.5

# CASE  : naira U+20A6
# EXPECT: NGN10,000 balance
₦10,000 balance

# CASE  : baht U+0E3F lookalike
# EXPECT: THB300 charge
฿300 charge

# CASE  : cent U+00A2
# EXPECT: 50c fee
50¢ fee

# CASE  : right arrow U+2192
# EXPECT: Login -> Dashboard
Login → Dashboard

# CASE  : double right arrow U+21D2
# EXPECT: OTP => verify
OTP ⇒ verify

# CASE  : left-right arrow U+2194
# EXPECT: Sync A<->B
Sync A↔B

# CASE  : black/marketing arrow U+27A1
# EXPECT: Tap here -> pay
Tap here ➡ pay

# CASE  : almost-equal U+2248 -> '='
# EXPECT: Total = 100000
Total ≈ 100000

# CASE  : not-equal U+2260 -> '!='
# EXPECT: PIN != 0000
PIN ≠ 0000

# CASE  : <= and >= U+2264/2265
# EXPECT: Age >= 18 and bal <= 50
Age ≥ 18 and bal ≤ 50

# CASE  : plus-minus U+00B1
# EXPECT: 5000 +/- 100
5000 ± 100

# CASE  : multiplication sign U+00D7
# EXPECT: 2 x 3 = 6
2 × 3 = 6

# CASE  : division sign U+00F7
# EXPECT: 10 / 2
10 ÷ 2

# CASE  : degree sign U+00B0
# EXPECT: Joto 25 degC leo
Joto 25°C leo

# CASE  : superscript two U+00B2
# EXPECT: Area 5m2
Area 5m²

# CASE  : vulgar fraction half U+00BD
# EXPECT: Pay 1/2 sasa
Pay ½ sasa

# CASE  : infinity U+221E
# EXPECT: Data inf bure
Data ∞ bure

# CASE  : micro sign U+00B5
# EXPECT: 10us delay
10µs delay

# CASE  : star U+2605 + check U+2713
# EXPECT: * VIP OK verified
★ VIP ✓ verified

# CASE  : ballot x U+2717
# EXPECT: Status x failed
Status ✗ failed

# CASE  : legacy smiley U+263A
# EXPECT: Karibu :)
Karibu ☺

# CASE  : heavy heart U+2764 (BMP)
# EXPECT: We <3 Briq
We ❤ Briq

# CASE  : OE ligature U+0152/0153
# EXPECT: Coeur OEuvre
Cœur Œuvre

# CASE  : fi ligature U+FB01
# EXPECT: Confirm now
Conﬁrm now

# CASE  : d-stroke / l-stroke U+0111/U+0142
# EXPECT: dido lukasz
đido łukasz

# CASE  : thorn / eth U+00FE/U+00F0
# EXPECT: thord name
þorð name

# CASE  : Turkish dotless i / dotted I U+0131/U+0130
# EXPECT: Istanbul kiz
İstanbul kız

# CASE  : Cyrillic capital homoglyphs (whole-word spoof)
# EXPECT: COCA
СОСА

# CASE  : Cyrillic 'Apple' spoof U+0410
# EXPECT: Apple Pay
Аpple Pay

# CASE  : Cyrillic lowercase spoof in 'scope'
# EXPECT: scope
sсope

# CASE  : Greek capital homoglyphs 'BETA'
# EXPECT: BETA
ΒΕΤΑ

# CASE  : Greek small omicron/nu U+03BF/03BD
# EXPECT: logv
lοgν

# CASE  : fullwidth latin uppercase U+FF21+
# EXPECT: HELLO
ＨＥＬＬＯ

# CASE  : fullwidth digits U+FF10+
# EXPECT: OTP 123456
OTP １２３４５６

# CASE  : fullwidth punctuation
# EXPECT: Halo! Balance: 5000?
Halo！ Balance： 5000？

# CASE  : uppercase accent traps U+00C0..U+00D9 -> AEIOU
# EXPECT: AEIOU
ÀÈÌÒÙ

# CASE  : macron a (non-GSM accent) U+0101 -> a
# EXPECT: Mwanza
Mwānzā

# CASE  : a with ring/caron etc -> base
# EXPECT: csz test
čšž test

# CASE  : literal triple hyphen collapse
# EXPECT: Section A -- Section B
Section A --- Section B

# CASE  : many spaces collapse
# EXPECT: Lipa leo sasa
Lipa     leo     sasa

# CASE  : nbsp + bullet + emdash combo
# EXPECT: - Lipa TZS 5 000-leo...
• Lipa TZS 5 000—leo…

# CASE  : guillemets U+00AB/00BB as quotes
# EXPECT: "Karibu" Briq
«Karibu» Briq

# CASE  : single angle quotes U+2039/203A
# EXPECT: <OTP> 4471
‹OTP› 4471

# CASE  : CJK corner bracket quotes U+300C/300D
# EXPECT: "OTP" ni 4471
「OTP」 ni 4471

# CASE  : lenticular brackets U+3010/3011
# EXPECT: [Briq] Alert
【Briq】 Alert

# CASE  : broken bar U+00A6 -> pipe
# EXPECT: A|B
A¦B

# CASE  : reference mark U+203B -> asterisk
# EXPECT: * Masharti yanatumika
※ Masharti yanatumika

# CASE  : double exclamation U+203C
# EXPECT: HARAKA!!
HARAKA‼

# CASE  : emoji (non-BMP) stripped U+1F600
# EXPECT: Karibu Briq
Karibu 😀 Briq

# CASE  : math bold alnum (non-BMP) stripped U+1D400
# EXPECT:  code
𝐀𝐁𝐂 code

# CASE  : rocket emoji marketing stripped U+1F680
# EXPECT: Launch today
Launch 🚀 today

# CASE  : flag emoji (non-BMP pair) stripped
# EXPECT: Tanzania 
Tanzania 🇹🇿

# CASE  : mixed real-world loan reminder
# EXPECT: Mteja's loan-TZS 50 000-due 5-9 Jun... Lipa - leo
Mteja’s loan—TZS 50 000—due 5–9 Jun… Lipa • leo
