% Preprocessor for pbu-Arab
% Normalisation of common orthographic variants and simple orthographic repairs.
% Rules use Epitran rewrite rule syntax: a -> b / X _ Y
% Comments begin with % and blank lines are allowed.

% ------------------------
% Normalise Arabic/Persian variants to Pashto canonical letters
% ------------------------
ك -> ک / _

ى -> ی / _
گ -> ګ / _
ہ -> ه / _
ة -> ه / _
ۀ ->ۀ / _    % keep the special final heh-with-ye-above as-is (handled in map)

% ------------------------
% Remove typographic noise
% ------------------------
ـ -> 0 / _        % tatweel
% If your files include ZERO WIDTH NON-JOINER (U+200C) add a rule to remove it.
% (You may need to insert the character directly into this file.)

% ------------------------
% Gemination (shadda) expansion
% For orthographic sequences C + U+0651, duplicate the consonant before mapping.
% Because Epitran rewrite syntax does not guarantee a single universal backreference form
% across all environments, we expand common consonants explicitly.
% ------------------------
پّ -> پپ / _
بّ -> بب / _
تّ -> تت / _
ټّ -> ټټ / _
ثّ -> ثث / _
جّ -> جج / _
چّ -> چچ / _
څّ -> څڅ / _
ځّ -> ځځ / _
دّ -> دد / _
ډّ -> ډډ / _
رّ -> رر / _
ړّ -> ړړ / _
زّ -> زز / _
ژّ -> ژژ / _
سّ -> سس / _
شّ -> شش / _
صّ -> صص / _
ضّ -> ضض / _
طّ -> طط / _
ظّ -> ظظ / _
فّ -> فف / _
قّ -> قق / _
کّ -> کک / _
ګّ -> ګګ / _
لّ -> لل / _
مّ -> مم / _
نّ -> نن / _
غّ -> غغ / _
خّ -> خخ / _
حّ -> حج? / _  % keep as حج? — if you prefer, map to حجح
هّ -> هه / _
عّ -> عع / _
ءّ -> ءء / _

% ------------------------
% Simple orthographic repairs (common multi-letter sequences -> single canonical forms)
% ------------------------
ؤ -> و / _
ئ -> ی / _

% ------------------------
% Handle و as consonant in specific contexts
% ------------------------
% Convert و to w when between consonants (like in ژوند)
و -> w / د͡ʒ _ ن

% End of preprocessor