API Reference

This section provides detailed information about the classes and functions provided by Zink.

Main Module (zink.zink)

zink.zink.prep(text, words)[source]

Prepares text for redaction by wrapping specified words in asterisks. These words will be excluded from redaction.

zink.zink.redact(text, categories=None, placeholder=None, use_cache=True, use_json_mapping=True, extractor=None, merger=None, replacer=None, auto_parallel=False, chunk_size=1000, max_workers=4, numbered_entities=False)[source]

Module-level convenience function that uses a global instance for caching. If 'auto_parallel' is True and len(text) > chunk_size, concurrency-based pipeline is used. Otherwise single-pass logic is used.

zink.zink.refresh_mapping_file()[source]

Deletes the persistent mapping file if it exists.

zink.zink.replace(text, categories=None, user_replacements=None, ensure_consistency=True, use_cache=True, use_json_mapping=True, extractor=None, merger=None, replacer=None, auto_parallel=False, chunk_size=1000, max_workers=4)[source]

Module-level convenience function that uses a global instance for caching.

zink.zink.replace_with_my_data(text, categories=None, user_replacements=None, ensure_consistency=True, use_json_mapping=True, extractor=None, merger=None, replacer=None, auto_parallel=False, chunk_size=1000, max_workers=4)[source]

Module-level convenience function. Typically 'replace_with_my_data' does NOT rely on caching, but we might still want concurrency for large texts if 'auto_parallel' is True.

zink.zink.shield(target_arg, labels=None, **zink_kwargs)[source]

A decorator that provides a full anonymization/re-identification "shield" for a function call.

It anonymizes a specific input argument, calls the decorated function, and then automatically re-identifies the function's string output.

Parameters:
  • target_arg (str or int) -- The name (str) or position (int) of the input argument to anonymize.

  • labels (tuple or list) -- The entity labels to anonymize. Required.

  • **zink_kwargs -- Additional keyword arguments for the underlying zn.redact function.

zink.zink.where_mapping_file()[source]

Returns the path to the persistent mapping file.

Extractor Module (zink.extractor)

class zink.extractor.EntityExtractor(model_name='deepanwa/NuNerZero_onnx')[source]

Bases: object

predict(text, labels=None, max_passes=2)[source]

Iteratively finds entities by masking found entities and re-running the model.

Parameters:
  • text (str) -- The input text.

  • labels (list of str, optional) -- Entity labels to predict. Defaults to None.

  • max_passes (int) -- A safeguard to prevent potential infinite loops.

Returns:

A list of all unique entities found across all passes.

Return type:

list of dict

predict2(text, labels=None)[source]

Performs a highly detailed entity extraction using a MEMORY-EFFICIENT THREAD-BASED parallel approach.

predict_thorough(text, labels=None)[source]

Performs a highly detailed entity extraction using a hybrid approach. For each chunk of labels, it runs a two-pass process with internal, temporary masking to find a comprehensive set of entities. It then resolves overlaps between all found entities across all chunks by keeping the label with the highest confidence score for each unique text span.

Parameters:
  • text (str) -- The input text.

  • labels (list of str, optional) -- Entity labels to predict.

Returns:

A list of the highest-confidence entities for each unique span.

Return type:

list of dict

Merger Module (zink.merger)

class zink.merger.EntityMerger[source]

Bases: object

Merges entities based on their labels and positions in the text. This class is designed to handle entities that are close together or have the same label, merging them into a single entity when appropriate.

merge(entities, text)[source]

Result Module (zink.result)

class zink.result.PseudonymizationResult(original_text: str, anonymized_text: str, replacements: ~typing.List[~zink.result.ReplacementDetail] = <factory>, features: ~typing.Dict = <factory>)[source]

Bases: object

Result of the pseudonymization process.

anonymized_text: str
features: Dict
original_text: str
replacements: List[ReplacementDetail]
class zink.result.ReplacementDetail(label: str, original: str, pseudonym: str, start: int, end: int, score: float)[source]

Bases: object

Details about the replacement of a sensitive entity.

end: int
label: str
original: str
pseudonym: str
score: float
start: int

Replacer Subpackage (zink.replacer)

class zink.replacer.EntityReplacer(use_json_mapping=False)[source]
replace_entities(entities, text, user_replacements=None)[source]

Replace entities in the text with pseudonyms, with randomized replacements.

Parameters:
  • entities (list of dict) -- A list of dictionaries, each containing 'start', 'end', 'label', and 'text'.

  • text (str) -- The original text.

  • user_replacements (dict,) -- A dictionary of user-defined replacements for specific entity labels. If provided, these will override the JSON-based mappings.

Returns:

The text with entities replaced by pseudonyms.

Return type:

str

replace_entities_ensure_consistency(entities, text, user_replacements=None)[source]

Replace entities in the text with pseudonyms, ensuring consistent replacements.

Parameters:
  • entities (list of dict) -- A list of dictionaries, each containing 'start', 'end', 'label', and 'text'.

  • text (str) -- The original text.

  • user_replacements (dict,) -- A dictionary of user-defined replacements for specific entity labels. If provided, these will override the JSON-based mappings.

Returns:

The text with entities replaced by pseudonyms.

Return type:

str

This subpackage provides various replacement strategies. It is used internally by the main zink.replace function.