anonymity.tools package#

Subpackages#

Module contents#

anonymity.tools.data_fly(table: DataFrame, ident: List | ndarray, qi: List | ndarray, k: int, supp_threshold: int, hierarchies: dict = {}) DataFrame#

Data-fly generalization algorithm for k-anonymity.

Parameters:
  • table (pandas dataframe) – dataframe with the data under study.

  • ident (list of strings) – list with the name of the columns of the dataframe. that are identifiers.

  • qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.

  • k (int) – desired level of k-anonymity.

  • supp_threshold (int) – level of suppression allowed.

  • hierarchies (dictionary) – hierarchies for generalization of columns.

Returns:

anonymized table.

Return type:

pandas dataframe

anonymity.tools.incognito(table: DataFrame, ident: List | ndarray, qi: List | ndarray, k: int, supp_threshold: int, hierarchies: dict) DataFrame#

Incognito generalization algorithm for k-anonymity.

Parameters:
  • table (pandas dataframe) – dataframe with the data under study.

  • ident (list of strings) – list with the name of the columns of the dataframe. that are identifiers.

  • qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.

  • k (int) – desired level of k-anonymity.

  • supp_threshold (int) – level of suppression allowed.

  • hierarchies (dictionary) – hierarchies for generalization of columns.

Returns:

anonymized table.

Return type:

pandas dataframe

anonymity.tools.k_anonymity(table: DataFrame, hierarchies: dict, k: int, qi: List | ndarray, supp_threshold: int, ident: List | ndarray, method: str) DataFrame#

Generalization algorithm for k-anonymity. Applies data-fly for default in case we don’t specify correctly.

Parameters:
  • table (pandas dataframe) – dataframe with the data under study.

  • ident (list of strings) – list with the name of the columns of the dataframe. that are identifiers.

  • qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.

  • k (int) – desired level of k-anonymity.

  • supp_threshold (int) – level of suppression allowed.

  • hierarchies (dictionary) – hierarchies for generalization of columns.

  • method (string) – name of the anonymization method that we want to use.

Returns:

anonymized table.

Return type:

pandas dataframe

anonymity.tools.l_diversity(table: DataFrame, sa: List | ndarray, qi: List | ndarray, k_method: str, l: int, ident: List | ndarray, supp_threshold: int, hierarchies: dict, k: int) DataFrame#

Apply l-diversity to an anonymized dataset.

Parameters:
  • table (pandas dataframe) – dataframe with the data under study.

  • sa (list of strings) – list with the name of the columns of the dataframe. that are sensitive attributes.

  • ident (list of strings) – list with the name of the columns of the dataframe. that are identifiers.

  • qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.

  • k (int) – desired level of k-anonymity.

  • k_method (string) – desired algorithm for anonymization.

  • l (int) – desired level of l-diversity.

  • supp_threshold (int) – level of suppression allowed.

  • hierarchies (dictionary) – hierarchies for generalization of columns.

Returns:

returns a list containing the value of l-diversity of the new table and the

anonymized table that satisfies l-diversity. :rtype: list

anonymity.tools.t_closeness(table: DataFrame, sa: List | ndarray, qi: List | ndarray, t: float, k_method: str, ident: List | ndarray, supp_threshold: int, hierarchies: dict) DataFrame#

Apply t-closeness to an anonymized dataset.

Parameters:
  • table (pandas dataframe) – dataframe with the data under study.

  • sa (list of strings) – list with the name of the columns of the dataframe. that are sensitive attributes.

  • qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.

  • t (float) – threshold for t-closeness

  • k_method (string) – string that specifies the type of k-anonymization we want to use

  • ident (list of strings) – list with the name of the columns of the dataframe. that are identifiers.

  • supp_threshold (int) – level of suppression allowed.

  • hierarchies (dictionary) – hierarchies for generalization of columns.

Returns:

list which contains the value of t for the anonymized table, the current table that after applying t-closeness and true or false whether t-closeness is actually satisfied.

Return type:

list

anonymity.tools.t_closeness_supp(table: DataFrame, sa: List | ndarray, qi: List | ndarray, t: float, supp_lim: float = 1) DataFrame#

Apply t-closeness to an anonymized dataset using suppressing up to the established percentage allowed as input.

Parameters:
  • table (pandas dataframe) – dataframe with the data under study.

  • sa (list of strings) – list with the name of the columns of the dataframe. that are sensitive attributes.

  • qi (list of strings) – list with the name of the columns of the dataframe. that are quasi-identifiers.

  • t (float) – threshold for t-closeness

  • supp_lim (float) – percentage of suppressed rows allowed

Returns:

table that covers t-closeness.

Return type:

pandas dataframe