synthetic_aia_mia.fetch_data package

Submodules

synthetic_aia_mia.fetch_data.adult module

Load Adult dataset and manage cross validation.

synthetic_aia_mia.fetch_data.adult.load(sensitive=[], k=0)[source]

Download if necessary folktables adult. Split and return train and test.

Parameters:
  • sensitive (list of str) – (Optional default=[]) List of sensitive attributes to include in the features. The sensitive attribute are “sex” and “race”.

  • k (int) – (Optinal default=0) Corss validation step in {0,1,2,3,4}.

Returns:

Train and test split dataframes in a dictionary.

Return type:

Dictionary

synthetic_aia_mia.fetch_data.split module

Split data into train / test using 5 folding corss validation.

synthetic_aia_mia.fetch_data.split.split_numpy(data, k=0)[source]

5-folding of dataset dictionary of numpy array.

Parameters:
  • data (Dictionary) – Dataset where each key maps to a numpy array.

  • k (int) – (Optional) Indice of the fold, can be 0,1,2,3 or 4.

Returns:

Dataset with train and test.

Return type:

Dictionary

synthetic_aia_mia.fetch_data.split.split_pandas(data, k=0)[source]

5-folding of dataset dictionary of numpy array.

Parameters:
  • data (pandas.dataframe) – Dataset in the form of a dataframe.

  • k (int) – (Optional) Indice of the fold, can be 0,1,2,3 or 4.

Returns:

Dataset with train and test.

Return type:

Dictionary

synthetic_aia_mia.fetch_data.utk module

Downlaod and manages train / test split for UTKFaces dataset.

synthetic_aia_mia.fetch_data.utk.load(sensitive=[], k=0)[source]

Load UTKFaces Dataset. Downloads if data are not available.

Parameters:
  • k (int) – (Optinal default=0) Corss validation step in {0,1,2,3,4}.

  • sensitive (list of str) – (Optional default=[]) List of sensitive attributes to include in the features. The sensitive attribute are “sex” and “race”.

Returns:

Train and test split numpy.ndarray in a dictionary.

Return type:

Doctionary

Module contents

Downloads datasets and splits in train/test.

class synthetic_aia_mia.fetch_data.Dataset[source]

Bases: object

Managing dataset in the high level interface.

load()[source]

Return the dataset loaded into memory.

Returns:

Previously updated dataset.

Return type:

pandas.dataframe for adul or dictionary of numpy.ndarray for utkfaces

update(data)[source]

Update the content of the dataset.