fedsim.data_manager package#

Submodules#

fedsim.data_manager.basic_data_manager module#

class fedsim.data_manager.basic_data_manager.BasicDataManager(root, dataset='mnist', num_partitions=500, rule='iid', sample_balance=0.0, label_balance=1.0, local_test_portion=0.0, seed=10, save_path=None, *args, **kwargs)[source]#

Bases: fedsim.data_manager.data_manager.DataManager

A basic data manager for partitioning the data. Currecntly three rules of partitioning are supported:

  • iid:

    same label distribution among clients. sample balance determines quota of each client samples from a lognorm distribution.

  • dir:

    Dirichlete distribution with concentration parameter given by label_balance determines label balance of each client. sample balance determines quota of each client samples from a lognorm distribution.

  • exclusive:

    samples corresponding to each label are randomly splitted to k clients where k = total_sample_size * label_balance. sample_balance determines the way this split happens (quota). This rule also is know as “shards splitting”.

Parameters
  • root (str) – root dir of the dataset to partition

  • dataset (str) – name of the dataset

  • num_clients (int) – number of partitions or clients

  • rule (str) – rule of partitioning

  • sample_balance (float) – balance of number of samples among clients

  • label_balance (float) – balance of the labels on each clietns

  • local_test_portion (float) – portion of local test set from trian

  • seed (int) – random seed of partitioning

  • save_path (str, optional) – path to save partitioned indices.

get_identifiers()[source]#

Returns identifiers to be used for saving the partition info.

Raises

NotImplementedError – this abstract method should be implemented by child classes

Returns

Sequence[str] – a sequence of str identifing class instance

make_datasets(root, global_transforms=None)[source]#
makes and returns local and global dataset objects. The local

datasets do not need a transform as recompiled datasets from indices already use transforms as they are requested.

Parameters
  • dataset_name (str) – name of the dataset.

  • root (str) – directory to download and manipulate data.

  • global_transforms (Dict[str, object]) – transforms for global dset

Raises

NotImplementedError – this abstract method should be implemented by child classes

Returns

Tuple[object, object] – local and global dataset

make_transforms()[source]#

makes and returns train and inference transforms.

Raises

NotImplementedError – this abstract method should be implemented by child classes

Returns

Tuple[object, object] – train and inference transforms

partition_local_data(dataset)[source]#

partitions local data indices into client index Iterable.

Parameters

dataset (object) – local dataset

Raises

NotImplementedError – this abstract method should be implemented by child classes

Returns

Dict[str, Iterable[Iterable[int]]] – {‘train’: tr_indices, ‘test’: ts_indices}

fedsim.data_manager.data_manager module#

class fedsim.data_manager.data_manager.DataManager(root, seed, save_dir=None, *args, **kwargs)[source]#

Bases: object

DataManager base class. Any other Data Manager is inherited from this class. There are four abstract class methods that child classes should implement: get_identifiers, make_datasets, make_transforms, partition_local_data.

Parameters
  • root (str) – root dir of the dataset to partition

  • seed (int) – random seed of partitioning

  • save_path (str, optional) – path to save partitioned indices.

get_global_dataset() Dict[str, torch.utils.data.dataset.Dataset][source]#
get_group_dataset(ids: Iterable[int]) Dict[str, torch.utils.data.dataset.Dataset][source]#
get_identifiers() Sequence[str][source]#

Returns identifiers to be used for saving the partition info.

Raises

NotImplementedError – this abstract method should be implemented by child classes

Returns

Sequence[str] – a sequence of str identifing class instance

get_local_dataset(id: int) Dict[str, torch.utils.data.dataset.Dataset][source]#
get_oracle_dataset() Dict[str, torch.utils.data.dataset.Dataset][source]#
get_partitioning_name() str[source]#
make_datasets(root: str, global_transforms: Dict[str, object]) Tuple[object, object][source]#
makes and returns local and global dataset objects. The local

datasets do not need a transform as recompiled datasets from indices already use transforms as they are requested.

Parameters
  • dataset_name (str) – name of the dataset.

  • root (str) – directory to download and manipulate data.

  • global_transforms (Dict[str, object]) – transforms for global dset

Raises

NotImplementedError – this abstract method should be implemented by child classes

Returns

Tuple[object, object] – local and global dataset

make_transforms() Tuple[object, object][source]#

makes and returns train and inference transforms.

Raises

NotImplementedError – this abstract method should be implemented by child classes

Returns

Tuple[object, object] – train and inference transforms

partition_local_data(dataset: object) Dict[str, Iterable[Iterable[int]]][source]#

partitions local data indices into client index Iterable.

Parameters

dataset (object) – local dataset

Raises

NotImplementedError – this abstract method should be implemented by child classes

Returns

Dict[str, Iterable[Iterable[int]]] – {‘train’: tr_indices, ‘test’: ts_indices}

fedsim.data_manager.utils module#

class fedsim.data_manager.utils.Subset(dataset, indices, transform=None)[source]#

Bases: torch.utils.data.dataset.Dataset

Subset of a dataset at specified indices.

Parameters
  • dataset (Dataset) – The whole Dataset

  • indices (sequence) – Indices in the whole set selected for subset.

Module contents#