fedsim.data_manager package#
Submodules#
fedsim.data_manager.basic_data_manager module#
- class fedsim.data_manager.basic_data_manager.BasicDataManager(root, dataset='mnist', num_partitions=500, rule='iid', sample_balance=0.0, label_balance=1.0, local_test_portion=0.0, seed=10, save_path=None, *args, **kwargs)[source]#
Bases:
fedsim.data_manager.data_manager.DataManager
A basic data manager for partitioning the data. Currecntly three rules of partitioning are supported:
- iid:
same label distribution among clients. sample balance determines quota of each client samples from a lognorm distribution.
- dir:
Dirichlete distribution with concentration parameter given by label_balance determines label balance of each client. sample balance determines quota of each client samples from a lognorm distribution.
- exclusive:
samples corresponding to each label are randomly splitted to k clients where k = total_sample_size * label_balance. sample_balance determines the way this split happens (quota). This rule also is know as “shards splitting”.
- Parameters
root (str) – root dir of the dataset to partition
dataset (str) – name of the dataset
num_clients (int) – number of partitions or clients
rule (str) – rule of partitioning
sample_balance (float) – balance of number of samples among clients
label_balance (float) – balance of the labels on each clietns
local_test_portion (float) – portion of local test set from trian
seed (int) – random seed of partitioning
save_path (str, optional) – path to save partitioned indices.
- get_identifiers()[source]#
Returns identifiers to be used for saving the partition info.
- Raises
NotImplementedError – this abstract method should be implemented by child classes
- Returns
Sequence[str] – a sequence of str identifing class instance
- make_datasets(root, global_transforms=None)[source]#
- makes and returns local and global dataset objects. The local
datasets do not need a transform as recompiled datasets from indices already use transforms as they are requested.
- Parameters
dataset_name (str) – name of the dataset.
root (str) – directory to download and manipulate data.
global_transforms (Dict[str, object]) – transforms for global dset
- Raises
NotImplementedError – this abstract method should be implemented by child classes
- Returns
Tuple[object, object] – local and global dataset
- make_transforms()[source]#
makes and returns train and inference transforms.
- Raises
NotImplementedError – this abstract method should be implemented by child classes
- Returns
Tuple[object, object] – train and inference transforms
- partition_local_data(dataset)[source]#
partitions local data indices into client index Iterable.
- Parameters
dataset (object) – local dataset
- Raises
NotImplementedError – this abstract method should be implemented by child classes
- Returns
Dict[str, Iterable[Iterable[int]]] – {‘train’: tr_indices, ‘test’: ts_indices}
fedsim.data_manager.data_manager module#
- class fedsim.data_manager.data_manager.DataManager(root, seed, save_dir=None, *args, **kwargs)[source]#
Bases:
object
DataManager base class. Any other Data Manager is inherited from this class. There are four abstract class methods that child classes should implement: get_identifiers, make_datasets, make_transforms, partition_local_data.
- Parameters
root (str) – root dir of the dataset to partition
seed (int) – random seed of partitioning
save_path (str, optional) – path to save partitioned indices.
- get_identifiers() Sequence[str] [source]#
Returns identifiers to be used for saving the partition info.
- Raises
NotImplementedError – this abstract method should be implemented by child classes
- Returns
Sequence[str] – a sequence of str identifing class instance
- make_datasets(root: str, global_transforms: Dict[str, object]) Tuple[object, object] [source]#
- makes and returns local and global dataset objects. The local
datasets do not need a transform as recompiled datasets from indices already use transforms as they are requested.
- Parameters
dataset_name (str) – name of the dataset.
root (str) – directory to download and manipulate data.
global_transforms (Dict[str, object]) – transforms for global dset
- Raises
NotImplementedError – this abstract method should be implemented by child classes
- Returns
Tuple[object, object] – local and global dataset
- make_transforms() Tuple[object, object] [source]#
makes and returns train and inference transforms.
- Raises
NotImplementedError – this abstract method should be implemented by child classes
- Returns
Tuple[object, object] – train and inference transforms
- partition_local_data(dataset: object) Dict[str, Iterable[Iterable[int]]] [source]#
partitions local data indices into client index Iterable.
- Parameters
dataset (object) – local dataset
- Raises
NotImplementedError – this abstract method should be implemented by child classes
- Returns
Dict[str, Iterable[Iterable[int]]] – {‘train’: tr_indices, ‘test’: ts_indices}