MEDfl.NetManager package

Submodules

MEDfl.NetManager.dataset module

class MEDfl.NetManager.dataset.DataSet(name: str, path: str, engine=None)[source]

Bases: object

__init__(name: str, path: str, engine=None)[source]

Initialize a DataSet object.

Parameters:
  • name (str) – The name of the dataset.

  • path (str) – The file path of the dataset CSV file.

delete_dataset()[source]

Delete the dataset from the database.

Notes: - Assumes the dataset name is unique in the ‘DataSets’ table.

static list_alldatasets(engine)[source]

List all dataset names from the ‘DataSets’ table.

Returns:

A DataFrame containing the names of all datasets in the ‘DataSets’ table.

Return type:

pd.DataFrame

update_data()[source]

Update the data in the dataset.

Not implemented yet.

upload_dataset(NodeId=-1)[source]

Upload the dataset to the database.

Parameters:

NodeId (int) – The NodeId associated with the dataset.

Notes: - Assumes the file at self.path is a valid CSV file. - The dataset is uploaded to the ‘DataSets’ table in the database.

validate()[source]

Validate name and path attributes.

Raises:

TypeError – If name or path is not a string.

MEDfl.NetManager.flsetup module

class MEDfl.NetManager.flsetup.FLsetup(name: str, description: str, network: Network)[source]

Bases: object

__init__(name: str, description: str, network: Network)[source]

Initialize a Federated Learning (FL) setup.

Parameters:
  • name (str) – The name of the FL setup.

  • description (str) – A description of the FL setup.

  • network (Network) – An instance of the Network class representing the network architecture.

create()[source]

Create an FL setup.

create_dataloader_from_node(node: Node, output, fill_strategy='mean', fit_encode=[], to_drop=[], train_batch_size: int = 32, test_batch_size: int = 1, split_frac: float = 0.2, dataset: Optional[Dataset] = None)[source]

Create DataLoader from a Node.

Parameters:
  • node (Node) – The node from which to create DataLoader.

  • train_batch_size (int) – The batch size for training data.

  • test_batch_size (int) – The batch size for test data.

  • split_frac (float) – The fraction of data to be used for training.

  • dataset (Dataset) – The dataset to use. If None, the method will read the dataset from the node.

Returns:

The DataLoader instances for training and testing.

Return type:

DataLoader

create_federated_dataset(output, fill_strategy='mean', fit_encode=[], to_drop=[], val_frac=0.1, test_frac=0.2) FederatedDataset[source]

Create a federated dataset.

Parameters:
  • output (string) – the output feature of the dataset

  • val_frac (float) – The fraction of data to be used for validation.

  • test_frac (float) – The fraction of data to be used for testing.

Returns:

The FederatedDataset instance containing train, validation, and test data.

Return type:

FederatedDataset

create_nodes_from_master_dataset(params_dict: dict)[source]

Create nodes from the master dataset.

Parameters:

params_dict (dict) – A dictionary containing parameters for node creation. - column_name (str): The name of the column in the MasterDataset used to create nodes. - train_nodes (list): A list of node names that will be used for training. - test_nodes (list): A list of node names that will be used for testing.

Returns:

A list of Node instances created from the master dataset.

Return type:

list

delete()[source]

Delete the FL setup.

get_flDataSet()[source]

Retrieve the federated dataset associated with the FL setup using the FL setup’s name.

Returns:

DataFrame containing the federated dataset information.

Return type:

pandas.DataFrame

static list_allsetups()[source]

List all the FL setups.

Returns:

A DataFrame containing information about all the FL setups.

Return type:

DataFrame

classmethod read_setup(FLsetupId: int)[source]

Read the FL setup by FLsetupId.

Parameters:

FLsetupId (int) – The id of the FL setup to read.

Returns:

An instance of the FLsetup class with the specified FLsetupId.

Return type:

FLsetup

validate()[source]

Validate name, description, and network.

MEDfl.NetManager.net_helper module

MEDfl.NetManager.net_helper.get_feddataset_id_from_name(name)[source]

Get the Federated dataset Id from the FedDatasets table based on the federated dataset name.

Parameters:

name (str) – Federated dataset name.

Returns:

FedId or None if not found.

Return type:

int or None

MEDfl.NetManager.net_helper.get_flpipeline_from_name(name)[source]

Get the FLpipeline Id from the FLpipeline table based on the FL pipeline name.

Parameters:

name (str) – FL pipeline name.

Returns:

FLpipelineId or None if not found.

Return type:

int or None

MEDfl.NetManager.net_helper.get_flsetupid_from_name(name)[source]

Get the FLsetupId from the FLsetup table based on the FL setup name.

Parameters:

name (str) – FL setup name.

Returns:

FLsetupId or None if not found.

Return type:

int or None

MEDfl.NetManager.net_helper.get_netid_from_name(name)[source]

Get the Network Id from the Networks table based on the NetName.

Parameters:

name (str) – Network name.

Returns:

NetId or None if not found.

Return type:

int or None

MEDfl.NetManager.net_helper.get_nodeid_from_name(name)[source]

Get the NodeId from the Nodes table based on the NodeName.

Parameters:

name (str) – Node name.

Returns:

NodeId or None if not found.

Return type:

int or None

MEDfl.NetManager.net_helper.is_str(data_df, row, x)[source]

Check if a column in a DataFrame is of type ‘object’ and convert the value accordingly.

Parameters:
  • data_df (pandas.DataFrame) – DataFrame containing the data.

  • row (pandas.Series) – Data row.

  • x (str) – Column name.

Returns:

Processed value based on the column type.

Return type:

str or float

MEDfl.NetManager.net_helper.master_table_exists()[source]

Check if the MasterDataset table exists in the database.

Returns:

True if the table exists, False otherwise.

Return type:

bool

MEDfl.NetManager.net_helper.process_data_after_reading(data, output, fill_strategy='mean', fit_encode=[], to_drop=[])[source]

Process data after reading from the database, including encoding, dropping columns, and creating a PyTorch TensorDataset.

Parameters:
  • data (pandas.DataFrame) – Input data.

  • output (str) – Output column name.

  • fill_strategy (str, optional) – Imputation strategy for missing values. Default is “mean”.

  • fit_encode (list, optional) – List of columns to be label-encoded. Default is an empty list.

  • to_drop (list, optional) – List of columns to be dropped from the DataFrame. Default is an empty list.

Returns:

Processed data as a PyTorch TensorDataset.

Return type:

torch.utils.data.TensorDataset

MEDfl.NetManager.net_helper.process_eicu(data_df)[source]

Process eICU data by filling missing values with mean and replacing NaNs with ‘Unknown’.

Parameters:

data_df (pandas.DataFrame) – Input data.

Returns:

Processed data.

Return type:

pandas.DataFrame

MEDfl.NetManager.net_manager_queries module

MEDfl.NetManager.network module

class MEDfl.NetManager.network.Network(name: str = '')[source]

Bases: object

A class representing a network.

name

The name of the network.

Type:

str

mtable_exists

An integer flag indicating whether the MasterDataset table exists (1) or not (0).

Type:

int

__init__(name: str = '')[source]

Initialize a Network instance.

Parameters:

name (str) – The name of the network.

add_node(node: Node)[source]

Add a node to the network.

Parameters:

node (Node) – The node to add.

create_master_dataset(path_to_csv: str = '/home/local/USHERBROOKE/saho6810/MEDfl/code/MEDfl/notebooks/eicu_test.csv')[source]

Create the MasterDataset table and insert dataset values.

Parameters:

path_to_csv – Path to the CSV file containing the dataset.

create_network()[source]

Create a new network in the database.

delete_network()[source]

Delete the network from the database.

static list_allnetworks()[source]

List all networks in the database. :returns: A DataFrame containing information about all networks in the database. :rtype: DataFrame

list_allnodes()[source]

List all nodes in the network.

Parameters:

None

Returns:

A DataFrame containing information about all nodes in the network.

Return type:

DataFrame

update_network(FLsetupId: int)[source]

Update the network’s FLsetupId in the database.

Parameters:

FLsetupId (int) – The FLsetupId to update.

use_network(network_name: str)[source]

Use a network in the database.

Parameters:

network_name (str) – The name of the network to use.

Returns:

An instance of the Network class if the network exists, else None.

Return type:

Network or None

validate()[source]

Validate name

MEDfl.NetManager.node module

class MEDfl.NetManager.node.Node(name: str, train: int, test_fraction: float = 0.2, engine=<sqlalchemy.engine.base.Connection object>)[source]

Bases: object

A class representing a node in the network.

name

The name of the node.

Type:

str

train

An integer flag representing whether the node is used for training (1) or testing (0).

Type:

int

test_fraction

The fraction of data used for testing when train=1. Default is 0.2.

Type:

float, optional

__init__(name: str, train: int, test_fraction: float = 0.2, engine=<sqlalchemy.engine.base.Connection object>)[source]

Initialize a Node instance.

Parameters:
  • name (str) – The name of the node.

  • train (int) – An integer flag representing whether the node is used for training (1) or testing (0).

  • test_fraction (float, optional) – The fraction of data used for testing when train=1. Default is 0.2.

assign_dataset(dataset_name: str)[source]

Assigning existing dataSet to node :param dataset_name: The name of the dataset to assign. :type dataset_name: str

Returns:

None

check_dataset_compatibility(data_df)[source]

Check if the dataset is compatible with the master dataset. :param data_df: The dataset to check. :type data_df: DataFrame

Returns:

None

create_node(NetId: int)[source]

Create a node in the database. :param NetId: The ID of the network to which the node belongs. :type NetId: int

Returns:

None

delete_node()[source]

Delete the node from the database.

get_dataset(column_name: Optional[str] = None)[source]

Get the dataset for the node based on the given column name. :param column_name: The column name to filter the dataset. Default is None. :type column_name: str, optional

Returns:

The dataset associated with the node.

Return type:

DataFrame

list_alldatasets()[source]

List all datasets associated with the node. :returns: A DataFrame containing information about all datasets associated with the node. :rtype: DataFrame

static list_allnodes()[source]

List all nodes in the database. :returns: A DataFrame containing information about all nodes in the database. :rtype: DataFrame

unassign_dataset(dataset_name: str)[source]

unssigning existing dataSet to node :param dataset_name: The name of the dataset to assign. :type dataset_name: str

Returns:

None

update_node()[source]

Update the node information (not implemented).

upload_dataset(dataset_name: str, path_to_csv: str = '/home/local/USHERBROOKE/saho6810/MEDfl/code/MEDfl/notebooks/eicu_test.csv')[source]

Upload the dataset to the database for the node. :param dataset_name: The name of the dataset. :type dataset_name: str :param path_to_csv: Path to the CSV file containing the dataset. Default is the path in params. :type path_to_csv: str, optional

Returns:

None

validate()[source]

Validate name, train, test_fraction

Module contents