Learnwares Reuse#

Learnware Reuser is a Python API that offers a variety of convenient tools for learnware reuse. Users can reuse a single learnware, combination of multiple learnwares, and heterogeneous learnwares using these tools efficiently, thereby saving the laborious time and effort of building models from scratch. There are mainly two types of reuse tools, based on whether user has gathered a small amount of labeled data beforehand: (1) direct reuse and (2) customized reuse based on labeled data.

Note

For detailed explanations of the learnware reusers mentioned below, please refer to COMPONENTS: All Reuse Methods .

Homo Reuse#

This part introduces baseline methods for reusing homogeneous learnwares to make predictions on unlabeled data.

Direct reuse of Learnware#

  • JobSelector selects different learnwares for different data by training a job selector classifier. The following code shows how to use it:

from learnware.reuse import JobSelectorReuser

# learnware_list is the list of searched learnware
reuse_job_selector = JobSelectorReuser(learnware_list=learnware_list)

# test_x is the user's data for prediction
# predict_y is the prediction result of the reused learnwares
predict_y = reuse_job_selector.predict(user_data=test_x)
  • AveragingReuser uses an ensemble method to make predictions. The mode parameter specifies the specific ensemble method:

from learnware.reuse import AveragingReuser

# Regression tasks:
#   - mode="mean": average the learnware outputs.
# Classification tasks:
#   - mode="vote_by_label": majority vote for learnware output labels.
#   - mode="vote_by_prob": majority vote for learnware output label probabilities.

reuse_ensemble = AveragingReuser(
    learnware_list=learnware_list, mode="vote_by_label"
)
ensemble_predict_y = reuse_ensemble.predict(user_data=test_x)

Reusing Learnware with Labeled Data#

When users have a small amount of labeled data, they can also adapt/polish the received learnware(s) by reusing them with the labeled data, gaining even better performance.

  • EnsemblePruningReuser selectively ensembles a subset of learnwares to choose the ones that are most suitable for the user’s task:

from learnware.reuse import EnsemblePruningReuser

# mode="regression": Suitable for regression tasks
# mode="classification": Suitable for classification tasks
reuse_ensemble_pruning = EnsemblePruningReuser(
    learnware_list=learnware_list, mode="regression"
)

# (val_X, val_y) is the small amount of labeled data
reuse_ensemble_pruning.fit(val_X, val_y)
predict_y = reuse_job_selector.predict(user_data=test_x)
  • FeatureAugmentReuser helps users reuse learnwares by augmenting features. This reuser regards each received learnware as a feature augmentor, taking its output as a new feature and then build a simple model on the augmented feature set(logistic regression for classification tasks and ridge regression for regression tasks):

from learnware.reuse import FeatureAugmentReuser

# mode="regression": Suitable for regression tasks
# mode="classification": Suitable for classification tasks
reuse_feature_augment = FeatureAugmentReuser(
    learnware_list=learnware_list, mode="regression"
)

# (val_X, val_y) is the small amount of labeled data
reuse_feature_augment.fit(val_X, val_y)
predict_y = reuse_feature_augment.predict(user_data=test_x)

Hetero Reuse#

When heterogeneous learnware search is activated(see WORKFLOWS: Hetero Search), users would receive heterogeneous learnwares which are identified from the whole “specification world”. Though these recommended learnwares are trained from tasks with different feature/label spaces from the user’s task, they can still be helpful and perform well beyond their original purpose. Normally these learnwares are hard to be used, leave alone polished by users, due to the feature/label space heterogeneity. However with the help of HeteroMapAlignLearnware class which align heterogeneous learnware with the user’s task, users can easily reuse them with the same set of reuse methods mentioned above.

During the alignment process of heterogeneous learnware, the statistical specifications of the learnware and the user’s task (user_spec) are used for input space alignment, and a small amount of labeled data (val_x, val_y) is mandatory to be used for output space alignment. This can be done by the following code:

from learnware.reuse import HeteroMapAlignLearnware

# mode="regression": For user tasks of regression
# mode="classification": For user tasks of classification
hetero_learnware = HeteroMapAlignLearnware(learnware=leanrware, mode="regression")
hetero_learnware.align(user_spec, val_x, val_y)

# Make predictions using the aligned heterogeneous learnware
predict_y = hetero_learnware.predict(user_data=test_x)

To reuse multiple heterogeneous learnwares, combine HeteroMapAlignLearnware with the homogeneous reuse methods AveragingReuser and EnsemblePruningReuser mentioned above will do the trick:

hetero_learnware_list = []
for learnware in learnware_list:
    hetero_learnware = HeteroMapAlignLearnware(learnware, mode="regression")
    hetero_learnware.align(user_spec, val_x, val_y)
    hetero_learnware_list.append(hetero_learnware)

# Reuse multiple heterogeneous learnwares using AveragingReuser
reuse_ensemble = AveragingReuser(learnware_list=hetero_learnware_list, mode="mean")
ensemble_predict_y = reuse_ensemble.predict(user_data=test_x)

# Reuse multiple heterogeneous learnwares using EnsemblePruningReuser
reuse_ensemble = EnsemblePruningReuser(
    learnware_list=hetero_learnware_list, mode="regression"
)
reuse_ensemble.fit(val_x, val_y)
ensemble_pruning_predict_y = reuse_ensemble.predict(user_data=test_x)

Reuse with Model Container#

The learnware package provides Model Container to build executive environment for learnwares according to their runtime dependent files. The learnware’s model will be executed in the containers and its env will be installed and uninstalled automatically.

Run the following codes to try run a learnware with Model Container:

from learnware.learnware import Learnware

with LearnwaresContainer(learnware, mode="conda") as env_container: # Let learnware be instance of Learnware Class, and its input shape is (20, 204)
    learnware = env_container.get_learnwares_with_container()[0]
    input_array = np.random.random(size=(20, 204))
    print(learnware.predict(input_array))

The mode parameter has two options, each for a specific learnware environment loading method:

  • 'conda': Install a separate conda virtual environment for each learnware (automatically deleted after execution); run each learnware independently within its virtual environment.

  • 'docker': Install a conda virtual environment inside a Docker container (automatically destroyed after execution); run each learnware independently within the container (requires Docker privileges).

Note

It’s important to note that the “conda” modes are not secure if there are any malicious learnwares. If the user cannot guarantee the security of the learnware they want to load, it’s recommended to use the “docker” mode to load the learnware.