hifis_surveyval.core package

Submodules

hifis_surveyval.core.dispatch module

This module allows discovery and dispatch of analysis functions.

class hifis_surveyval.core.dispatch.Dispatcher(surveyval: hifis_surveyval.hifis_surveyval.HIFISSurveyval, data: hifis_surveyval.data_container.DataContainer)[source]

Bases: object

Provides analysis function module and execution facilities.

The operations are based on a module folder and optionally a list of module names to be given at initialization.

__init__(surveyval: hifis_surveyval.hifis_surveyval.HIFISSurveyval, data: hifis_surveyval.data_container.DataContainer) None[source]

Initialize the Dispatcher.

Args:
surveyval (HIFISSurveyval): Passing HIFISSurveyval object in in

order to pass it through to particular analysis scripts.

discover() None[source]

Discover all potential or selected modules in the module folder.

Iterate over all modules in the module folder (non-recursive) or selected modules only and cache the names of those python (.py) files. Exception: __init__.py is excluded.

load_all_modules() None[source]

Try to load and run all discovered modules.

Make sure to run discover() beforehand. If no modules have been discovered, a warning will be logged. See Also: load_module()

load_module(module_name: str) None[source]

Attempt to load a module given by name.

Exceptions raised from import will be caught and logged as error on the console.

Args:

module_name (str): The name of the module, without the .py ending

Raises:

ImportError: Exception thrown if script could not be loaded. AttributeError: Exception thrown if run method could not be executed.

hifis_surveyval.core.preprocess module

This module starts a preprocessing script, if it exists.

class hifis_surveyval.core.preprocess.Preprocessor[source]

Bases: object

Provides running a preprocessing script.

classmethod preprocess(settings: hifis_surveyval.core.settings.Settings, data: hifis_surveyval.data_container.DataContainer) hifis_surveyval.data_container.DataContainer[source]

Run preprocessing script.

Exceptions raised from import will be caught and logged as error on the console.

Args:

settings (Settings): The settings of the run. data (DataContainer): The data to preprocess.

Raises:

ImportError: Exception thrown if script could not be loaded. AttributeError: Exception thrown if run method could not be executed.

hifis_surveyval.core.settings module

This module handles settings.

It provides: * settings classes * getter for settings * an export function to create a file

class hifis_surveyval.core.settings.FileSettings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, PREPROCESSING_FILENAME: pathlib.Path = PosixPath('preprocess.py'), METADATA: pathlib.Path = PosixPath('metadata/meta.yml'), SCRIPT_FOLDER: pathlib.Path = PosixPath('scripts'), SCRIPT_NAMES: List[str] = [], OUTPUT_FORMAT: hifis_surveyval.plotting.supported_output_format.SupportedOutputFormat = SupportedOutputFormat.SCREEN, OUTPUT_FOLDER: pathlib.Path = PosixPath('output'), ID_COLUMN_NAME: str = 'id')[source]

Bases: pydantic.env_settings.BaseSettings

Settings, that the user can change.

class Config[source]

Bases: object

Subclass for specification.

See https://pydantic-docs.helpmanual.io/usage/model_config/ for details.

case_sensitive = True
ID_COLUMN_NAME: str
METADATA: pathlib.Path
OUTPUT_FOLDER: pathlib.Path
OUTPUT_FORMAT: hifis_surveyval.plotting.supported_output_format.SupportedOutputFormat
PREPROCESSING_FILENAME: pathlib.Path
SCRIPT_FOLDER: pathlib.Path
SCRIPT_NAMES: List[str]
classmethod validate_preprocessing_script(to_validate: str) pathlib.Path[source]

Assure, that preprocessing script is a Python file.

Args:
to_validate (str):

Preprocessing script path as string to be validated.

Returns:

Path: Path to the preprocessing script.

class hifis_surveyval.core.settings.Settings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, PREPROCESSING_FILENAME: pathlib.Path = PosixPath('preprocess.py'), METADATA: pathlib.Path = PosixPath('metadata/meta.yml'), SCRIPT_FOLDER: pathlib.Path = PosixPath('scripts'), SCRIPT_NAMES: List[str] = [], OUTPUT_FORMAT: hifis_surveyval.plotting.supported_output_format.SupportedOutputFormat = SupportedOutputFormat.SCREEN, OUTPUT_FOLDER: pathlib.Path = PosixPath('output'), ID_COLUMN_NAME: str = 'id', CONFIG_FILENAME: pathlib.Path = PosixPath('hifis-surveyval.yml'), VERBOSITY: int = 0, RUN_TIMESTAMP: str = None, ANALYSIS_OUTPUT_PATH: pathlib.Path = None, TRUE_VALUES: Set[str] = {'1', 'On', 'True', 'Y', 'Yes'}, FALSE_VALUES: Set[str] = {'0', 'False', 'N', 'No', 'Off'})[source]

Bases: hifis_surveyval.core.settings.SystemSettings, hifis_surveyval.core.settings.FileSettings

Merge two sub setting types.

ANALYSIS_OUTPUT_PATH: pathlib.Path
CONFIG_FILENAME: pathlib.Path
FALSE_VALUES: Set[str]

A set of strings to be interpreted as boolean ‘False’ when parsing the input data.

RUN_TIMESTAMP: str
TRUE_VALUES: Set[str]

A set of strings to be interpreted as boolean ‘True’ when parsing the input data.

VERBOSITY: int
create_default_config_file() None[source]

Create a file to store the config.

load_config_file() None[source]

Return an instance of Settings.

set_verbosity(verbose_count: int) None[source]

Interpret the verbosity option count.

Set the log levels accordingly. The used log level is also stored in the settings.

Args:

verbose_count (int): The amount of verbose option triggers.

class hifis_surveyval.core.settings.SystemSettings(_env_file: Optional[Union[pathlib.Path, str]] = '<object object>', _env_file_encoding: Optional[str] = None, _secrets_dir: Optional[Union[pathlib.Path, str]] = None, *, CONFIG_FILENAME: pathlib.Path = PosixPath('hifis-surveyval.yml'), VERBOSITY: int = 0, RUN_TIMESTAMP: str = None, ANALYSIS_OUTPUT_PATH: pathlib.Path = None, TRUE_VALUES: Set[str] = {'1', 'On', 'True', 'Y', 'Yes'}, FALSE_VALUES: Set[str] = {'0', 'False', 'N', 'No', 'Off'})[source]

Bases: pydantic.env_settings.BaseSettings

Settings, that are not loaded from file.

ANALYSIS_OUTPUT_PATH: pathlib.Path
CONFIG_FILENAME: pathlib.Path
FALSE_VALUES: Set[str]

A set of strings to be interpreted as boolean ‘False’ when parsing the input data.

RUN_TIMESTAMP: str
TRUE_VALUES: Set[str]

A set of strings to be interpreted as boolean ‘True’ when parsing the input data.

VERBOSITY: int
classmethod assemble_output_path(to_validate: str, values: Dict[str, Any]) pathlib.Path[source]

Assemble path from user settings and datetime.

Args:
to_validate (str):

Analysis output path as string to be validated.

values (Dict[str, Any]):

Parts of the analysis output path to be concatenated as an absolute path.

Returns:

Path: Path to the output folder of the an analysis run.

classmethod case_insensitive_values(to_validate: Set[str]) Set[source]

Extend list of values to match all cases.

Args:

to_validate (str): Analysis output path as string to be validated.

Returns:
Set: Set of false and true values accepted as boolean values in

the data.

classmethod set_timestamp(to_validate: str) str[source]

Get the current datetime.

Args:

to_validate (str): Date-time string to be validated.

Returns:

str: Date-time string in a specific format.

hifis_surveyval.core.util module

This module provides helper functions.

hifis_surveyval.core.util.create_example_script(settings: hifis_surveyval.core.settings.Settings) None[source]

Create an example script from data payload at the default script location.

Args:
settings (Settings):

Settings of the analysis run.

hifis_surveyval.core.util.create_preprocessing_script(settings: hifis_surveyval.core.settings.Settings) None[source]

Create an empty preprocessing script at the default location.

Args:
settings (Settings):

Settings of the analysis run.

hifis_surveyval.core.util.cross_reference_sum(data: <MagicMock id='140236685928048'>, grouping: <MagicMock id='140236685976336'>) <MagicMock id=’140236685928048’>[source]

Cross references a data frame with a series and count correlations.

The data frame is processed column-wise. For each column, indices are grouped up by their respective value in the grouping series and each group is summed up.

Columns with incomplete data or rows that can not be cross-referenced may be dropped.

In the context of the survey analysis, data usually is a multiple choice question, while the grouping series is a single choice question. They get matched by the participant IDs and the correlations get summed up.

Args:
data (DataFrame):

A data frame of which the columns are to be grouped and summed up.

grouping (Series):

A series with indices (mostly) matching that of “data”, associating each index with a group towards which the values of “data” are to be counted.

Returns:
DataFrame:

A data frame containing the columns from data (minus dropped columns) and the unique values of the grouping series as indices. Each cell at [column, index] holds the sum of the values in the respective column of the data which corresponded to the index in the grouping series.

hifis_surveyval.core.util.dataframe_value_counts(dataframe: <MagicMock id='140236685928048'>, relative_values: bool = False, drop_nans: bool = True) <MagicMock id=’140236685928048’>[source]

Count how often a unique value appears in each column of a data frame.

Args:
dataframe (DataFrame):

The data frame of which the values shall be counted.

relative_values (bool):

Instead of absolute counts fill the cells with their relative contribution to the column total

drop_nans (bool):

Whether to remove the NaN value count. Defaults to True

Returns:
DataFrame:

A new data frame with the same columns as the input. The index is changed to represent the unique values and the cells contain the count of the unique values in the given column.

hifis_surveyval.core.util.filter_and_group_series(base_data: <MagicMock id='140236685976336'>, group_by: <MagicMock id='140236685976336'>, min_value: Optional[float] = None, max_value: Optional[float] = None) <MagicMock id=’140236685928048’>[source]

Filter a series and group its values according to another series.

Generate a sparse DataFrame in which all values of base_data are assigned to a column according to the corresponding value for the same index in group_by.

Indexes not present in group_by will result in an empty row. Indexes not present in base_data will result in an empty column.

Args:
base_data (Series):

The series of which the data is to be sorted and filtered.

group_by (Series):

A series assigning each index to a group.

min_value (Optional[float]):

An optional minimum value. All values of base_data below this value will be excluded from the result. Not set by default.

max_value (Optional[float]):

An optional maximum value. All values of base_data above this value will be excluded from the result. Not set by default.

Returns:
DataFrame:

A new DataFrame where each row represents an index of base_data and each column is one of the unique values of the group_by series. The values of base_data are put into the column where the base_data index matches the group_by index.

Module contents

This package provides core functionalities.