Converter¶
Multiple Aspect Trajectory Tools Framework
MAT-data: Data Preprocessing for Multiple Aspect Trajectory Data Mining
The present application offers a tool, to support the user in the classification task of multiple aspect trajectories, specifically for extracting and visualizing the movelets, the parts of the trajectory that better discriminate a class. It integrates into a unique platform the fragmented approaches available for multiple aspects trajectories and in general for multidimensional sequence classification into a unique web-based and python library system. Offers both movelets visualization and classification methods.
Created on Dec, 2023 Copyright (C) 2023, License GPL Version 3 or superior (see LICENSE file)
@author: Tarlis Portela
- matdata.converter.any2ts(data_path, folder, file, cols=None, tid_col='tid', class_col='label', opLabel='Converting TS')[source]¶
Converts data from various formats (CSV, Parquet, etc.) to a time series format.
Parameters:¶
- data_pathstr
The directory path where the data files are located.
- folderstr
The folder containing the data file to be converted.
- filestr
The name of the data file to be converted.
- colslist of str, optional
A list of column names to be included in the time series data.
- tid_colstr, optional (default=’tid’)
The name of the column to be used as the trajectory identifier.
- class_colstr, optional (default=’label’)
The name of the column to be treated as the class/label column.
- opLabelstr, optional (default=’Converting TS’)
A label describing the operation, useful for logging or display purposes.
Returns:¶
- pandas.DataFrame
A DataFrame containing the time series data, with trajectory identifier, class label, and specified columns.
- matdata.converter.csv2df(url, class_col='label', tid_col='tid', missing='?')[source]¶
Converts a CSV file from a given URL into a pandas DataFrame.
Parameters:¶
- urlstr
The URL pointing to the CSV file to be read.
- class_colstr, optional (default=’label’)
Unused, kept for standard.
- tid_colstr, optional (default=’tid’)
Unused, kept for standard.
- missingstr, optional (default=’?’)
The placeholder for missing values in the CSV file.
Returns:¶
- pandas.DataFrame
A DataFrame containing the data from the CSV file, with missing values handled as specified and columns renamed if necessary.
- matdata.converter.df2mat(df, folder, file, cols=None, mat_cols=None, desc_cols=None, label_columns=None, other_dsattrs=None, tid_col='tid', class_col='label', opLabel='Converting MAT')[source]¶
Converts a pandas DataFrame to a Multiple Aspect Trajectory .mat file and saves it to the specified folder.
Parameters:¶
- dfpandas.DataFrame
The DataFrame to be converted to a .mat file.
- folderstr
The directory where the .mat file will be saved.
- filestr
The base name of the .mat file (without extension).
- colslist of str, optional
A list of column names from the DataFrame to include in the .mat file. If None, all columns are included.
- mat_colslist of str, optional
A list of column names representing the trajectory attibutes. If None, no columns are used.
- desc_colslist of str, optional
A dict of column descriptors to be included as descriptive metadata.
- label_columnslist of str, optional
A list of column names that can be treated as labels in the .mat file.
- other_dsattrsdict, optional
A dictionary of additional dataset attributes to be included in the .mat file.
- tid_colstr, optional (default=’tid’)
The name of the column to be used as the trajectory identifier.
- class_colstr, optional (default=’label’)
The name of the column to be treated as the class/label column.
- opLabelstr, optional (default=’Converting MAT’)
A label describing the operation, useful for logging or display purposes.
Returns:¶
None
- matdata.converter.df2parquet(df, data_path, file='train', tid_col='tid', class_col='label', select_cols=None, opLabel='Writing MAT')[source]¶
Writes a pandas DataFrame to a Parquet file.
Parameters:¶
- dfpandas.DataFrame
The DataFrame to be written to the Parquet file.
- data_pathstr
The directory path where the Parquet file will be saved.
- filestr, optional (default=’train’)
The base name of the Parquet file (without extension).
- tid_colstr, optional (default=’tid’)
The name of the column to be used as the trajectory identifier.
- class_colstr, optional (default=’label’)
The name of the column to be treated as the class/label column.
- select_colslist of str, optional
A list of column names to be included in the Parquet file. If None, all columns are included.
- opLabelstr, optional (default=’Writing PARQUET’)
A label describing the operation, useful for logging or display purposes.
Returns:¶
- pandas.DataFrame
The input DataFrame
- matdata.converter.df2zip(df, data_path, file, tid_col='tid', class_col='label', select_cols=None, opLabel='Writing MAT')[source]¶
Writes a pandas DataFrame to a CSV file and compresses it into a ZIP archive.
Parameters:¶
- dfpandas.DataFrame
The DataFrame to be written to the CSV file and then compressed into a ZIP archive.
- data_pathstr
The directory path where the ZIP archive will be saved.
- filestr
The base name of the CSV file (without extension) to be compressed into the ZIP archive.
- tid_colstr, optional (default=’tid’)
The name of the column to be used as the trajectory identifier.
- class_colstr, optional (default=’label’)
The name of the column to be treated as the class/label column.
- select_colslist of str, optional
A list of column names to be included in the CSV file. If None, all columns are included.
- opLabelstr, optional (default=’Writing ZIP’)
A label describing the operation, useful for logging or display purposes.
Returns:¶
- pandas.DataFrame
The input DataFrame
- matdata.converter.mat2df(url, class_col='label', tid_col='tid', missing='?')[source]¶
Converts a MATLAB .mat file from a given URL into a pandas DataFrame.
Parameters:¶
- urlstr
The URL pointing to the .mat file to be read.
- class_colstr, optional (default=’label’)
The name of the column to be treated as the class/label column.
- tid_colstr, optional (default=’tid’)
The name of the column to be used as the unique trajectory identifier.
- missingstr, optional (default=’?’)
The placeholder for missing values in the dataset.
Returns:¶
- pandas.DataFrame
A DataFrame containing the data from the .mat file, with missing values handled as specified and columns renamed if necessary.
Raises:¶
- Exception
Not Implemented.
- matdata.converter.parquet2df(url, class_col='label', tid_col='tid', missing='?')[source]¶
Converts a Parquet file from a given URL into a pandas DataFrame.
Parameters:¶
- urlstr
The URL pointing to the Parquet file to be read.
- class_colstr, optional (default=’label’)
Unused, kept for standard.
- tid_colstr, optional (default=’tid’)
Unused, kept for standard.
- missingstr, optional (default=’?’)
The placeholder for missing values in the dataset.
Returns:¶
- pandas.DataFrame
A DataFrame containing the data from the Parquet file, with missing values handled as specified and columns renamed if necessary.
- matdata.converter.ts2df(url, class_col='label', tid_col='tid', missing='?')[source]¶
Converts a time series file from a given URL into a pandas DataFrame.
Parameters:¶
- urlstr
The URL pointing to the time series file to be read.
- class_colstr, optional (default=’label’)
The name of the column to be treated as the class/label column.
- tid_colstr, optional (default=’tid’)
The name of the column to be used as the unique trajectory identifier.
- missingstr, optional (default=’?’)
The placeholder for missing values in the dataset.
Returns:¶
- pandas.DataFrame
A DataFrame containing the data from the time series file, with missing values handled as specified and columns renamed if necessary.
- matdata.converter.xes2df(url, class_col='label', tid_col='tid', opLabel='Converting XES', save=False, start_tid=1)[source]¶
Converts an XES (eXtensible Event Stream) file from a given URL into a pandas DataFrame.
Parameters:¶
- urlstr
The URL pointing to the XES file to be read.
- class_colstr, optional (default=’label’)
The name of the column to be treated as the class/label column.
- tid_colstr, optional (default=’tid’)
The name of the column to be used as the trajectory identifier.
- opLabelstr, optional (default=’Converting XES’)
A label describing the operation, useful for logging or display purposes.
- savebool, optional (default=False)
A flag indicating whether to save the DataFrame to a file after conversion.
- start_tidint, optional (default=1)
The starting value for trajectory identifiers as tid_col values need to be generated.
Returns:¶
- pandas.DataFrame
A DataFrame containing the data from the XES file, with columns renamed if necessary.
- matdata.converter.zip2arf(folder, file, cols, tid_col='tid', class_col='label', missing='?', opLabel='Reading CSV')[source]¶
Extracts a CSV file from a ZIP archive and converts it into an ARFF (Attribute-Relation File Format) file.
Parameters:¶
- folderstr
The directory path where the ZIP archive is located.
- filestr
The name of the ZIP archive file (with or without extension).
- colslist of str
A list of column names to be included in the ARFF file.
- tid_colstr, optional (default=’tid’)
The name of the column to be used as the trajectory identifier.
- class_colstr, optional (default=’label’)
The name of the column to be treated as the class/label column.
- missingstr, optional (default=’?’)
The placeholder for missing values in the CSV file.
- opLabelstr, optional (default=’Reading CSV’)
A label describing the operation, useful for logging or display purposes.
Returns:¶
- pandas.DataFrame
A DataFrame containing the data from the extracted ZIP file, with missing values handled as specified and columns renamed if necessary.
- matdata.converter.zip2csv(folder, file, cols, class_col='label', tid_col='tid', missing='?')[source]¶
Extracts and compile Trajectory CSV files from a ZIP archive and converts it into a pandas DataFrame.
Parameters:¶
- folderstr
The directory path where the ZIP archive is located, and destination to the CSV resulting file.
- filestr
The name of the ZIP archive file (with or without extension).
- colslist of str
A list of column names to be included in the DataFrame.
- class_colstr, optional (default=’label’)
The name of the column to be treated as the class/label column.
- tid_colstr, optional (default=’tid’)
The name of the column to be used as the trajectory identifier.
- missingstr, optional (default=’?’)
The placeholder for missing values in the CSV file.
Returns:¶
- pandas.DataFrame
A DataFrame containing the data from the extracted CSV file, with missing values handled as specified and columns renamed if necessary.
- matdata.converter.zip2df(url, class_col='label', tid_col='tid', missing='?', opLabel='Reading ZIP')[source]¶
Extracts and converts a CSV trajectory file from a ZIP archive located at a given URL into a pandas DataFrame.
Parameters:¶
- urlstr
The URL pointing to the ZIP archive containing the CSV file to be read.
- class_colstr, optional (default=’label’)
The name of the column to be treated as the class/label column.
- tid_colstr, optional (default=’tid’)
The name of the column to be used as the unique trajectory identifier.
- missingstr, optional (default=’?’)
The placeholder for missing values in the CSV file.
- opLabelstr, optional (default=’Reading ZIP’)
A label describing the operation, for logging purposes.
Returns:¶
- pandas.DataFrame
A DataFrame containing the data from the extracted CSV file, with missing values handled as specified and columns renamed if necessary.