module: obs_sequence

class obs_sequence.obs_sequence(file)

Create an obs_sequence object from an ascii observation sequence file.

df

DataFrame containing all the observations.

Type:

pandas.DataFrame

all_obs

List of all observations, each observation is a list.

Type:

list

header

Header from the ascii file.

Type:

str

vert

Dictionary of dart vertical units.

Type:

dict

types

Dictionary of types in the observation sequence file.

Type:

dict

copie_names

Names of copies in the observation sequence file. Spelled ‘copie’ to avoid conflict with the Python built-in copy function. Spaces are replaced with underscores in copie_names.

Type:

list

Parameters:

file – the input observation sequence ascii file

Example

Read the observation sequence from file:

obs_seq = obs_sequence('/home/data/obs_seq.final.ascii.small')

Access the resulting pandas DataFrame:

obs_seq.df

For 3D sphere models: latitude and longitude are in degrees in the DataFrame

Calculations:

  • sq_err = (mean-obs)**2

  • bias = (mean-obs)

  • rmse = sqrt( sum((mean-obs)**2)/n )

  • bias = sum((mean-obs)/n)

  • spread = sum(sd)

  • totalspread = sqrt(sum(sd+obs_err_var))

create_all_obs()

steps through the generator to create a list of all observations in the sequence

obs_to_list(obs)

put single observation into a list

discards obs_def

static generate_linked_list_pattern(n)

Create a list of strings with the linked list pattern for n lines.

write_obs_seq(file, df=None)

Write the observation sequence to a file.

This function writes the observation sequence to disk. If no DataFrame is provided, it writes the obs_sequence object to a file using the header and all observations stored in the object. If a DataFrame is provided,it creates a header and linked list from the DataFrame, then writes the DataFrame obs to an obs_sequence file. Note the DataFrame is assumed to have been created from obs_sequence object.

Parameters:
  • file (str) – The path to the file where the observation sequence will be written.

  • df (pandas.DataFrame, optional) – A DataFrame containing the observation data. If not provided, the function uses self.header and self.all_obs.

Returns:

None

Examples

obs_seq.write_obs_seq('/path/to/output/file') obs_seq.write_obs_seq('/path/to/output/file', df=obs_seq.df)

column_headers()

define the columns for the dataframe

static read_header(file)

Read the header and number of lines in the header of an obs_seq file

static collect_obs_types(header)

Create a dictionary for the observation types in the obs_seq header

static collect_copie_names(header)

Extracts the names of the copies from the header of an obs_seq file.

Parameters:

header (list) – A list of strings representing the lines in the header of the obs_seq file.

Returns:

A tuple containing two elements:
  • copie_names (list): A list of strings representing the copy names with underscores for spaces.

  • len(copie_names) (int): The number of copy names.

Return type:

tuple

static obs_reader(file, n)

Reads the obs sequence file and returns a generator of the obs

composite_types(composite_types='use_default')

Set up and construct composite types for the DataFrame.

This function sets up composite types based on a provided YAML configuration or a default configuration. It constructs new composite rows by combining specified components and adds them to the DataFrame.

Parameters:

composite_types (str, optional) – The YAML configuration for composite types. If ‘use_default’, the default configuration is used. Otherwise, a custom YAML configuration can be provided.

Returns:

The updated DataFrame with the new composite rows added.

Return type:

pd.DataFrame

Raises:

Exception – If there are repeat values in the components.

obs_sequence.load_yaml_to_dict(file_path)

Load a YAML file and convert it to a dictionary.

Parameters:

file_path (str) – The path to the YAML file.

Returns:

The YAML file content as a dictionary.

Return type:

dict

obs_sequence.convert_dart_time(seconds, days)

covert from seconds, days after 1601 to datetime object

Note

  • base year for Gregorian calendar is 1601

  • dart time is seconds, days since 1601

obs_sequence.select_by_dart_qc(df, dart_qc)

Selects rows from a DataFrame based on the DART quality control flag.

Parameters:
  • df (DataFrame) – A pandas DataFrame.

  • dart_qc (int) – The DART quality control flag to select.

Returns:

A DataFrame containing only the rows with the specified DART quality control flag.

Return type:

DataFrame

Raises:

ValueError – If the DART quality control flag is not present in the DataFrame.

obs_sequence.select_failed_qcs(df)

Selects rows from a DataFrame where the DART quality control flag is greater than 0.

Parameters:

df (DataFrame) – A pandas DataFrame.

Returns:

A DataFrame containing only the rows with a DART quality control flag greater than 0.

Return type:

DataFrame

obs_sequence.possible_vs_used(df)

Calculates the count of possible vs. used observations by type.

This function takes a DataFrame containing observation data, including a ‘type’ column for the observation type and an ‘observation’ column. The number of used observations (‘used’), is the total number minus the observations that failed quality control checks (as determined by the select_failed_qcs function). The result is a DataFrame with each observation type, the count of possible observations, and the count of used observations.

Parameters:
  • df (pd.DataFrame) – A DataFrame with at least two columns: ‘type’ for the observation type and ‘observation’

  • function (for the observation data. It may also contain other columns required by the select_failed_qcs)

  • checks. (to determine failed quality control)

Returns:

A DataFrame with three columns: ‘type’, ‘possible’, and ‘used’. ‘type’ is the observation type, ‘possible’ is the count of all observations of that type, and ‘used’ is the count of observations of that type that passed quality control checks.

Return type:

pd.DataFrame

obs_sequence.construct_composit(df_comp, composite, components)

Construct a composite DataFrame by combining rows from two components.

This function takes two DataFrames and combines rows from them based on matching location and time. It creates a new row with a composite type by combining specified columns using the square root of the sum of squares method.

Parameters:
  • df_comp (pd.DataFrame) – The DataFrame containing the component rows to be combined.

  • composite (str) – The type name for the new composite rows.

  • components (list of str) – A list containing the type names of the two components to be combined.

Returns:

The updated DataFrame with the new composite rows added.

Return type:

merged_df (pd.DataFrame)