Package qcsv
[frames] | no frames]

Package qcsv

source code

Classes
  Table
Table(types, names, rows)
  Column
Column(type, name, cells)
Functions
 
read(fname, delimiter=',', skip_header=False)
read loads cell data, column headers and type information for each column given a file path to a CSV formatted file.
source code
 
map_names(table, f)
new_rows executes f on every column header in the table, with three arguments, in order: column type, column index, column name.
source code
 
map_data(table, f)
new_rows executes f on every cell of data with five arguments, in order: column type, column name, row index, column index, contents.
source code
 
cast(table)
cast type casts all of the values in 'rows' to their corresponding types in types.
source code
 
convert_missing_cells(table, dstr='', dint=0, dfloat=0.0)
convert_missing_cells changes the values of all NULL cells to the values specified by dstr, dint and dfloat.
source code
 
convert_columns(table, **kwargs)
convert_columns executes converter functions on specific columns, where the parameter names for kwargs are the column names, and the parameter values are functions of one parameter that return a single value.
source code
 
convert_types(table, fstr=None, fint=None, ffloat=None)
convert_types works just like convert_columns, but on types instead of specific columns.
source code
 
column(table, colname)
column returns the column with name "colname", where the column returned is a triple of the column type, the column name and a NumPy array of cells in the column.
source code
 
columns(table)
columns returns a list of all columns in the data set, where each column is a triple of its type, name and a NumPy array of cells in the column.
source code
 
frequencies(column)
frequencies returns a dictionary where the keys are unique values in the column, and the values correspond to the frequency of each value in the column.
source code
 
type_str(typ)
type_str returns a string representation of a column type.
source code
 
cell_str(cell_contents)
cell_str is a convenience function for converting cell contents to a string when there are still NULL values.
source code
 
print_data_table(table)
print_data_table is a convenience function for pretty-printing the data in tabular format, including header names and type annotations.
source code
Variables
  __package__ = 'qcsv'
Function Details

read(fname, delimiter=',', skip_header=False)

source code 

read loads cell data, column headers and type information for each column given a file path to a CSV formatted file. A "Table" namedtuple is returned with fields "types", "names" and "rows".

All cells have left and right whitespace trimmed.

All rows MUST be the same length.

delimiter is the string the separates each field in a row.

If skip_header is set, then no column headers are read, and column names are set to their corresponding indices (as strings).

map_names(table, f)

source code 

new_rows executes f on every column header in the table, with three arguments, in order: column type, column index, column name. The result of the function is placed in the corresponding header location.

A new table is returned with the new column names.

map_data(table, f)

source code 

new_rows executes f on every cell of data with five arguments, in order: column type, column name, row index, column index, contents. The result of the function is placed in the corresponding cell location.

A new table is returned with the converted values.

cast(table)

source code 

cast type casts all of the values in 'rows' to their corresponding types in types.

The only special case here is missing values or NULL columns. If a value is missing or a column has type NULL (i.e., all values are missing), then the value is replaced with None, which is Python's version of a NULL value.

N.B. cast is idempotent. i.e., cast(x) = cast(cast(x)).

convert_missing_cells(table, dstr='', dint=0, dfloat=0.0)

source code 

convert_missing_cells changes the values of all NULL cells to the values specified by dstr, dint and dfloat. For example, all NULL cells in columns with type "string" will be replaced with the value given to dstr.

convert_columns(table, **kwargs)

source code 

convert_columns executes converter functions on specific columns, where the parameter names for kwargs are the column names, and the parameter values are functions of one parameter that return a single value.

e.g., convert_columns(names, rows, colname=lambda s: s.lower()) would convert all values in the column with name 'colname' to lowercase.

convert_types(table, fstr=None, fint=None, ffloat=None)

source code 

convert_types works just like convert_columns, but on types instead of specific columns. This function will likely be more useful, since sanitizatiion functions are typically type oriented rather than column oriented.

However, when there are specific kinds of columns that need special sanitization, convert_columns should be used.

cell_str(cell_contents)

source code 

cell_str is a convenience function for converting cell contents to a string when there are still NULL values.

N.B. If you choose to work with data while keeping NULL values, you will likely need to write more functions similar to this one.