Metadata-Version: 2.4
Name: csvspoon
Version: 1.3.0
Summary: A tool to manipulate csv files with headers.
Author-email: Jean-Benoist Leger <jb@leger.tf>
License: MIT
Project-URL: Homepage, https://gitlab.com/jbleger/csvspoon
Keywords: csv
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.5
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: argparse
Requires-Dist: argcomplete
Requires-Dist: colorama
Requires-Dist: more-itertools
Requires-Dist: numpy
Requires-Dist: lazy-loader
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Dynamic: license-file

# csvspoon: a tool to manipulate csv file with headers

Again, again, and again.

## Installing

From pypi:

```
pip3 install csvspoon
```

Or developer version:

```
git clone <this repo>
cd csvspoon
pip3 install -e .
```

## Enable completion (for bash or other shells using bash-completion)

```
mkdir -p ~/.local/share/bash-completion/completions
register-python-argcomplete csvspoon > ~/.local/share/bash-completion/completions/csvspoon
```

## Python module

All methods and functions are accessible in the python module.
## Cli example
### csvspoon cat: Concatenate CSV files
 - Cat two csv files:
```
csvspoon cat file1.csv file2.csv
```
 - Display a csv file with a high number of columns:
```
csvspoon cat -S wide_data.csv
```
 - Change delimiter of a csv file:
```
csvspoon cat -d "\t" -u ";" file.csv > result.csv
```
 - Change delimiter of a csv file with specified output:
```
csvspoon cat -o result.csv -d "\t" -u ";" file.csv
```
 - Reformat two columns of a csv files:
```
csvspoon cat -f a_colname:5.1f -f another_colname:04d file.csv
```
 - Cat one csv file, keeping only a column:
```
csvspoon cat file.csv:a_col
```
 - Cat two csv files, renaming a column on the second file:
```
csvspoon cat file1.csv file2.csv:new_col=old_col,another_col
```
 - Cat a csv file, keeping all columns except one:
```
csvspoon cat file.csv:-col_to_drop
```
 - Cat a csv file with a renamed column and one excluded (all other cols kept):
```
csvspoon cat file.csv:display_name=internal_name,-skip_col
```
### csvspoon apply: Apply functions to add columns
 - Combine text columns by a formula:
```
csvspoon apply -a name "lastname.upper()+' '+firstname.lower()" file.csv
```
 - Column names with spaces (use --rowvar):
```
csvspoon apply --rowvar r -a "full name" "r['first name']+' '+r['last name']" file.csv
```
 - Combine text columns by a formula and remove original columns:
```
csvspoon apply -a name "lastname.upper()+' '+firstname.lower()" \
        -O-lastname,-firstname file.csv
```
 - Sum to integer columns:
```
csvspoon apply -t cola:int -t colb:int -a colsum "cola+colb" file.csv
```
 - Sum to integer columns and format the result:
```
csvspoon apply -t cola:int -t colb:int -a colsum:05d "cola+colb" file.csv
```
 - Compute complex expression between columns:
```
csvspoon apply \
        -b "import math" \
        -t x:float \
        -t y:float \
        -a norm "math.sqrt(x**2+y**2)" \
        file.csv
```
 - Use a custom function from a local module (current dir) for a new column:
```
csvspoon apply -p . -b "from stuff import myfun" -a new_col "myfun(col1, col2)" file.csv
```
 - Multiple computation can be done reusing newly created columns:
```
csvspoon apply -t x:int -a x2p1 "x**2+1" -a x2p1m1 "x2p1-1" file.csv
```
### csvspoon sort: Sort CSV file
 - Sort csv file using column cola:
```
csvspoon sort -k cola file.csv
```
 - Sort csv file using columns cola and colb:
```
csvspoon sort -k cola -k colb file.csv
```
 - Sort csv file using numerical mode on column numcol:
```
csvspoon sort -n -k numcol file.csv
```
 - Shuffle csv file:
```
csvspoon sort -R file.csv
```
### csvspoon filter: Filter CSV from given conditions
 - Filter csv file using two columns:
```
csvspoon filter -a "lastname!=firstname" file.csv
```
 - Column name with spaces (use --rowvar):
```
csvspoon filter --rowvar r -a "r['unit price']>10" file.csv
```
 - Chain filters on csv file:
```
csvspoon filter \
        -a "lastname.startswith('Doe')" \
        -a "firstname.starswith('John')" \
        file.csv
```
 - Filter csv file with float column price:
```
csvspoon filter -t price:float -a "price>12.5" file.csv
```
 - Filter csv file with complex expression:
```
csvspoon filter \
        -b "import math" \
        -t x:float \
        -t y:float \
        -t z:float \
        -a "math.sqrt(x**2+y**2)>z" \
        file.csv
```
### csvspoon join: Join CSV files
 - Operate NATURAL JOIN on two csv files:
```
csvspoon join file1.csv file2.csv
```
 - Operate two NATURAL JOIN on three csv files:
```
csvspoon join file1.csv file2.csv file3.csv
```
 - Operate LEFT JOIN on two csv files
```
csvspoon join -l file1.csv file2.csv
```
 - Operate RIGHT JOIN on two csv files
```
csvspoon join -r file1.csv file2.csv
```
 - Operate OUTER JOIN on two csv files
```
csvspoon join -lr file1.csv file2.csv
```
### csvspoon aggregate: Compute aggregation on CSV file
 - Keeping unique lines, one line per group:
```
csvspoon aggregate \
        -k group \
        file.csv
```
 - Column name with spaces (use --rowvar):
```
csvspoon aggregate --rowvar r -k category -a total "sum(r['unit price'])" file.csv
```
 - Computing the total mean grade:
```
csvspoon aggregate \
        --np \
        -t grade:float \
        -a meangrade "np.mean(grade)" \
        file.csv
```
 - Computing the total mean grade specifing a format:
```
csvspoon aggregate \
        --np \
        -t grade:float \
        -a meangrade:.2f "np.mean(grade)" \
        file.csv
```
 - Computing the mean grade by group:
```
csvspoon aggregate \
        --np \
        -t grade:float \
        -a meangrade "np.mean(grade)" \
        -k group \
        file.csv
```
 - Computing the mean grade, median, standard deviation by group:
```
csvspoon aggregate \
        --np \
        -t grade:float \
        -a meangrade "np.mean(grade)" \
        -a mediangrade "np.median(grade)" \
        -a stdgrade "np.std(grade)" \
        -k group \
        file.csv
```
### csvspoon sample: Sample rows in CSV file
 - Sample 10 rows without replacement:
```
csvspoon sample -k 10 file.csv
```
 - Weighted sample of 10 rows using column weight:
```
csvspoon sample -W weight -k 10 file.csv
```
 - Sample 30 rows with replacement:
```
csvspoon sample -r -k 30 file.csv
```
## Cli usage
```
usage: csvspoon [-h] {aggregate,apply,cat,filter,join,sample,sort} ...

A tool to manipulate csv files with headers.
Again, again and again.

options:
  -h, --help            show this help message and exit

subcommands:
  {aggregate,apply,cat,filter,join,sample,sort}
    aggregate           Apply a aggregation formula to compute a new column.
    apply               Apply a formula to compute a new column.
    cat                 Concatenate csv files.
    filter              Filter a csv with a formula.
    join                Operate join on csv files
    sample              Random sampling in streaming (with or without
                        replacement).
    sort                Sort csv files.

```
## `csvspoon aggregate
```
usage: csvspoon aggregate [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT]
                          [-f FORMAT] [-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM]
                          [-C OUTPUTENC] [-H N | -T N] [-D] [-w TEXTWIDTH |
                          -S] [--infer-lines INFER_LINES] [-L | -N]
                          [-b BEFORE] [--np] [--sp] [-p PATH] [-t TYPE]
                          [-r NAME] [-a COLSPEC FORMULA] [-k KEYS]
                          [input]

Apply a formula to compute a new column.
The formula must be a valid python expression evaluated for each
groupped row. With --rowvar NAME, do not use column names as
variables; use the dictionary NAME[column_name] instead.
Only aggregation or column with non ambiguous values are keeped.
Warning: this method need to store in memory all the input csv file.

positional arguments:
  input                 Input file specification. If no input file is
                        provided, stdin is used as input file. Can be a
                        filename (e.g. "file.csv"), or a filename followed by
                        a semicolon and column names separated by commas (e.g.
                        "file.csv:a_colname,another_colname"). A column can be
                        renamed with "new_name=old_name" (e.g.
                        "file.csv:new_col=old_col"). When column names are
                        specified (and no exclusion is used), only those
                        columns are used, in the given order. Prefixing a
                        column with "-" excludes it. When at least one column
                        is excluded, all columns from the file are kept except
                        the excluded ones; columns that are also listed
                        (possibly with a rename) appear under their (new)
                        name. Excluded columns cannot be renamed (no "=" after
                        "-col").

options:
  -h, --help            show this help message and exit

Input options:
  -d, --delim DELIM     Input delimiter. (default: ',')
  -c, --inputenc INPUTENC
                        Input encoding. (default: 'utf8')

Output options:
  -o, --output OUTPUT   Output file, else output on stdout.
  -f, --format FORMAT   Apply a format on a column on output. The argument
                        must be a column name followed by a colon and a format
                        specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
                        This option can be specified multiple time to format
                        different columns.
  -O, --output-columns COLS
                        Output only the given columns, in that order. Same
                        format as after ':' in the input: column names,
                        optional renames (newname=oldname), optional
                        exclusions (-col). e.g. "a,b,label=id,-skip". These
                        transformations are applied after all other processing
                        (including the --format option). When excluding
                        columns (-col), use -O-col or --output-columns=-col
                        (no space, or use =) so that -col is not parsed as a
                        separate option.
  -F, --output-format OUTPUT_FORMAT
                        Output format. Choices: auto, csv, terminal, ascii,
                        unicode, markdown (or md), latex (or tex). auto
                        (default): CSV except when output on stdout and stdout
                        is a tty, then terminal. csv: Raw CSV using the output
                        delimiter (-u/--output-delim). terminal: Pretty-print
                        with Unicode box-drawing for a nice display. ascii:
                        Pretty-print as text table with ASCII characters only.
                        unicode: Pretty-print as text table with Unicode box-
                        drawing characters. markdown (md): Pretty-print as a
                        Markdown table. latex (tex): Pretty-print as a LaTeX
                        tabular (booktabs).
  -u, --output-delim ODELIM
                        Output delimiter. (default: ',')
  -C, --outputenc OUTPUTENC
                        Output encoding. (default: 'utf8')
  -H, --head N          Output only the first N rows.
  -T, --tail N          Output only the last N rows.
  -D, --dark-background
                        Use dark background colors for terminal output. This
                        option only has an effect when terminal pretty-print
                        is used.
  -w, --textwidth TEXTWIDTH
                        Maximum total width of the table in characters
                        (including borders and padding) for pretty output
                        (terminal, ascii, unicode, markdown). Content is
                        truncated with "…" when needed. (default: terminal
                        width when stdout is a tty, else no limit)
  -S, --keep-long-lines
                        Do not wrap or truncate long lines (mutually exclusive
                        with -w). Implies -L in terminal output mode except
                        when -N is specified.
  --infer-lines INFER_LINES
                        Number of rows used to infer column types and sizes
                        for pretty output (markdown, terminal, etc.). Use
                        "inf" for no limit (infer on all rows). (default:
                        1000)
  -L, --less            Pipe output through less (only when writing to stdout
                        on a tty). Automatically activated if the number of
                        lines exceed the terminal size (except when '--no-
                        less' or '-N' is used).
  -N, --no-less         Never output through less (only when writing to stdout
                        on a tty).

Processing options:
  -b, --before BEFORE   Run the following code before evaluate the expression
                        on each row. Can be specified multiple times. (e.g.
                        "import math").
  --np                  Shortcut to `--before "import numpy as np"`
  --sp                  Shortcut to `--np --before "import scipy as sp"`
  -p, --python-path PATH
                        Append path provided in argument to Python path.
                        Usefull in with -b. e.g. -p . -b "import myfunctions".
  -t, --type TYPE       Apply type conversion on specified command prior to
                        expression. The argument must be a column name
                        followed by a valid Python type. See "--before" to
                        define non standard type. e.g. "a_column:int" or
                        "a_column:float". This option can be specified
                        multiple time to type different columns.
  -r, --rowvar NAME     When set, row values are not exposed as local
                        variables in expressions; instead a single dictionary
                        mapping column names to values is exposed under this
                        name. Useful when column names contain spaces or
                        special characters. e.g. --rowvar row then use
                        row["col name"] in apply/aggregate/filter expressions.
  -a, --add, --add-aggregation COLSPEC FORMULA
                        Append a new column by aggregation of values. Take two
                        argument, COLSPEC and FORMULA. COLSPEC is the name of
                        the created column obtained the aggregation. The
                        COLSPEC can also contains a colon and a format
                        specifier, see "--format" for example. FORMULA must be
                        a valid python expression. For each column, list of
                        values to aggregate are accessible as local variable
                        (or via the dict given by --rowvar NAME). The formula
                        should return a single value. e.g. "sum(a_colname) +
                        sum(other_colname)" or with --rowvar r: "sum(r['a
                        col']) + sum(r['other col'])". See "--type" for typing
                        other columns and "--before" for run code before
                        evaluating expression. Can be specified multiple time.
  -k, --key KEYS        Column used groupping the aggregate. Can be specified
                        multiple time. Similar to "GROUP BY" in SQL.

Examples:
  Keeping unique lines, one line per group:
    csvspoon aggregate \
            -k group \
            file.csv

  Column name with spaces (use --rowvar):
    csvspoon aggregate --rowvar r -k category -a total "sum(r['unit price'])" file.csv

  Computing the total mean grade:
    csvspoon aggregate \
            --np \
            -t grade:float \
            -a meangrade "np.mean(grade)" \
            file.csv

  Computing the total mean grade specifing a format:
    csvspoon aggregate \
            --np \
            -t grade:float \
            -a meangrade:.2f "np.mean(grade)" \
            file.csv

  Computing the mean grade by group:
    csvspoon aggregate \
            --np \
            -t grade:float \
            -a meangrade "np.mean(grade)" \
            -k group \
            file.csv

  Computing the mean grade, median, standard deviation by group:
    csvspoon aggregate \
            --np \
            -t grade:float \
            -a meangrade "np.mean(grade)" \
            -a mediangrade "np.median(grade)" \
            -a stdgrade "np.std(grade)" \
            -k group \
            file.csv

```
## `csvspoon apply
```
usage: csvspoon apply [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
                      [-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
                      [-H N | -T N] [-D] [-w TEXTWIDTH | -S]
                      [--infer-lines INFER_LINES] [-L | -N] [-b BEFORE] [--np]
                      [--sp] [-p PATH] [-t TYPE] [-r NAME]
                      [-a COLSPEC FORMULA]
                      [input]

Apply a formula to compute a new column.
The formula must be a valid python expression evaluated on each row.
With --rowvar NAME, do not use column names as variables; use the
dictionary NAME[column_name] instead.
This method is completely streamed and no data is stored in memory.

positional arguments:
  input                 Input file specification. If no input file is
                        provided, stdin is used as input file. Can be a
                        filename (e.g. "file.csv"), or a filename followed by
                        a semicolon and column names separated by commas (e.g.
                        "file.csv:a_colname,another_colname"). A column can be
                        renamed with "new_name=old_name" (e.g.
                        "file.csv:new_col=old_col"). When column names are
                        specified (and no exclusion is used), only those
                        columns are used, in the given order. Prefixing a
                        column with "-" excludes it. When at least one column
                        is excluded, all columns from the file are kept except
                        the excluded ones; columns that are also listed
                        (possibly with a rename) appear under their (new)
                        name. Excluded columns cannot be renamed (no "=" after
                        "-col").

options:
  -h, --help            show this help message and exit

Input options:
  -d, --delim DELIM     Input delimiter. (default: ',')
  -c, --inputenc INPUTENC
                        Input encoding. (default: 'utf8')

Output options:
  -o, --output OUTPUT   Output file, else output on stdout.
  -f, --format FORMAT   Apply a format on a column on output. The argument
                        must be a column name followed by a colon and a format
                        specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
                        This option can be specified multiple time to format
                        different columns.
  -O, --output-columns COLS
                        Output only the given columns, in that order. Same
                        format as after ':' in the input: column names,
                        optional renames (newname=oldname), optional
                        exclusions (-col). e.g. "a,b,label=id,-skip". These
                        transformations are applied after all other processing
                        (including the --format option). When excluding
                        columns (-col), use -O-col or --output-columns=-col
                        (no space, or use =) so that -col is not parsed as a
                        separate option.
  -F, --output-format OUTPUT_FORMAT
                        Output format. Choices: auto, csv, terminal, ascii,
                        unicode, markdown (or md), latex (or tex). auto
                        (default): CSV except when output on stdout and stdout
                        is a tty, then terminal. csv: Raw CSV using the output
                        delimiter (-u/--output-delim). terminal: Pretty-print
                        with Unicode box-drawing for a nice display. ascii:
                        Pretty-print as text table with ASCII characters only.
                        unicode: Pretty-print as text table with Unicode box-
                        drawing characters. markdown (md): Pretty-print as a
                        Markdown table. latex (tex): Pretty-print as a LaTeX
                        tabular (booktabs).
  -u, --output-delim ODELIM
                        Output delimiter. (default: ',')
  -C, --outputenc OUTPUTENC
                        Output encoding. (default: 'utf8')
  -H, --head N          Output only the first N rows.
  -T, --tail N          Output only the last N rows.
  -D, --dark-background
                        Use dark background colors for terminal output. This
                        option only has an effect when terminal pretty-print
                        is used.
  -w, --textwidth TEXTWIDTH
                        Maximum total width of the table in characters
                        (including borders and padding) for pretty output
                        (terminal, ascii, unicode, markdown). Content is
                        truncated with "…" when needed. (default: terminal
                        width when stdout is a tty, else no limit)
  -S, --keep-long-lines
                        Do not wrap or truncate long lines (mutually exclusive
                        with -w). Implies -L in terminal output mode except
                        when -N is specified.
  --infer-lines INFER_LINES
                        Number of rows used to infer column types and sizes
                        for pretty output (markdown, terminal, etc.). Use
                        "inf" for no limit (infer on all rows). (default:
                        1000)
  -L, --less            Pipe output through less (only when writing to stdout
                        on a tty). Automatically activated if the number of
                        lines exceed the terminal size (except when '--no-
                        less' or '-N' is used).
  -N, --no-less         Never output through less (only when writing to stdout
                        on a tty).

Processing options:
  -b, --before BEFORE   Run the following code before evaluate the expression
                        on each row. Can be specified multiple times. (e.g.
                        "import math").
  --np                  Shortcut to `--before "import numpy as np"`
  --sp                  Shortcut to `--np --before "import scipy as sp"`
  -p, --python-path PATH
                        Append path provided in argument to Python path.
                        Usefull in with -b. e.g. -p . -b "import myfunctions".
  -t, --type TYPE       Apply type conversion on specified command prior to
                        expression. The argument must be a column name
                        followed by a valid Python type. See "--before" to
                        define non standard type. e.g. "a_column:int" or
                        "a_column:float". This option can be specified
                        multiple time to type different columns.
  -r, --rowvar NAME     When set, row values are not exposed as local
                        variables in expressions; instead a single dictionary
                        mapping column names to values is exposed under this
                        name. Useful when column names contain spaces or
                        special characters. e.g. --rowvar row then use
                        row["col name"] in apply/aggregate/filter expressions.
  -a, --add, --add-column COLSPEC FORMULA
                        Append a new column (or update existing one). Take two
                        argument, COLSPEC and FORMULA. COLSPEC is the name of
                        the created column obtained by appling the formula.
                        The column is remplaced if already exists. The COLSPEC
                        can also contains a colon and a format specifier, see
                        "--format" for example. FORMULA must be a valid python
                        expression. For the current row, columns values are
                        accessible as local variable (or via the dict given by
                        --rowvar NAME). e.g. "a_colname + other_colname" or
                        "min(a_colname, other_colname)"; with --rowvar row:
                        row["col name"] + row["other col"]. See "--type" for
                        typing other columns and "--before" for run code
                        before evaluating expression. Can be specified
                        multiple time.

Examples:
  Combine text columns by a formula:
    csvspoon apply -a name "lastname.upper()+' '+firstname.lower()" file.csv

  Column names with spaces (use --rowvar):
    csvspoon apply --rowvar r -a "full name" "r['first name']+' '+r['last name']" file.csv

  Combine text columns by a formula and remove original columns:
    csvspoon apply -a name "lastname.upper()+' '+firstname.lower()" \
            -O-lastname,-firstname file.csv

  Sum to integer columns:
    csvspoon apply -t cola:int -t colb:int -a colsum "cola+colb" file.csv

  Sum to integer columns and format the result:
    csvspoon apply -t cola:int -t colb:int -a colsum:05d "cola+colb" file.csv

  Compute complex expression between columns:
    csvspoon apply \
            -b "import math" \
            -t x:float \
            -t y:float \
            -a norm "math.sqrt(x**2+y**2)" \
            file.csv

  Use a custom function from a local module (current dir) for a new column:
    csvspoon apply -p . -b "from stuff import myfun" -a new_col "myfun(col1, col2)" file.csv

  Multiple computation can be done reusing newly created columns:
    csvspoon apply -t x:int -a x2p1 "x**2+1" -a x2p1m1 "x2p1-1" file.csv

```
## `csvspoon cat
```
usage: csvspoon cat [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
                    [-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
                    [-H N | -T N] [-D] [-w TEXTWIDTH | -S]
                    [--infer-lines INFER_LINES] [-L | -N]
                    [input ...]

Concatenate csv files.
Empty fields added if some columns do not exist in all files
This method is completely streamed and no data is stored in memory.

positional arguments:
  input                 Input file specification. If no input file is
                        provided, stdin is used as first input file, otherwise
                        use explicitly "-" for stdin. Can be a filename (e.g.
                        "file.csv"), or a filename followed by a semicolon and
                        column names separated by commas (e.g.
                        "file.csv:a_colname,another_colname"). A column can be
                        renamed with "new_name=old_name" (e.g.
                        "file.csv:new_col=old_col"). When column names are
                        specified (and no exclusion is used), only those
                        columns are used, in the given order. Prefixing a
                        column with "-" excludes it. When at least one column
                        is excluded, all columns from the file are kept except
                        the excluded ones; columns that are also listed
                        (possibly with a rename) appear under their (new)
                        name. Excluded columns cannot be renamed (no "=" after
                        "-col").

options:
  -h, --help            show this help message and exit

Input options:
  -d, --delim DELIM     Input delimiter. (default: ',')
  -c, --inputenc INPUTENC
                        Input encoding. (default: 'utf8')

Output options:
  -o, --output OUTPUT   Output file, else output on stdout.
  -f, --format FORMAT   Apply a format on a column on output. The argument
                        must be a column name followed by a colon and a format
                        specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
                        This option can be specified multiple time to format
                        different columns.
  -O, --output-columns COLS
                        Output only the given columns, in that order. Same
                        format as after ':' in the input: column names,
                        optional renames (newname=oldname), optional
                        exclusions (-col). e.g. "a,b,label=id,-skip". These
                        transformations are applied after all other processing
                        (including the --format option). When excluding
                        columns (-col), use -O-col or --output-columns=-col
                        (no space, or use =) so that -col is not parsed as a
                        separate option.
  -F, --output-format OUTPUT_FORMAT
                        Output format. Choices: auto, csv, terminal, ascii,
                        unicode, markdown (or md), latex (or tex). auto
                        (default): CSV except when output on stdout and stdout
                        is a tty, then terminal. csv: Raw CSV using the output
                        delimiter (-u/--output-delim). terminal: Pretty-print
                        with Unicode box-drawing for a nice display. ascii:
                        Pretty-print as text table with ASCII characters only.
                        unicode: Pretty-print as text table with Unicode box-
                        drawing characters. markdown (md): Pretty-print as a
                        Markdown table. latex (tex): Pretty-print as a LaTeX
                        tabular (booktabs).
  -u, --output-delim ODELIM
                        Output delimiter. (default: ',')
  -C, --outputenc OUTPUTENC
                        Output encoding. (default: 'utf8')
  -H, --head N          Output only the first N rows.
  -T, --tail N          Output only the last N rows.
  -D, --dark-background
                        Use dark background colors for terminal output. This
                        option only has an effect when terminal pretty-print
                        is used.
  -w, --textwidth TEXTWIDTH
                        Maximum total width of the table in characters
                        (including borders and padding) for pretty output
                        (terminal, ascii, unicode, markdown). Content is
                        truncated with "…" when needed. (default: terminal
                        width when stdout is a tty, else no limit)
  -S, --keep-long-lines
                        Do not wrap or truncate long lines (mutually exclusive
                        with -w). Implies -L in terminal output mode except
                        when -N is specified.
  --infer-lines INFER_LINES
                        Number of rows used to infer column types and sizes
                        for pretty output (markdown, terminal, etc.). Use
                        "inf" for no limit (infer on all rows). (default:
                        1000)
  -L, --less            Pipe output through less (only when writing to stdout
                        on a tty). Automatically activated if the number of
                        lines exceed the terminal size (except when '--no-
                        less' or '-N' is used).
  -N, --no-less         Never output through less (only when writing to stdout
                        on a tty).

Examples:
  Cat two csv files:
    csvspoon cat file1.csv file2.csv

  Display a csv file with a high number of columns:
    csvspoon cat -S wide_data.csv

  Change delimiter of a csv file:
    csvspoon cat -d "\t" -u ";" file.csv > result.csv

  Change delimiter of a csv file with specified output:
    csvspoon cat -o result.csv -d "\t" -u ";" file.csv

  Reformat two columns of a csv files:
    csvspoon cat -f a_colname:5.1f -f another_colname:04d file.csv

  Cat one csv file, keeping only a column:
    csvspoon cat file.csv:a_col

  Cat two csv files, renaming a column on the second file:
    csvspoon cat file1.csv file2.csv:new_col=old_col,another_col

  Cat a csv file, keeping all columns except one:
    csvspoon cat file.csv:-col_to_drop

  Cat a csv file with a renamed column and one excluded (all other cols kept):
    csvspoon cat file.csv:display_name=internal_name,-skip_col

```
## `csvspoon filter
```
usage: csvspoon filter [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
                       [-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
                       [-H N | -T N] [-D] [-w TEXTWIDTH | -S]
                       [--infer-lines INFER_LINES] [-L | -N] [-b BEFORE]
                       [--np] [--sp] [-p PATH] [-t TYPE] [-r NAME]
                       [-a FILTER_FORMULA]
                       [input]

Evaluate a formula on each row, and keep only rows where the formula
is evaluated True.
The formula must be a valid python expression evaluated on each row.
With --rowvar NAME, do not use column names as variables; use the
dictionary NAME[column_name] instead.
This method is completely streamed and no data is stored in memory.

positional arguments:
  input                 Input file specification. If no input file is
                        provided, stdin is used as input file. Can be a
                        filename (e.g. "file.csv"), or a filename followed by
                        a semicolon and column names separated by commas (e.g.
                        "file.csv:a_colname,another_colname"). A column can be
                        renamed with "new_name=old_name" (e.g.
                        "file.csv:new_col=old_col"). When column names are
                        specified (and no exclusion is used), only those
                        columns are used, in the given order. Prefixing a
                        column with "-" excludes it. When at least one column
                        is excluded, all columns from the file are kept except
                        the excluded ones; columns that are also listed
                        (possibly with a rename) appear under their (new)
                        name. Excluded columns cannot be renamed (no "=" after
                        "-col").

options:
  -h, --help            show this help message and exit

Input options:
  -d, --delim DELIM     Input delimiter. (default: ',')
  -c, --inputenc INPUTENC
                        Input encoding. (default: 'utf8')

Output options:
  -o, --output OUTPUT   Output file, else output on stdout.
  -f, --format FORMAT   Apply a format on a column on output. The argument
                        must be a column name followed by a colon and a format
                        specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
                        This option can be specified multiple time to format
                        different columns.
  -O, --output-columns COLS
                        Output only the given columns, in that order. Same
                        format as after ':' in the input: column names,
                        optional renames (newname=oldname), optional
                        exclusions (-col). e.g. "a,b,label=id,-skip". These
                        transformations are applied after all other processing
                        (including the --format option). When excluding
                        columns (-col), use -O-col or --output-columns=-col
                        (no space, or use =) so that -col is not parsed as a
                        separate option.
  -F, --output-format OUTPUT_FORMAT
                        Output format. Choices: auto, csv, terminal, ascii,
                        unicode, markdown (or md), latex (or tex). auto
                        (default): CSV except when output on stdout and stdout
                        is a tty, then terminal. csv: Raw CSV using the output
                        delimiter (-u/--output-delim). terminal: Pretty-print
                        with Unicode box-drawing for a nice display. ascii:
                        Pretty-print as text table with ASCII characters only.
                        unicode: Pretty-print as text table with Unicode box-
                        drawing characters. markdown (md): Pretty-print as a
                        Markdown table. latex (tex): Pretty-print as a LaTeX
                        tabular (booktabs).
  -u, --output-delim ODELIM
                        Output delimiter. (default: ',')
  -C, --outputenc OUTPUTENC
                        Output encoding. (default: 'utf8')
  -H, --head N          Output only the first N rows.
  -T, --tail N          Output only the last N rows.
  -D, --dark-background
                        Use dark background colors for terminal output. This
                        option only has an effect when terminal pretty-print
                        is used.
  -w, --textwidth TEXTWIDTH
                        Maximum total width of the table in characters
                        (including borders and padding) for pretty output
                        (terminal, ascii, unicode, markdown). Content is
                        truncated with "…" when needed. (default: terminal
                        width when stdout is a tty, else no limit)
  -S, --keep-long-lines
                        Do not wrap or truncate long lines (mutually exclusive
                        with -w). Implies -L in terminal output mode except
                        when -N is specified.
  --infer-lines INFER_LINES
                        Number of rows used to infer column types and sizes
                        for pretty output (markdown, terminal, etc.). Use
                        "inf" for no limit (infer on all rows). (default:
                        1000)
  -L, --less            Pipe output through less (only when writing to stdout
                        on a tty). Automatically activated if the number of
                        lines exceed the terminal size (except when '--no-
                        less' or '-N' is used).
  -N, --no-less         Never output through less (only when writing to stdout
                        on a tty).

Processing options:
  -b, --before BEFORE   Run the following code before evaluate the expression
                        on each row. Can be specified multiple times. (e.g.
                        "import math").
  --np                  Shortcut to `--before "import numpy as np"`
  --sp                  Shortcut to `--np --before "import scipy as sp"`
  -p, --python-path PATH
                        Append path provided in argument to Python path.
                        Usefull in with -b. e.g. -p . -b "import myfunctions".
  -t, --type TYPE       Apply type conversion on specified command prior to
                        expression. The argument must be a column name
                        followed by a valid Python type. See "--before" to
                        define non standard type. e.g. "a_column:int" or
                        "a_column:float". This option can be specified
                        multiple time to type different columns.
  -r, --rowvar NAME     When set, row values are not exposed as local
                        variables in expressions; instead a single dictionary
                        mapping column names to values is exposed under this
                        name. Useful when column names contain spaces or
                        special characters. e.g. --rowvar row then use
                        row["col name"] in apply/aggregate/filter expressions.
  -a, --add, --add-filter FILTER_FORMULA
                        FORMULA must be a valid python expression, which is
                        casted to bool(). For the current row, columns values
                        are accessible as local variable (or via the dict
                        given by --rowvar NAME). e.g. "a_colname >
                        other_colname" or "a_colname=='fixedvalue'"; with
                        --rowvar row: "row['col name'] > row['other col']".
                        See "--type" for typing other columns and "--before"
                        for run code before evaluating filter expression. Can
                        be specified multiple time.

Examples:
  Filter csv file using two columns:
    csvspoon filter -a "lastname!=firstname" file.csv

  Column name with spaces (use --rowvar):
    csvspoon filter --rowvar r -a "r['unit price']>10" file.csv

  Chain filters on csv file:
    csvspoon filter \
            -a "lastname.startswith('Doe')" \
            -a "firstname.starswith('John')" \
            file.csv

  Filter csv file with float column price:
    csvspoon filter -t price:float -a "price>12.5" file.csv

  Filter csv file with complex expression:
    csvspoon filter \
            -b "import math" \
            -t x:float \
            -t y:float \
            -t z:float \
            -a "math.sqrt(x**2+y**2)>z" \
            file.csv

```
## `csvspoon join
```
usage: csvspoon join [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
                     [-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
                     [-H N | -T N] [-D] [-w TEXTWIDTH | -S]
                     [--infer-lines INFER_LINES] [-L | -N] [-l] [-r] [-e]
                     input [input ...]

Natural join of csv files.
Joins are performed from left to right.
Warning: this method need to store in memory all csv except the
    first which is streamed.

If neither --left or --right specified, inner join is realized. For
complete outer join, use --left and --right together.

positional arguments:
  input                 Input file specification. If less than two input files
                        are provided, stdin is used as first input file,
                        otherwise use explicitly "-" for stdin. Can be a
                        filename (e.g. "file.csv"), or a filename followed by
                        a semicolon and column names separated by commas (e.g.
                        "file.csv:a_colname,another_colname"). A column can be
                        renamed with "new_name=old_name" (e.g.
                        "file.csv:new_col=old_col"). When column names are
                        specified (and no exclusion is used), only those
                        columns are used, in the given order. Prefixing a
                        column with "-" excludes it. When at least one column
                        is excluded, all columns from the file are kept except
                        the excluded ones; columns that are also listed
                        (possibly with a rename) appear under their (new)
                        name. Excluded columns cannot be renamed (no "=" after
                        "-col").

options:
  -h, --help            show this help message and exit

Input options:
  -d, --delim DELIM     Input delimiter. (default: ',')
  -c, --inputenc INPUTENC
                        Input encoding. (default: 'utf8')

Output options:
  -o, --output OUTPUT   Output file, else output on stdout.
  -f, --format FORMAT   Apply a format on a column on output. The argument
                        must be a column name followed by a colon and a format
                        specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
                        This option can be specified multiple time to format
                        different columns.
  -O, --output-columns COLS
                        Output only the given columns, in that order. Same
                        format as after ':' in the input: column names,
                        optional renames (newname=oldname), optional
                        exclusions (-col). e.g. "a,b,label=id,-skip". These
                        transformations are applied after all other processing
                        (including the --format option). When excluding
                        columns (-col), use -O-col or --output-columns=-col
                        (no space, or use =) so that -col is not parsed as a
                        separate option.
  -F, --output-format OUTPUT_FORMAT
                        Output format. Choices: auto, csv, terminal, ascii,
                        unicode, markdown (or md), latex (or tex). auto
                        (default): CSV except when output on stdout and stdout
                        is a tty, then terminal. csv: Raw CSV using the output
                        delimiter (-u/--output-delim). terminal: Pretty-print
                        with Unicode box-drawing for a nice display. ascii:
                        Pretty-print as text table with ASCII characters only.
                        unicode: Pretty-print as text table with Unicode box-
                        drawing characters. markdown (md): Pretty-print as a
                        Markdown table. latex (tex): Pretty-print as a LaTeX
                        tabular (booktabs).
  -u, --output-delim ODELIM
                        Output delimiter. (default: ',')
  -C, --outputenc OUTPUTENC
                        Output encoding. (default: 'utf8')
  -H, --head N          Output only the first N rows.
  -T, --tail N          Output only the last N rows.
  -D, --dark-background
                        Use dark background colors for terminal output. This
                        option only has an effect when terminal pretty-print
                        is used.
  -w, --textwidth TEXTWIDTH
                        Maximum total width of the table in characters
                        (including borders and padding) for pretty output
                        (terminal, ascii, unicode, markdown). Content is
                        truncated with "…" when needed. (default: terminal
                        width when stdout is a tty, else no limit)
  -S, --keep-long-lines
                        Do not wrap or truncate long lines (mutually exclusive
                        with -w). Implies -L in terminal output mode except
                        when -N is specified.
  --infer-lines INFER_LINES
                        Number of rows used to infer column types and sizes
                        for pretty output (markdown, terminal, etc.). Use
                        "inf" for no limit (infer on all rows). (default:
                        1000)
  -L, --less            Pipe output through less (only when writing to stdout
                        on a tty). Automatically activated if the number of
                        lines exceed the terminal size (except when '--no-
                        less' or '-N' is used).
  -N, --no-less         Never output through less (only when writing to stdout
                        on a tty).

Processing options:
  -l, --left            Perform left join. If more than two files are
                        provided, each join in a left join. Can be used with
                        `-r` to obtain a outer join.
  -r, --right           Perform right join. If more than two files are
                        provided, each join in a right join. Can be used with
                        `-l` to obtain a outer join.
  -e, --empty           Indicate than empty field have to be considered as a
                        value.

Examples:
  Operate NATURAL JOIN on two csv files:
    csvspoon join file1.csv file2.csv

  Operate two NATURAL JOIN on three csv files:
    csvspoon join file1.csv file2.csv file3.csv

  Operate LEFT JOIN on two csv files
    csvspoon join -l file1.csv file2.csv

  Operate RIGHT JOIN on two csv files
    csvspoon join -r file1.csv file2.csv

  Operate OUTER JOIN on two csv files
    csvspoon join -lr file1.csv file2.csv

```
## `csvspoon sample
```
usage: csvspoon sample [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
                       [-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
                       [-H N | -T N] [-D] [-w TEXTWIDTH | -S]
                       [--infer-lines INFER_LINES] [-L | -N] [-W COL] [-r |
                       -R] [--seed SEED] [-k K]
                       [input]

Random sampling in streaming. Two modes: without replacement (default)
returns at most k distinct rows; with replacement (-r) returns exactly
k rows (each row may appear multiple times). This method stores only in
memory up to k rows.

positional arguments:
  input                 Input file specification. If no input file is
                        provided, stdin is used as input file. Can be a
                        filename (e.g. "file.csv"), or a filename followed by
                        a semicolon and column names separated by commas (e.g.
                        "file.csv:a_colname,another_colname"). A column can be
                        renamed with "new_name=old_name" (e.g.
                        "file.csv:new_col=old_col"). When column names are
                        specified (and no exclusion is used), only those
                        columns are used, in the given order. Prefixing a
                        column with "-" excludes it. When at least one column
                        is excluded, all columns from the file are kept except
                        the excluded ones; columns that are also listed
                        (possibly with a rename) appear under their (new)
                        name. Excluded columns cannot be renamed (no "=" after
                        "-col").

options:
  -h, --help            show this help message and exit

Input options:
  -d, --delim DELIM     Input delimiter. (default: ',')
  -c, --inputenc INPUTENC
                        Input encoding. (default: 'utf8')

Output options:
  -o, --output OUTPUT   Output file, else output on stdout.
  -f, --format FORMAT   Apply a format on a column on output. The argument
                        must be a column name followed by a colon and a format
                        specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
                        This option can be specified multiple time to format
                        different columns.
  -O, --output-columns COLS
                        Output only the given columns, in that order. Same
                        format as after ':' in the input: column names,
                        optional renames (newname=oldname), optional
                        exclusions (-col). e.g. "a,b,label=id,-skip". These
                        transformations are applied after all other processing
                        (including the --format option). When excluding
                        columns (-col), use -O-col or --output-columns=-col
                        (no space, or use =) so that -col is not parsed as a
                        separate option.
  -F, --output-format OUTPUT_FORMAT
                        Output format. Choices: auto, csv, terminal, ascii,
                        unicode, markdown (or md), latex (or tex). auto
                        (default): CSV except when output on stdout and stdout
                        is a tty, then terminal. csv: Raw CSV using the output
                        delimiter (-u/--output-delim). terminal: Pretty-print
                        with Unicode box-drawing for a nice display. ascii:
                        Pretty-print as text table with ASCII characters only.
                        unicode: Pretty-print as text table with Unicode box-
                        drawing characters. markdown (md): Pretty-print as a
                        Markdown table. latex (tex): Pretty-print as a LaTeX
                        tabular (booktabs).
  -u, --output-delim ODELIM
                        Output delimiter. (default: ',')
  -C, --outputenc OUTPUTENC
                        Output encoding. (default: 'utf8')
  -H, --head N          Output only the first N rows.
  -T, --tail N          Output only the last N rows.
  -D, --dark-background
                        Use dark background colors for terminal output. This
                        option only has an effect when terminal pretty-print
                        is used.
  -w, --textwidth TEXTWIDTH
                        Maximum total width of the table in characters
                        (including borders and padding) for pretty output
                        (terminal, ascii, unicode, markdown). Content is
                        truncated with "…" when needed. (default: terminal
                        width when stdout is a tty, else no limit)
  -S, --keep-long-lines
                        Do not wrap or truncate long lines (mutually exclusive
                        with -w). Implies -L in terminal output mode except
                        when -N is specified.
  --infer-lines INFER_LINES
                        Number of rows used to infer column types and sizes
                        for pretty output (markdown, terminal, etc.). Use
                        "inf" for no limit (infer on all rows). (default:
                        1000)
  -L, --less            Pipe output through less (only when writing to stdout
                        on a tty). Automatically activated if the number of
                        lines exceed the terminal size (except when '--no-
                        less' or '-N' is used).
  -N, --no-less         Never output through less (only when writing to stdout
                        on a tty).

Processing options:
  -W, --weight COL      Column name used as weight. Non-numeric or negative
                        values are treated as zero. If not set, sampling is
                        uniform.
  -r, --with-replacement
                        Sample with replacement: output exactly k rows (rows
                        may repeat).
  -R, --revert          Return the complement of sampling: the k lines left
                        after selecting all others except k. Incompatible with
                        -r (with replacement).
  --seed SEED           Random seed for reproducibility (any string of
                        characters).
  -k K                  Number of rows to return. Without replacement: at most
                        k rows; if the input has fewer than k lines, all are
                        selected. With replacement (-r): exactly k rows.
                        (default: 1)

Examples:
  Sample 10 rows without replacement:
    csvspoon sample -k 10 file.csv

  Weighted sample of 10 rows using column weight:
    csvspoon sample -W weight -k 10 file.csv

  Sample 30 rows with replacement:
    csvspoon sample -r -k 30 file.csv

```
## `csvspoon sort
```
usage: csvspoon sort [-h] [-d DELIM] [-c INPUTENC] [-o OUTPUT] [-f FORMAT]
                     [-O COLS] [-F OUTPUT_FORMAT] [-u ODELIM] [-C OUTPUTENC]
                     [-H N | -T N] [-D] [-w TEXTWIDTH | -S]
                     [--infer-lines INFER_LINES] [-L | -N] [-k KEYS] [-n] [-r]
                     [-R] [--seed SEED]
                     [input]

Sort csv file.
Warning: this method need to store in memory all the input csv file.

positional arguments:
  input                 Input file specification. If no input file is
                        provided, stdin is used as input file. Can be a
                        filename (e.g. "file.csv"), or a filename followed by
                        a semicolon and column names separated by commas (e.g.
                        "file.csv:a_colname,another_colname"). A column can be
                        renamed with "new_name=old_name" (e.g.
                        "file.csv:new_col=old_col"). When column names are
                        specified (and no exclusion is used), only those
                        columns are used, in the given order. Prefixing a
                        column with "-" excludes it. When at least one column
                        is excluded, all columns from the file are kept except
                        the excluded ones; columns that are also listed
                        (possibly with a rename) appear under their (new)
                        name. Excluded columns cannot be renamed (no "=" after
                        "-col").

options:
  -h, --help            show this help message and exit

Input options:
  -d, --delim DELIM     Input delimiter. (default: ',')
  -c, --inputenc INPUTENC
                        Input encoding. (default: 'utf8')

Output options:
  -o, --output OUTPUT   Output file, else output on stdout.
  -f, --format FORMAT   Apply a format on a column on output. The argument
                        must be a column name followed by a colon and a format
                        specifier. e.g. "a_colname:5d" or "a_colname:+07.2f".
                        This option can be specified multiple time to format
                        different columns.
  -O, --output-columns COLS
                        Output only the given columns, in that order. Same
                        format as after ':' in the input: column names,
                        optional renames (newname=oldname), optional
                        exclusions (-col). e.g. "a,b,label=id,-skip". These
                        transformations are applied after all other processing
                        (including the --format option). When excluding
                        columns (-col), use -O-col or --output-columns=-col
                        (no space, or use =) so that -col is not parsed as a
                        separate option.
  -F, --output-format OUTPUT_FORMAT
                        Output format. Choices: auto, csv, terminal, ascii,
                        unicode, markdown (or md), latex (or tex). auto
                        (default): CSV except when output on stdout and stdout
                        is a tty, then terminal. csv: Raw CSV using the output
                        delimiter (-u/--output-delim). terminal: Pretty-print
                        with Unicode box-drawing for a nice display. ascii:
                        Pretty-print as text table with ASCII characters only.
                        unicode: Pretty-print as text table with Unicode box-
                        drawing characters. markdown (md): Pretty-print as a
                        Markdown table. latex (tex): Pretty-print as a LaTeX
                        tabular (booktabs).
  -u, --output-delim ODELIM
                        Output delimiter. (default: ',')
  -C, --outputenc OUTPUTENC
                        Output encoding. (default: 'utf8')
  -H, --head N          Output only the first N rows.
  -T, --tail N          Output only the last N rows.
  -D, --dark-background
                        Use dark background colors for terminal output. This
                        option only has an effect when terminal pretty-print
                        is used.
  -w, --textwidth TEXTWIDTH
                        Maximum total width of the table in characters
                        (including borders and padding) for pretty output
                        (terminal, ascii, unicode, markdown). Content is
                        truncated with "…" when needed. (default: terminal
                        width when stdout is a tty, else no limit)
  -S, --keep-long-lines
                        Do not wrap or truncate long lines (mutually exclusive
                        with -w). Implies -L in terminal output mode except
                        when -N is specified.
  --infer-lines INFER_LINES
                        Number of rows used to infer column types and sizes
                        for pretty output (markdown, terminal, etc.). Use
                        "inf" for no limit (infer on all rows). (default:
                        1000)
  -L, --less            Pipe output through less (only when writing to stdout
                        on a tty). Automatically activated if the number of
                        lines exceed the terminal size (except when '--no-
                        less' or '-N' is used).
  -N, --no-less         Never output through less (only when writing to stdout
                        on a tty).

Processing options:
  -k, --key KEYS        Column used for sorting. Can be specified multiple
                        time.
  -n, --numeric-sort    Compare according to numerical value.
  -r, --reverse         Reverse the result of comparisons.
  -R, --random-sort     Shuffle. If key specified, shuffle is performed inside
                        lines with the same key.
  --seed SEED           Random seed for reproducibility (any string of
                        characters).

Examples:
  Sort csv file using column cola:
    csvspoon sort -k cola file.csv

  Sort csv file using columns cola and colb:
    csvspoon sort -k cola -k colb file.csv

  Sort csv file using numerical mode on column numcol:
    csvspoon sort -n -k numcol file.csv

  Shuffle csv file:
    csvspoon sort -R file.csv

```
