Loading a csv file#

We load a tab separated data file using the load_table() function. The format is inferred from the filename suffix and you will note, in this case, it’s not actually a csv file.

from cogent3 import load_table

table = load_table("data/stats.tsv")
table
LocusRegionRatio
NP_003077Con2.5386
NP_004893Con121351.4264
NP_005079Con9516594.9789
NP_005500NonCon0.0000
NP_055852NonCon10933217.7090

5 rows x 3 columns

Note

The known filename suffixes for reading are .csv, .tsv and .pkl or .pickle (Python’s pickle format).

Note

If you invoke the static column types argument, i.e.``load_table(…, static_column_types=True)`` and the column data are not static, those columns will be left as a string type.

Loading from a url#

The cogent3 load functions support loading from a url. We load the above .tsv file directly from GitHub.

from cogent3 import load_table

table = load_table("https://raw.githubusercontent.com/cogent3/cogent3/develop/doc/data/stats.tsv")

Loading delimited specifying the format#

Although unnecessary in this case, it’s possible to override the suffix by specifying the delimiter using the sep argument.

from cogent3 import load_table

table = load_table("data/stats.tsv", sep="\t")
table
LocusRegionRatio
NP_003077Con2.5386
NP_004893Con121351.4264
NP_005079Con9516594.9789
NP_005500NonCon0.0000
NP_055852NonCon10933217.7090

5 rows x 3 columns

Loading delimited data without a header line#

To create a table from the follow examples, you specify your header and use make_table().

Using load_delimited()#

This is just a standard parsing function which does not do any filtering or converting elements to non-string types.

from cogent3.parse.table import load_delimited

header, rows, title, legend = load_delimited("data/CerebellumDukeDNaseSeq.pk", header=False, sep="\t")
rows[:4]
[['chr1',
  '29214',
  '29566',
  'chr1.1',
  '626',
  '.',
  '0.0724',
  '3.9',
  '-1',
  '159'],
 ['chr1',
  '89933',
  '90118',
  'chr1.2',
  '511',
  '.',
  '0.0313',
  '1.59',
  '-1',
  '94'],
 ['chr1',
  '545979',
  '546193',
  'chr1.3',
  '543',
  '.',
  '0.0428',
  '2.23',
  '-1',
  '100'],
 ['chr1',
  '713797',
  '714639',
  'chr1.4',
  '1000',
  '.',
  '0.3215',
  '16.0',
  '-1',
  '380']]

Using FilteringParser#

from cogent3.parse.table import FilteringParser

reader = FilteringParser(with_header=False, sep="\t")
rows = list(reader("data/CerebellumDukeDNaseSeq.pk"))
rows[:4]
[['chr1',
  '29214',
  '29566',
  'chr1.1',
  '626',
  '.',
  '0.0724',
  '3.9',
  '-1',
  '159'],
 ['chr1',
  '89933',
  '90118',
  'chr1.2',
  '511',
  '.',
  '0.0313',
  '1.59',
  '-1',
  '94'],
 ['chr1',
  '545979',
  '546193',
  'chr1.3',
  '543',
  '.',
  '0.0428',
  '2.23',
  '-1',
  '100'],
 ['chr1',
  '713797',
  '714639',
  'chr1.4',
  '1000',
  '.',
  '0.3215',
  '16.0',
  '-1',
  '380']]

Selectively loading parts of a big file#

Loading a set number of lines from a file#

The limit argument specifies the number of lines to read.

from cogent3 import load_table

table = load_table("data/stats.tsv", limit=2)
table
LocusRegionRatio
NP_003077Con2.5386
NP_004893Con121351.4264

2 rows x 3 columns

Loading only some rows#

If you only want a subset of the contents of a file, use the FilteringParser. This allows skipping certain lines by using a callback function. We illustrate this with stats.tsv, skipping any rows with "Ratio" > 10.

from cogent3.parse.table import FilteringParser

reader = FilteringParser(
    lambda line: float(line[2]) <= 10, with_header=True, sep="\t"
)
table = load_table("data/stats.tsv", reader=reader, digits=1)
table
LocusRegionRatio
NP_003077Con2.5
NP_005500NonCon0.0

2 rows x 3 columns

You can also negate a condition, which is useful if the condition is complex. In this example, it means keep the rows for which Ratio > 10.

reader = FilteringParser(
    lambda line: float(line[2]) <= 10, with_header=True, sep="\t", negate=True
)
table = load_table("data/stats.tsv", reader=reader, digits=1)
table
LocusRegionRatio
NP_004893Con121351.4
NP_005079Con9516595.0
NP_055852NonCon10933217.7

3 rows x 3 columns

Loading only some columns#

Specify the columns by their names.

from cogent3.parse.table import FilteringParser

reader = FilteringParser(columns=["Locus", "Ratio"], with_header=True, sep="\t")
table = load_table("data/stats.tsv", reader=reader)
table
LocusRatio
NP_0030772.5386
NP_004893121351.4264
NP_0050799516594.9789
NP_0055000.0000
NP_05585210933217.7090

5 rows x 2 columns

Or, by their index.

from cogent3.parse.table import FilteringParser

reader = FilteringParser(columns=[0, -1], with_header=True, sep="\t")
table = load_table("data/stats.tsv", reader=reader)
table
LocusRatio
NP_0030772.5386
NP_004893121351.4264
NP_0050799516594.9789
NP_0055000.0000
NP_05585210933217.7090

5 rows x 2 columns

Note

The negate argument does not affect the columns evaluated.

Load raw data as a list of lists of strings#

We just use FilteringParser.

from cogent3.parse.table import FilteringParser

reader = FilteringParser(with_header=True, sep="\t")
data = list(reader("data/stats.tsv"))

We just display the first two lines.

data[:2]
[['Locus', 'Region', 'Ratio'], ['NP_003077', 'Con', '2.5386013224378985']]

Note

The individual elements are all str.

Make a table from header and rows#

from cogent3 import make_table

header = ["A", "B", "C"]
rows = [range(3), range(3, 6), range(6, 9), range(9, 12)]
table = make_table(header=["A", "B", "C"], data=rows)
table
ABC
012
345
678
91011

4 rows x 3 columns

Make a table from a dict#

For a dict with key’s as column headers.

from cogent3 import make_table

data = dict(A=[0, 3, 6], B=[1, 4, 7], C=[2, 5, 8])
table = make_table(data=data)
table
ABC
012
345
678

3 rows x 3 columns

Specify the column order when creating from a dict.#

table = make_table(header=["C", "A", "B"], data=data)
table
CAB
201
534
867

3 rows x 3 columns

Create the table with an index#

A Table can be indexed like a dict if you designate a column as the index (and that column has a unique value for every row).

table = load_table("data/stats.tsv", index_name="Locus")
table["NP_055852"]
LocusRegionRatio
NP_055852NonCon10933217.7090

1 rows x 3 columns

table["NP_055852", "Region"]
'NonCon'

Note

The index_name argument also applies when using make_table().

Create a table from a pandas.DataFrame#

from pandas import DataFrame

from cogent3 import make_table

data = dict(a=[0, 3], b=["a", "c"])
df = DataFrame(data=data)
table = make_table(data_frame=df)
table
ab
0a
3c

2 rows x 2 columns

Create a table from header and rows#

from cogent3 import make_table

table = make_table(header=["a", "b"], data=[[0, "a"], [3, "c"]])
table
ab
0a
3c

2 rows x 2 columns

Create a table from dict#

make_table() is the utility function for creating Table objects from standard python objects.

from cogent3 import make_table

data = dict(a=[0, 3], b=["a", "c"])
table = make_table(data=data)
table
ab
0a
3c

2 rows x 2 columns

Create a table from a 2D dict#

from cogent3 import make_table

d2D = {
    "edge.parent": {
        "NineBande": "root",
        "edge.1": "root",
        "DogFaced": "root",
        "Human": "edge.0",
    },
    "x": {
        "NineBande": 1.0,
        "edge.1": 1.0,
        "DogFaced": 1.0,
        "Human": 1.0,
    },
    "length": {
        "NineBande": 4.0,
        "edge.1": 4.0,
        "DogFaced": 4.0,
        "Human": 4.0,
    },
}
table = make_table(
    data=d2D,
)
table
edge.parentxlength
root1.00004.0000
root1.00004.0000
root1.00004.0000
edge.01.00004.0000

4 rows x 3 columns

Create a table that has complex python objects as elements#

from cogent3 import make_table

table = make_table(
    header=["abcd", "data"],
    data=[[range(1, 6), "0"], ["x", 5.0], ["y", None]],
    missing_data="*",
    digits=1,
)
table
abcddata
range(1, 6)0
x5.0
yNone

3 rows x 2 columns

Create an empty table#

from cogent3 import make_table

table = make_table()
table

0 rows x 0 columns