Skip to content

Snailz

snail logo

snailz is a synthetic data generator that models a study of snails in the Pacific Northwest which are growing to unusual size as a result of exposure to pollution. The package generates fully-reproducible datasets of varying sizes and with varying statistical properties, and is intended for classroom use. For example, an instructor can give each learner a unique dataset to analyze, while learners can test their analysis pipelines using datasets they generate themselves.

The Story

Years ago, logging companies dumped toxic waste in a remote region of Vancouver Island. As the containers leaked and the pollution spread, some snails in the region began growing unusually large. Your team is now collecting and analyzing specimens from affected regions to determine if exposure to pollution is responsible.

Usage:

usage: snailz [-h]
              [--defaults]
          [--outdir OUTDIR]
              [--override OVERRIDE [OVERRIDE ...]]
          [--params PARAMS]
              [--profile]

options:
  -h, --help            show this help message and exit
  --defaults            show default parameters as JSON
  --outdir OUTDIR       output directory
  --override OVERRIDE [OVERRIDE ...]
                        name=value parameters to override defaults
  --params PARAMS       specify JSON parameter file
  --profile             enable profiling

Schema

snailz schema

Colophon

snailz was inspired by the Palmer Penguins dataset and by conversations with Rohan Alexander about his book Telling Stories with Data.

My thanks to everyone who built the tools this project relies on, including:

The snail logo was created by sunar.ko.

Acknowledgments

  • Greg Wilson is a programmer, author, and educator based in Toronto. He was the co-founder and first Executive Director of Software Carpentry and received ACM SIGSOFT's Influential Educator Award in 2020.