# Gene Expression — Test Project Configuration
#
# Data size: 80 rows x 202 columns (synthetic)
# Download: python dev/test-datasets/download.py --dataset gene-expression
#
# Project name:
gene-expression-analysis

# Data path:
/home/mrichardson/Projects/Urika/dev/test-datasets/gene-expression/data

# Description:
Gene expression microarray data from tumour and normal tissue samples. 80 tissue samples
(40 tumour, 40 normal) profiled across 200 genes using microarray technology. Expression
values are log2-normalised intensities. The goal is to identify which genes are differentially
expressed between tumour and normal tissue, and whether a classifier can reliably distinguish
sample types. Key challenges include the high dimensionality (200 features, 80 samples) and
potential for overfitting.

# Research question:
Which genes are differentially expressed between tumour and normal tissue, and can a
classifier reliably predict tissue type from expression profiles?

# Mode:
exploratory

# Web search:
no

# Venv:
no

# Knowledge suggestions:
Add the data-description.md from dev/test-datasets/gene-expression/knowledge/. Optionally add
references on differential expression analysis, multiple testing correction, or regularised
classification methods for high-dimensional data.
