Metadata-Version: 2.2
Name: NeEDS4BigDataPy
Version: 1.0.1
Summary: Python implementation of subsampling methods for big data under GLMs from NeEDS4BigData.
Author: Amalan Mahendran
License: MIT
Project-URL: Homepage, https://amalan-constat.github.io/NeEDS4BigData/index.html
Project-URL: Repository, https://github.com/Amalan-ConStat/NeEDS4BigDataPy
Project-URL: Issues, https://github.com/Amalan-ConStat/NeEDS4BigDataPy/issues
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: statsmodels
Requires-Dist: pygam
Requires-Dist: matplotlib
Dynamic: requires-python

# NeEDS4BigDataPy

<!-- badges: start -->

[![Project Status: Active - The project has reached a stable, usable
state and is being actively
developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![GitHub
issues](https://img.shields.io/github/issues/Amalan-ConStat/NeEDS4BigDataPy.svg?style=popout)](https://github.com/Amalan-ConStat/NeEDS4BigDataPy/issues)

[![MIT
license](https://img.shields.io/badge/License-MIT-blue.svg)](https://lbesson.mit-license.org/)
[![](https://img.shields.io/badge/doi-10.1007/s00362--023--01446--9-green.svg)](https://doi.org/10.1007/s00362-023-01446-9)
<!-- badges: end -->

*The python library “NeEDS4BigDataPy” provides approaches to implement
subsampling methods to analyse big data and is the python version of NeEDS4BigData.*

### What is “NeEDS4BigData” an abbreviation for?

*Ne*w *E*xperimental *D*esign based *S*ubsampling methods *for Big Data*.

### How to engage with `NeEDS4BigDataPy` for the first time?

```python
# Installing from PyPI
pip install NeEDS4BigDataPy
```

```python
# Importing the package
import NeEDS4BigDataPy
```

### Subsampling Methods

1.  A- and L-optimality based subsampling for GLMs.
2.  A-optimality based subsampling for Gaussian Linear Models.
3.  Leverage sampling for GLMs.
4.  Local case control sampling for logistic regression.
5.  A-optimality based subsampling under measurement constraints for
    GLMs.
6.  Model robust subsampling method for GLMs.
7.  Subsampling method for GLMs when the model is potentially
    misspecified.

These seven methods are described in the following articles under the
topics

1.  Introduction - explains the need for subsampling methods.
2.  Model based subsampling
3.  Model robust and misspecification
4.  Benchmarking Functions

For $2)$ we assume the main effects model can describe the data. While
for $3)$ first we consider there are several models that can describe
the big data, then later we assume the given main effects model is
misspecified. Under these conditions from $2)$ and $3)$ we explore
subsampling for four given big data sets. Further, to explore the
computation time we ran simulations for the scenarios $2)$ and $3)$
where we compare our subsampling functions against full data modelling
in $4)$.

#### Thank You
