Metadata-Version: 2.2
Name: fandango-learn
Version: 0.0.2
Author-email: Martin Eberlein <martin.eberlein@hu-berlin.de>
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: pandas>=2.2.3
Requires-Dist: shap>=0.45
Requires-Dist: lightgbm>=4.3.0
Requires-Dist: numpy>=1.23.2
Requires-Dist: scikit-learn>=1.1.2
Provides-Extra: test
Requires-Dist: pytest>=7.2.2; extra == "test"
Requires-Dist: pytest-cov>=4.1.0; extra == "test"
Requires-Dist: pytest-html>=3.2.0; extra == "test"
Requires-Dist: pytest-rerunfailures>=11.1.2; extra == "test"
Requires-Dist: parameterized>=0.8.1; extra == "test"

# fandango-learn

This repository contains the code for the Fandango Learn project.
The goal is to automatically learn _fandango_ constraints form a set of inputs.

Overall, the learner will be integrated into **Avicenna**, which provides the necessary infrastructure of the hypothesis refinement loop.
Furthermore, **Avicenna** provides the means to automatically learn the set of relevant non-terminals, reducing the search space for the learner.

### Table of Contents

- [Quick Start (Prototype)](#quick-start-prototype)
- [Installation](#install-development-testing)
- [Notebooks](#notebooks)
- [Reusability](#reusability)
- [Benchmark-Results](#benchmark)

---

## Quick Start (Prototype)

### Introduction to FandangoLearner

This notebook demonstrates how to use **FandangoLearner**, a pattern based approach that automatically learns constraints to explain why a program fails.

The core idea of FandangoLearner is to identify patterns in inputs 
that lead to program errors or unexpected behaviors. Using these patterns, 
it generates constraints in the Fandango language to help developers 
understand input-related bugs.

### Step 1: Define the Grammar
We start by defining the grammar for our input language.
This example focuses on arithmetic expressions using trigonometric and square root functions.

```python
from fdlearn.interface.fandango import parse_contents

grammar = """
<start> ::= <arithexp>;
<arithexp> ::= <function>"("<number>")";
<function> ::= "sqrt" | "cos" | "sin" | "tan";
<number> ::= <maybeminus><onenine><maybedigits> | "0";
<maybeminus> ::= "-" | "";
<onenine> ::= "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9";
<maybedigits> ::= <digit>*;
<digit>::=  "0" | <onenine>;
"""

grammar, _ = parse_contents(grammar)
```

### Step 2: Provide Initial Inputs

We supply a set of example inputs along with their expected outcomes (`True` for failure, `False` otherwise).

```python
initial_inputs = {
    ("sqrt(-900)", True),  # This input causes a failure.
    ("sqrt(-10)", True),   # Another failure case.
    ("sqrt(0)", False),    # This input works correctly.
    ("sin(-900)", False),  # Works correctly.
    ("sqrt(2)", False),    # Works correctly.
    ("cos(10)", False),    # Works correctly.
}
```

Convert inputs to FandangoInput objects

```python
from fdlearn.learner import FandangoInput

initial_inputs = {
  FandangoInput.from_str(grammar, inp, oracle)
  for inp, oracle in initial_inputs
}
```

### Step 3: Select Relevant Non-Terminals (Optional)

We specify the non-terminals in the grammar that are likely related to the program's failure behavior.
This step is optional but can help focus the learning process on specific parts of the grammar.
Later, we will see that **Avicenna** can automatically learn relevant non-terminals.

```python
from fdlearn.learner import NonTerminal

relevant_non_terminals = {
  NonTerminal("<number>"),
  NonTerminal("<maybeminus>"),
  NonTerminal("<function>"),
}
```

## Step 4: Learn Constraints

Using the `FandangoLearner`, we learn constraints that explain why certain inputs fail.

```python
from fdlearn.learner import FandangoLearner

learner = FandangoLearner(grammar)

learned_constraints = learner.learn_constraints(
  initial_inputs,
  relevant_non_terminals=relevant_non_terminals
)
```

## Step 5: Analyze Results

Finally, we analyze the constraints generated by FandangoLearner to understand the root cause of failures.

```python
print("Learned Constraints:")
for constraint in learner.get_best_candidates():
    print(constraint)
```

The output will show the constraints that best explain the failures in the initial inputs.

```plaintext
(int(<number>) <= -10 and str(<function>) == 'sqrt'), 
Precision: 1.0, Recall: 1.0 (based on 2 failing and 4 passing inputs)
```

We can see that the constraint `(int(<number>) <= -10 and str(<function>) == 'sqrt')` explains why the inputs `sqrt(-900)` and `sqrt(-10)` fail.
However, this constraint is too specific and does not generalize well to other inputs.
Thus, we need a feedback loop that automatically refines these constraints to generate general constraints that captures the essence of the failure.
We will use **Avicenna** to provide this feedback loop.

---

## Install, Development, Testing

### Installation

To use **FandangoLearn**, you need to install both **FandangoFuzzer** (from its `learner` branch) and **FandangoLearn**. We recommend using a dedicated virtual environment:

```bash
python3.12 -m venv venv
source venv/bin/activate
pip install --upgrade pip
```

Now, install **FandangoFuzzer** from the `learner` branch followed by **FandangoLearn**:

```bash
pip install git+https://github.com/fandango-fuzzer/fandango.git@learner
pip install fandangolearn
```

### Development

If you plan to extend or evaluate **FandangoLearn** locally, clone this repository:

```bash
git clone https://github.com/fandango-fuzzer/fandango-learn.git
cd fandango-learn
```

Create and activate a virtual environment (recommended):

```bash
python3.12 -m venv venv
source venv/bin/activate

pip install --upgrade pip
pip install -r requirements.txt
pip install -e .
```

### Testing

To install the test dependencies and run the tests:

```bash
make install-tests
make tests
```

That’s it! You’re now set up to use, develop, and test **FandangoLearn**. If you encounter any issues or have suggestions, feel free to open an issue or submit a pull request.

## Notebooks

**Work in progress.**

This repository contains jupyter notebooks that demonstrate how to use the FandangoLearner to learn constraints from a set of inputs.
Furthermore, the notebooks show how FandangoLearner works and how it can be, for instance, integrated into Avicenna.

Current notebooks:

- [Introduction to FandangoLearner](./doc/01_fandango-learner.ipynb): Demonstrates how to use the FandangoLearner to learn constraints from a set of inputs.
- [Automated Refinement](./doc/02_refinement.ipynb): Shows how we can automatically refine constraints learned by FandangoLearner.
- [Adding Patterns](./doc/03_patterns.ipynb): Demonstrates how to add new patterns to the FandangoLearner to improve the learning process.

More notebooks will be added soon.

---

# Development Notes

## Reusability

The code should be reusable for other projects, such as Avicenna.
Therefore, the code uses many abstract classes that are already implemented in Avicenna and AvicennaISLearn.
This makes comparing both approaches FandangoLearn and ISLearn extremely easy.


# Benchmark

**Date:** 2025-01-27 10:45:40  
**Git Branch:** `dev`  
**Last Commit:** `76d77e252f2dcc848e705e62ab7fc56485b008f6`

| **Subject**      | **Total** | **Correct** | **Percentage** | **#Seeds** | **Mean Precision** | **P-StdDev** | **Mean Recall** | **R-StdDev** | **Time (s)** |
|-------------------|-----------|-------------|----------------|------------|--------------------|--------------|-----------------|--------------|-------------|
| Calculator        | 1         | 1           | 100.00         | 5          | 1.0000            | 0.0000       | 0.9174          | 0.0000       | 0.0326      |
| CalculatorRE      | 1         | 1           | 100.00         | 5          | 1.0000            | 0.0000       | 1.0000          | 0.0000       | 3.2854      |
| Heartbleed        | 4         | 2           | 50.00          | 5          | 1.0000            | 0.0000       | 1.0000          | 0.0000       | 0.0121      |
| HeartbleedRE      | 2         | 2           | 100.00         | 5          | 1.0000            | 0.0000       | 1.0000          | 0.0000       | 4.1509      |
| Middle            | 84        | 1           | 1.19           | 5          | 1.0000            | 0.0000       | 1.0000          | 0.0000       | 0.0395      |
| MiddleRE          | 1         | 1           | 55.56          | 5          | 1.0000            | 0.0000       | 1.0000          | 0.0000       | 32.1872     |
| Pysnooper2        | 2         | 1           | 50.00          | 5          | 1.0000            | 0.0000       | 1.0000          | 0.0000       | 0.0492      |
| Pysnooper3        | 2         | 2           | 100.00         | 5          | 1.0000            | 0.0000       | 1.0000          | 0.0000       | 0.0592      |


## First Ideas

- Use Pattern Based Approach
   - Will likely lead to combinatorial explosion
   - Requires similar ideas as scatched in the Avicenna paper, i.e. reduce the number of relevant non-terminals
- Build atomic constraints
   - Use the atomic constraints to build more complex constraints
   - Atomic constraints will be combined to more complex constraints with conjunctions and disjunctions.
- Implement different filter mechanisms 
   - Allow to rank constraints based on different criteria like precision, recall, etc.

new idea:
- exists and forall constraints can be quite similar to each other, when that use bounded nonterminals that are applied only once.
   - we might want to use forall constraints only when they are really needed, i.e. when the nonterminal is used multiple times.



## TODOs

- Only add new constraints if they are better than the best one so far.
- InputReducer
- AttributeSearches
- Better Negation
- CLI
- Documentation
- Extend Eval
