Metadata-Version: 2.4
Name: libinspector
Version: 1.0.19
Summary: Library for core functionalities of IoT Inspector. This captures packets and stores it in a database for real-time network traffic analysis.
Author-email: Danny Huang <dhuang@nyu.edu>, Andrew Quijano <andrew.quijano@nyu.edu>
License-Expression: Apache-2.0
Project-URL: Homepage, https://inspector.engineering.nyu.edu/
Project-URL: Source, https://github.com/nyu-mlab/inspector-core-library
Project-URL: Tracker, https://github.com/nyu-mlab/inspector-core-library/issues
Project-URL: Documentation, https://github.com/nyu-mlab/iot-inspector-client/wiki
Project-URL: Download, https://github.com/nyu-mlab/inspector-core-library/releases/latest
Keywords: iot-inspector,network-traffic-analysis,network-monitoring,iot-security
Classifier: Topic :: System :: Networking :: Monitoring
Classifier: Development Status :: 5 - Production/Stable
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: netaddr==1.3.0
Requires-Dist: psutil==7.2.2
Requires-Dist: scapy==2.7.0
Requires-Dist: requests==2.32.5
Requires-Dist: zeroconf==0.148.0
Requires-Dist: geoip2==5.2.0
Dynamic: license-file

# inspector-core-library
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)
[![libinspector_test](https://github.com/nyu-mlab/inspector-core-library/actions/workflows/libinspector_test.yml/badge.svg)](https://github.com/nyu-mlab/inspector-core-library/actions/workflows/libinspector_test.yml)
[![codecov](https://codecov.io/gh/nyu-mlab/inspector-core-library/graph/badge.svg?token=TBYF5KVBT7)](https://codecov.io/gh/nyu-mlab/inspector-core-library)

Library for core functionalities of IoT Inspector

## Installation

To install the `libinspector` module via pip, use the following command:

```sh
pip install libinspector
```


## Usage

### Running the Inspector

For debugging purposes, you can also set the following environment variables to control the behavior of the Inspector Core:

| Variable           | Description                                                                                          | Default |
|:-------------------|:-----------------------------------------------------------------------------------------------------|:--------|
| `USE_IN_MEMORY_DB` | Set to `false` to use a physical `.db` file on disk. Useful for debugging the core library/database. | `true`  |
| `SCAN_ALL_DEVICES` | Set to `true` to ARP-spoof all devices on the network BY DEFAULT. Disabled by default.               | `false` |
| `ARP_SPOOF_ROUTER` | Set to `false` to NOT ARP-spoof the router.                                                          | `true`  |
| `ARP_SPOOF_DEVICE` | Set to `false` to NOT ARP-spoof the device.                                                          | `true`  |

To run the Inspector, you need to activate the virtual environment first and then run the following command (You need to pass environment variables here too):

```sh
sudo USE_IN_MEMORY_DB=false SCAN_ALL_DEVICES=true PYTHONPATH=~/.local/lib/python3.11/site-packages python3 -m libinspector.core
```

#### How to set environment variables (Linux/macOS):
```bash
export USE_IN_MEMORY_DB=false
export SCAN_ALL_DEVICES=true
export ARP_SPOOF_ROUTER=false
export ARP_SPOOF_DEVICE=false
```

#### How to set environment variables (Windows):
```powershell
$env:USE_IN_MEMORY_DB = "false"
$env:SCAN_ALL_DEVICES = "true"
$env:ARP_SPOOF_ROUTER = "false"
$env:ARP_SPOOF_DEVICE = "false"
```

### Embedding in Your Own Python Application

The preferred way to use `libinspector` is to embed it within your own Python application. You can do this by importing `libinspector.core` and calling the `start_threads()` method, which returns almost instantaneously. Your Python script will then need to read the in-memory SQLite database for information about the devices and the network traffic flows.

```python
import time
import libinspector.core
import libinspector.global_state

# This method returns almost instantaneously
libinspector.core.start_threads()

# Make sure to sleep and/or do other work here, such as analyzing the in-memory SQLite database. For example, you can keep printing the device list from the `devices` table.
db_conn, rwlock = libinspector.global_state.db_conn_and_lock

while True:
    with rwlock:
        for device in db_conn.execute('SELECT mac_address, ip_address FROM devices').fetchall():
            print(f'MAC: {device["mac_address"]}, IP: {device["ip_address"]}')
    time.sleep(5)

```

If you want to add additional packet parsing capabilities, you can specific a custom callback when you start Inspector. Here's an example that prints out the summary of each captured packet:

```python
import libinspector
libinspector.core.start_threads(
  custom_packet_callback_func=lambda pkt: print(f'Packet captured: {pkt.summary()}')
)

```

### Data Schema

The data schema is defined in `mem_db.py` and includes the following tables:

- `devices`: Stores information about devices on the network.
  - `mac_address` (TEXT, PRIMARY KEY): The MAC address of the device.
  - `ip_address` (TEXT, NOT NULL): The IP address assigned to the device.
  - `is_inspected` (INTEGER, DEFAULT 0): Indicates whether the device is being inspected (1) or not (0).
  - `is_gateway` (INTEGER, DEFAULT 0): Indicates whether the device is a gateway (1) or not (0).
  - `updated_ts` (INTEGER, DEFAULT 0): The timestamp of the last update.
  - `metadata_json` (TEXT, DEFAULT '{}'): Additional metadata in JSON format.

- `hostnames`: Stores hostnames associated with IP addresses.
  - `ip_address` (TEXT, PRIMARY KEY): The IP address associated with the hostname.
  - `hostname` (TEXT, NOT NULL): The hostname of the device.
  - `updated_ts` (INTEGER, DEFAULT 0): The timestamp of the last update.
  - `data_source` (TEXT, NOT NULL): The source of the hostname data.
  - `metadata_json` (TEXT, DEFAULT '{}'): Additional metadata in JSON format.

- `network_flows`: Stores information about network flows.
  - `timestamp` (INTEGER): The timestamp of the network flow.
  - `src_ip_address` (TEXT): The source IP address of the flow.
  - `dest_ip_address` (TEXT): The destination IP address of the flow.
  - `src_hostname` (TEXT): The source hostname of the flow.
  - `dest_hostname` (TEXT): The destination hostname of the flow.
  - `src_mac_address` (TEXT): The source MAC address of the flow.
  - `dest_mac_address` (TEXT): The destination MAC address of the flow.
  - `src_port` (TEXT): The source port of the flow.
  - `dest_port` (TEXT): The destination port of the flow.
  - `protocol` (TEXT): The protocol used in the flow.
  - `byte_count` (INTEGER, DEFAULT 0): The number of bytes transferred in the flow.
  - `packet_count` (INTEGER, DEFAULT 0): The number of packets transferred in the flow.
  - `metadata_json` (TEXT, DEFAULT '{}'): Additional metadata in JSON format.
  - PRIMARY KEY (`timestamp`, `src_mac_address`, `dest_mac_address`, `src_ip_address`, `dest_ip_address`, `src_port`, `dest_port`, `protocol`): The composite primary key for the table.

### How `libinspector` Works

The `libinspector` module works by starting various threads to monitor and inspect network traffic. Here is a high-level overview of the `start_threads` function in `core.py`:

1. **Ensure Single Instance**: The function first ensures that only one instance of the Inspector core is running.
2. **Initialize Database**: It initializes the database by calling `mem_db.initialize_db()`.
3. **Initialize Networking Variables**: It enables IP forwarding and updates the network information.
4. **Start Threads**: It starts several threads to perform various tasks:
   - Update network info from the OS every 60 seconds.
   - Discover devices on the network every 10 seconds.
   - Collect and process packets from the network.
   - Spoof internet traffic.
   - Start the mDNS and UPnP scanner threads.


### Testing and Development

To test locally, run these commands:

```
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install .
```


## Notes

TODO:
 - Create more test cases to obtain higher code coverage.


## Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue.

## License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

## Contact

Ask Prof. Danny Y. Huang (dhuang@nyu.edu) or Andrew Quijano (andrew.quijano@nyu.edu).

