filoma — Quick interactive examples¶

This notebook demonstrates key filoma capabilities and includes lightweight checks to see if it works in your environment.

It covers: imports and version checks, probing a file and a directory, working with the filoma.DataFrame wrapper, using probe_to_df, a small image probe example, and saving a CSV export.

Note: cells wrap operations in try/except so the notebook still runs if optional dependencies (e.g. polars, numpy, or image backends) are missing.

In [ ]:
# Basic environment and import checks
from pathlib import Path

import filoma
from filoma import DataFrame


def check_imports():
    results = {}
    try:
        import filoma

        results["filoma"] = getattr(filoma, "__version__", "unknown")
    except Exception as e:
        results["filoma"] = f"IMPORT ERROR: {e}"

    for pkg in ("polars", "numpy", "PIL"):
        try:
            __import__(pkg if pkg != "PIL" else "PIL.Image")
            results[pkg] = "available"
        except Exception as e:
            results[pkg] = f"missing ({e})"

    # show where we are running the notebook from
    results["cwd"] = str(Path(".").resolve())
    return results


check_imports()

1) Quick probe: a single file and a directory¶

Try probing a README or small file, then probe a lightweight sample directory from the repo's tests/ tree.

In [ ]:
file_candidate = "../README.md"
dir_candidate = "../tests/"

print("probing file ->", file_candidate)
if file_candidate is not None:
    try:
        file_report = filoma.probe(file_candidate)
        print("file probe result type:", type(file_report))
        try:
            # many filoma dataclasses implement a nice repr or to-dict
            print(file_report)
        except Exception:
            pass
    except Exception as e:
        print("file probe failed:", e)
else:
    print("No small file found to probe in the repository root.")

print("probing directory ->", dir_candidate)
if dir_candidate is not None:
    try:
        dir_report = filoma.probe(dir_candidate, max_depth=2, threads=2)
        print("directory probe returned an object of type:", type(dir_report))
        # If it exposes a to_df() method we can inspect a little
        if hasattr(dir_report, "to_df"):
            try:
                dfw = dir_report.to_df()
                print("to_df() -> wrapper type:", type(dfw))
            except Exception as e:
                print("to_df() raised:", e)
    except Exception as e:
        print("directory probe failed:", e)
else:
    print("No small directory found to probe in tests/; adjust the path and re-run.")

2) Working with filoma.DataFrame wrapper¶

Construct a filoma.DataFrame from a list of paths and run the convenience enrichers: .add_path_components(), .add_file_stats_cols(), and .add_depth_col().

In [ ]:
sample_paths = [p for p in (Path("../README.md"), Path("../pyproject.toml"), Path("../Cargo.toml")) if p.exists()]
if not sample_paths:
    # fallback to a couple of files from tests if present
    sample_paths = [p for p in (Path("../tests/test_basic_dataframe.py"), Path("../tests/test_rust_comprehensive.py")) if p.exists()]

print("sample paths used:", sample_paths)
dfw = DataFrame(sample_paths)
print("Initial wrapper and head:")
print(dfw.head(10))

print("With path components:")
try:
    df_components = dfw.add_path_components()
    print(df_components.head(10))
except Exception as e:
    print("add_path_components failed:", e)

print("With file stats:")
try:
    df_stats = dfw.add_file_stats_cols()
    print(df_stats.head(10))
except Exception as e:
    print("add_file_stats_cols failed:", e)

print("Add depth column relative to repo root:")
try:
    df_depth = dfw.add_depth_col(Path("."))
    print(df_depth.head(10))
except Exception as e:
    print("add_depth_col failed:", e)

3) Build a DataFrame from a directory using probe_to_df¶

This uses filoma's convenience probe_to_df which will build a Polars DataFrame if polars is installed. We request a lightweight folder under tests/ to keep runtime small.

In [ ]:
from filoma import probe_to_df

dir_path = "../tests"
if dir_path is None:
    print("No test directory available for probe_to_df; skip this cell.")
else:
    try:
        pl_df = probe_to_df(dir_path, to_pandas=False, enrich=True, max_depth=2, threads=2)
        print("probe_to_df returned a Polars DataFrame with shape:", pl_df.shape)
        # Show a small sample and a group_by_extension summary when available
        try:
            print("Sample rows:")
            print(pl_df.head(5))
        except Exception:
            pass
        try:
            print("Extension counts:")
            # wrap it in a DataFrame wrapper if needed
            from filoma import DataFrame as DFWrap

            wrapper = DFWrap(pl_df)
            print(wrapper.group_by_extension().head(10))
        except Exception as e:
            print("group_by_extension failed:", e)
    except Exception as e:
        print("probe_to_df failed:", e)

4) Image probing (in-memory)¶

Create a small numpy array and pass it to filoma.probe_image to exercise the image path that accepts arrays. This avoids needing image files or heavy dependencies.

In [ ]:
try:
    import numpy as np

    arr = np.random.randn(16, 16)
    img_report = filoma.probe_image(arr)
    print("probe_image on numpy array returned type:", type(img_report))
    try:
        print(img_report)
    except Exception:
        pass
except Exception as e:
    print("Skipping image probe; numpy unavailable or probe failed:", e)

5) Save a small CSV export (if polars is available)¶

This cell attempts to save the probe_to_df result or our small DataFrame example to /tmp/filoma_example.csv. It prints a short verification sample.

In [ ]:
out_path = Path("/tmp/filoma_example.csv")
saved = False
try:
    if "pl_df" in globals():
        # Try write via polars if present
        try:
            pl_df.write_csv(str(out_path))
            saved = True
        except Exception:
            pass
    if not saved and "dfw" in globals():
        try:
            df_stats.save_csv(out_path)
            saved = True
        except Exception:
            pass
    if saved:
        print("Saved CSV to", out_path)
        try:
            print("CSV sample:", out_path.read_text().splitlines()[:10])
        except Exception:
            pass
    else:
        print("Could not save CSV; polars or file-writer not available.")
except Exception as e:
    print("Saving CSV failed:", e)

Notes and next steps¶

  • If a cell raised an exception because a dependency is missing, install polars, numpy, and optionally pillow.
  • To run longer scans increase max_depth and threads in the probe() calls.
  • Use probe_to_df(..., to_pandas=True) to get a pandas.DataFrame if you prefer pandas.