🚀 filoma demo¶

Fast, multi-backend file analysis with a tiny API surface

In [ ]:
import filoma

print(f"filoma version: {filoma.__version__}")

Let's start with something simple, like getting a handy dataclass for a single file:

🔍📄 File analysis¶

📄 Single file (any kind)¶

In [ ]:
from filoma import probe_file

file_info = probe_file("../README.md")
print(f"Path: {file_info.path}")
print(f"Size: {file_info.size}")
print(f"Modified: {file_info.modified}")
print(f"file_info: {[i for i in dir(file_info) if not i.startswith('_')]}")

or specifically for image files:

🖼️ Image file analysis¶

In [ ]:
from filoma import probe_image

img = probe_image("../images/logo.png")
print(f"Type of file: {img.file_type}, Type of img object: {type(img)}")
print(f"Shape: {img.shape}")
print(f"Data range: {img.min} - {img.max}")
print(f"img info: {img.as_dict()}")

🔍📁 Directory Analysis¶

Do you want to analyze a directory of files and extract metadata, text content, and other useful information?
filoma makes it super easy to do so with just a few lines of code.

In [ ]:
from filoma.directories import DirectoryProfiler, DirectoryProfilerConfig

# Create a profiler using the typed config dataclass
config = DirectoryProfilerConfig(use_rust=True)
dir_prof = DirectoryProfiler(config)

analysis = dir_prof.probe("../")
dir_prof.print_summary(analysis)

Want to quickly see a report of your findings? filoma has you covered.

In [ ]:
dir_prof.print_report(analysis)

📁 Directory of files --> DataFrame --> Data Exploration¶

Now that you saw what's up with your files, you might want to explore the data in a familiar format.
filoma can convert the analysis results into a Polars (or Pandas) DataFrame real quick.
NOTE: Pandas support requires the pd extra which you can install by running uv sync --extra pd in your terminal.

In [ ]:
from filoma import probe_to_df

df = probe_to_df("../", max_depth=2, enrich=True)
print(f"Found {len(df)} files")
df.head()
In [ ]:
!uv pip install pandas pyarrow
In [ ]:
print(f"Type of df:\t{type(df.to_pandas())}, \nShape of df:\t{df.to_pandas().shape}")

⚡ DataFrame enrichment¶

You're probably wondering "what is enrich=True?"
Well, since filoma gathers the paths of your files in a DataFrame, why not enrich this DataFrame with additional metadata. Its own DataFrame class has convenience functions like: add_path_components(), add_file_stats_cols(), add_depth_col()

Let's see it in action:

In [ ]:
from rich.console import Console
from rich.panel import Panel

console = Console()

cfg = DirectoryProfilerConfig(build_dataframe=True, use_rust=True, return_absolute_paths=True)
dprof = DirectoryProfiler(cfg)
res = dprof.probe("../")

orig_cols = list(res.dataframe.columns)
console.print(Panel(f"Columns before enrich: [bold]{', '.join(orig_cols)}[/]"))
console.print(Panel(res.dataframe.head(3).to_pandas().to_string(index=False), title="DataFrame head (before enrich)"))

df = res.dataframe.enrich()
new_cols = sorted(set(df.columns) - set(orig_cols))
console.print(Panel(f"New columns after enrich: [bold]{', '.join(new_cols)}[/]"))
console.print(Panel(df.head(3).to_pandas().to_string(index=False), title="DataFrame head (after enrich)"))

✅ Conclusion¶

So this is how you can use filoma to go from a messy directory tree to a clean DataFrame with enriched metadata, ready for analysis and downstream workflows.