🚀 filoma demo¶
Fast, multi-backend file analysis with a tiny API surface
import filoma
print(f"filoma version: {filoma.__version__}")
Let's start with something simple, like getting a handy dataclass for a single file:
🔍📄 File analysis¶
📄 Single file (any kind)¶
from filoma import probe_file
file_info = probe_file("../README.md")
print(f"Path: {file_info.path}")
print(f"Size: {file_info.size}")
print(f"Modified: {file_info.modified}")
print(f"file_info: {[i for i in dir(file_info) if not i.startswith('_')]}")
or specifically for image files:
🖼️ Image file analysis¶
from filoma import probe_image
img = probe_image("../images/logo.png")
print(f"Type of file: {img.file_type}, Type of img object: {type(img)}")
print(f"Shape: {img.shape}")
print(f"Data range: {img.min} - {img.max}")
print(f"img info: {img.as_dict()}")
🔍📁 Directory Analysis¶
Do you want to analyze a directory of files and extract metadata, text content, and other useful information?
filoma makes it super easy to do so with just a few lines of code.
from filoma.directories import DirectoryProfiler, DirectoryProfilerConfig
# Create a profiler using the typed config dataclass
config = DirectoryProfilerConfig(use_rust=True)
dir_prof = DirectoryProfiler(config)
analysis = dir_prof.probe("../")
dir_prof.print_summary(analysis)
Want to quickly see a report of your findings? filoma has you covered.
dir_prof.print_report(analysis)
📁 Directory of files --> DataFrame --> Data Exploration¶
Now that you saw what's up with your files, you might want to explore the data in a familiar format.
filoma can convert the analysis results into a Polars (or Pandas) DataFrame real quick.
NOTE: Pandas support requires the pd extra which you can install by running uv sync --extra pd in your terminal.
from filoma import probe_to_df
df = probe_to_df("../", max_depth=2, enrich=True)
print(f"Found {len(df)} files")
df.head()
!uv pip install pandas pyarrow
print(f"Type of df:\t{type(df.to_pandas())}, \nShape of df:\t{df.to_pandas().shape}")
⚡ DataFrame enrichment¶
You're probably wondering "what is enrich=True?"
Well, since filoma gathers the paths of your files in a DataFrame, why not enrich this DataFrame with additional metadata. Its own DataFrame class has convenience functions like: add_path_components(), add_file_stats_cols(), add_depth_col()
Let's see it in action:
from rich.console import Console
from rich.panel import Panel
console = Console()
cfg = DirectoryProfilerConfig(build_dataframe=True, use_rust=True, return_absolute_paths=True)
dprof = DirectoryProfiler(cfg)
res = dprof.probe("../")
orig_cols = list(res.dataframe.columns)
console.print(Panel(f"Columns before enrich: [bold]{', '.join(orig_cols)}[/]"))
console.print(Panel(res.dataframe.head(3).to_pandas().to_string(index=False), title="DataFrame head (before enrich)"))
df = res.dataframe.enrich()
new_cols = sorted(set(df.columns) - set(orig_cols))
console.print(Panel(f"New columns after enrich: [bold]{', '.join(new_cols)}[/]"))
console.print(Panel(df.head(3).to_pandas().to_string(index=False), title="DataFrame head (after enrich)"))
✅ Conclusion¶
So this is how you can use filoma to go from a messy directory tree to a clean DataFrame with enriched metadata, ready for analysis and downstream workflows.