Repository Detection#
The repository detection module provides comprehensive analysis of git repositories, including language detection, project classification, branch analysis, CI/CD detection, and metrics collection.
Features#
- Language Detection: Identifies primary and secondary programming languages, frameworks, and build tools
- Project Classification: Determines project type (application, library, CLI, etc.) and domain
- Branch Analysis: Detects branching strategies and analyzes branch patterns
- CI/CD Detection: Identifies CI/CD configurations and platforms
- Metrics Collection: Gathers repository metrics including stars, forks, issues, and contributor information
- Git Provider Integration: Supports real-time metrics from GitHub and GitLab APIs
- AppConfig Integration: Dynamic provider configuration through application settings
Usage#
Basic Repository Analysis#
from metagit.core.detect.repository import RepositoryAnalysis
# Analyze a local repository
analysis = RepositoryAnalysis.from_path("/path/to/repo")
# Analyze a remote repository (clones it temporarily)
analysis = RepositoryAnalysis.from_url("https://github.com/username/repo")
# Generate summary
summary = analysis.summary()
print(summary)
# Convert to MetagitConfig
config = analysis.to_metagit_config()
CLI Usage#
# Analyze current directory
metagit detect repository
# Analyze specific path
metagit detect repository --path /path/to/repo
# Analyze remote repository
metagit detect repository --url https://github.com/username/repo
# Save configuration to .metagit.yml
metagit detect repository --save
# Output in different formats
metagit detect repository --output yaml
metagit detect repository --output json
Git Provider Plugins#
The repository detection system supports git provider plugins that enable fetching real-time metrics from hosting platforms like GitHub and GitLab.
Supported Providers#
- GitHub: Fetches stars, forks, issues, pull requests, and contributor data
- GitLab: Fetches project statistics, merge requests, and member information
Configuration Methods#
1. AppConfig (Recommended)#
Configure providers through the application configuration file:
# ~/.config/metagit/config.yml or metagit.config.yml
config:
providers:
github:
enabled: true
api_token: "ghp_your_github_token_here"
base_url: "https://api.github.com" # For GitHub Enterprise
gitlab:
enabled: false
api_token: "glpat_your_gitlab_token_here"
base_url: "https://gitlab.com/api/v4" # For self-hosted GitLab
Benefits: - Persistent configuration across sessions - No need to set environment variables - Easy to manage multiple environments - Supports enterprise instances
2. Environment Variables#
Set API tokens as environment variables:
export GITHUB_TOKEN="your_github_personal_access_token"
export GITLAB_TOKEN="your_gitlab_personal_access_token"
3. CLI Options#
Override configuration for specific commands:
# Use GitHub token
metagit detect repository --github-token "your_token"
# Use GitLab token
metagit detect repository --gitlab-token "your_token"
# Custom API URLs (for self-hosted instances)
metagit detect repository --github-url "https://github.company.com/api/v3"
metagit detect repository --gitlab-url "https://gitlab.company.com/api/v4"
# Disable AppConfig and use environment variables only
metagit detect repository --use-app-config=false
Configuration Priority#
The system uses the following priority order for provider configuration:
- CLI Options (highest priority) - Override all other settings
- AppConfig - Persistent configuration from config files
- Environment Variables - Fallback for legacy support
Provider Features#
GitHub Provider#
- Authentication: Personal Access Token
- Metrics: Stars, forks, open issues, pull requests, contributors
- Metadata: Repository description, topics, creation date, license
- URL Support: github.com, GitHub Enterprise
- Configuration:
providers.github.enabled,providers.github.api_token,providers.github.base_url
GitLab Provider#
- Authentication: Personal Access Token
- Metrics: Star count, forks, open issues, merge requests, members
- Metadata: Project description, topics, visibility, namespace
- URL Support: gitlab.com, self-hosted GitLab
- Configuration:
providers.gitlab.enabled,providers.gitlab.api_token,providers.gitlab.base_url
Fallback Behavior#
When no provider is available or API calls fail, the system falls back to git-based metrics:
- Contributors: Counted from git commit history
- Commit Frequency: Calculated from recent commit patterns
- Stars/Forks/Issues: Set to 0 (requires API access)
Detection Components#
Language Detection#
Analyzes file extensions and content to identify:
- Primary Language: Most dominant programming language
- Secondary Languages: Other languages present
- Frameworks: React, Vue, Angular, Terraform, Kubernetes, etc.
- Package Managers: npm, pip, cargo, go.mod, etc.
- Build Tools: Make, Gradle, Maven, etc.
Project Type Detection#
Classifies projects based on file patterns:
- Application: Web apps, mobile apps, desktop apps
- Library: Reusable code libraries
- CLI: Command-line tools
- Microservice: Containerized services
- Data Science: ML/AI projects with notebooks
- Infrastructure as Code: Terraform, CloudFormation, etc.
Branch Analysis#
Detects branching strategies:
- Git Flow: Feature, develop, release, hotfix branches
- GitHub Flow: Simple main branch with feature branches
- GitLab Flow: Environment-based branching
- Trunk-Based Development: Single main branch
- Custom: Other branching patterns
CI/CD Detection#
Identifies CI/CD configurations:
- GitHub Actions:
.github/workflows/ - GitLab CI:
.gitlab-ci.yml - CircleCI:
.circleci/config.yml - Jenkins:
Jenkinsfile - Travis CI:
.travis.yml
Metrics Collection#
Gathers repository statistics:
- Stars: Repository stars/watches
- Forks: Repository forks
- Open Issues: Number of open issues
- Pull Requests: Open and recently merged PRs
- Contributors: Number of contributors
- Commit Frequency: Daily, weekly, or monthly activity
Output Formats#
Summary Output#
Human-readable summary of all detected information:
Repository Analysis Summary
Path: /path/to/repo
URL: https://github.com/username/repo
Git Repository: True
Primary Language: Python
Secondary Languages: JavaScript, Shell
Frameworks: React, Terraform
Package Managers: pip, npm
Project Type: application
Domain: web
Confidence: 0.85
Branch Strategy: GitHub Flow
Number of Branches: 3
CI/CD Tool: GitHub Actions
Contributors: 5
Commit Frequency: weekly
Stars: 42
Forks: 12
Open Issues: 3
Open PRs: 1
PRs Merged (30d): 8
Metrics Source: GitHub API
Has Docker: True
Has Tests: True
Has Documentation: True
Has Infrastructure as Code: True
YAML Output#
Structured YAML configuration:
name: "My Project"
description: "A sample project"
url: "https://github.com/username/repo"
kind: "application"
license:
kind: "mit"
file: "LICENSE"
maintainers:
- name: "John Doe"
email: "john@example.com"
role: "Maintainer"
branch_strategy: "github_flow"
taskers:
- kind: "taskfile"
branches:
- name: "main"
- name: "develop"
- name: "feature/new-feature"
cicd:
platform: "github"
pipelines:
- name: "CI"
ref: ".github/workflows/ci.yml"
metrics:
stars: 42
forks: 12
open_issues: 3
pull_requests:
open: 1
merged_last_30d: 8
contributors: 5
commit_frequency: "weekly"
metadata:
default_branch: "main"
has_ci: true
has_tests: true
has_docs: true
has_docker: true
has_iac: true
created_at: "2024-01-01T00:00:00"
last_commit_at: "2024-01-15T12:00:00"
workspace:
projects:
- name: "default"
repos:
- name: "My Project"
path: "/path/to/repo"
url: "https://github.com/username/repo"
Examples#
Basic Analysis#
from metagit.core.detect.repository import RepositoryAnalysis
# Analyze current directory
analysis = RepositoryAnalysis.from_path(".")
# Print summary
print(analysis.summary())
# Get configuration
config = analysis.to_metagit_config()
With AppConfig Integration#
from metagit.core.detect.repository import RepositoryAnalysis
from metagit.core.appconfig import AppConfig
from metagit.core.providers import registry
# Load AppConfig and configure providers
app_config = AppConfig.load()
registry.configure_from_app_config(app_config)
# Analyze repository (will use configured providers for metrics)
analysis = RepositoryAnalysis.from_path(".")
print(analysis.summary())
With Manual Provider Configuration#
from metagit.core.detect.repository import RepositoryAnalysis
from metagit.core.providers.github import GitHubProvider
from metagit.core.providers import registry
# Setup GitHub provider manually
provider = GitHubProvider(api_token="ghp_...")
registry.register(provider)
# Analyze repository
analysis = RepositoryAnalysis.from_path(".")
print(analysis.summary())
CLI with AppConfig#
# Create AppConfig file
mkdir -p ~/.config/metagit
cat > ~/.config/metagit/config.yml << EOF
config:
providers:
github:
enabled: true
api_token: "ghp_..."
gitlab:
enabled: false
api_token: ""
EOF
# Analyze with AppConfig providers
metagit detect repository --path /path/to/repo --output summary
# Save configuration with real metrics
metagit detect repository --path /path/to/repo --save
CLI with Environment Variables#
# Set environment variables
export GITHUB_TOKEN="ghp_..."
export GITLAB_TOKEN="glpat-..."
# Analyze with environment providers
metagit detect repository --path /path/to/repo --output summary
# Disable AppConfig and use environment only
metagit detect repository --use-app-config=false --path /path/to/repo
Error Handling#
The detection system gracefully handles errors:
- Missing Files: Skips analysis of missing files/directories
- API Failures: Falls back to git-based metrics
- Invalid Repositories: Returns appropriate error messages
- Network Issues: Continues with local analysis
- Configuration Errors: Falls back to environment variables or defaults
Performance Considerations#
- Local Analysis: Fast, no network required
- Provider API Calls: May add 1-3 seconds for metrics
- Large Repositories: Analysis time scales with repository size
- Caching: No built-in caching (consider implementing for repeated analysis)
- Configuration Loading: AppConfig is loaded once per command execution