Metadata-Version: 2.4
Name: moroccan-digital-factory
Version: 1.0.0
Summary: A Phased Implementation Framework for Moroccan Digital and Knowledge Sovereignty
Author-email: Samir Baladi <gitdeeper@gmail.com>
License: MIT
Project-URL: Documentation, https://moroccan-ai.netlify.app/documentation
Project-URL: Source, https://gitlab.com/gitdeeper4/mdpf
Project-URL: DOI, https://doi.org/10.5281/zenodo.20496255
Keywords: morocco,digital-sovereignty,cloud-first,darija,nlp
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31.0
Requires-Dist: simplejson>=3.19.0
Dynamic: license-file

# 🇲🇦 Moroccan Digital–Physical Factory (MDPF) v1.0.0

<div align="center">

**A Phased Implementation Framework for Moroccan Digital and Knowledge Sovereignty**

*From Regional Cloud Hosting to National Knowledge Sovereignty*

[![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![DOI Paper](https://img.shields.io/badge/DOI-pending-lightgrey)](https://doi.org/10.5281/zenodo.20496255)
[![GitLab](https://img.shields.io/badge/GitLab-MDPF-orange?logo=gitlab)](https://gitlab.com/gitdeeper4/mdpf)
[![GitHub](https://img.shields.io/badge/GitHub-mirror-black?logo=github)](https://github.com/gitdeeper4/mdpf)
[![ORCID](https://img.shields.io/badge/ORCID-0009--0003--8903--0029-A6CE39?logo=orcid)](https://orcid.org/0009-0003-8903-0029)

---

**A Phased, Low-Capital-First Pathway to Moroccan Digital and Knowledge Sovereignty,**
**Grounded in the Maroc Digital 2030 / Cloud First Policy Landscape**

*Independent research paper · June 2026*

</div>

---

## 📋 Table of Contents

- [Overview](#-overview)
- [Why This Framework](#-why-this-framework)
- [The Five-Layer Framework](#-the-five-layer-framework)
- [Phased Implementation Roadmap](#-phased-implementation-roadmap)
- [Institutional and Policy Context](#-institutional-and-policy-context)
- [Risks and Limitations](#-risks-and-limitations)
- [Project Structure](#-project-structure)
- [Research Agenda](#-research-agenda)
- [Contributing](#-contributing)
- [Citation](#-citation)
- [Author](#-author)
- [License](#-license)

---

## 🌍 Overview

The **Moroccan Digital–Physical Factory (MDPF)** is a conceptual and practical framework proposing how Morocco's pursuit of digital and knowledge sovereignty can be built incrementally rather than all at once. Sovereignty in this domain rests on a pyramid of five interdependent capacities — physical infrastructure, processing, semantic representation, cultural identity, and governance — that cannot realistically be constructed simultaneously without prohibitive capital expenditure.

This project proposes a **phased, low-capital-first implementation pathway**: instead of beginning with nationally owned data-center construction, independent researchers and small institutions can start with regionally hosted or Morocco-resident cloud computing, deferring sovereign physical infrastructure to a later maturation stage. The highest-leverage, lowest-cost entry points are the **processing layer** (classification, clustering, retrieval) and the **cultural-identity layer** (Moroccan-language and dialectal corpora, particularly Darija) — both of which require methodological and linguistic expertise rather than capital.

> 🧠 **Core argument:** The conventional bottom-up build order — infrastructure before processing before representation before culture before governance — should be *inverted in execution* while preserved as a conceptual hierarchy. Demonstrated value at the processing and cultural-identity layers is what subsequently attracts infrastructure partnerships and institutional attention, not the reverse.

The framework is grounded in Morocco's existing institutional landscape: the national high-performance computing capacity already operated by **Mohammed VI Polytechnic University (UM6P)**, the government's **2025–2030 "Cloud First" roadmap** under the **Digital Morocco 2030** strategy, and emerging sovereign data-center investment such as the **EcoDar** facility in Dakhla.

---

## 🎯 Why This Framework

Digital sovereignty has moved from an abstract policy aspiration to an operational requirement for national governments. For Morocco, this is codified in the **Maroc Digital 2030** strategy (launched September 2024 under the Ministry of Digital Transition and Administrative Reform), which sets explicit targets for AI adoption, public-service digitalization, and specialized digital infrastructure. A central pillar, formalized in December 2025, is the national **"Cloud First" roadmap** for 2025–2030, which frames cloud adoption as a lever of national sovereignty rather than a purely technical choice.

A practical gap exists between this national-level policy ambition and the operational reality faced by independent researchers, small laboratories, and emerging technology teams who want to contribute to Moroccan knowledge infrastructure but cannot deploy capital at the scale of state-backed data-center programs. MDPF addresses that gap directly.

---

## 🏗️ The Five-Layer Framework

| # | Layer | Approach | Who Leads |
|---|---|---|---|
| 1 | **Data Infrastructure** | Deferred, rented — not built | State / quasi-state institutions (long term) |
| 2 | **Processing** (classification, clustering, retrieval) | **Recommended entry point** — rented compute, open-source tooling | Independent researchers, small teams |
| 3 | **Digital / Semantic Representation** | Fine-tune open-weight models on Moroccan corpora | Researchers, university partnerships (ENSIAS, INPT) |
| 4 | **Cultural & Knowledge Identity** | Structure Moroccan legal, historical, cultural data into machine-usable datasets | **Best suited to individuals / small teams** |
| 5 | **Governance & Sovereignty** | National education, research, and policy integration | Ministry of Digital Transition, ANRT, ADD |

### Layer 1 — Data Infrastructure (Deferred, Rented, Not Built)

Entry-level implementation rents compute from Morocco-resident commercial cloud regions (e.g., Oracle's Casablanca and Settat regions), regional North African/European cloud zones with low-latency connectivity to Morocco, or general-purpose international providers — with an explicit migration plan toward in-country hosting once available. This removes the single largest capital barrier: energy, GPU/TPU procurement, and physical security.

### Layer 2 — Processing (Classification, Clustering, Retrieval)

The recommended primary entry point. Classification (sorting data by Moroccan dialectal and administrative context), clustering (building knowledge groups across law, culture, education), and retrieval (activating stored knowledge on demand) require methodological expertise and curated data rather than capital. A working prototype can be built on rented compute with open-source tooling.

### Layer 3 — Digital Representation (Semantic Models)

Fine-tuning existing open-weight language models on Moroccan-specific corpora (Darija text, administrative/legal documents, regional news) is computationally lighter than foundation-model training and runs on the same rented compute as Layer 2.

### Layer 4 — Cultural and Knowledge Identity

Structuring and documenting Moroccan legal, historical, and cultural knowledge into machine-usable datasets, published on open-science platforms (Zenodo, OSF, HuggingFace) under DOI and preregistration practices, creates durable, citable infrastructure without requiring institutional permission or capital.

### Layer 5 — Governance and Sovereignty

Embedding this work into national education, research, and policy structures requires engagement with state institutions — the Ministry of Digital Transition, ANRT, the Agence de Développement du Digital (ADD). This is explicitly **out of scope** for direct execution by independent researchers; demonstrated output at Layers 2 and 4 is the credible route to institutional attention.

---

## 🗺️ Phased Implementation Roadmap

| Phase | Name | Description |
|---|---|---|
| **0** | Data & corpus assembly | Collect and clean Moroccan-language and domain-specific datasets (Darija text, legal/administrative documents, cultural archives); local hardware only |
| **1** | Rented processing prototype | Deploy classification, clustering, retrieval pipelines on regional/international cloud compute; validate and publish methodology |
| **2** | Semantic fine-tuning | Fine-tune an open-weight LLM on assembled corpora; benchmark against general-purpose models on Moroccan-specific tasks |
| **3** | Open publication & dataset release | Publish datasets, model weights (where licensing permits), and methodology papers with DOIs |
| **4** | Institutional partnership | Approach UM6P or in-country cloud regions (Oracle Casablanca/Settat, future sovereign facilities like EcoDar) for scaled compute |
| **5** | Policy engagement | Engage ANRT, the Ministry of Digital Transition, or ADD regarding inclusion in national education, research, or policy frameworks |

---

## 🏛️ Institutional and Policy Context

**National Cloud Policy** — The Cloud First roadmap, presented to the House of Representatives in late 2025, mandates a phased transition of government digital platforms toward cloud architectures, tying data residency and processing location to sovereignty, and is accompanied by forthcoming "Digital X.0" legislation on data flows, AI ethics, and interoperability.

**Emerging Sovereign and Commercial Cloud Capacity** — Between 2024 and 2026, approximately $1.1 billion has been allocated to the broader Digital Morocco 2030 strategy. Oracle announced a $140 million investment for cloud regions in Casablanca and Settat; Microsoft, AWS, and Google Cloud are reportedly evaluating Moroccan footholds. On the sovereign side, the EcoDar green data center in Dakhla is intended to keep sensitive national data resident within Moroccan jurisdiction.

**Existing HPC Capacity** — UM6P's African Supercomputing Center (inaugurated 2021), home to the **Toubkal** supercomputer, provides 3.15 petaflops of compute and ranks among the global TOP500 systems. UM6P has formalized cloud/AI partnerships with Oracle (first Oracle Lab in Africa, Casablanca 2022) and Microsoft's Africa Transformation Office — a precedent for academic–commercial cloud collaboration inside Morocco.

---

## ⚠️ Risks and Limitations

| Risk | Description | Mitigation |
|---|---|---|
| **Data residency** | Renting non-Moroccan cloud compute, even temporarily, may tension with sovereignty objectives | Encryption at rest/in transit, contractual data-deletion guarantees, avoidance of adverse jurisdictions, clear migration plan to in-country hosting |
| **Vendor dependency** | Reliance on commercial cloud providers introduces lock-in and pricing exposure | Build on portable, open-source tooling rather than provider-proprietary services |
| **Legitimacy** | Work produced outside institutional channels may face slower state adoption | Phase 4–5 sequencing treats institutional/policy engagement as a consequence of demonstrated output, not a precondition |

---

## 🗂️ Project Structure

```
mdpf/
│
├── README.md                          # This file
├── README_ar.md                       # Arabic version
├── LICENSE                            # MIT License
├── CONTRIBUTING.md                    # Contribution guidelines
├── CHANGELOG.md                       # Version history
│
├── paper/                             # Source paper and references
│   ├── moroccan_digital_factory_paper.docx
│   └── references.bib
│
├── docs/                              # Documentation
│   ├── index.md
│   ├── framework/                     # Per-layer documentation
│   │   ├── layer1_infrastructure.md
│   │   ├── layer2_processing.md
│   │   ├── layer3_representation.md
│   │   ├── layer4_cultural_identity.md
│   │   └── layer5_governance.md
│   └── roadmap/                       # Per-phase documentation
│       ├── phase0_corpus.md
│       ├── phase1_prototype.md
│       ├── phase2_finetuning.md
│       ├── phase3_publication.md
│       ├── phase4_partnership.md
│       └── phase5_policy.md
│
├── corpora/                           # Moroccan-language datasets (Phase 0)
│   ├── darija/
│   ├── legal_administrative/
│   └── cultural_archives/
│
├── pipelines/                         # Processing pipelines (Phase 1)
│   ├── classification/
│   ├── clustering/
│   └── retrieval/
│
└── models/                            # Fine-tuned semantic models (Phase 2)
```

---

## 🔬 Research Agenda

Empirical validation of this framework is an open research agenda, including:

- Benchmarking classification/clustering/retrieval pipelines on Darija and Moroccan administrative corpora against general-purpose multilingual baselines
- Quantifying the cost differential between rented regional cloud compute and projected in-country sovereign hosting
- Documenting reuse and citation metrics for openly published Moroccan-language datasets as a proxy for "demonstrated value" preceding institutional partnership
- Formalizing data-residency and data-deletion contractual templates suitable for independent researchers operating on commercial cloud infrastructure

---

## 🤝 Contributing

We welcome contributions from NLP researchers, Moroccan dialectology and Darija specialists, cloud/infrastructure engineers, and policy researchers.

```bash
# 1. Fork and clone
git clone https://gitlab.com/YOUR_USERNAME/mdpf.git

# 2. Create a feature branch
git checkout -b feature/your-feature-name

# 3. Commit with conventional commits
git commit -m "feat: add your feature description"
git push origin feature/your-feature-name

# 4. Open a Merge Request on GitLab
```

**Priority contribution areas:**
- Darija and Moroccan dialectal corpus collection and cleaning
- Open-weight model fine-tuning recipes for Moroccan administrative/legal text
- Classification, clustering, and retrieval pipeline implementations (Layer 2)
- Cultural and historical knowledge dataset structuring (Layer 4)
- Documentation translation (Arabic, French, English)

---

## 📖 Citation

```bibtex
@article{Baladi2026MDPF,
  title   = {A Phased Implementation Framework for the Moroccan Digital--Physical
             Factory: From Regional Cloud Hosting to National Knowledge Sovereignty},
  author  = {Baladi, Samir},
  year    = {2026},
  month   = {June},
  note    = {Independent research paper}
}
```

---

## 👤 Author

| Name | Role | Affiliation |
|---|---|---|
| **Samir Baladi** | Author · Framework design · Analysis | Independent Researcher, Ronin Institute / Rite of Renaissance |

**Corresponding author:** Samir Baladi · [gitdeeper@gmail.com](mailto:gitdeeper@gmail.com) · ORCID: [0009-0003-8903-0029](https://orcid.org/0009-0003-8903-0029)

---

## 📄 License

This project is licensed under the **MIT License** — see [LICENSE](LICENSE) for details.

---

<div align="center">

**🇲🇦 MDPF — Building Moroccan digital sovereignty layer by layer, starting where capital isn't the constraint.**

*From rented compute to national infrastructure — an inverted, achievable build order.*

---

ORCID: [0009-0003-8903-0029](https://orcid.org/0009-0003-8903-0029)

</div>
