# Cross-Project References in projspec

## Context

projspec is a parsing/modeling library that reads existing metadata files (prefect.yaml,
dvc.yaml, datapackage.json, etc.) and produces a structured representation of a project.
It has no engine of its own and no central registry. A project is a directory + one or
more parsed spec files; a single directory may match many spec types simultaneously.

## What the codebase shows

The relevant model units are:

- `Project` — a directory + `specs`, `contents`, `artifacts`, `children`
- `ProjectSpec` — a parsed spec file (Prefect, DVC, Airflow, etc.)
- `BaseContent` — a dataclass describing *what exists* in a project
- `BaseArtifact` — something a project *can do/make*
- `children` — already models sub-directories as nested `Project` objects

`children` is already a form of cross-project reference, but only downward (into
subdirectories). There is currently no upward or sideways reference concept.

## Where cross-project references already naturally appear, but are not yet modeled

Looking at the existing spec parsers:

- `DVCRepo` reads remote storage config but drops it as a raw list of strings — it
  already has the concept of external data locations but does not model them typed.
- `DataPackage` / `TabularData` have `path` fields that could escape the project root.
- `IntakeCatalog` sources have URLs that are fully external.
- `Prefect`/`Airflow` deployments have `entrypoint` fields that could point elsewhere.

None of these are currently typed as cross-project references — they are just strings
or opaque dicts.

## Proposed design

The natural fit for projspec's existing structure is a new `BaseContent` subclass —
`ProjectReference` — living in `content/`:

```python
@dataclass
class ProjectReference(BaseContent):
    """A declared reference to another project, from within a spec."""
    path: str              # as declared in the source spec (may be relative, remote URL, etc.)
    ref_type: str          # "data_dep" | "subproject" | "upstream" | "trigger" | ""
    source_spec: str       # which spec declared this, e.g. "dvc_repo", "prefect"
    resolved: Project | None = field(default=None, repr=False)
```

This sits cleanly inside `ProjectSpec._contents`, populated by individual spec parsers
that encounter external references. The `resolved` field starts as `None` and can be
populated later by the caller — projspec surfaces it but does not resolve it.

## Detection approach: implicit vs explicit

Given projspec is a parser of existing formats, implicit detection is the right first
approach: detect cross-project refs by noticing a path or URL in a spec escapes or is
external to the project root. No new file format needed.

Explicit declaration (e.g. a `projspec.toml` with `depends_on: [../other-project]`)
could come later as an extension.

## Resolution levels

- Level 1 (surface only): record that the path escapes the root, emit it as a typed
  `ProjectReference` with `resolved=None`. Leave resolution entirely to the caller.
  This is safe and consistent with projspec's current scope.

- Level 2 (root discovery): given the escaped path, walk up to find whether a
  recognised project root exists there (i.e. does the target directory also contain a
  known spec file?). If so, populate `resolved` with a parsed `Project`. Still
  filesystem-local, no registry needed.

- Level 3 (full graph): load the target project recursively and build a multi-project
  dependency graph. This starts to feel like an engine and is out of scope for now.

Level 1 is the recommended starting point. Level 2 is useful and achievable without
introducing engine-like behaviour.

## Remote paths

Paths could be remote (s3://, gs://, https://, etc.). These cannot be detected by
path-escape logic. They should be treated as opaque `ProjectReference` objects —
parsed and typed but not resolved — with `resolved=None`.

## Open design questions

1. **`ref_type` vocabulary** — should projspec define a controlled vocabulary for the
   nature of the reference (`data_dep`, `trigger`, `upstream`, `subproject`, etc.), or
   leave it as a free string for now? A controlled vocabulary aids filtering via
   `library.filter()`, but may be premature before more spec parsers are updated.

2. **Resolution scope** — `Project` already walks `children` (downward). Should
   projspec also support a `walk_upward` or `siblings` resolution mode, so that if
   project A declares `path: ../project-b`, projspec can optionally parse project B and
   populate `resolved`? Or is that firmly the caller's responsibility?
