Metadata-Version: 2.1
Name: swh.provenance
Version: 0.1.0
Summary: Software Heritage provenance
Author-email: Software Heritage developers <swh-devel@inria.fr>
Project-URL: Homepage, https://gitlab.softwareheritage.org/swh/devel/swh-provenance
Project-URL: Bug Reports, https://gitlab.softwareheritage.org/swh/devel/swh-provenance/-/issues
Project-URL: Funding, https://www.softwareheritage.org/donate
Project-URL: Documentation, https://docs.softwareheritage.org/devel/swh-provenance/
Project-URL: Source, https://gitlab.softwareheritage.org/swh/devel/swh-provenance.git
Classifier: Programming Language :: Python :: 3
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.9
Description-Content-Type: text/x-rst
License-File: LICENSE
License-File: AUTHORS
Requires-Dist: swh.core[db,http] >=2
Requires-Dist: swh.model >=2.6.1
Requires-Dist: swh.storage
Requires-Dist: swh.graph >=2.0.0
Provides-Extra: testing
Requires-Dist: swh.core[db,http] >=2 ; extra == 'testing'
Requires-Dist: swh.model >=2.6.1 ; extra == 'testing'
Requires-Dist: swh.storage ; extra == 'testing'
Requires-Dist: swh.graph >=2.0.0 ; extra == 'testing'
Requires-Dist: grpcio ; extra == 'testing'
Requires-Dist: pytest >=8.1 ; extra == 'testing'
Requires-Dist: swh.graph[testing] >=1.0.1 ; extra == 'testing'
Requires-Dist: types-click ; extra == 'testing'
Requires-Dist: types-PyYAML ; extra == 'testing'
Requires-Dist: types-Deprecated ; extra == 'testing'

Software Heritage - Provenance
==============================

This service provide a provenance query service for the Software Heritage
Archive. Provenance is the ability to ask for a given object stored in the
Archive: "where does it come from?"

This question generally does not have a simple and unambiguous answer. It can
be, among other:

- what it the oldest revision in which this object has been found?
- what is the "better" origin in which this object can be found?

Answering this kind of question requires querying the Merkle DAG on which the
Software Heritage Archive is built with complex queries, mostly from the bottom
to the top (aka from Content to Origin objects).

The idea is to use both the compressed graph representation of the Archive
(swh-graph) and a preprocessed provennce index to speed up some of the
provenance queries.


API Description
===============

For a single object::

    Input: SWHID (core SWHID of an artifact found in the use code base)

    Output: SWHID or origin URI where input SWHID was found + context information
        Context information, a subset of:
            snapshot (snp SWHID)
            release (rel)
            revision (rev)
            path (filesystem-style path)

    Non-functional requirements: TODO something about the fact that both the
    answer and the context information should be "as high as possible" in the
    graph


Public API
----------

::

    GET /whereis/:swhid

    GET /whereis_all/

    POST /whereare/TODO
      :swhids
