Metadata-Version: 2.2
Name: project-fleming
Version: 0.0.4
Home-page: https://github.com/sede-open/Fleming
Project-URL: Issue Tracker, https://github.com/sede-open/Fleming/issues
Project-URL: Source, https://github.com/sede-open/Fleming
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.9, <3.12
Description-Content-Type: text/markdown
License-File: LICENSE.md
Requires-Dist: pytest==7.4.0
Requires-Dist: pyspark<3.6.0,>=3.3.0
Requires-Dist: pytest-mock>=3.14.0
Requires-Dist: sentence-transformers>=3.2.0
Requires-Dist: mlflow>=2.0.1
Requires-Dist: black>=24.1.0
Requires-Dist: nltk>=3.8.2
Requires-Dist: torch>=2.4.1
Requires-Dist: tiktoken>=0.8.0
Requires-Dist: time>=1.0.0
Requires-Dist: databricks-sdk<1.0.0,>=0.20.0
Requires-Dist: beautifulsoup4==4.12.3
Requires-Dist: PyGithub==2.5.0
Requires-Dist: jwt==1.3.1
Requires-Dist: pytest-mock==3.14.0
Requires-Dist: requests<=2.32.3
Requires-Dist: numpy<2.0.0,>=1.23.4
Requires-Dist: pandas<2.2.0,>=1.5.2
Requires-Dist: mkdocs-material==9.5.20
Requires-Dist: mkdocs-material-extensions==1.3.1
Requires-Dist: mkdocstrings==0.25.0
Requires-Dist: mkdocstrings-python==1.10.8
Requires-Dist: mkdocs-macros-plugin==1.0.1
Provides-Extra: pyspark
Requires-Dist: pyspark<3.6.0,>=3.3.0; extra == "pyspark"
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python

# Fleming

<img align="right" src="docs/images/logo.png" title="Logo Discovery" alt="Logo Discovery" width="33%"></a>

An open-source project of the "brain" of the AI Discovery Tool. Including technical scripts to build, register, serve and query models on databricks which use Semantic Search. These models can be run on cpu and not gpu providing signiifcant cost reductions.

[Databricks](https://www.databricks.com), a popular big data processing and analytics platform, is utilized to build and train machine learning models on the ingested data.

By combining data ingestion from GitHub with Databricks' model training and serving capabilities, pipelines can provide a seamless end-to-end solution for processing and analyzing data from GitHub repositories.

The serving endpoint designed to process and analyze large volumes of data, enabling efficient data discovery and insights.


# Support and contacts

If you encounter any issues or have questions, please reach out to the team by raising an issue on the repo. They will be happy to assist you and provide further information about the project.

# Contributing

Contributions to this project are welcome! If you would like to contribute, please refer to our [Contributing Guide](CONTRIBUTION.md) for guidelines on how to get started. We appreciate your support in making this project even better.

# Licensing

The code in this repository is licensed under the default copyright notice, which can be found in the [LICENSE](LICENSE) file. Please review the license before using or distributing the code.
