{% extends "base.html" %} {% block title %} About the Knowledge Repo {% endblock %} {% block content %} {{ super() }}
The knowledge repository is a git repository, web app, and set of tools that enables the sharing of knowledge between data scientists and other technical roles. The idea is to act as a hub whereby people can submit their work, and go through a standard QA review process, using data formats that make sense in these professions. Currently, the knowledge repository supports the following formats:
Users add these notebooks to the knowledge repository through the
knowledge_repo
tool, as described below, which converts them
into a standard format; and allows them to be rendered and curated in the
web app.
To install the knowledge repository tooling, simply run:
pip install
git+ssh://git@github.com/airbnb/knowledge-repo.git
If your organization already has a knowledge data repository setup, check it out onto your computer as you normally would; for example:
git clone git@example.com:example_data_repo.git
If not, or for fun, you can create a new knowledge repository using:
knowledge_repo --repo <repo_path> init
Running this same script if a repo already exists at
<repo_path>
will have no effect.
You can drop the --repo
option if you set the
$KNOWLEDGE_REPO
environment variable to the location of that
repository.
For more details about the structure of a knowledge repository, see the technical details section below.
The whole point of a knowledge repository is to host knowledge posts. You can add a knowledge post using:
knowledge_repo --repo <repo_path> add <supported
knowledge format> <location in knowledge repo>
For example, if my knowledge repository is in a folder named
test_repo
, and I have an IPython notebook at
Documents/notebook.ipynb
, and I want it to be added to the
knowledge repository under projects/test_knowledge
, I can
run:
knowledge_repo --repo test_repo add Documents/notebook.ipynb
projects/test_knowledge
If you look in test_repo
you will see a new folder
test_repo/projects/test_knowledge.kp
, and you are set to use
git commit
and git push
to submit it for review.
Note that the folder ends in ‘.kp’. This is added automatically
to indicate that this folder is a knowledge post. Explicitly adding the
‘.kp’ is optional. Also note that knowledge_repo
does not automatically create a branch for you; so if that is the way in
which your organisation works, be sure to manually create a branch before
pushing into the repo.
Currently ‘ipynb’, ‘Rmd’ and ‘md’ files are supported. See the “Contributing” section below to see how to add support for more formats.
To update an existing knowledge post, simply pass the
--update
option, which will allow the add operation to
override existing knowledge posts. e.g.
knowledge_repo --repo <repo_path> add --update <supported
knowledge format> <location in knowledge repo>
Running the web app allows you to locally view all the knowledge posts in the repository, or to serve it for others to view. It is also useful when developing on the web app.
Running the web app in development/local/private mode is as simple as running:
knowledge_repo --repo <repo_path> runserver
Supported options are --port
and --dburi
which
respectively change the local port on which the server is running, and the
sqlalchemy uri where the database can be found and/or initiated. The
default port is 7000, and the default dburi is
sqlite:////tmp/knowledge.db
. If the database does not exist,
it is created (if that is possible) and initialised. Database migrations
are not automatic (to prevent accidental data loss), but can be performed
using:
knowledge_repo --repo <repo_path> db_migrate --dburi
<db>
Deploying the web app is much like running the development server, except that the web app is deployed on top of gunicorn. It also allows for enabling server-side components such as sending emails to subscribed users.
Deploying is as simple as: knowledge_repo --repo <repo_path>
deploy
Supported options are --port
,
--dburi
,--workers
, --timeout
and
--config
. The --config
option allows you to
specify a python config file from which to load the extended configuration.
A template config file is provided in
resources/server_config.py
. The --port
and
--dburi
options are as before, with the --workers
and --timeout
options specifying the number of threads to use
when serving through gunicorn, and the timeout after which the threads are
presumed to have died, and will be restarted.
We would love to work with you to create the best knowledge repository software possible. If you have ideas or would like to have your own code included, add an issue or pull request and we will review it.
Support for conversion of a particular filetype to a knowledge post is
added by writing a new KnowledgePostConverter
object. Each
converter should live in its own file in
knowledge_repo/converters
. Refer to the implementation for
ipynb, Rmd, and md for more details. If your conversion is site-specific,
you can define these subclasses in .knowledge_repo_config
,
whereupon they will be picked up by the conversion code.
When a KnowledgePost is constructed by converting from support
filetypes, the resulting post is then passed through a series of
postprocessors (defined in knowledge_repo/postprocessors
).
This allows one to modify the knowledge post, upload images to remote
storage facilities (such as S3), and/or verify some additional structure of
the knowledge posts. As above, defining these classes in
.knowledge_repo_config
allows for postprocessors to be used
locally.
Is the Knowledge Repository missing something else that you would like to see? Let us know, and we’ll see if we cannot help you.
A knowledge repository is a git repository with the following structure:
<repo> + .git # The git repository metadata + .resources # A folder into which the knowledge_repo repository is checked out (as a git submodule) - .knowledge_repo_config # Local configuration for this knowledge repository - <knowledge posts>
The use of a git submodule to checkout the knowledge_repo into
.resources
allows use to ensure that the client and server are
using the same version of the code. When one uses the
knowledge_repo
script, it actually passes the options to the
version of the knowledge_repo
script in
.resources/scripts/knowledge_repo
. Thus, updating the version
of knowledge_repo used by client and server alike is as simple as changing
which revision is checked out by git submodule in the usual way. That
is:
pushd .resources git pull git checkout <revision>/<branch> popd git commit -a -m 'Updated version of the knowledge_repo' git push
Then, all users and servers associated with this repository will be updated to the new version. This prevents version mismatches between client and server, and all users of the repository.
In development, it is often useful to disable this chaining. To use the
local code instead of the code in the checked out knowledge repository,
pass the --dev
option as:
knowledge_repo --repo <repo_path> --dev <action>
...
A knowledge post is a directory, with the following structure:
<knowledge_post> - knowledge.md + images/* [Optional] + orig_src/* [Optional; stores the original converted file]
Images are automatically extracted from the local paths on your
computer, and placed into images. orig_src
contains the
file(s) from which the knowledge post was converted from.