Metadata-Version: 2.4
Name: yasmot
Version: 0.10
Summary: Object tracker for stereo images and video
Author: Ketil Malde
Author-email: ketil@malde.org
License: LGPL
Keywords: object tracking
Classifier: Programming Language :: Python :: 3
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Requires-Python: >=3.8
Requires-Dist: parse
Requires-Dist: scipy

# YASMOT: Yet Another Stereo Image Multi-Object Tracker

This is a program that parses the output from a typical object
detector (Yolo and RetinaNet are supported) applied to video or a
series of still images, and tracks the objects over time.

The usage is probably best illustrated by the test cases or the help
text generated by `yasmot --help`.

## Tracking

As usual, tracking consists of defining a distance measure between
object detection bounding boxes, and then using the Hungarian
algorithm to find the best pairing.

The input is either one or more directories of individual frame
annotations (as produced by YOLO), or one or more CSV-files with a
table of annotations (as produced by e.g. RetinaNet).

Output is a list of tracks (i.e. the set of detections for each
object), with a summary of the class detections for each track.  Then
follows a list of frames with the object detections labeled with
track ID, which may be redirected to a file with `-o`.

## Interpolation

Missing detections can be interpolated by specifying the
`--interpolate` option, the interpolated (inferred) detections will
have a probability of 0.0000.

## Stereo images

The `-s` option links objects taken with a stereoscopic camera setup.
Normally tracks will be generated, but the `--no-track` option can be
specified to only link detections between the cameras, and not in time.

## Using pixel-based coordinates

YOLO outputs image coordinates as fractional images, i.e. values range
from 0 to 1.  Some object detectors (including RetinaNet) outputs a
CSV file with pixel-based coordinates.  Since yasmot may not have the
images available, you therefore need to specify the pixel size of the
images, e.g. as `--shape 1228,1027`.

## Consensus predictions

If you run multiple object detectors, it may be useful to combine the
outputs into a consensus set of predictions.  This can be achieved
by specifying the `-c` option.  Again you can use `--no-track` if you
just want the frame-by-frame consensus and not perform tracking as
well.

## Controlling tracking parameters

The `--scale` parameter controls how the different bounding box pairs
are ranked when considered for tracking (or stereo matching).  The
algorithm uses a Gaussian score for position and size, and this
parameter controls the sharpness (or temperature) of the Gaussian.
Generally, if you have large changes between frames (rapidly moving
objects or low frame rate, you can try reducing this parameter.

Tracks are maintained across missing detections, this is controlled by
the parameter `--max_age`.  The age is determined based on the frame
name, and unless the frame name is a plain number, the extraction can
be specified with `--time_pattern`.

In case there are classes representing an unknown or unidentified
object, it is possible to specify the label with the `--unknown`
parameter to avoid having this class be called as a consensus class.
