Metadata-Version: 2.4
Name: cs-mediainfo
Version: 20260531
Summary: Simple minded facilities for media information inferred from filenames. This contains mostly lexical functions for extracting information from strings or constructing media filenames from metadata and a few classes like `EpisodeInfo` and `SeriesEpisodeInfo` for common descriptions.
Keywords: python2,python3
Author-email: Cameron Simpson <cs@cskk.id.au>
Description-Content-Type: text/markdown
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Requires-Dist: cs.deco>=20260525
Requires-Dist: cs.gimmicks>=20260311
Requires-Dist: cs.lex>=20260526
Requires-Dist: cs.pfx>=20250914
Requires-Dist: cs.tagset>=20260531
Project-URL: MonoRepo Commits, https://bitbucket.org/cameron_simpson/css/commits/branch/main
Project-URL: Monorepo Git Mirror, https://github.com/cameron-simpson/css
Project-URL: Monorepo Hg/Mercurial Mirror, https://hg.sr.ht/~cameron-simpson/css
Project-URL: Source, https://github.com/cameron-simpson/css/blob/main/lib/python/cs/mediainfo.py

Simple minded facilities for media information inferred from filenames.
This contains mostly lexical functions for extracting information from strings
or constructing media filenames from metadata and a few classes
like `EpisodeInfo` and `SeriesEpisodeInfo` for common descriptions.

*Latest release 20260531*:
Bugfix a parse return value, small doc updates.

The default filename parsing rules are based on my personal convention,
which is to name media files as:

  series_name--episode_info--title--source--etc-etc.ext

where the components are:
* `series_name`:
  the programme series name downcased and with whitespace replaced by dashes;
  in the case of standalone items like movies this is often the studio.
* `episode_info`: a structured field with episode information:
  `s`_n_ is a series/season,
  `e`_n_` is an episode number within the season,
  `x`_n_` is a "extra" - addition material supplied with the season,
  etc.
* `title`: the episode title downcased and with whitespace replaced by dashes.
* `source`: the source of the media.
* `ext`: filename extension such as `mp4`.

As you may imagine,
as a rule I dislike mixed case filenames
and filenames with embedded whitespace.
I also like a media filename to contain enough information
to identify the file contents in a compact and human readable form.

Short summary:
* `EpisodeDatumDefn`: An `EpisodeInfo` marker definition with the following components: - `name`: the marker name, such as `"series"` or `"episode"` - `prefix`: the stub used in a filename, such as `"s"` or `"e"` - `re`: a regular expression to match the `prefix` an some digits.
* `EpisodeInfo`: Trite class for episodic information, used to store, match or transcribe series/season, episode, etc values.
* `main`: Main command line running some test code.
* `parse_name`: Parse the descriptive part of a filename (the portion remaining after stripping the file extension) and yield `(part,fields)` for each part as delineated by `sep`.
* `part_to_title`: Convert a filename part into a title string.
* `pathname_info`: Parse information from the basename of a file pathname. Return a mapping of field => values in the order parsed.
* `scrub_title`: Strip redundant text from the start of an episode title.
* `SeriesEpisodeInfo`: Episode information from a TV series episode.
* `title_to_part`: Convert a title string into a filename part. This is lossy; the `part_to_title` function cannot completely reverse this.

Module contents:
- <a name="EpisodeDatumDefn"></a>`class EpisodeDatumDefn(EpisodeDatumDefn)`: An `EpisodeInfo` marker definition with the following components:
  - `name`: the marker name, such as `"series"` or `"episode"`
  - `prefix`: the stub used in a filename, such as `"s"` or `"e"`
  - `re`: a regular expression to match the `prefix` an some digits

*`EpisodeDatumDefn.parse(self, s, offset=0)`*:
Parse an episode datum from a string, return the value and new offset.
Raise `ValueError` if the string doesn't match this definition.

Parameters:
* `s`: the string
* `offset`: parse offset, default 0
- <a name="EpisodeInfo"></a>`class EpisodeInfo(types.SimpleNamespace)`: Trite class for episodic information, used to store, match
  or transcribe series/season, episode, etc values.

*`EpisodeInfo.__getitem__(self, name)`*:
We can look up values by name.

*`EpisodeInfo.as_dict(self)`*:
Return the episode info as a `dict`.

*`EpisodeInfo.as_tags(self, prefix=None)`*:
Generator yielding the episode info as `Tag`s.

*`EpisodeInfo.from_filename_part(s, offset=0)`*:
Factory to return an `EpisodeInfo` from a filename episode field.

Parameters:
* `s`: the string containing the episode information
* `offset`: the start of the episode information, default 0

The episode information must extend to the end of the string
because the factory returns just the information. See the
`parse_filename_part` class method for the core parse.

*`EpisodeInfo.get(self, name, default=None)`*:
Look up value by name with default.

*`EpisodeInfo.parse_filename_part(s, offset=0)`*:
Parse episode information from a string,
returning the matched fields and the new offset.

Parameters:
`s`: the string containing the episode information.
`offset`: the starting offset of the information, default 0.

*`EpisodeInfo.season`*:
.season property, synonym for .series
- <a name="main"></a>`main(argv=None)`: Main command line running some test code.
- <a name="parse_name"></a>`parse_name(name, sep='--')`: Parse the descriptive part of a filename
  (the portion remaining after stripping the file extension)
  and yield `(part,fields)` for each part as delineated by `sep`.
- <a name="part_to_title"></a>`part_to_title(part)`: Convert a filename part into a title string.

  Example:

      >>> part_to_title('episode-name')
      'Episode Name'
- <a name="pathname_info"></a>`pathname_info(pathname)`: Parse information from the basename of a file pathname.
  Return a mapping of field => values in the order parsed.
- <a name="scrub_title"></a>`scrub_title(title: str, *, season=None, episode=None) -> str`: Strip redundant text from the start of an episode title.

  I frequently get "title" strings with leading season/episode information.
  This function cleans up these strings to return the unadorned title.
- <a name="SeriesEpisodeInfo"></a>`class SeriesEpisodeInfo(cs.deco.Promotable)`: Episode information from a TV series episode.

*`SeriesEpisodeInfo.as_dict(self)`*:
Return the non-`None` values as a `dict`.
Note that this uses `dataclasses.asdict()` and as such is a deep copy.

*`SeriesEpisodeInfo.from_str(episode_title: str, series=None)`*:
Infer a `SeriesEpisodeInfo` from an episode title.

This recognises the common `'sSSeEE - Episode Title'` format
and variants like `Series Name - sSSeEE - Episode Title'`
or `'sSSeEE - Episode Title - Part: One'`.
- <a name="title_to_part"></a>`title_to_part(title)`: Convert a title string into a filename part.
  This is lossy; the `part_to_title` function cannot completely reverse this.

  Example:

      >>> title_to_part('Episode Name')
      'episode-name'

# Release Log



*Release 20260531*:
Bugfix a parse return value, small doc updates.

*Release 20240519*:
Initial PyPI release, particularly for SeriesEpisodeInfo which I use in cs.app.playon.
