Changelog

This changelog mostly follows keep a changelog. Release numbering mostly follows Semantic Versioning.

Unreleased

  • None

Contributions always welcomed!

Version 3.0.0 (2020-09-15)

Milestone

Initial release of webchanges as an updated fork of urlwatch 2.21. Changes below are relative to urlwatch 2.21

Added

  • You can now specify just a url and the “just works for web” philosophy optimizes the monitoring of text in webpages by applying all necessary filters TODO

  • If no job name is provided, the title of an HTML page will be used for a job name in reports

  • The Python html2text package (used by the html2text filter, previously known as pyhtml2text) is now initialized with the following purpose-optimized non-default options: unicode_snob = True, body_width = 0, single_line_break = True, and ignore_images = True

  • The output from html2text filter is reconstructed into HTML (for html reports), preserving basic formatting such as bolding, italics, list bullets, etc. as well as making links clickable

  • The formatting of HTML has been made radically more legible and useful, including long lines wrapping around

  • Reports are now rendered correctly by HTML email clients who reformat style sheets

  • Filter format-xml reformats (pretty-print) XML

  • webchanges --errors will run all jobs and list all errors and empty responses (after filtering)

  • Browser jobs now recognize cookies, headers, http_proxy, https_proxy, and timeout

  • Can select the revision number of Chromium browser to use with chromium_revision

  • Can set the user directory for the Chromium browser with user_data_dir

  • Chromium can be directed to ignore HTTPs errors with ignore_https_errors

  • Chromium can be directed as to when to consider a page loaded with wait_until

  • Additional command line switches can be passed to Chromium with switches

  • New report filters additions_only and deletions_only allow to track only content that was added (or deleted) from the source

  • Support for Python 3.9

  • Backward compatibility with urlwatch 2.21

Changed and deprecated

  • Navigation by full browser is now accomplished by specifying the url and adding the use_browser: true directive

  • The navigate directive has been deprecated for clarity and will trigger a warning *TODO*

  • The name of the default job’s configuration file has been changed to jobs.yaml; if at program launch urls.yaml is found and no jobs.yaml exists, it is copied over for backward-compatibility

  • The name of the default program configuration file has been changed to config.yaml; if at program launch urlwatch.yaml is found and no config.yaml exists, it is copied over for backward-compatibility.

  • The location of config files in Windows has been moved to %USERPROFILE%\Documents\webchanges where they can be more easily edited (they are indexed there) and backed up

  • The html2text filter defaults to using the Python html2text package (with optimized defaults)

  • New additions_only directive to report only added lines (useful when monitoring only new content)

  • New deletions_only directive to report only deleted lines

  • keyring and cssselect Python packages are no longer installed by default

  • html2text and markdown2 Python packages are installed by default

  • Installation of Python packages required by a feature is now made easier with pip extras

  • The html2text filter’s re method has been renamed strip_tags, which is deprecated and will trigger a warning

  • The grep filter has been renamed keep_lines_containing, which is deprecated and will trigger a warning

  • The grepi filter has been renamed delete_lines_containing, which is deprecated and will trigger a warning

  • Both the keep_lines_containing and delete_lines_containing accept text (default) in addition to re (regular expressions)

  • --test command line switch is used to test a job (formerly --test-filter, deprecated)

  • --test-diff command line switch is used to test a jobs’ diff (formerly --test-diff-filter, deprecated)

  • -V command line switch added as an alias to --version

  • If a filename for --jobs, --config or --hooks is supplied without a path and the file is not present in the current directory, webchanges now looks for it in the default configuration directory

  • In Windows, --edit defaults to using built-in notepad.exe if %EDITOR% or %VISUAL% are not set

  • When using --job command line switch, if there’s no file by that name in the specified directory will look in the default one before giving up.

  • The use of the kind directive in jobs.yaml configuration files has been deprecated (but is, for now, still used internally)

  • The database (cache) file is backed up at every run to *.bak

  • The list of default and optional dependencies has been updated (see documentation) to enable “Just works”

  • Dependencies are now specified as PyPi extras to simplify their installation

  • Changed timing from datetime to timeit.default_timer

  • Upgraded concurrent execution loop to concurrent.futures.ThreadPoolExecutor.map

  • Reports’ elapsed time now always has at least 2 significant digits

  • Using flake8 to the test suite

Removed

  • The html2text filter’s lynx method is no longer supported; use html2text instead

  • Python 3.5 (obsoleted by 3.6 on December 23, 2016) is no longer supported

Fixed

  • The html2text filter’s html2text method defaults to unicode handling

  • HTML href links ending with spaces are no longer broken by xpath replacing spaces with %20

  • Initial config file no longer has directives sorted alphabetically, but are saved logically (e.g. enabled is always the first sub-directive

Security

  • None

Documentation changes

  • Complete rewrite

Known bugs

  • An empty report will still be generated for a job when no reportable changes survive the additions_only or deletions_only report filters