nhlscrapi: NHL Scraper API¶
Purpose¶
Provide a Python API for accessing NHL game data including play by play, game summaries, player stats et c. The library hides the guts of the NHL website scraping process and encapsulates not only the data gathering, but data output. This project is inspired by the R package nhlscrapr, an all around must for NHL analytics geeks and R power users.
nhlscrapi is in the early/initial stages, but will be updated regularly.
Installation¶
Getting started is as easy as:
pip install nhlscrapi
For more information on the setup, see the PyPi: nhlscrapi. The documentation for the package can be found here.
Usage Example¶
Scrape data for game 1226 of 2014, Ottawa vs Pittsburgh.
from nhlscrapi.games.game import Game, GameKey, GameType
from nhlscrapi.games.cumstats import Score, ShotCt, Corsi, Fenwick
season = 2014 # 2013-2014 season
game_num = 1226 #
game_type = GameType.Regular # regular season game
game_key = GameKey(season, game_type, game_num)
# define stat types that will be counted as the plays are parsed
cum_stats = {
'Score': Score(),
'Shots': ShotCt(),
'Corsi': Corsi(),
'Fenwick': Fenwick()
}
game = Game(game_key, cum_stats=cum_stats)
# also http requests and processing are lazy
# accumulators require play by play info so they parse the RTSS PBP
print('Final : {}'.format(game.cum_stats['Score'].total))
print('Shootout : {}'.format(game.cum_stats['Score'].shootout.total))
print('Shots : {}'.format(game.cum_stats['Shots'].total))
print('EV Shot Atts : {}'.format(game.cum_stats['Corsi'].total))
print('Corsi : {}'.format(game.cum_stats['Corsi'].share()))
print('FW Shot Atts : {}'.format(game.cum_stats['Fenwick'].total))
print('Fenwick : {}'.format(game.cum_stats['Fenwick'].share()))
# http req for roster report
# only parses the sections related to officials and coaches
print('\nRefs : {}'.format(game.refs))
print('Linesman : {}'.format(game.linesman))
print('Coaches')
print(' Home : {}'.format(game.home_coach))
print(' Away : {}'.format(game.away_coach))
# scrape all remaining reports
game.load_all()
Current Release: v0.3.5¶
This is a pre-release and is not stable and fully fit for production. The first full stable release (v1.0.0) will be made available once the framework for all NHL game reports are completed. Currently, Play-by-Play, Home/Away TOI, Roster and Face-off Comparison reports are functional.
License¶
The NHL Scraper API is a free Python library provided under Apache License version 2.0.
- Free software: Apache License, v2.0
- Documentation: (coming eventually)
Change log¶
v0.3.5¶
- dropped urllib2 dependency because it’s 2015 and I’m tired of being a dinosaur
- added
requests
to setup dependencies- fully qualified the
scrapr.NHLCn
import inscrapr.reportloader
- consolidated cli_opts.py into gamedata.py ... that whole thing needs a rewrite anyway (TODO)
v0.3.4¶
- setup script reference bug.
v0.3.3¶
- true bug fix. messed up the pypi upload setup
- forgot cfg et c.
v0.3.2¶
- refactored
Plays
/Strength
construct
- moved
Plays
andStrength
fromgames.plays
togames.playbyplay
- moved
scrapr.rtss.playparser.PlayParser
toscrapr.rtss
- deleted games/plays.py and scrapr/playparser.py
- reworked data structure of
PlayParser
to be purely a dict- parsed play data isn’t converted into the proper
Play
object untilgames.playbyplay.PlayByPlay
gets it- refactored TOI/ShiftSummary construct
- moved
ShiftSummary
fromscrapr.toirep
togames.toi
scrapr.toirep.TOIRepBase
now stores by player shift info as dict- parsed shift summary isn’t made into a
ShiftSummary
object until inTOI
- Goal of both big refactors was to keep scraping/raw web data as dicts and have object wrappers only exist in the games package
- added a
unittest
for the time on ice and shift summary info- added docstrings to major report and scraper interfaces
- built docs using Sphinx
v0.3.1¶
- fixed play-by-play bug created when no cum_stats provided
- deprecated extractors
- refactored GameKey and GameType into nhlscrapi.games.game
- updated unittests and README to reflect the refactoring
v0.3.0¶
- added face off comparison report, associated report loaded (scraper) and unittest
- gave Game object basic access/loading to face off comp
- reworked testing framework
- can now run tests w the standard
python -m unittest discover
- made versioning counter sane. structure is v(realease).(feature).(bug)
- added
lxml
to the install requirements in setup- added this change log