kedro.extras.datasets.api.APIDataSet¶
-
class
kedro.extras.datasets.api.
APIDataSet
(url, method='GET', data=None, params=None, headers=None, auth=None, json=None, timeout=60)[source]¶ Bases:
kedro.io.core.AbstractDataSet
APIDataSet
loads the data from HTTP(S) APIs. It uses the python requests library: https://requests.readthedocs.io/en/master/Example:
from kedro.extras.datasets.api import APIDataSet data_set = APIDataSet( url="https://quickstats.nass.usda.gov" params={ "key": "SOME_TOKEN", "format": "JSON", "commodity_desc": "CORN", "statisticcat_des": "YIELD", "agg_level_desc": "STATE", "year": 2000 } ) data = data_set.load()
Attributes
Methods
Checks whether a data set’s output already exists by calling the provided _exists() method.
APIDataSet.from_config
(name, config[, …])Create a data set instance using the configuration provided.
Loads data by delegation to the provided load method.
Release any cached data.
APIDataSet.save
(data)Saves data by delegation to the provided save method.
-
__init__
(url, method='GET', data=None, params=None, headers=None, auth=None, json=None, timeout=60)[source]¶ Creates a new instance of
APIDataSet
to fetch data from an API endpoint.- Parameters
url (
str
) – The API URL endpoint.method (
str
) – The Method of the request, GET, POST, PUT, DELETE, HEAD, etc…data (
Optional
[Any
]) – The request payload, used for POST, PUT, etc requests https://requests.readthedocs.io/en/master/user/quickstart/#more-complicated-post-requestsparams (
Optional
[Dict
[str
,Any
]]) – The url parameters of the API. https://requests.readthedocs.io/en/master/user/quickstart/#passing-parameters-in-urlsheaders (
Optional
[Dict
[str
,Any
]]) – The HTTP headers. https://requests.readthedocs.io/en/master/user/quickstart/#custom-headersauth (
Union
[Tuple
[str
],AuthBase
,None
]) – Anythingrequests
accepts. Normally it’s either('login', 'password')
, orAuthBase
,HTTPBasicAuth
instance for more complex cases.json (
Union
[List
,Dict
[str
,Any
],None
]) – The request payload, used for POST, PUT, etc requests, passed in to the json kwarg in the requests object. https://requests.readthedocs.io/en/master/user/quickstart/#more-complicated-post-requeststimeout (
int
) – The wait time in seconds for a response, defaults to 1 minute. https://requests.readthedocs.io/en/master/user/quickstart/#timeouts
-
exists
()¶ Checks whether a data set’s output already exists by calling the provided _exists() method.
- Return type
bool
- Returns
Flag indicating whether the output already exists.
- Raises
DataSetError – when underlying exists method raises error.
-
classmethod
from_config
(name, config, load_version=None, save_version=None)¶ Create a data set instance using the configuration provided.
- Parameters
name (
str
) – Data set name.config (
Dict
[str
,Any
]) – Data set config dictionary.load_version (
Optional
[str
]) – Version string to be used forload
operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.save_version (
Optional
[str
]) – Version string to be used forsave
operation if the data set is versioned. Has no effect on the data set if versioning was not enabled.
- Return type
AbstractDataSet
- Returns
An instance of an
AbstractDataSet
subclass.- Raises
DataSetError – When the function fails to create the data set from its config.
-
load
()¶ Loads data by delegation to the provided load method.
- Return type
Any
- Returns
Data returned by the provided load method.
- Raises
DataSetError – When underlying load method raises error.
-
release
()¶ Release any cached data.
- Raises
DataSetError – when underlying release method raises error.
- Return type
None
-
save
(data)¶ Saves data by delegation to the provided save method.
- Parameters
data (
Any
) – the value to be saved by provided save method.- Raises
DataSetError – when underlying save method raises error.
- Return type
None
-