artifician.extractors package
artifician.extractors.html_extractors module
- artifician.extractors.html_extractors.count_child_nodes(node: List[Union[str, bs4.element.Tag]]) int
Counts the number of child nodes for a given node.
- Args:
node (List[Union[str, Tag]]): The node list to count children for.
- Returns:
int: The number of child nodes.
- Raises:
TypeError: If the first element in the node list is not a bs4.element.Tag.
- artifician.extractors.html_extractors.get_child_attribute(node: List[Union[str, bs4.element.Tag]], attribute: str) str
Retrieves the value of a specified attribute from the first child of a given node.
- Args:
node (List[Union[str, Tag]]): The node list to get the child attribute from. attribute (str): The name of the attribute to retrieve.
- Returns:
str: The value of the attribute from the child node.
- Raises:
TypeError: If the first element in the node list is not a bs4.element.Tag.
- artifician.extractors.html_extractors.get_child_node_text(node: List[Union[str, bs4.element.Tag]]) str
Extracts text from the first child node of a given node.
- Args:
node (List[Union[str, Tag]]): The node list to extract child text from.
- Returns:
str: The text content of the child node.
- Raises:
TypeError: If the first element in the node list is not a bs4.element.Tag.
- artifician.extractors.html_extractors.get_node_attribute(node: List[Union[str, bs4.element.Tag]], attribute: str) str
Retrieves the value of a specified attribute from a given node.
- Args:
node (List[Union[str, Tag]]): The node list to get the attribute from. attribute (str): The name of the attribute to retrieve.
- Returns:
str: The value of the attribute.
- Raises:
TypeError: If the first element in the node list is not a bs4.element.Tag.
- artifician.extractors.html_extractors.get_node_text(node: List[Union[str, bs4.element.Tag]]) str
Extracts text from a given node.
- Args:
node (List[Union[str, Tag]]): The node list to extract text from.
- Returns:
str: The text content of the node.
- Raises:
TypeError: If the first element in the node list is not a bs4.element.Tag. ValueError: If the node list is empty.
- artifician.extractors.html_extractors.get_parent_attribute(node: List[Union[str, bs4.element.Tag]], attribute: str) str
Retrieves the value of a specified attribute from the parent of a given node.
- Args:
node (List[Union[str, Tag]]): The node list to get the parent attribute from. attribute (str): The name of the attribute to retrieve.
- Returns:
str: The value of the attribute from the parent node.
- Raises:
TypeError: If the first element in the node list is not a bs4.element.Tag.
- artifician.extractors.html_extractors.get_parent_node_text(node: List[Union[str, bs4.element.Tag]]) str
Extracts text from the parent node of a given node.
- Args:
node (List[Union[str, Tag]]): The node list to extract parent text from.
- Returns:
str: The text content of the parent node.
- Raises:
TypeError: If the first element in the node list is not a bs4.element.Tag.
- artifician.extractors.html_extractors.get_sibling_node_text(node: List[Union[str, bs4.element.Tag]]) str
Extracts text from the first sibling node of a given node.
- Args:
node (List[Union[str, Tag]]): The node list to extract sibling text from.
- Returns:
str: The text content of the sibling node.
- Raises:
TypeError: If the first element in the node list is not a bs4.element.Tag.