Metadata-Version: 2.4
Name: links-extractor-cli
Version: 1.4.0
Summary: Extract all internal and external links from a URL.
Author-email: Devharsh Trivedi <devharsh.1592@gmail.com>
License: GPL-3.0
Project-URL: Homepage, https://github.com/com-puter-tips/Links-Extractor
Project-URL: Repository, https://github.com/com-puter-tips/Links-Extractor
Project-URL: Issues, https://github.com/com-puter-tips/Links-Extractor/issues
Keywords: links,link-extractor,url,url-parsing,web-scraping,seo
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Text Processing :: Markup :: HTML
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Requires-Dist: beautifulsoup4
Requires-Dist: lxml
Dynamic: license-file

# Links-Extractor

[![License: GPL v3](https://img.shields.io/badge/License-GPLv3-blue.svg)](https://www.gnu.org/licenses/gpl-3.0)

Extract all internal and external links from a URL in Python.

## Description

Links-Extractor fetches one or more web pages and lists the internal and external hyperlinks found on each page. A link is treated as internal when its host matches the host of the page being scanned, and external otherwise. Empty anchors and `javascript:`, `mailto:`, and `tel:` links are ignored.

## Install

```
pip install links-extractor
```

This installs the `links-extractor` command. You can also run the script directly from a clone (`python3 extractor.py ...`).

## Requirements

- Python 3
- Dependencies: `requests`, `beautifulsoup4`, `lxml`

Install them with:

```
pip install -r requirements.txt
```

## Usage

Pass one or more URLs as arguments:

```
links-extractor https://example.com
python3 extractor.py https://example.com
python3 extractor.py https://example.com https://www.python.org
```

Redirect the output to a file:

```
python3 extractor.py https://example.com > out.txt
```

For each URL the script prints the count and list of internal links followed by the count and list of external links.

A full write-up is available at http://com.puter.tips/2016/12/extract-all-internal-and-external-links.html

You may also find the companion project useful: https://github.com/com-puter-tips/SEO-Analysis

## Citation

If you use this software, please cite it using the metadata in [CITATION.cff](CITATION.cff).

## License

Distributed under the GNU General Public License v3.0. See [LICENSE](LICENSE).
