|
1 |
| -# web-scraping-selenium-python |
2 |
| -Web Scraping with Python Selenium: Tutorial for Beginners |
| 1 | +# Web Scraping with Python Selenium |
| 2 | + |
| 3 | +[<img src="https://img.shields.io/static/v1?label=&message=python&color=brightgreen" />](https://github.com/topics/python) [<img src="https://img.shields.io/static/v1?label=&message=selenium&color=blue" />](https://github.com/topics/selenium) [<img src="https://img.shields.io/static/v1?label=&message=Web%20Scraping&color=important" />](https://github.com/topics/web-scraping) |
| 4 | +- [Installing Selenium](#installing-selenium) |
| 5 | +- [Testing](#testing) |
| 6 | +- [Scraping with Selenium](#scraping-with-selenium) |
| 7 | + |
| 8 | +In this article, we’ll cover an overview of web scraping with Selenium using a real-life example. |
| 9 | + |
| 10 | +For a detailed tutorial on Selenium, see [our blog](https://oxylabs.io/blog/selenium-web-scraping). |
| 11 | + |
| 12 | +## Installing Selenium |
| 13 | + |
| 14 | +1. Create a virtual environment: |
| 15 | + |
| 16 | +```sh |
| 17 | +python3 -m venv .env |
| 18 | +``` |
| 19 | + |
| 20 | +2. Install Selenium using pip: |
| 21 | + |
| 22 | +```sh |
| 23 | +pip install selenium |
| 24 | +``` |
| 25 | + |
| 26 | +3. Install Selenium Web Driver. See [this page](https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/) for details. |
| 27 | + |
| 28 | +## Testing |
| 29 | + |
| 30 | +With virtual environment activated, enter IDLE by typing in `python3`. Enter the following command on IDLE: |
| 31 | + |
| 32 | +```python |
| 33 | +>>> from selenium.webdriver import Chrome |
| 34 | + |
| 35 | +``` |
| 36 | + |
| 37 | +If there are no errors, move on to the next step. If there is an error, ensure that `chromedriver` is added to the PATH. |
| 38 | + |
| 39 | +## Scraping with Selenium |
| 40 | + |
| 41 | +Import required modules as follows: |
| 42 | + |
| 43 | +```python |
| 44 | +from selenium.webdriver import Chrome, ChromeOptions |
| 45 | +from selenium.webdriver.common.by import By |
| 46 | +``` |
| 47 | + |
| 48 | +Add the skeleton of the script as follows: |
| 49 | + |
| 50 | +```python |
| 51 | +def get_data(url) -> list: |
| 52 | + ... |
| 53 | + |
| 54 | + |
| 55 | +def main(): |
| 56 | + ... |
| 57 | + |
| 58 | +if __name__ == '__main__': |
| 59 | + main() |
| 60 | +``` |
| 61 | + |
| 62 | +Create ChromeOptions object and set `headless` to `True`. Use this to create an instance of `Chrome`. |
| 63 | + |
| 64 | +```python |
| 65 | + browser_options = ChromeOptions() |
| 66 | + browser_options.headless = True |
| 67 | + |
| 68 | + driver = Chrome(options=browser_options) |
| 69 | +``` |
| 70 | + |
| 71 | +Call the `driver.get` method to load a URL. After that, locate the link for the Humor section by link text and click it: |
| 72 | + |
| 73 | +```python |
| 74 | + driver.get(url) |
| 75 | + |
| 76 | + element = driver.find_element(By.LINK_TEXT, "Humor") |
| 77 | + element.click() |
| 78 | +``` |
| 79 | + |
| 80 | +Create a CSS selector to find all books from this page. After that run a loop on the books and find the bookt title, price, stock availability. Use a dictionary to store one book information and add all these dictionaries to a list. See the code below: |
| 81 | + |
| 82 | +```python |
| 83 | + books = driver.find_elements(By.CSS_SELECTOR, ".product_pod") |
| 84 | + data = [] |
| 85 | + for book in books: |
| 86 | + title = book.find_element(By.CSS_SELECTOR, "h3 > a") |
| 87 | + price = book.find_element(By.CSS_SELECTOR, ".price_color") |
| 88 | + stock = book.find_element(By.CSS_SELECTOR, ".instock.availability") |
| 89 | + book_item = { |
| 90 | + 'title': title.get_attribute("title"), |
| 91 | + 'price': price.text, |
| 92 | + 'stock': stock. text |
| 93 | + } |
| 94 | + data.append(book_item) |
| 95 | + |
| 96 | +``` |
| 97 | + |
| 98 | +Lastly, return the `data` dictionary from this function. |
| 99 | + |
| 100 | +For the complete code, see [main.py](src/main.py). |
| 101 | + |
| 102 | +For a detailed tutorial on Selenium, see [our blog](https://oxylabs.io/blog/selenium-web-scraping). |
0 commit comments