You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<a href="https://github.com/Smartproxy/Smartproxy"> :house: Main Repository :house: </a>
6
+
<a href="https://github.com/Decodo/Decodo"> Main Repository </a>
9
7
</p>
10
8
11
9
## Table of contents
@@ -23,7 +21,7 @@
23
21
24
22
## Disclaimer
25
23
26
-
The following tutorial is meant for educational purposes and introduces the basics of building a web scraping project using Smartproxy proxies. You can read more about the [Requests](https://requests.readthedocs.io/en/master/user/quickstart/) and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) libraries in their documentation to learn more about them and build upon this example.
24
+
The following tutorial is meant for educational purposes and introduces the basics of building a web scraping project using Decodo proxies. You can read more about the [Requests](https://requests.readthedocs.io/en/master/user/quickstart/) and [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) libraries in their documentation to learn more about them and build upon this example.
27
25
28
26
## What is web scraping with Python?
29
27
@@ -44,13 +42,13 @@ To run the example scraper, you're going to need [Python](https://www.python.org
44
42
To install the scraper example, run the following:
@@ -87,7 +85,7 @@ Once you know exactly what you want from the site, you can inspect those element
87
85
88
86
The Chrome DevTools will open and display the HTML structure of the page. You can manually search for the item you need or use the element picker tool in the top-left corner. Select it, hover over the item you need in the page and it'll find it in the HTML code. After a quick inspection, you can see that the main information on each book is located in the article element with a class name **product_pod**.
All of the data you'll need is nested in the **article** element. Now, let's inspect the price. We can see that the price value is the text of the paragraph with the **price_color** class. If you inspect the In stock part, you can see that it's a text value of the **instock availability** paragraph. You can check out other elements on the page and see how they're represented in the HTML. Once you're done, let's build a simple web scraper to extract this data through code.
@@ -110,20 +108,20 @@ Then, you'll need to write a GET request to retrieve the contents of the site. A
110
108
The ```requests.get``` function has only one required argument: the URL of the site you're targeting. However, you must pass in an additional proxy parameter because you'll want to use a proxy to reach the content. Declare these variables above your ```requests.get``` statement.
For the proxy, you first need to specify its kind, in this case, HTTP. Then, you have to enter the Smartproxy username and password, separated by a colon, and the endpoint you'll be using to connect to the proxy server. In this example, we're using residential proxies. You can get this information from the dashboard by following these steps:
116
+
For the proxy, you first need to specify its kind, in this case, HTTP. Then, you have to enter the Decodo username and password, separated by a colon, and the endpoint you'll be using to connect to the proxy server. In this example, we're using residential proxies. You can get this information from the dashboard by following these steps:
119
117
1. Open the proxy setup tab.
120
118
2. Navigate to the Endpoint generator.
121
119
3. Configure the parameters according to your needs. Set your authentication method, location, session type, and protocol.
122
120
4. Select the number of proxy endpoints you want to generate (you'll only need 1 for now).
0 commit comments