Yelp for Business Scrapper- Yelp.com is a reliable source for extracting information about local businesses such as restaurants, shops, Home Services, Car Services, etc. You can use web scraping to extract yelp data such as phone numbers, updates, address, etc. The scraper we build. in this tutorial we will cover Yelp data for any keyword and location.
First, we will create a python scraper to extract the Yelp business listing page from the Yelp search results keyword and location (zip code, region, city). This scraper will extract information such as business name, rank, rating, revision value, and business URL.
Then we will create a Yelp business details scraper that will extract data from the Yelp business URL based on the URLs extracted from the original scraper. This will crawl the Yelp business pages for information such as business name, contact details, working hours, and services.
How to Build a Scraper to Scrape Yelp for Business Data
- Construct the URL for the search results page from Yelp to extract the business listings data. (example – https://www.yelp.com/search?find_desc=Restaurants&find_loc=Washington%2C+DC&ns=1.)
- Download the HTML of the search result page using Python Requests and parse the page using LXML.
- Save the business listings data to a CSV file and select a business URL from the scraped data.
- Download the HTML of the selected business URL using Requests and parse it using LXML.
- Scrape Yelp data and save it as a JSON file.
Yelp for Business Listings Scraper
We will be extracting the following details from a business listing page from Yelp:
- Business Name
- Number of Reviews
- Price Range
- Business URL
Below is a screenshot of the data we will be extracting from Yelp.
Yelp for Business Directory Data Scraping – Scrape or Extract Yelp listing Business Directory Data
Yelp is the best approachable source to get new clients particularly when one is looking at the local businesses. At iWeb Scraping, we offer the best Yelp business contact data scraping services that are very helpful to scrape business data to increase client engagements through Yelp Directory.
1. Install Python 3 and Pip:
For this web scraping tutorial using Python 3, we will need some packages for downloading and parsing the HTML. Below are the package requirements:
- PIP to install the following packages in Python (https://pip.pypa.io/en/stable/installing/)
- Python Requests, to make requests and download the HTML content of the pages (http://docs.python-requests.org/en/master/user/install/).
- Python LXML, for parsing the HTML Tree Structure using Xpaths (Learn how to install that here – http://lxml.de/installation.html)
- UnicodeCSV for handling Unicode characters in the output file. Install it using pip install unicodecsv.
3. Constructing Input URL
We will need to input a search result URL to the scraper. For example, here is the one for Washington- https://www.yelp.com/search?find_desc=Restaurants&find_loc=Washington%2C+DC&ns=1.
We’ll have to create this URL manually to scrape the business listings from that page.
The Code to Scrape Yelp Data
You can download the code from the GitHub link https://gist.github.com/scrapehero/8c61789f3f0c9d1dbc6859b635de2e4f
If you would like the code in Python 2.7 check out the link at https://gist.github.com/scrapehero/bde7d6ec5f1cb62b8482f2b2b4ca1a94.
Running the Scraper
Save the script with any name, we saved this as yelp_search.py. If you type in the script name in command prompt or terminal along with a -h:
usage: yelp_search.py [-h] place keyword positional arguments: place Location/ Address/ zip code keyword Any keyword optional arguments: -h, --help show this help message and exit
A keyword is any type of business. You can use any business type available in Yelp.com such as – Restaurants, Health, Home Services, Hotels, Education, etc.
Run the script using python with arguments for
keyword. The argument for
place can be provided as a location, address, or zip code.
Here is how to run the command to find top 10 restaurants in Washington D.C. Put the arguments as
20001 for place and
Restaurants for keyword:
python3 yelp_search.py 20001 Restaurants
This should create a CSV file called
scraped_yelp_results_for_20001.csv that will be in the same folder as the script.
Here is some sample data extracted from Yelp.com for the command above.
You can download the code at https://gist.github.com/scrapehero/8c61789f3f0c9d1dbc6859b635de2e4f
Let us know in the comments how this scraper worked for you.