Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Follow publication

Web scraping with Selenium: Why your code works but returns an empty list still.

scrapingexpert.com

In this article, I’ll show how I scraped the restaurant, type, rating, popular items and price details off a page on Doordash, San Diego’s website. while maintaining the data integrity such that Restaurant A did not end up with Restaurant B’s menu or Restaurant D’s prices. This article states and solves some reasons why your list keeps returning empty even when your code is right. It could be because:

  1. The Menu and Prices are hidden until a dropdown button is clicked. The algorithm needs to click on this button so the details can be collected and parsed by Beautiful soup.
  2. Unlike other parameters like restaurant that had one value per tag for one store, Menu and prices had between two and three values each in one tag for one store.
  3. The multiple values per tag of Menu and Prices creates nested lists making it impossible to use get_text() function to get the values.

Selenium is that master key you want to have when it comes to scraping website. Since its process is human-like, it can bye-pass most firewalls meant to prevent libraries like Beautiful soup. The thing about websites is they are like human faces, they have unique features of their own, so when you are using Selenium to scrape, you really need to take the html/element inspection aspect serious, else you’ll be plunged down a rabbit hole and become madder than the mad hatter himself.

I handled the Menu and price value by:

  1. storing them in separate lists, without calling the get_text() function
  2. I then transformed each list into separate data frame
  3. A function was applied to all columns in each data frame to get the text values with an if condition to bypass None values.
  4. The columns in each data frame are combined to a single column
  5. Both processed data frames are merged with data frame containing the rest parameters through index.

Below is the full code.

Lobatan! (it translates to ‘All done!” in Yoruba)

After converting the final df to csv file, Below is a screenshot of my csv file, you can see the Restaurant, type, location, rating, Menu and corresponding prices.

Doordash csv file screen shot.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

No responses yet

Write a response