Web Scraping vs. API: Scraping Weather Underground Data with the Weather API (2024)

4 min read

·

Mar 16, 2023

--

In my last article, I wrote about web scraping weather data from Weather Underground using Beautiful Soup. The reasoning behind this, was I found that the charts generated from weather underground didn’t meet my needs for historical data analysis.

Web Scraping vs. API: Scraping Weather Underground Data with the Weather API (2)

As a result, I turned to my friend Jake Fitzsimmons for advice on using the Weather Underground API key to retrieve historical weather data.

In this article, I’ll walk you through the code that Jake Fitzsimmons helped me optimise, which uses the Weather Underground API to retrieve weather data for a specific station ID and date range. We’ll also discuss how this approach compares to web scraping and the benefits of using an API.

Please visit my github for the code.

import requests
import json
import pandas as pd

# function to get wind direction based on degrees
def get_wind_direction(degrees):
direction_names = ["N", "NNE", "NE", "ENE", "E", "ESE", "SE", "SSE", "S", "SSW", "SW", "WSW", "W", "WNW", "NW", "NNW"]
index = round(degrees / (360. / len(direction_names))) % len(direction_names)
return direction_names[index]

# URL to access weather data. Add this from your json packet when inspecting the source.
url = 'https://################'

# station ID to get weather data for. You will need to add your own station
station_id = 'IFLETC15'

# start and end dates to get weather data for
start_date = '20230205'
end_date = '20230314'

# API key to access weather data. You will need to add your own api key
api_key = '############################'

# create a list of dates to get weather data for
dates = pd.date_range(start=start_date, end=end_date, freq='D')

# create an empty list to store all the weather data
all_data = []

# loop through all the dates and get weather data for each date
for date in dates:
# set the parameters for the API request
params = {
'stationId': station_id,
'format': 'json',
'units': 'm',
'date': date.strftime('%Y%m%d'),
'numericPrecision': 'decimal',
'apiKey': api_key
}

# send the API request and get the response
response = requests.get(url, params=params)

# check if the API request was successful
if response.status_code == 200:
# get the weather data from the response
data = json.loads(response.text)["observations"]
# loop through all the rows in the weather data and add wind direction data
for row in data:
metric = row.pop("metric")
row.update(metric)
row["wind_direction"] = get_wind_direction(row["winddirAvg"])

# add the weather data to the list of all data
all_data += data
else:
# print an error message if the API request failed
print(f'Request failed with status code {response.status_code}')

# create a Pandas DataFrame from the weather data and save it to a CSV file
df = pd.DataFrame.from_dict(all_data)
df.to_csv('weather_data1.csv', index=False)

The code begins by importing the necessary libraries — requests, json, and pandas — for API calls, data processing, and formatting respectively. It then defines a function get_wind_direction() that takes the wind direction in degrees and returns the direction name in cardinal direction (N, NNE, NE, ENE, E, ESE, SE, SSE, S, SSW, SW, WSW, W, WNW, NW, NNW).

The next few lines set the URL for the Weather Underground API, along with the station ID, start date, end date, and API key. We use pandas to create a date range for the given start and end dates.

The code then initializes an empty list called all_data, which will be used to store the retrieved weather data. It then loops through the dates in the date range and sends an API request for each date, specifying the station ID, format, units, numeric precision, and API key as parameters. If the request returns a status code of 200, the response data is parsed into JSON format and the observations data is extracted. For each observation, the metric data is removed from the dictionary and merged with the observation data. The wind direction is calculated using the get_wind_direction() function and added to the dictionary. The observation data is then added to the all_data list. If the request returns a status code other than 200, an error message is printed.

After all of the data has been retrieved, the all_data list is converted into a pandas DataFrame and saved as a CSV file called “weather_data1.csv” without index columns.

While web scraping is a viable method for retrieving data from websites, using an API is often more efficient and reliable. APIs are designed to provide a programmatic interface to access data from a remote server, and the data is typically returned in a standardised format such as JSON or XML. This makes it easier to parse and manipulate the data, compared to scraping HTML data which can be messy and prone to changes.

Additionally, many websites impose rate limits on web scraping to prevent excessive traffic and protect their server. Using an API can bypass these limitations since they are typically designed to handle large amounts of requests from multiple sources.

Next steps, I will start creating my Tableau Dashboard. The progress of this can be found on my Tableau Public profile.

Web Scraping vs. API: Scraping Weather Underground Data with the Weather API (3)

Once I am satisfied with my Tableau build, I will start pulling in data from other local weather stations to get a greater picture of the weather in Fletcher, NSW, Australia.

In this article, we explored how to use the Weather Underground API to retrieve historical weather data for a specific station ID and date range. We compared the benefits of using an API over web scraping and discussed how the retrieved data is processed and saved using pandas. I hope this article has provided insight into the power of APIs for data retrieval and encouraged you to explore the use of APIs in your own projects.

Web Scraping vs. API: Scraping Weather Underground Data with the Weather API (2024)

FAQs

Is API scraping the same as web scraping? ›

Is Using an API Considered Web Scraping? No, using an API isn't typically considered web scraping. While both can help you retrieve data from a site, scraping is about parsing HTML content to extract data from web pages, while APIs return data directly in a semi-structured format.

What is the difference between web scraping and data scraping? ›

Data scraping, in its most general form, refers to a technique in which a computer program extracts data from output generated from another program. Data scraping is commonly manifest in web scraping, the process of using an application to extract valuable information from a website.

Does Weather Underground have an API? ›

0. Weather Underground is a commercial weather service that provides real-time weather information via the Internet. To use Weather Underground, you will need to create an API key here.

Is web scraping API legal? ›

There are no specific laws prohibiting web scraping, and many companies employ it in legitimate ways to gain data-driven insights.

Which API is used for web scraping? ›

Wikipedia API

Let's say we want to gather some additional data about the Fortune 500 companies and since wikipedia is a rich source for data we decide to use the MediaWiki API to scrape this data.

Is web scraping still used? ›

With web scraping becoming a more common tool for companies across many industries, security providers are efficiently keeping up and constantly improving their antibot products. The trend of 40% of web traffic being bots is going nowhere in 2023.

Why is data scraping illegal? ›

While web scraping is not inherently illegal, how it is conducted and the data's subsequent use can raise legal and ethical concerns. Actions such as scraping copyrighted content and personal information without consent or engaging in activities that disrupt the normal functioning of a website may be deemed illegal.

What is better for web scraping? ›

Python is widely considered to be the best programming language for web scraping. That's because it has a vast collection of libraries and tools for the job, including BeautifulSoup and Scrapy. Also, Python's simple syntax makes it a great choice for beginners.

What is another name for web scraping? ›

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.

What is the alternative to Wunderground API? ›

The AerisWeather API is another great weather API and mapping platform that provides access to current weather conditions and forecast data. It also provides APIs for lightning, wildfires, and advanced weather maps with unique customization options. It uses JSON as the default output format.

What is the best API for weather? ›

The top weather APIs in 2024 include Tomorrow.io, OpenWeatherMap, MeteoGroup, Weatherstack, Weatherbit, Weather2020, AerisWeather, Accuweather, and Visual Crossing. The all-around best Weather API for 2024 is Tomorrow. io's Weather API, offering 80+ data layers and a top-rated interface.

What is the API limit for Weather Underground? ›

Unfortunately, Wunderground.com does not offer unrestricted access to the data of a PWS. API Keys are basically limited in use as follows: A maximum of 1500 calls per day. A maximum of 30 calls per minute.

Is API better than web scraping? ›

So, if flexibility and format control are crucial, scraping might be the way to go. If efficiency, reliability, and sanctioned data access are your priorities, then an API is the better choice.

Can I get sued for web scraping? ›

This makes navigating US privacy laws very complex, but there is one overarching theme that helps web scrapers – all the state laws have an exception for public personal data. So long as you are only scraping personal data that has been clearly made public, you will likely fall under the various US law exceptions.

Can web scraping be detected? ›

Application Security Manager (ASM) can identify web scraping attacks on web sites that ASM protects by using information gathered about clients through fingerprinting or persistent identification.

Is web scraping legal in apify? ›

Web scraping publicly available data is legal, but you should avoid scraping personal data or intellectual property.

What is web scraping also known as? ›

Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites.

What counts as web scraping? ›

Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database.

Why use scraper API? ›

Scraper APIs overcome the limitations of manual web scraping, such as dealing with website structure changes, encountering blocks and captchas, and the high costs associated with infrastructure maintenance.

Top Articles
Latest Posts
Article information

Author: Ouida Strosin DO

Last Updated:

Views: 6426

Rating: 4.6 / 5 (56 voted)

Reviews: 95% of readers found this page helpful

Author information

Name: Ouida Strosin DO

Birthday: 1995-04-27

Address: Suite 927 930 Kilback Radial, Candidaville, TN 87795

Phone: +8561498978366

Job: Legacy Manufacturing Specialist

Hobby: Singing, Mountain biking, Water sports, Water sports, Taxidermy, Polo, Pet

Introduction: My name is Ouida Strosin DO, I am a precious, combative, spotless, modern, spotless, beautiful, precious person who loves writing and wants to share my knowledge and understanding with you.