Cities
Scrape Wikipedia for the population and other properties of cities worldwide
scrape(cities)
Scrape the population of selected cities worldwide from Wikipedia
Parameters:
-
cities
(list
) –List of city names to scrape
Returns:
-
df
(DataFrame
) –A DataFrame with the columns "city_name", "country_code", "population", "timestamp_population", "longitude", "latitude", and "timezone"
Warns:
-
UserWarning
–If the scraping fails for a city
get_country(soup)
Get the country name from the Wikipedia article
Parameters:
-
soup
(BeautifulSoup
) –BeautifulSoup object for the Wikipedia article
Returns:
-
country
(str
) –Name of the country
get_country_code(soup, country)
get_timezone(soup, country_code)
get_geo(soup)
get_population(soup)
Get the population of the city from the Wikipedia article
Parameters:
-
soup
(BeautifulSoup
) –BeautifulSoup object for the Wikipedia article
Returns:
-
population
(int
) –Population of the city
get_population_year(soup)
Get the year of the population data from the Wikipedia article
Parameters:
-
soup
(BeautifulSoup
) –BeautifulSoup object for the Wikipedia article
Returns:
-
year
(int
) –Year of the population data
get_soup(article)
Get the BeautifulSoup object for a given URL
Parameters:
-
article
(str
) –URL slug for Wikipedia article to parse
Returns:
-
soup
(BeautifulSoup
) –BeautifulSoup object for the Wikipedia article
Raises:
-
ValueError
–If the Wikipedia article is not found