Skip to content

Cities

Scrape Wikipedia for the population and other properties of cities worldwide

scrape(cities)

Scrape the population of selected cities worldwide from Wikipedia

Parameters:

  • cities (list) –

    List of city names to scrape

Returns:

  • df ( DataFrame ) –

    A DataFrame with the columns "city_name", "country_code", "population", "timestamp_population", "longitude", "latitude", and "timezone"

Warns:

get_country(soup)

Get the country name from the Wikipedia article

Parameters:

  • soup (BeautifulSoup) –

    BeautifulSoup object for the Wikipedia article

Returns:

  • country ( str ) –

    Name of the country

get_country_code(soup, country)

Get the country code from the Wikipedia article

Parameters:

  • soup (BeautifulSoup) –

    BeautifulSoup object for the Wikipedia article

  • country (str) –

    Name of the country

Returns:

  • country_code ( str ) –

    ISO 3166-1 alpha-2 country code

get_timezone(soup, country_code)

Get the timezone from the Wikipedia article

Parameters:

  • soup (BeautifulSoup) –

    BeautifulSoup object for the Wikipedia article

  • country_code (str) –

    ISO 3166-1 alpha-2 country code

Returns:

  • tz ( str ) –

    Timezone for the country

get_geo(soup)

Get the latitude and longitude from the Wikipedia article

Parameters:

  • soup (BeautifulSoup) –

    BeautifulSoup object for the Wikipedia article

Returns:

  • latitude ( float ) –

    Latitude of the city

  • longitude ( float ) –

    Longitude of the city

get_population(soup)

Get the population of the city from the Wikipedia article

Parameters:

  • soup (BeautifulSoup) –

    BeautifulSoup object for the Wikipedia article

Returns:

  • population ( int ) –

    Population of the city

get_population_year(soup)

Get the year of the population data from the Wikipedia article

Parameters:

  • soup (BeautifulSoup) –

    BeautifulSoup object for the Wikipedia article

Returns:

  • year ( int ) –

    Year of the population data

get_soup(article)

Get the BeautifulSoup object for a given URL

Parameters:

  • article (str) –

    URL slug for Wikipedia article to parse

Returns:

  • soup ( BeautifulSoup ) –

    BeautifulSoup object for the Wikipedia article

Raises:

  • ValueError

    If the Wikipedia article is not found