Get and analyse data from AURN

Get and merge data by a bounds of longitude and latitude

Users can give a bounds of longitude and latitude to download the station data.

Load the ObsAQ package and Define the range

[1]:
import obsaq

lon_min = -9
lon_max = 1.8
lat_min = 49
lat_max = 61
bounds=[lon_min, lon_max, lat_min, lat_max]

Get the station metadata

NOTE: This is the all station information and the next step will choose the stations by bounds.

[2]:
meta = obsaq.meta()
site_table = meta.get_metadata('aurn')

Choose the stations by range

[3]:
final_sites = meta.get_site(bounds=bounds)
final_sites.head(5)
Site is selected by bounds: [-9, 1.8, 49, 61]
[3]:
site_id site_name location_type latitude longitude parameter Parameter_name start_date end_date ratified_to zone agglomeration local_authority
0 ABD Aberdeen Urban Background 57.15736 -2.094278 O3 Ozone 2003-08-01 2021-09-20 2021-09-20 North East Scotland NaN Aberdeen City
1 ABD Aberdeen Urban Background 57.15736 -2.094278 NO Nitric oxide 1999-09-18 2021-09-20 2021-09-20 North East Scotland NaN Aberdeen City
2 ABD Aberdeen Urban Background 57.15736 -2.094278 NO2 Nitrogen dioxide 1999-09-18 2021-09-20 2021-09-20 North East Scotland NaN Aberdeen City
3 ABD Aberdeen Urban Background 57.15736 -2.094278 NOXasNO2 Nitrogen oxides as nitrogen dioxide 1999-09-18 2021-09-20 2021-09-20 North East Scotland NaN Aberdeen City
4 ABD Aberdeen Urban Background 57.15736 -2.094278 SO2 Sulphur dioxide 2001-01-01 2007-09-30 2007-09-30 North East Scotland NaN Aberdeen City

Start to download the merged targeted station data

Download the final merged file for all targeted station data using memory.

  • pollutant: See names of pollutants. Define one pollutant, diverse pollutants or all pollutants to download the data for them. Pollutants can be defined as “PM2.5”,”PM10”,”O3”,”NO”,”NO2”,”NOXasNO2” or “SO2”.

  • start: the start date of data to be downloaded.

  • end: the end date of data to be downloaded.

  • year: the year of data to be downloaded. Defaults to 2010.

  • output_dir: the directory to save the downloaded data.

  • download_mode: “Stream” for saving final and intermediate files while “memory” for only the final file.

  • save_per_site: whether save files for every station individually.

  • save_merged: whether save the merged file for all selected data.

  • add_site_id: whether include site id in the downloaded file.

Warnings are normal for processing data with different format. Ingoring them can be fine.

[4]:
merged_df = meta.download_sites(
    port="aurn",
    pollutant="PM2.5",
    start="2017-12-01",
    end="2018-11-30",
    output_dir="data/test_pm25_final",
    download_mode="memory",
    save_per_site=False,
    save_merged=True,
    add_site_id=True
)

     site_id        site_name     location_type   latitude  longitude  \
0        ABD         Aberdeen  Urban Background  57.157360  -2.094278
1        ABD         Aberdeen  Urban Background  57.157360  -2.094278
2        ABD         Aberdeen  Urban Background  57.157360  -2.094278
3        ABD         Aberdeen  Urban Background  57.157360  -2.094278
6        ABD         Aberdeen  Urban Background  57.157360  -2.094278
...      ...              ...               ...        ...        ...
3057    YK11  York Fishergate     Urban Traffic  53.951889  -1.075861
3058    YK11  York Fishergate     Urban Traffic  53.951889  -1.075861
3059    YK11  York Fishergate     Urban Traffic  53.951889  -1.075861
3060    YK11  York Fishergate     Urban Traffic  53.951889  -1.075861
3061    YK11  York Fishergate     Urban Traffic  53.951889  -1.075861

     parameter                             Parameter_name  start_date  \
0           O3                                      Ozone  2003-08-01
1           NO                               Nitric oxide  1999-09-18
2          NO2                           Nitrogen dioxide  1999-09-18
3     NOXasNO2        Nitrogen oxides as nitrogen dioxide  1999-09-18
6         PM10  PM10 particulate matter (Hourly measured)  1999-09-18
...        ...                                        ...         ...
3057     NV2.5       Non-volatile PM2.5 (Hourly measured)  2011-08-31
3058      V2.5           Volatile PM2.5 (Hourly measured)  2011-08-31
3059        wd                    Modelled Wind Direction  2010-08-01
3060        ws                        Modelled Wind Speed  2010-08-01
3061      temp                       Modelled Temperature  2010-08-01

        end_date ratified_to                    zone agglomeration  \
0     2021-09-20  2021-09-20     North East Scotland           NaN
1     2021-09-20  2021-09-20     North East Scotland           NaN
2     2021-09-20  2021-09-20     North East Scotland           NaN
3     2021-09-20  2021-09-20     North East Scotland           NaN
6     2021-09-20  2021-09-20     North East Scotland           NaN
...          ...         ...                     ...           ...
3057  2018-10-10  2018-10-10  Yorkshire & Humberside           NaN
3058  2018-10-10  2018-10-10  Yorkshire & Humberside           NaN
3059     ongoing       Never  Yorkshire & Humberside           NaN
3060     ongoing       Never  Yorkshire & Humberside           NaN
3061     ongoing       Never  Yorkshire & Humberside           NaN

     local_authority
0      Aberdeen City
1      Aberdeen City
2      Aberdeen City
3      Aberdeen City
6      Aberdeen City
...              ...
3057            York
3058            York
3059            York
3060            York
3061            York

[1625 rows x 13 columns]
Saved merged file: data/test_pm25_final/MERGED_aurn_PM2.5_20171201_20181130.csv

Merge data by site_id

Load the ObsAQ package and check the site information

[5]:
import obsaq

meta = obsaq.meta()
site_table = meta.get_metadata('aurn')

site_table.head(5)
[5]:
site_id site_name location_type latitude longitude parameter Parameter_name start_date end_date ratified_to zone agglomeration local_authority
0 ABD Aberdeen Urban Background 57.15736 -2.094278 O3 Ozone 2003-08-01 2021-09-20 2021-09-20 North East Scotland NaN Aberdeen City
1 ABD Aberdeen Urban Background 57.15736 -2.094278 NO Nitric oxide 1999-09-18 2021-09-20 2021-09-20 North East Scotland NaN Aberdeen City
2 ABD Aberdeen Urban Background 57.15736 -2.094278 NO2 Nitrogen dioxide 1999-09-18 2021-09-20 2021-09-20 North East Scotland NaN Aberdeen City
3 ABD Aberdeen Urban Background 57.15736 -2.094278 NOXasNO2 Nitrogen oxides as nitrogen dioxide 1999-09-18 2021-09-20 2021-09-20 North East Scotland NaN Aberdeen City
4 ABD Aberdeen Urban Background 57.15736 -2.094278 SO2 Sulphur dioxide 2001-01-01 2007-09-30 2007-09-30 North East Scotland NaN Aberdeen City

Choose the stations by site_id

[6]:
final_sites = meta.get_site(site_id='ABD')
final_sites.drop_duplicates(subset='site_id')
Site is selected by site_id: ABD
[6]:
site_id site_name location_type latitude longitude parameter Parameter_name start_date end_date ratified_to zone agglomeration local_authority
0 ABD Aberdeen Urban Background 57.15736 -2.094278 O3 Ozone 2003-08-01 2021-09-20 2021-09-20 North East Scotland NaN Aberdeen City

Start to merge the selected station data

  • Use “start” and “end” to define a period of time.

  • Other parameters are the same as “Download data by a bounds of longitude and latitude” part

[7]:
meta.download_sites(
    port="aurn",
    pollutant="PM2.5",
    start="2017-12-01",
    end="2018-11-30",
    output_dir="data/test_pm25_siteid",
    download_mode="memory",
    save_per_site=False,
    save_merged=True,
    add_site_id=True
)
   site_id site_name     location_type  latitude  longitude parameter  \
0      ABD  Aberdeen  Urban Background  57.15736  -2.094278        O3
1      ABD  Aberdeen  Urban Background  57.15736  -2.094278        NO
2      ABD  Aberdeen  Urban Background  57.15736  -2.094278       NO2
3      ABD  Aberdeen  Urban Background  57.15736  -2.094278  NOXasNO2
6      ABD  Aberdeen  Urban Background  57.15736  -2.094278      PM10
7      ABD  Aberdeen  Urban Background  57.15736  -2.094278      NV10
8      ABD  Aberdeen  Urban Background  57.15736  -2.094278       V10
9      ABD  Aberdeen  Urban Background  57.15736  -2.094278     PM2.5
10     ABD  Aberdeen  Urban Background  57.15736  -2.094278     NV2.5
11     ABD  Aberdeen  Urban Background  57.15736  -2.094278      V2.5
12     ABD  Aberdeen  Urban Background  57.15736  -2.094278        wd
13     ABD  Aberdeen  Urban Background  57.15736  -2.094278        ws
14     ABD  Aberdeen  Urban Background  57.15736  -2.094278      temp

                                Parameter_name  start_date    end_date  \
0                                        Ozone  2003-08-01  2021-09-20
1                                 Nitric oxide  1999-09-18  2021-09-20
2                             Nitrogen dioxide  1999-09-18  2021-09-20
3          Nitrogen oxides as nitrogen dioxide  1999-09-18  2021-09-20
6    PM10 particulate matter (Hourly measured)  1999-09-18  2021-09-20
7          Non-volatile PM10 (Hourly measured)  2009-02-20  2019-11-07
8              Volatile PM10 (Hourly measured)  2009-02-20  2019-11-07
9   PM2.5 particulate matter (Hourly measured)  2009-02-20  2021-09-20
10        Non-volatile PM2.5 (Hourly measured)  2009-02-20  2019-11-07
11            Volatile PM2.5 (Hourly measured)  2009-02-20  2019-11-07
12                     Modelled Wind Direction  2010-08-01  2021-09-20
13                         Modelled Wind Speed  2010-08-01  2021-09-20
14                        Modelled Temperature  2010-08-01  2021-09-20

   ratified_to                 zone agglomeration local_authority
0   2021-09-20  North East Scotland           NaN   Aberdeen City
1   2021-09-20  North East Scotland           NaN   Aberdeen City
2   2021-09-20  North East Scotland           NaN   Aberdeen City
3   2021-09-20  North East Scotland           NaN   Aberdeen City
6   2021-09-20  North East Scotland           NaN   Aberdeen City
7   2019-11-07  North East Scotland           NaN   Aberdeen City
8   2019-11-07  North East Scotland           NaN   Aberdeen City
9   2021-09-20  North East Scotland           NaN   Aberdeen City
10  2019-11-07  North East Scotland           NaN   Aberdeen City
11  2019-11-07  North East Scotland           NaN   Aberdeen City
12       Never  North East Scotland           NaN   Aberdeen City
13       Never  North East Scotland           NaN   Aberdeen City
14       Never  North East Scotland           NaN   Aberdeen City
Saved merged file: data/test_pm25_siteid/MERGED_aurn_PM2.5_20171201_20181130.csv
[7]:
site_id Date time PM<sub>2.5</sub> particulate matter (Hourly measured) status.7 unit.7
0 ABD 30-11-2017 24:00 2.2 R ugm-3 (TEOM FDMS)
1 ABD 01-12-2017 01:00 2.1 R ugm-3 (TEOM FDMS)
2 ABD 01-12-2017 02:00 3.2 R ugm-3 (TEOM FDMS)
3 ABD 01-12-2017 03:00 4.1 R ugm-3 (TEOM FDMS)
4 ABD 01-12-2017 04:00 2.4 R ugm-3 (TEOM FDMS)
... ... ... ... ... ... ...
8755 ABD 30-11-2018 19:00 3.0 R ugm-3 (TEOM FDMS)
8756 ABD 30-11-2018 20:00 5.6 R ugm-3 (TEOM FDMS)
8757 ABD 30-11-2018 21:00 3.0 R ugm-3 (TEOM FDMS)
8758 ABD 30-11-2018 22:00 0.3 R ugm-3 (TEOM FDMS)
8759 ABD 30-11-2018 23:00 5.7 R ugm-3 (TEOM FDMS)

8760 rows × 6 columns

Analyse the data and draw plots

Temporal trend of the air pollutant concentration.

After data filtering and preprocessing, we can do various analysis based on the data. Here, we use a trend analysis to examine how PM2.5 concentration varies over time.

[8]:
import pandas as pd
import matplotlib.pyplot as plt

# Preprocessing
date_str = merged_df["Date"].astype(str).str.strip()
time_str = merged_df["time"].astype(str).str.strip()

# Handle non-standard "24:00" timestamps.
is_2400 = time_str.str.startswith("24:00")
dates = pd.to_datetime(date_str, dayfirst=True, errors="coerce")
time_fixed = time_str.where(~is_2400, "00:00")
dt = pd.to_datetime(
    dates.dt.strftime("%Y-%m-%d") + " " + time_fixed,
    errors="coerce"
)
dt = dt + pd.to_timedelta(is_2400.astype(int), unit="D")

# Write back to the dataframe
merged_df["datetime"] = dt
pm_col = "PM<sub>2.5</sub> particulate matter (Hourly measured)"

# Draw the plot
plt.figure(figsize=(14, 5))
plt.plot(
    merged_df["datetime"],
    merged_df[pm_col],
    linewidth=1
)
plt.xlabel("Datetime")
plt.ylabel("PM2.5 (µg/m³)")
plt.title("Temporal variation of PM2.5")
plt.tight_layout()
plt.show()
../_images/notebook_get_and_analyse_data_19_0.png