Process Activity
Description: Ingest and prepare data from ERA5 data (Copernicus)
Diagram of the Process Activity
digraph Diagram { graph [ stylesheet="../_static/custom.css" fontnames = "svg" # "... rock solid standards compliant SVG", see: https://graphviz.org/faq/font/#what-about-svg-fonts rankdir="LR" nodesep="0.15" tooltip=" " ]; node [ fontname="sans-serif" ]; activity_34be5d18_c1f7_4e3d_82e7_81909e0989f4 [ shape="rect" style="filled, rounded" width="3" height="0.8" fontcolor="white" fillcolor="#4363d8" fontsize="13pt" label="ERA5 Data (Copernicus)\n(current activity)" tooltip=" " ]; EntityUsed_1 [ shape="rect" style="filled" width="2" height="0.1" fontcolor="blue" fontsize="10pt" URL="https://doi.org/10.24381/cds.adbb2d47" target="_blank" label="ERA5 website" tooltip="Entity used for current activity" fillcolor="#ffe119" ]; EntityUsed_1:e -> activity_34be5d18_c1f7_4e3d_82e7_81909e0989f4:w [ minlen="2" ]; }
Diagram of the Process Sub-Activities (in sequential order)
Note
Click on a sub-activity to go to the corresponding page.
digraph Diagram { graph [ stylesheet="../_static/custom.css" fontnames = "svg" # "... rock solid standards compliant SVG", see: https://graphviz.org/faq/font/#what-about-svg-fonts rankdir="LR" tooltip=" " ]; node [ fontname="sans-serif" ]; subgraph { rank="same"; activity_70e828d0_4231_4b9b_bb69_7882483fb591 [ label="ERA5 Get raw input" shape="rect" style="filled, rounded" width="3" fontcolor="white" fillcolor="#4363d8" URL="../activity_70e828d0-4231-4b9b-bb69-7882483fb591.html" target="_parent" tooltip=" \nGo to process activity" ]; activity_8a364e23_1e2f_4f75_9b61_2ad1ad39fe68 [ label="ERA5 Marshalling data" shape="rect" style="filled, rounded" width="3" fontcolor="white" fillcolor="#4363d8" URL="../activity_8a364e23-1e2f-4f75-9b61-2ad1ad39fe68.html" target="_parent" tooltip=" \nGo to process activity" ]; activity_f75ac23d_46b5_4e7f_a52d_638ab34a7b81 [ label="ERA5 Data Processing" shape="rect" style="filled, rounded" width="3" fontcolor="white" fillcolor="#4363d8" URL="../activity_f75ac23d-46b5-4e7f-a52d-638ab34a7b81.html" target="_parent" tooltip=" \nGo to process activity" ]; } subgraph { rank="same"; activity_70e828d0_4231_4b9b_bb69_7882483fb591_Description [ label="The process involves obtaining NUTS - Nomenclature of territorial units for\lstatistics polygons for the relevant regions, followed by calling a public\lAPI with GPS coordinates derived from the polygons. A single API call is\lmade per month, resulting in a gridded data response file with a default\lresolution of 0.1 degree latitude/longitude. Each month and region\lcorresponds to one variable, resulting in over 20.000 files. The\lperformance of this process is relatively slow, taking around a minute per\lcall, and it requires a substantial number of calls to collect the complete\ldataset. There are over 12.000 raw input files in NetCDF4 format covering\lthe period from 1990 to 2022.\l" tooltip=" \nDescription of activity" shape="note" width="5" height="0.1" color="grey" fontsize="10pt" ]; activity_8a364e23_1e2f_4f75_9b61_2ad1ad39fe68_Description [ label="The process involves reading NetCDF files into Panda dataframes, obtaining\lestimated population data for grids from the Global Human Settlements data\lbased on Eurostat, merging the population data with ERA5 data, and writing\lthe merged data to disk in Parquet format. External experts perform quality\lchecks on the merged data, which could be either a one-off or a regular\lquality assurance check. The process utilizes over 12.000 NetCDF4 files as\linput as well as data from the GHSL - Global Human Settlement Layer. The\loutput of the process is a single Parquet file named \"Interim data for\lreview\" with its corresponding URI.\l" tooltip=" \nDescription of activity" shape="note" width="5" height="0.1" color="grey" fontsize="10pt" ]; activity_f75ac23d_46b5_4e7f_a52d_638ab34a7b81_Description [ label="The process involves creating a date variable from timestamps based on the\ltime zone of each region, considering that the data is recorded hourly. It\lalso addresses unit differences, converting Kelvin to Celsius and meters to\lmillimeters. The data is then grouped by date, variable, and region, and\ltemperature is averaged while also obtaining maximum and minimum values,\laccumulating precipitation by date, and identifying the maximum wind gust\lvalue. Moving averages are calculated for variables using different time\lwindows (7-day, 30-day, 90-day, 365-day). Baseline values for temperature,\lprecipitation, wind gust, and deviations from the baseline (anomalies) are\ldetermined based on the period from 1991 to 2020. Data older than 2015 is\lremoved, and a group-by operation is performed, collapsing the data by\lregion using population-weighted averages. It is important to note that the\lERA5 data may contain imputed and missing values. In memory, each row\lcorresponds to a region, with mesh-blocks aggregated per day to calculate\lregion-level values by taking the average of all variables weighted by the\lpopulation of each block. The resulting data is stored to disk in CSV, SAV,\lor other suitable formats, as the data size remains manageable.\l" tooltip=" \nDescription of activity" shape="note" width="5" height="0.1" color="grey" fontsize="10pt" ]; } activity_70e828d0_4231_4b9b_bb69_7882483fb591 -> activity_70e828d0_4231_4b9b_bb69_7882483fb591_Description [ style="dotted" arrowhead="none" color="grey" ]; activity_8a364e23_1e2f_4f75_9b61_2ad1ad39fe68 -> activity_8a364e23_1e2f_4f75_9b61_2ad1ad39fe68_Description [ style="dotted" arrowhead="none" color="grey" ]; activity_f75ac23d_46b5_4e7f_a52d_638ab34a7b81 -> activity_f75ac23d_46b5_4e7f_a52d_638ab34a7b81_Description [ style="dotted" arrowhead="none" color="grey" ]; activity_70e828d0_4231_4b9b_bb69_7882483fb591 -> activity_8a364e23_1e2f_4f75_9b61_2ad1ad39fe68 activity_8a364e23_1e2f_4f75_9b61_2ad1ad39fe68 -> activity_f75ac23d_46b5_4e7f_a52d_638ab34a7b81 }