The research evaluates the accuracy and completeness of wind and solar energy infrastructure data in OpenStreetMap (OSM) for Belgium and Ireland, identifying common mapping errors and proposing techniques to enrich available data. By combining OSM data with CORINE Land Cover inventory, we study land use patterns around renewable energy infrastructures, facilitating environmental planning.
We are witnessing the rise of a collective awareness of the importance of building a more environmentally sustainable future. Within this context, the relevance of renewable energy sources is widely acknowledged; however, to support the energy transition, the availability of reliable data about energy supply, infrastructures, and their environmental impacts is essential [1]. OpenStreetMap (OSM) emerges as a valuable data source to meet these requirements. In this work, we describe our research on evaluating the OSM database in a study of wind and solar energy infrastructures. As a case-study, we analyse two countries: Belgium and Ireland.
Data within OSM is well-known to have a diverse level of completeness and granularity [2]. Considering OSM's vibrant mapping communities coupled with both (a) the environmental visibility of wind and solar energy infrastructures and (b) their relatively limited number compared to other built infrastructures, it is reasonable to assume that the majority of these installations are well mapped within OSM. OSM energy-related objects are mapped under the _“power”_ tag, with available key-value combinations to identify wind and solar sources. Wind turbines are commonly tagged as _“power=generator”_ and _“generator:source=wind”_, while solar farms are identified as _“power=plant”_ and _“plant:source=solar”_.
By combining OSM data with the CORINE Land Cover (CLC) inventory we considers two research questions. Firstly, we seek to identify common mapping errors and tagging issues associated with wind and solar energy infrastructure representation within OSM. This involves examining geometries and tagging mistakes while evaluating the accuracy and completeness of these infrastructures. Secondly, we perform a geographical analysis to consider the distribution of infrastructures across various CLC land covers. We seek to detect patterns around land cover and renewable energy infrastructure. Our methodology is summarised as follows: OSM data, in PBF format, is downloaded from _GeoFabrik_. Initially here we just consider Ireland and Belgium due to local knowledge and their manageable data sizes. Analysis is performed using _Python_ and the _osmium_ library. Pending acceptance, source code will be made openly available on GitHub in documented Jupyter notebooks.
We answer our research questions using three key steps:
1. Assess available wind and solar infrastructures in OSM for the correctness of OSM datatypes and potential geometric errors.
2. Differentiate between individual installations and larger "farms", to evaluate the quality and completeness of OSM data.
3. Investigate commonalities around land use surrounding renewable energy infrastructures using CLC.
Initially, we extract all OSM objects with _“power”_ tag =_“plant”_ or _“generator”_, yielding a full listing of all available power sources. We filter solar and wind sources based on the _"generator:source"_ or _"plant:source"_ tags. The accuracy of the OSM datatypes and geometries is checked before moving to "farms" identification. To accurately differentiate single installations from farms, we analyze solar and wind sources separately due to their different geometric nature.
We employ a point-in-polygon search with spatial indexes to assign nodes – of either wind or solar type – to intersecting areas of the same energy type. This process raises some geometric questions concerning the meaning of areas without nodes and nodes not linked to any area. We observed several issues: (a) not mapping individual panels within solar farms or defining turbines as circular areas, (b) nodes not associated with any area, (c) unlinked solar nodes representing private installations, (d) the absence of OSM areas or relations for turbines frequently indicated incomplete mapping. This final issue is significant: 91.4% of wind nodes in Belgium and around 87.4% in Ireland are not located within any mapped area. Considering the limited number of turbines included in OSM areas or relations, we employed spatial clustering on all turbines to identify wind farms and validate our findings by automatically comparing clusters with existing OSM areas and relations.
DBSCAN (Density-based spatial clustering of applications with noise) is particularly suited for our purpose. One of the most important DBSCAN parameters is the maximum distance between two samples belonging to the same cluster which we set as 5 times the rotor diameter [3]. We assume a diameter value of 130 meters for all turbines [3] considering the frequent absence of the _"rotor:diameter"_ tag. Comparing clustering results with OSM data allows us to evaluate quality and completeness of areas and relations, to identify new farms, and to extend existing linkages to unassigned infrastructures.
Due to their geometric nature, the process described cannot be applied to solar infrastructures. Given that all remaining solar data are areas, our methodology proposes two different processes, based on the power-tag value (“generator” or “plant”). Although OSM guidelines suggest that solar farms should be tagged as “plant”, deviations from this standard are possible. To avoid discarding valid data, we conducted an area-in-area search on solar generators. This approach assumes that if a solar generator area encompasses multiple solar generators, it may denote a farm. This process enables us to identify a few tagging mistakes where the _"generator"_ value was used instead of _"plant"_. While it may be possible to apply the same approach on solar _"plant"_, this would cause a significant data loss, thus we opt for keeping all plant-objects. Ultimately, this decision is strongly dependent on the data accuracy requirements. The refined datasets provide a comprehensive overview of wind and solar energy sources, facilitating further analyses into renewable energy.
Our second research question considerations the analysis of land uses around energy infrastructure by integrating CLC. We use an arbitrary buffer of 1000m for this analysis. In Belgium, around 18% of wind farms are located within 1000 meters from heterogeneous agricultural areas, while 17% and 15% are near urban fabric and arable land respectively. In Ireland, most wind farms are near pastures (~24%), scrub (~18%) and forests (~18%). We notice how agricultural areas, in the form of heterogeneous agricultural areas, pastures, or arable land seem to prevail in both contexts. For wind energy, artificial surfaces appear relevant only in Belgium; however, both countries have solar farms near urban fabrics LC. Within the buffer from solar infrastructures this LC appears ~19% of the time in Belgium and ~12% in Ireland. In Belgium, heterogeneous agricultural areas (~19%) and forests (~12%) are also significant, while pastures predominate in Ireland (~31%), followed by arable land (~16%). However, 56% of the Irish territory is classified as pasture LC.
Our work seeks to assess and improve completeness and accuracy of renewable energy infrastructure data in OSM. Our analysis distinguishes between individual installations and larger farms. Our strategy based on density-based clustering for identifying wind farms faciliates evaluation of the OSM data. We developed methods to associate solar panels to farms thereby mitigating tagging errors. Overall, we highlight challenges arising from inconsistent mapping practices and deviation from OSM mapping guidelines. The refined datasets offer comprehensive insights into wind and solar energy sources allowing additional analyses and assessment of their integration into the surrounding landscape.
The next steps for this work include validating our approach for a larger set of countries and regions while also integrating further internal and external completeness and quality evaluations.
Creative Commons Attribution 3.0 Unported https://creativecommons.org/licenses/by/3.0/