OpenStreetMap Data for Automated Labelling Machine Learning Examples: The Challenge of Road Type Imbalance

Proceedings of the OSM Science

Published On 2023

Advances in Artificial Intelligence (AI) and, specifically, in Deep Learning (DL) have fostered geospatial analysis and remote sensing, culminating in the establishment of GeoAI [1, 2] and the solidification of research on methodologies and techniques for AI-assisted mapping [3-7]. Nevertheless, a particular challenge lies in the substantial demand for training examples in DL. Manual labelling of these examples is labour-intensive, consuming a considerable amount of time and financial resources. Alternatively, semi or automated labelling of data emerges as a prominent solution, as exemplified by the tool ohsome2label [8], which harnesses data from the OpenStreetMap [9] to label satellite images. However, moving from characterising object types (road, river, building) based on geometry to categorising them by attributes might result in an imbalanced class distribution in the utilised Machine Learning (ML) dataset.Such imbalances are common in numerous practical applications. Learning from skewed datasets can be particularly challenging and often requires non-conventional ML techniques. A comprehensive awareness of the issues associated with class imbalance, as well as strategies for mitigating them, is essential [10]. In the context of spatial data, the distribution of classes can vary from country to country and region to region, adding a new layer of complexity and exacerbating this issue. In this context, an analysis was conducted on the distribution of road types, defined by the values of the OSM" highway" tag, in diverse-profile nations. The aim was to evaluate the extent of class imbalance and to identify any consistent patterns in the …

Journal

Proceedings of the OSM Science

Published On

2023

Page

65-68

Authors

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Position

Chair of GIScience HeiGIT Heidelberg Institute for Geoinformation Technology

H-Index(all)

H-Index(since 2020)

I-10 Index(all)

I-10 Index(since 2020)

Citation(all)

Citation(since 2020)

Cited By

Research Interests

Geoinformatics

GIScience

VGI

Geomatics

Geographic Information Science

University Profile Page

Ruprecht-Karls-Universität Heidelberg

Access Email

Edson Augusto Melanda

Universidade Federal de São Carlos

Position

H-Index(all)

H-Index(since 2020)

I-10 Index(all)

I-10 Index(since 2020)

Citation(all)

Citation(since 2020)

Cited By

Research Interests

University Profile Page

Universidade Federal de São Carlos

Access Email

Other Articles from authors

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Geo-spatial Information Science

An investigation of the temporality of OpenStreetMap data contribution activities

OpenStreetMap (OSM) is a dataset in constant change and this dynamic needs to be better understood. Based on 12-year time series of seven OSM data contribution activities extracted from 20 large cities worldwide, we investigate the temporal dynamic of OSM data production, more specifically, the auto- and cross-correlation, temporal trend, and annual seasonality of these activities. Furthermore, we evaluate and compare nine different temporal regression methods for forecasting such activities in horizons of 1–4 weeks. Several insights could be obtained from our analyses, including that the contribution activities tend to grown linearly in a moderate intra-annual cycle. Also, the performance of the temporal forecasting methods shows that they yield in general more accurate estimations of future contribution activities than a baseline metric, i.e. the arithmetic average of recent previous observations. In particular, the …

2024/3/3

OpenStreetMap Data for Automated Labelling Machine Learning Examples: The Challenge of Road Type Imbalance

Authors

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Edson Augusto Melanda

Universidade Federal de São Carlos

Other Articles from authors

An investigation of the temporality of OpenStreetMap data contribution activities

Distortions in Judged Spatial Relations in Large Language Models: The Dawn of Natural Language Geographic Data?

How to assess the needs of vulnerable population groups towards heat-sensitive routing? An evidence-based and practical approach to reducing urban heat stress

Carbon fluxes related to land use and land cover change in Baden-Württemberg

Urban Heat Island Intensity Prediction in the Context of Heat Waves: An Evaluation of Model Performance

Exploring road and points of interest (POIs) associations in OpenStreetMap, a new paradigm for OSM road class prediction

Semi-supervised water tank detection to support vector control of emerging infectious diseases transmitted by Aedes Aegypti

A spatio-temporal analysis investigating completeness and inequalities of global urban building data in OpenStreetMap

Challenges and solution approach for greenhouse gas emission inventories at fine spatial resolutions–the example of the Rhine-Neckar district

Initial response to the COVID-19 pandemic on real-life well-being, social contact and roaming behavior in patients with schizophrenia, major depression and healthy controls: A …

Schistosomiasis mansoni and hydrographical conditions in São Carlos, São Paulo, Brazil

Semi-Supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation

Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks

Private Vehicles Greenhouse Gas Emission Estimation at Street Level for Berlin Based on Open Data

Traffic Speed Modelling to Improve Travel Time Estimation in Openrouteservice

Analyzing and Improving the Quality and Fitness for Purpose of OpenStreetMap as Labels in Remote Sensing Applications

TÉCNICA GEOBIA PARA IDENTIFICAÇÃO SEMI-AUTOMÁTICA DO LEITO REGULAR DE CURSOS D’ÁGUA A PARTIR DE IMAGENS OBTIDAS POR RPA

UndercoverEisAgenten-Monitoring Permafrost Thaw in the Arctic using Local Knowledge and UAVs

Other articles from Proceedings of the OSM Science journal

Towards an open high-resolution land use dataset in Great Britain–Comparing and consolidating retail centre areas from open data sources

Mapping public space in urban neighbourhoods using OpenStreetMap data

OpenStreetMap Data for Automated Labelling Machine Learning Examples: The Challenge of Road Type Imbalance

Assessing bike-transit accessibility with OpenStreetMap

Developing a data validation method with OpenStreetMap Senegal and the Ministry of Health in support of accurate health facility data

Exploring road and points of interest (POIs) associations in OpenStreetMap, a new paradigm for OSM road class prediction

Beyond Two Dimensions: Large-Scale Building Height Mapping in OpenStreetMap via Synthetic Aperture Radar and Street-View Imagery

Exploring road and points of interest (POIs) associations in OpenStreetMap, a new paradigm for OSM road class prediction

Exploring road and points of interest (POIs) associations in OpenStreetMap, a new paradigm for OSM road class prediction

OpenStreetMap Data for Automated Labelling Machine Learning Examples: The Challenge of Road Type Imbalance

Assessing bike-transit accessibility with OpenStreetMap

Fostering OSM's Micromapping Through Combined Use of Artificial Intelligence and Street-View Imagery