OpenStreetMap Data for Automated Labelling Machine Learning Examples: The Challenge of Road Type Imbalance

Proceedings of the OSM Science

Published On 2023

Advances in Artificial Intelligence (AI) and, specifically, in Deep Learning (DL) have fostered geospatial analysis and remote sensing, culminating in the establishment of GeoAI [1, 2] and the solidification of research on methodologies and techniques for AI-assisted mapping [3-7]. Nevertheless, a particular challenge lies in the substantial demand for training examples in DL. Manual labelling of these examples is labour-intensive, consuming a considerable amount of time and financial resources. Alternatively, semi or automated labelling of data emerges as a prominent solution, as exemplified by the tool ohsome2label [8], which harnesses data from the OpenStreetMap [9] to label satellite images. However, moving from characterising object types (road, river, building) based on geometry to categorising them by attributes might result in an imbalanced class distribution in the utilised Machine Learning (ML) dataset.Such imbalances are common in numerous practical applications. Learning from skewed datasets can be particularly challenging and often requires non-conventional ML techniques. A comprehensive awareness of the issues associated with class imbalance, as well as strategies for mitigating them, is essential [10]. In the context of spatial data, the distribution of classes can vary from country to country and region to region, adding a new layer of complexity and exacerbating this issue. In this context, an analysis was conducted on the distribution of road types, defined by the values of the OSM" highway" tag, in diverse-profile nations. The aim was to evaluate the extent of class imbalance and to identify any consistent patterns in the …

Journal

Proceedings of the OSM Science

Published On

2023

Page

65-68

Authors

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Position

Chair of GIScience HeiGIT Heidelberg Institute for Geoinformation Technology

H-Index(all)

56

H-Index(since 2020)

39

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

Geoinformatics

GIScience

VGI

Geomatics

Geographic Information Science

Edson Augusto Melanda

Edson Augusto Melanda

Universidade Federal de São Carlos

Position

H-Index(all)

10

H-Index(since 2020)

5

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

Other Articles from authors

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Geo-spatial Information Science

An investigation of the temporality of OpenStreetMap data contribution activities

OpenStreetMap (OSM) is a dataset in constant change and this dynamic needs to be better understood. Based on 12-year time series of seven OSM data contribution activities extracted from 20 large cities worldwide, we investigate the temporal dynamic of OSM data production, more specifically, the auto- and cross-correlation, temporal trend, and annual seasonality of these activities. Furthermore, we evaluate and compare nine different temporal regression methods for forecasting such activities in horizons of 1–4 weeks. Several insights could be obtained from our analyses, including that the contribution activities tend to grown linearly in a moderate intra-annual cycle. Also, the performance of the temporal forecasting methods shows that they yield in general more accurate estimations of future contribution activities than a baseline metric, i.e. the arithmetic average of recent previous observations. In particular, the …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

arXiv preprint arXiv:2401.04218

Distortions in Judged Spatial Relations in Large Language Models: The Dawn of Natural Language Geographic Data?

We present a benchmark for assessing the capability of Large Language Models (LLMs) to discern intercardinal directions between geographic locations and apply it to three prominent LLMs: GPT-3.5, GPT-4, and Llama-2. This benchmark specifically evaluates whether LLMs exhibit a hierarchical spatial bias similar to humans, where judgments about individual locations' spatial relationships are influenced by the perceived relationships of the larger groups that contain them. To investigate this, we formulated 14 questions focusing on well-known American cities. Seven questions were designed to challenge the LLMs with scenarios potentially influenced by the orientation of larger geographical units, such as states or countries, while the remaining seven targeted locations less susceptible to such hierarchical categorization. Among the tested models, GPT-4 exhibited superior performance with 55.3% accuracy, followed by GPT-3.5 at 47.3%, and Llama-2 at 44.7%. The models showed significantly reduced accuracy on tasks with suspected hierarchical bias. For example, GPT-4's accuracy dropped to 32.9% on these tasks, compared to 85.7% on others. Despite these inaccuracies, the models identified the nearest cardinal direction in most cases, suggesting associative learning, embodying human-like misconceptions. We discuss the potential of text-based data representing geographic relationships directly to improve the spatial reasoning capabilities of LLMs.

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

ERDKUNDE

How to assess the needs of vulnerable population groups towards heat-sensitive routing? An evidence-based and practical approach to reducing urban heat stress

Heat poses a significant risk to human health, particularly for vulnerable populations, such as pregnant women, older individuals, young children and people with pre-existing medical conditions. In view of this, we formulated a heat stress-avoidant routing approach in Heidelberg, Germany, to ensure mobility and support day-to-day activities in urban areas during heat events. Although the primary focus is on pedestrians, it is also applicable to cyclists. To obtain a nuanced understanding of the needs and demands of the wider population, especially vulnerable groups, and to address the challenge of reducing urban heat stress, we used an inter-and transdisciplinary approach. The needs of vulnerable groups, the public, and the city administration were identified through participatory methods and various tools, including interactive city walks. Solution approaches and adaptation measures to prevent heat stress were evaluated and integrated into the development of a heat-avoiding route service through a co-design process. The findings comprise the identification of perceived hotspots for heat (such as large public spaces in the city centre with low shading levels), the determination of commonly reported symptoms resulting from severe heat (eg, fatigue or lack of concentration), and the assessment of heat adaptation measures that were rated positively, including remaining in the shade and delaying errands. Additionally, we analysed and distinguished between individual and community adaptation strategies. Overall, many respondents did not accurately perceive the risk of heat stress in hot weather, despite severe limitations. As a result, the heat …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Environmental Monitoring and Assessment

Carbon fluxes related to land use and land cover change in Baden-Württemberg

Spatially explicit information on carbon fluxes related to land use and land cover change (LULCC) is of value for the implementation of local climate change mitigation strategies. However, estimates of these carbon fluxes are often aggregated to larger areas. We estimated committed gross carbon fluxes related to LULCC in Baden-Württemberg, Germany, using different emission factors. In doing so, we compared four different data sources regarding their suitability for estimating the fluxes: (a) a land cover dataset derived from OpenStreetMap (OSMlanduse); (b) OSMlanduse with removal of sliver polygons (OSMlanduse cleaned), (c) OSMlanduse enhanced with a remote sensing time series analysis (OSMlanduse+); (d) the LULCC product of Landschaftsveränderungsdienst (LaVerDi) from the German Federal Agency of Cartography and Geodesy. We produced a high range of carbon flux estimates, mostly caused …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Engineering Proceedings

Urban Heat Island Intensity Prediction in the Context of Heat Waves: An Evaluation of Model Performance

Urban heat islands, characterized by higher temperatures in cities compared to surrounding areas, have been studied using various techniques. However, during heat waves, existing models often underestimate the intensity of these heat islands compared to empirical measurements. To address this, an hourly time-series-based model for predicting heat island intensity during heat wave conditions is proposed. The model was developed and validated using empirical data from the National Monitoring Network in Temuco, Chile. Results indicate a strong correlation (r > 0.98) between the model’s predictions and actual monitoring data. Additionally, the study emphasizes the importance of considering the unique microclimatic characteristics and built environment of each city when modelling urban heat islands. Factors such as urban morphology, land cover, and anthropogenic heat emissions interact in complex ways, necessitating tailored modelling approaches for the accurate representation of heat island phenomena.

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Proceedings of the OSM Science

Exploring road and points of interest (POIs) associations in OpenStreetMap, a new paradigm for OSM road class prediction

1 GIScience Research Group, Heidelberg University, Heidelberg, Germany; francis. andorful@ uni-heidelberg. de, nir. fulman@ uni-heidelberg. de 2 HeiGIT-Heidelberg Institute for Geoinformation Technology, 69120 Heidelberg, Germany; sven. lautenbach@ uni-heidelberg. de, christina. ludwing@ uni-heidelberg. de, herfort@ uni-heidelberg. de, zipf@ uni-heidelberg. de

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

International Journal of Applied Earth Observation and Geoinformation

Semi-supervised water tank detection to support vector control of emerging infectious diseases transmitted by Aedes Aegypti

The disease transmitting mosquito Aedes Aegypti is an increasing global threat. It breeds in small artificial containers such as rainwater tanks and can be characterized by a short flight range. The resulting high spatial variability of abundance is challenging to model. Therefore, we tested an approach to map water tank density as a spatial proxy for urban Aedes Aegypti habitat suitability. Water tank density mapping was performed by a semi-supervised self-training approach based on open accessible satellite imagery for the city of Rio de Janeiro. We ran a negative binomial generalized linear regression model to evaluate the statistical significance of water tank density for modeling inner-urban Aedes Aegypti distribution measured by an entomological surveillance system between January 2019 and December 2021. Our proposed semi-supervised model outperformed a supervised model for water tank detection …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Nature Communications

A spatio-temporal analysis investigating completeness and inequalities of global urban building data in OpenStreetMap

OpenStreetMap (OSM) has evolved as a popular dataset for global urban analyses, such as assessing progress towards the Sustainable Development Goals. However, many analyses do not account for the uneven spatial coverage of existing data. We employ a machine-learning model to infer the completeness of OSM building stock data for 13,189 urban agglomerations worldwide. For 1,848 urban centres (16% of the urban population), OSM building footprint data exceeds 80% completeness, but completeness remains lower than 20% for 9,163 cities (48% of the urban population). Although OSM data inequalities have recently receded, partially as a result of humanitarian mapping efforts, a complex unequal pattern of spatial biases remains, which vary across various human development index groups, population sizes and geographic regions. Based on these results, we provide recommendations for data …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Challenges and solution approach for greenhouse gas emission inventories at fine spatial resolutions–the example of the Rhine-Neckar district

This discussion paper originated as the concluding publication of one of the pilot projects of the "Climate Action Science" research initiative at Heidelberg Center for the Environment (HCE), focusing on the Rhine-Neckar district and the city of Heidelberg. The aim of the explorative project was to generate a first overview on greenhouse gas emission data in order to initiate climate action of various actors and to provide well-founded support by using accurate infor-mation. The focus during the pilot phase was on the collection, compilation and evaluation of the quality of heterogeneous data sets and methods for a greenhouse gas emission inventory, as well as on the information preparation and evaluation of different inventory and presentation options. These should in turn be adapted to the needs of different users and fields of applica-tion. The study focused on different German approaches to greenhouse gas accounting, espe-cially in Baden-Württemberg compared to other German states, and in detail on the City of Heidelberg compared to the surrounding municipalities in the Rhine-Neckar district. The over-arching goal is to use the results beyond the case study projected here as a stimulus and pre-liminary work for further projects and activities in the overall "Climate Action Science" project. Several difficulties were encountered in processing the emissions inventory and compiling var-ious data sets on emissions in the study area. Three basic situations were identified: 1. De-sired data is not available (measurements required), 2. Desired data is not freely accessible (stakeholder involvement), 3. Data generation via proxy data. In the pilot phase …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

European Neuropsychopharmacology

Initial response to the COVID-19 pandemic on real-life well-being, social contact and roaming behavior in patients with schizophrenia, major depression and healthy controls: A …

The COVID-19 pandemic strongly impacted people's daily lives. However, it remains unknown how the pandemic situation affects daily-life experiences of individuals with preexisting severe mental illnesses (SMI). In this real-life longitudinal study, the acute onset of the COVID-19 pandemic in Germany did not cause the already low everyday well-being of patients with schizophrenia (SZ) or major depression (MDD) to decrease further. On the contrary, healthy participants’ well-being, anxiety, social isolation, and mobility worsened, especially in healthy individuals at risk for mental disorder, but remained above the levels seen in patients. Despite being stressful for healthy individuals at risk for mental disorder, the COVID-19 pandemic had little additional influence on daily-life well-being in psychiatric patients with SMI. This highlights the need for preventive action and targeted support of this vulnerable population.

Edson Augusto Melanda

Edson Augusto Melanda

Universidade Federal de São Carlos

Transactions of The Royal Society of Tropical Medicine and Hygiene

Schistosomiasis mansoni and hydrographical conditions in São Carlos, São Paulo, Brazil

Background In Brazil, schistosomiasis mansoni cases still occur, even in non-endemic areas. This study aimed to evaluate schistosomiasis mansoni cases and to delimit water collections investigated for infested planorbidae in São Carlos, São Paulo, Brazil. Methods A cross-sectional descriptive study and spatial analysis of schistosomiasis mansoni cases notified in the city from January 2005 to December 2017 was conducted. The study used geographical information system software to map residential and leisure exposures to water courses and bodies and related them to planorbidae surveys of São Paulo state. Results During the study period, 32 cases were notified. The main forms were intestinal and hepatosplenic. Twenty-eight cases were allochthonous, two autochthonous and two indeterminate. Eleven patients (33.3%) had contact with water …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

arXiv preprint arXiv:2307.02574

Semi-Supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation

Accurate building height estimation is key to the automatic derivation of 3D city models from emerging big geospatial data, including Volunteered Geographical Information (VGI). However, an automatic solution for large-scale building height estimation based on low-cost VGI data is currently missing. The fast development of VGI data platforms, especially OpenStreetMap (OSM) and crowdsourced street-view images (SVI), offers a stimulating opportunity to fill this research gap. In this work, we propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OSM data to generate low-cost and open-source 3D city modeling in LoD1. The proposed method consists of three parts: first, we propose an SSL schema with the option of setting a different ratio of "pseudo label" during the supervised regression; second, we extract multi-level morphometric features from OSM data (i.e., buildings and streets) for the purposed of inferring building height; last, we design a building floor estimation workflow with a pre-trained facade object detection network to generate "pseudo label" from SVI and assign it to the corresponding OSM building footprint. In a case study, we validate the proposed SSL method in the city of Heidelberg, Germany and evaluate the model performance against the reference data of building heights. Based on three different regression models, namely Random Forest (RF), Support Vector Machine (SVM), and Convolutional Neural Network (CNN), the SSL method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters, which …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Remote Sensing of Environment

Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks

Artificial intelligence (AI) approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-edge solutions with high generalization ability. To this end, we build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task (called C2Seg dataset), which consists of two cross-city scenes, i.e., Berlin-Augsburg (in Germany) and Beijing-Wuhan (in China). Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN for short, to promote the AI model's generalization ability from …

2023/12/15

Article Details
Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

ISPRS International Journal of Geo-Information

Private Vehicles Greenhouse Gas Emission Estimation at Street Level for Berlin Based on Open Data

As one of the major greenhouse gas (GHG) emitters that has not seen significant emission reductions in the previous decades, the transportation sector requires special attention from policymakers. Policy decisions, thereby need to be supported by traffic emission assessments. Estimations of traffic emissions often rely on huge amounts of actual traffic data whose availability is limited, hampering the transferability of the estimation approaches in time and space. Here, we propose a high-resolution estimation of traffic emissions, which is based entirely on open data, such as the road network and points of interest derived from OpenStreetMap (OSM). We estimated the annual average daily GHG emissions from individual motor traffic for the OSM road network in Berlin by combining the estimated Annual Average Daily Traffic Volume (AADTV) with respective emission factors. The AADTV was calculated by simulating car trips with the open routing engine Openrouteservice, weighted by activity functions based on statistics of the German Mobility Panel. Our estimated total annual GHG emissions were 7.3 million t CO2 equivalent. The highest emissions were estimated for the motorways and major roads connecting the city center with the outskirts. The application of the approach to Berlin showed that the method could reflect the traffic pattern. As the input data is freely available, the approach can be applied to other study areas within Germany with little additional effort.

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

Traffic Speed Modelling to Improve Travel Time Estimation in Openrouteservice

Time-dependent traffic speed information at a street level is important for routing services to estimate accurate travel times and to recommend routes which avoid traffic congestion. Still, most open-source routing machines that use OpenStreetMap (OSM) as the primary data source rely on static driving speeds derived from OSM tags, since comprehensive traffic speed data is not openly available. In this study, a method was developed to model traffic speed by hour of day at a street level using open data from OpenStreetMap, Twitter and population data. The modelled traffic speed data was subsequently integrated into the open-source routing engine openrouteservice to improve travel time estimation in route planning. Machine learning models were trained for ten cities worldwide using traffic speed data from Uber Movement as reference data. Different indicators based on geolocation and timestamp of Twitter data as well as a geographically adapted betweeness centrality indicator were evaluated for their potential to improve prediction accuracy. In all cities, the Twitter indicators improved the model, although this effect was only visible for certain road types. The centrality indicator improved the model as well but to a lesser extent. The Google Routing API was used as reference to evaluate the accuracy in travel time estimation. Deviations in travel times were regionally different and were partly alleviated by including the raw traffic data by Uber or the modelled traffic speed data in openrouteservice.

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Analyzing and Improving the Quality and Fitness for Purpose of OpenStreetMap as Labels in Remote Sensing Applications

OpenStreetMap (OSM) is a well-known example of volunteered geographic information. It has evolved to one of the most used geographic databases. As data quality of OSM is heterogeneous both in space and across different thematic domains, data quality assessment is of high importance for potential users of OSM data. As use cases differ with respect to their requirements, it is not data quality per se that is of interest for the user but fitness for purpose. We investigate the fitness for purpose of OSM to derive land-use and land-cover labels for remote sensing-based classification models. Therefore, we evaluated OSM land-use and land-cover information by two approaches:(1) assessment of OSM fitness for purpose for samples in relation to intrinsic data quality indicators at the scale of individual OSM objects and (2) assessment of OSM-derived multi-labels at the scale of remote sensing patches (. 1. 22× 1. 22 km) in combination with deep learning approaches. The first approach was applied to 1000 randomly selected relevant OSM objects. The quality score for each OSM object in the samples was combined with a large set of intrinsic quality indicators (such as the experience of the mapper, the number of mappers in a region, and the number of edits made to the object) and auxiliary information about the location of the OSM object (such as the continent or the ecozone). Intrinsic indicators were derived by a newly developed tool

Edson Augusto Melanda

Edson Augusto Melanda

Universidade Federal de São Carlos

TÉCNICA GEOBIA PARA IDENTIFICAÇÃO SEMI-AUTOMÁTICA DO LEITO REGULAR DE CURSOS D’ÁGUA A PARTIR DE IMAGENS OBTIDAS POR RPA

Os processos de ocupação do território pelo homem ocasionam alterações no uso do solo que impactam o meio ambiente e a vida da população. Como consequência tem-se a redução da vegetação nativa ao redor dos cursos hídricos, que interfere na qualidade e disponibilidade deste recurso. Diante dessas considerações o presente estudo teve como objetivo delimitar a borda da calha do leito regular de um curso d’água localizado no município de São Carlos–SP, por meio da técnica de imagem baseada em objeto geográfico utilizando imagens obtidas por aeronave remotamente pilotada. Os resultados encontrados apresentaram valor de índice Kappa e de exatidão global de 0.86 e 0.93 respectivamente, indicando que a classificação realizada foi considerada como “muito boa”. Desta maneira, destaca-se que este estudo poderá contribuir para projetos futuros que pretendem calcular a largura dos cursos hídricos e realizar demarcações corretas das áreas de preservação permanente.

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

UndercoverEisAgenten-Monitoring Permafrost Thaw in the Arctic using Local Knowledge and UAVs

The Arctic is experiencing severe changes to its landscapes due to the thawing of permafrost influenced by the twofold increase of temperature across the Arctic due to global warming compared to the global average. This process, which affects the livelihoods of indigenous people, is also associated with the further release of greenhouse gases and also connected to ecological impacts on the arctic flora and fauna. These small-scale changes and disturbances to the land surface caused by permafrost thaw have been inadequately documented.To better understand and monitor land surface changes, the project" UndercoverEisAgenten" is using a combination of local knowledge, satellite remote sensing, and data from unmanned aerial vehicles (UAVs) to study permafrost thaw impacts in Northwest Canada. The high-resolution UAV data will serve as a baseline for further analysis of optical and radar remote sensing time series data. The project aims to achieve two main goals: 1) to demonstrate the value of using unmanned aerial vehicle (UAV) data in remote regions of the global north, and 2) to involve young citizen scientists from schools in Canada and Germany in the process. By involving students in the project, the project aims to not only expand the use of remote sensing in these regions, but also provides educational opportunities for the participating students. By using UAVs and satellite imagery, the project aims to develop a comprehensive archive of observable surface features that indicate the degree of permafrost degradation. This will be accomplished through the use of automatic image enhancement techniques, as well as classical …

Other articles from Proceedings of the OSM Science journal

Oliver O'Brien

Oliver O'Brien

University College London

Proceedings of the OSM Science

Towards an open high-resolution land use dataset in Great Britain–Comparing and consolidating retail centre areas from open data sources

Great Britain does not have a comprehensive and openly licenced high-resolution land use dataset that includes detail on building usage, but OpenStreetMap (OSM) has potential as a good base for creation of such a dataset [1]. OSM’s quality and completeness is highly variable, but often good and improving, including for land use mapping [2, 3]. This research evaluates use of separate open datasets to augment OSM for Great Britain. The research focuses on retail areas as these have recently been impacted both by internet shopping and the COVID pandemic [4].This paper evaluates three generally openly available datasets showing retail centre extents across Great Britain, analysing each by areal footprint and, where available, premises counts. Firstly, the Consumer Data Research Centre (CDRC)’s Retail Centres Boundaries 2022 product, secondly non-domestic Energy Performance Certificates (EPCs) geolocated with Unique Property Reference Numbers (UPRNs), filtered for retail categories, and finally OSM land use retail polygons on their own.

Florian Ledermann

Florian Ledermann

Technische Universität Wien

Proceedings of the OSM Science

Mapping public space in urban neighbourhoods using OpenStreetMap data

OpenStreetMap (OSM) enriches the exploration and study of urban landscapes. In this research project, we aim to use OSM data to investigate urban public spaces from a distributional justice perspective. While public spaces are acknowledged as an important resource for urban society, it becomes important, in light of ongoing trends towards privatization, commercialization, and festivalization, to critically observe and reflect on the extent to which resources, rights and opportunities regarding public space are distributed equally. The amount, accessibility, and character of public space can differ between cities and neighbourhoods. A quantitative analysis of public space could offer insights into the distribution and availability of public space. To this end, we propose a framework for the identification and categorization of these spaces based on OSM data. The framework aims to enable both the mapping of public spaces as well as an evaluation of the share of public space. We also hope to investigate the potential of OSM data. Some preliminary findings and an introductory overview of the research process, with an emphasis on its cartographic aspects, were presented in a previous publication [1]. The inspiration for this research is the so-called Nolli map, a map of Rome dating back over 250 years. Giovanni Battista Nolli, an Italian architect, engineer and cartographer, analysed the urban fabric beyond the structure of roads and buildings. In his work, titled'La Nuova Topografia di Roma', Nolli mapped the interior and exterior spaces of Rome in high detail as a figure-ground map with contrasting dark and light sections. This distinction is commonly …

2023/12/29

Article Details
Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Proceedings of the OSM Science

OpenStreetMap Data for Automated Labelling Machine Learning Examples: The Challenge of Road Type Imbalance

Advances in Artificial Intelligence (AI) and, specifically, in Deep Learning (DL) have fostered geospatial analysis and remote sensing, culminating in the establishment of GeoAI [1, 2] and the solidification of research on methodologies and techniques for AI-assisted mapping [3-7]. Nevertheless, a particular challenge lies in the substantial demand for training examples in DL. Manual labelling of these examples is labour-intensive, consuming a considerable amount of time and financial resources. Alternatively, semi or automated labelling of data emerges as a prominent solution, as exemplified by the tool ohsome2label [8], which harnesses data from the OpenStreetMap [9] to label satellite images. However, moving from characterising object types (road, river, building) based on geometry to categorising them by attributes might result in an imbalanced class distribution in the utilised Machine Learning (ML) dataset.Such imbalances are common in numerous practical applications. Learning from skewed datasets can be particularly challenging and often requires non-conventional ML techniques. A comprehensive awareness of the issues associated with class imbalance, as well as strategies for mitigating them, is essential [10]. In the context of spatial data, the distribution of classes can vary from country to country and region to region, adding a new layer of complexity and exacerbating this issue. In this context, an analysis was conducted on the distribution of road types, defined by the values of the OSM" highway" tag, in diverse-profile nations. The aim was to evaluate the extent of class imbalance and to identify any consistent patterns in the …

Randall Guensler

Randall Guensler

Georgia Institute of Technology

Proceedings of the OSM Science

Assessing bike-transit accessibility with OpenStreetMap

Low-density land use, sprawl, and Euclidean zoning (ie, separation of commercial and residential land-uses) can reduce the effectiveness of public transit by reducing the number of homes, amenities, services, and jobs near transit stops [1, 2]. This gives rise to the first-last mile problem, where transit riders must travel long distances to access transit from their origin and from transit to their destination. Bicycles as a first and/or last-mile mode (henceforth referred to as bike-transit) can extend the service coverage area of a transit stop or station by allowing transit users to cover a greater distance in the same amount of time [3]. Not only can people reach transit stops faster on a bicycle than they could by walking; people using bicycles can also reach more transit stops within the same time frame. Lastly, people may be able to avoid bus feeder routes and cycle directly to higher service quality transit routes (such as rail).Despite bike-transit's potential for shortening travel times, bike-transit is not commonly modeled in traditional travel demand modeling or trip planners. This is because bike-transit trips are computationally intensive to calculate given the number of possible transit stop pairs and departure times. Our solution to this is to use bicycle and transit shortest path algorithms to demonstrate how bike-transit improves public transit's accessibility to destinations by reducing overall travel times, transit waiting times, and the number of transit transfers needed. Previous work has been done in assessing how bike-transit improves the effectiveness of public transit [4-7]; in this case, we will take an in-depth look at three different locations that are varying in …

Andy South

Andy South

Liverpool School of Tropical Medicine

Proceedings of the OSM Science

Developing a data validation method with OpenStreetMap Senegal and the Ministry of Health in support of accurate health facility data

This research examines the collaboration between a local OpenStreetMap chapter and health authorities to improve health facility data accuracy. By utilizing open data and statistical methods, communities can empower Ministries of Health, address Sustainable Development Goals (SDGs) indicators, and enhance emergency response. The healthsites. io Digital Public Good [1] has been working with OpenStreetmap Senegal [2] since 2017. We have established a data collaborative focused on health facility data that lives in OpenStreetMap. The collaborative is a semi-formal network that identifies and shares geospatial data on health to OpenStreetMap. It works to identify gaps and barriers to sharing, defines methodologies and data models for sharing and supports stakeholders with sharing and the use of data especially for decision-making. Crucially, the collaborative saves validated data to OpenStreetMap which means that successive projects are able to benefit from the work even when programs end. Accurate health data plays a vital role in effective healthcare planning, resource allocation, and emergency response. However, existing data sources often suffer from inaccuracies and limited sharing, hindering the potential for informed decision-making and comprehensive health interventions. In response, we have developed an Emergency Health data validation method [3]. The method involves local stakeholders and the healthsites. io open data platform as a means to enhance data quality and accessibility. The Global Fund's COVID-19 response mechanism underscores the significance of accurate health facility data [4]. This mechanism …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Proceedings of the OSM Science

Exploring road and points of interest (POIs) associations in OpenStreetMap, a new paradigm for OSM road class prediction

1 GIScience Research Group, Heidelberg University, Heidelberg, Germany; francis. andorful@ uni-heidelberg. de, nir. fulman@ uni-heidelberg. de 2 HeiGIT-Heidelberg Institute for Geoinformation Technology, 69120 Heidelberg, Germany; sven. lautenbach@ uni-heidelberg. de, christina. ludwing@ uni-heidelberg. de, herfort@ uni-heidelberg. de, zipf@ uni-heidelberg. de

Hao Li

Hao Li

Heidelberg University

Proceedings of the OSM Science

Beyond Two Dimensions: Large-Scale Building Height Mapping in OpenStreetMap via Synthetic Aperture Radar and Street-View Imagery

In the past decades, the world has been comprehensively mapped in 2D, however, a vertical dimension remains underexplored despite its huge potential. For instance, as of August 2023, more than 571 million buildings are mapped in OpenStreetMap (OSM) according to statistics from Taginfo, but less than 3% of them are associated with height values via the key/value pairs heights=*. Though one can often estimate the height information via OSM key/value pairs such as building: levels=* and stories=*. Mapping human settlements as a 3D representation of reality requires an accurate description of vertical dimensions besides the 2D footprints and shapes. A 3D representation of human settlement is important in many aspects, including public health, urban planning, and environment monitoring, disaster management, etc. In this context, a list of the most relevant 3D building attributes mainly includes but is not limited to building height, building floor, and roof type [1, 2]. For instance, building height is a key and fundamental factor in post-disaster (eg, earthquake and flood) damage and situation assessment. Similarly, the roof type information is beneficial in estimating photovoltaic electricity potential at scale. As defined in CityGML 2.0 [3], 3D building models are divided into five levels of detail (LoDs)[4]. In LoD0, only the 2D footprint information is involved in the model. In LoD1, the LoD0 model is extruded by their building heights, and the obtained cuboid after extrusion is the LoD1 model. In LoD2, the 3D roof structure information is added to the LoD2 model. The LoD3 model further contains facade elements such as windows and doors. The …

Nir Fulman

Nir Fulman

Tel Aviv University

Proceedings of the OSM Science

Exploring road and points of interest (POIs) associations in OpenStreetMap, a new paradigm for OSM road class prediction

1 GIScience Research Group, Heidelberg University, Heidelberg, Germany; francis. andorful@ uni-heidelberg. de, nir. fulman@ uni-heidelberg. de 2 HeiGIT-Heidelberg Institute for Geoinformation Technology, 69120 Heidelberg, Germany; sven. lautenbach@ uni-heidelberg. de, christina. ludwing@ uni-heidelberg. de, herfort@ uni-heidelberg. de, zipf@ uni-heidelberg. de

Sven Lautenbach

Sven Lautenbach

Ruprecht-Karls-Universität Heidelberg

Proceedings of the OSM Science

Exploring road and points of interest (POIs) associations in OpenStreetMap, a new paradigm for OSM road class prediction

1 GIScience Research Group, Heidelberg University, Heidelberg, Germany; francis. andorful@ uni-heidelberg. de, nir. fulman@ uni-heidelberg. de 2 HeiGIT-Heidelberg Institute for Geoinformation Technology, 69120 Heidelberg, Germany; sven. lautenbach@ uni-heidelberg. de, christina. ludwing@ uni-heidelberg. de, herfort@ uni-heidelberg. de, zipf@ uni-heidelberg. de

Edson Augusto Melanda

Edson Augusto Melanda

Universidade Federal de São Carlos

Proceedings of the OSM Science

OpenStreetMap Data for Automated Labelling Machine Learning Examples: The Challenge of Road Type Imbalance

Advances in Artificial Intelligence (AI) and, specifically, in Deep Learning (DL) have fostered geospatial analysis and remote sensing, culminating in the establishment of GeoAI [1, 2] and the solidification of research on methodologies and techniques for AI-assisted mapping [3-7]. Nevertheless, a particular challenge lies in the substantial demand for training examples in DL. Manual labelling of these examples is labour-intensive, consuming a considerable amount of time and financial resources. Alternatively, semi or automated labelling of data emerges as a prominent solution, as exemplified by the tool ohsome2label [8], which harnesses data from the OpenStreetMap [9] to label satellite images. However, moving from characterising object types (road, river, building) based on geometry to categorising them by attributes might result in an imbalanced class distribution in the utilised Machine Learning (ML) dataset.Such imbalances are common in numerous practical applications. Learning from skewed datasets can be particularly challenging and often requires non-conventional ML techniques. A comprehensive awareness of the issues associated with class imbalance, as well as strategies for mitigating them, is essential [10]. In the context of spatial data, the distribution of classes can vary from country to country and region to region, adding a new layer of complexity and exacerbating this issue. In this context, an analysis was conducted on the distribution of road types, defined by the values of the OSM" highway" tag, in diverse-profile nations. The aim was to evaluate the extent of class imbalance and to identify any consistent patterns in the …

Kari Watkins

Kari Watkins

Georgia Institute of Technology

Proceedings of the OSM Science

Assessing bike-transit accessibility with OpenStreetMap

Low-density land use, sprawl, and Euclidean zoning (ie, separation of commercial and residential land-uses) can reduce the effectiveness of public transit by reducing the number of homes, amenities, services, and jobs near transit stops [1, 2]. This gives rise to the first-last mile problem, where transit riders must travel long distances to access transit from their origin and from transit to their destination. Bicycles as a first and/or last-mile mode (henceforth referred to as bike-transit) can extend the service coverage area of a transit stop or station by allowing transit users to cover a greater distance in the same amount of time [3]. Not only can people reach transit stops faster on a bicycle than they could by walking; people using bicycles can also reach more transit stops within the same time frame. Lastly, people may be able to avoid bus feeder routes and cycle directly to higher service quality transit routes (such as rail).Despite bike-transit's potential for shortening travel times, bike-transit is not commonly modeled in traditional travel demand modeling or trip planners. This is because bike-transit trips are computationally intensive to calculate given the number of possible transit stop pairs and departure times. Our solution to this is to use bicycle and transit shortest path algorithms to demonstrate how bike-transit improves public transit's accessibility to destinations by reducing overall travel times, transit waiting times, and the number of transit transfers needed. Previous work has been done in assessing how bike-transit improves the effectiveness of public transit [4-7]; in this case, we will take an in-depth look at three different locations that are varying in …

Silvana Camboim

Silvana Camboim

Universidade Federal do Paraná

Proceedings of the OSM Science

Fostering OSM's Micromapping Through Combined Use of Artificial Intelligence and Street-View Imagery

Map scale is fundamental to cartography. The International Cartographic Association's definition of a map [1] already explicitly emphasizes the selection of specific features, pointing to the process of cartographic generalization. This operation simplifies the representation of geographic data to produce a map at a given scale [2]. In the past, geospatial data was collected in a standardized way. Representations at larger scales could then be derived. However, OpenStreetMap (OSM) has changed this approach. Each object can be captured individually, resulting in digital representations of varying and distinct accuracies. Nevertheless, products such as OSM's Slippy Map Tiles are designed for consistent scales, maintaining a uniform scale for each tile. This characteristic gives users a seemingly seamless view of the map. The tile layers on the OpenStreetMap website range in maximum scale from around 1: 2000 to 1: 250 [3, 4], suitable for mapping urban detail. In the context of collaborative mapping, the term Micromapping was coined as the" mapping of small geographic objects"[5] and appears as a topic of growing interest among the OpenStreetMap and general Volunteered Geographic Information community [5, 6, 7, 8], can be helpful in many applications like mapping large-scale infrastructure [5]; pedestrian security and flow prediction [6]; detailed 3D model generation and indoor mapping [7]; assistive technologies like tactile maps generation [8]; and also general-purpose micro mapping rendering [9]. There is also some discussion among the OSM community about their idiosyncrasies, comprising issues that may arise when there is a bigger …