Distortions in Judged Spatial Relations in Large Language Models: The Dawn of Natural Language Geographic Data?

arXiv preprint arXiv:2401.04218

Published On 2024/1/8

We present a benchmark for assessing the capability of Large Language Models (LLMs) to discern intercardinal directions between geographic locations and apply it to three prominent LLMs: GPT-3.5, GPT-4, and Llama-2. This benchmark specifically evaluates whether LLMs exhibit a hierarchical spatial bias similar to humans, where judgments about individual locations' spatial relationships are influenced by the perceived relationships of the larger groups that contain them. To investigate this, we formulated 14 questions focusing on well-known American cities. Seven questions were designed to challenge the LLMs with scenarios potentially influenced by the orientation of larger geographical units, such as states or countries, while the remaining seven targeted locations less susceptible to such hierarchical categorization. Among the tested models, GPT-4 exhibited superior performance with 55.3% accuracy, followed by GPT-3.5 at 47.3%, and Llama-2 at 44.7%. The models showed significantly reduced accuracy on tasks with suspected hierarchical bias. For example, GPT-4's accuracy dropped to 32.9% on these tasks, compared to 85.7% on others. Despite these inaccuracies, the models identified the nearest cardinal direction in most cases, suggesting associative learning, embodying human-like misconceptions. We discuss the potential of text-based data representing geographic relationships directly to improve the spatial reasoning capabilities of LLMs.

Journal

arXiv preprint arXiv:2401.04218

Published On

2024/1/8

Authors

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Position

Chair of GIScience HeiGIT Heidelberg Institute for Geoinformation Technology

H-Index(all)

56

H-Index(since 2020)

39

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

Geoinformatics

GIScience

VGI

Geomatics

Geographic Information Science

Abdulkadir Memduhoğlu

Abdulkadir Memduhoğlu

Harran Üniversitesi

Position

Geomatic Engineering

H-Index(all)

10

H-Index(since 2020)

10

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

Geospatial Semantic Web

Cartography

GIScience

University Profile Page

Nir Fulman

Nir Fulman

Tel Aviv University

Position

PhD candidate

H-Index(all)

5

H-Index(since 2020)

5

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

Spatial modeling

Transportation

GIS

University Profile Page

Other Articles from authors

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Geo-spatial Information Science

An investigation of the temporality of OpenStreetMap data contribution activities

OpenStreetMap (OSM) is a dataset in constant change and this dynamic needs to be better understood. Based on 12-year time series of seven OSM data contribution activities extracted from 20 large cities worldwide, we investigate the temporal dynamic of OSM data production, more specifically, the auto- and cross-correlation, temporal trend, and annual seasonality of these activities. Furthermore, we evaluate and compare nine different temporal regression methods for forecasting such activities in horizons of 1–4 weeks. Several insights could be obtained from our analyses, including that the contribution activities tend to grown linearly in a moderate intra-annual cycle. Also, the performance of the temporal forecasting methods shows that they yield in general more accurate estimations of future contribution activities than a baseline metric, i.e. the arithmetic average of recent previous observations. In particular, the …

Nir Fulman

Nir Fulman

Tel Aviv University

Cities

A project-based view of urban dynamics: Analyzing ‘leapfrogging’ and fringe development in Israel

Analyzing urban pattern dynamics based on construction projects, we classify them into three types - infilling, fringe, and leapfrogging, and focus on the role of leapfrogging projects as seeds for new developments, leading to uncontrolled urban sprawl. To study the leapfrogging phenomenon, we investigate the sprawl of three Israeli cities - Netanya, Haifa, and Safed over 54 years from 1964 to 2018 and conduct a country-wide analysis of the urban sprawl of all 66 Israeli municipalities between 2013 and 2018. Our analysis is based on a country-wide GIS database of roads, buildings, other infrastructure elements, and development plans, as well as high-resolution aerial photos covering the investigated areas and periods. We uncover and characterize a positive feedback mechanism of rapid leapfrogging developments that attract further developments in their proximity and emphasize the potential of leapfrogging …

Abdulkadir Memduhoğlu

Abdulkadir Memduhoğlu

Harran Üniversitesi

Environment and Planning B: Urban Analytics and City Science

Semantic enrichment of building functions through geospatial data integration and ontological inference

The comprehensive definition of buildings in urban spatial databases (SDBs) and geographic information systems (GISs) is crucial for city management, considering their essential role in cities. Modern urban geospatial applications such as smart cities need a rich-information infrastructure, relying upon urban SDBs and GISs, powered by multiple data sources. In this respect, many geospatial techniques are available for acquiring geometric data of buildings, but their semantic definitions require extra effort. Today, volunteered geographic information (VGI) platforms are assumed to be alternative geospatial data sources and their integration with official datasets provides a new opportunity for the enrichment of urban geospatial datasets. In this context, geospatial semantic web technologies can contribute to the semantic enrichment process. This study presents a methodology for enriching various building datasets …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

ERDKUNDE

How to assess the needs of vulnerable population groups towards heat-sensitive routing? An evidence-based and practical approach to reducing urban heat stress

Heat poses a significant risk to human health, particularly for vulnerable populations, such as pregnant women, older individuals, young children and people with pre-existing medical conditions. In view of this, we formulated a heat stress-avoidant routing approach in Heidelberg, Germany, to ensure mobility and support day-to-day activities in urban areas during heat events. Although the primary focus is on pedestrians, it is also applicable to cyclists. To obtain a nuanced understanding of the needs and demands of the wider population, especially vulnerable groups, and to address the challenge of reducing urban heat stress, we used an inter-and transdisciplinary approach. The needs of vulnerable groups, the public, and the city administration were identified through participatory methods and various tools, including interactive city walks. Solution approaches and adaptation measures to prevent heat stress were evaluated and integrated into the development of a heat-avoiding route service through a co-design process. The findings comprise the identification of perceived hotspots for heat (such as large public spaces in the city centre with low shading levels), the determination of commonly reported symptoms resulting from severe heat (eg, fatigue or lack of concentration), and the assessment of heat adaptation measures that were rated positively, including remaining in the shade and delaying errands. Additionally, we analysed and distinguished between individual and community adaptation strategies. Overall, many respondents did not accurately perceive the risk of heat stress in hot weather, despite severe limitations. As a result, the heat …

Nir Fulman

Nir Fulman

Tel Aviv University

Epidemiology

Residential Greenness and Long-term Mortality Among Patients Who Underwent Coronary Artery Bypass Graft Surgery

Background:Studies have reported inverse associations between exposure to residential greenness and mortality. Greenness has also been associated with better surgical recovery. However, studies have had small sample sizes and have been restricted to clinical settings. We investigated the association between exposure to residential greenness and all-cause mortality among a cohort of cardiac patients who underwent coronary artery bypass graft (CABG) surgery.Methods:We studied this cohort of 3,128 CABG patients between 2004 and 2009 at seven cardiothoracic departments in Israel and followed patients until death or 1st May 2021. We collected covariate information at the time of surgery and calculated the patient-level average normalized difference vegetation index (NDVI) over the entire follow-up in a 300 m buffer from the home address. We used Cox proportional hazards regression models to …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Proceedings of the OSM Science

OpenStreetMap Data for Automated Labelling Machine Learning Examples: The Challenge of Road Type Imbalance

Advances in Artificial Intelligence (AI) and, specifically, in Deep Learning (DL) have fostered geospatial analysis and remote sensing, culminating in the establishment of GeoAI [1, 2] and the solidification of research on methodologies and techniques for AI-assisted mapping [3-7]. Nevertheless, a particular challenge lies in the substantial demand for training examples in DL. Manual labelling of these examples is labour-intensive, consuming a considerable amount of time and financial resources. Alternatively, semi or automated labelling of data emerges as a prominent solution, as exemplified by the tool ohsome2label [8], which harnesses data from the OpenStreetMap [9] to label satellite images. However, moving from characterising object types (road, river, building) based on geometry to categorising them by attributes might result in an imbalanced class distribution in the utilised Machine Learning (ML) dataset.Such imbalances are common in numerous practical applications. Learning from skewed datasets can be particularly challenging and often requires non-conventional ML techniques. A comprehensive awareness of the issues associated with class imbalance, as well as strategies for mitigating them, is essential [10]. In the context of spatial data, the distribution of classes can vary from country to country and region to region, adding a new layer of complexity and exacerbating this issue. In this context, an analysis was conducted on the distribution of road types, defined by the values of the OSM" highway" tag, in diverse-profile nations. The aim was to evaluate the extent of class imbalance and to identify any consistent patterns in the …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Environmental Monitoring and Assessment

Carbon fluxes related to land use and land cover change in Baden-Württemberg

Spatially explicit information on carbon fluxes related to land use and land cover change (LULCC) is of value for the implementation of local climate change mitigation strategies. However, estimates of these carbon fluxes are often aggregated to larger areas. We estimated committed gross carbon fluxes related to LULCC in Baden-Württemberg, Germany, using different emission factors. In doing so, we compared four different data sources regarding their suitability for estimating the fluxes: (a) a land cover dataset derived from OpenStreetMap (OSMlanduse); (b) OSMlanduse with removal of sliver polygons (OSMlanduse cleaned), (c) OSMlanduse enhanced with a remote sensing time series analysis (OSMlanduse+); (d) the LULCC product of Landschaftsveränderungsdienst (LaVerDi) from the German Federal Agency of Cartography and Geodesy. We produced a high range of carbon flux estimates, mostly caused …

Abdulkadir Memduhoğlu

Abdulkadir Memduhoğlu

Harran Üniversitesi

Intercontinental Geoinformation Days

Evaluating the ground point classification performance of Agisoft Metashape Software

This paper investigates the complex process of extracting bare land surfaces from point clouds, with a particular focus on filtering out objects such as trees, buildings, and vehicles. It underscores the importance of this task in diverse domains, including cadastral surveying, base mapping, and various geographical sciences, all while excluding specific reference to LiDAR and GIS applications. The research provides an extensive exploration of different algorithms used for point cloud filtering, culminating in a comprehensive evaluation of Agisoft's ground point filtering algorithm in contrast to the well-recognized CSF method. For this comparison, an Unmanned Aerial Vehicle (UAV) flight was performed at Harran University's Osmanbey campus to generate the necessary point cloud. The results of this assessment reveal that a significant portion of the obtained points pertains to ground points, underscoring the efficacy of the filtering process in producing Digital Terrain Models (DTMs). The numerical findings demonstrate that the overall accuracy stands at 0.002, with minimal Type I and Type II errors, reaffirming the robust performance of the filtering algorithms in producing accurate DTMs.

2023/12/19

Article Details
Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Engineering Proceedings

Urban Heat Island Intensity Prediction in the Context of Heat Waves: An Evaluation of Model Performance

Urban heat islands, characterized by higher temperatures in cities compared to surrounding areas, have been studied using various techniques. However, during heat waves, existing models often underestimate the intensity of these heat islands compared to empirical measurements. To address this, an hourly time-series-based model for predicting heat island intensity during heat wave conditions is proposed. The model was developed and validated using empirical data from the National Monitoring Network in Temuco, Chile. Results indicate a strong correlation (r > 0.98) between the model’s predictions and actual monitoring data. Additionally, the study emphasizes the importance of considering the unique microclimatic characteristics and built environment of each city when modelling urban heat islands. Factors such as urban morphology, land cover, and anthropogenic heat emissions interact in complex ways, necessitating tailored modelling approaches for the accurate representation of heat island phenomena.

Nir Fulman

Nir Fulman

Tel Aviv University

Transport Policy

Investigating occasional travel patterns based on smartcard transactions

Public transportation (PT) studies often neglect non-routine trips focusing predominantly on commuting. However, recent research revealed that occasional trips make up a substantial portion of public transport journeys, and traveler preferences for non-routine trips diverge from their preferences for regular commuting. We study non-routine trips based on a database of 63 million smartcard (SC) records of PT boardings made in Israel during June 2019. The characteristics of these trips are revealed by clustering PT users’ boarding records based on the location of the boarding stops and time of day, applying an extended DBSCAN algorithm. Our major findings are that (1) conventional home-work-home commuters are a minority in Israel and constitute less than 15% of the riders; (2) at least 30% of the PT trips do not belong to any cluster and can be classified as occasional; (3) The vast majority of users make both …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Proceedings of the OSM Science

Exploring road and points of interest (POIs) associations in OpenStreetMap, a new paradigm for OSM road class prediction

1 GIScience Research Group, Heidelberg University, Heidelberg, Germany; francis. andorful@ uni-heidelberg. de, nir. fulman@ uni-heidelberg. de 2 HeiGIT-Heidelberg Institute for Geoinformation Technology, 69120 Heidelberg, Germany; sven. lautenbach@ uni-heidelberg. de, christina. ludwing@ uni-heidelberg. de, herfort@ uni-heidelberg. de, zipf@ uni-heidelberg. de

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

International Journal of Applied Earth Observation and Geoinformation

Semi-supervised water tank detection to support vector control of emerging infectious diseases transmitted by Aedes Aegypti

The disease transmitting mosquito Aedes Aegypti is an increasing global threat. It breeds in small artificial containers such as rainwater tanks and can be characterized by a short flight range. The resulting high spatial variability of abundance is challenging to model. Therefore, we tested an approach to map water tank density as a spatial proxy for urban Aedes Aegypti habitat suitability. Water tank density mapping was performed by a semi-supervised self-training approach based on open accessible satellite imagery for the city of Rio de Janeiro. We ran a negative binomial generalized linear regression model to evaluate the statistical significance of water tank density for modeling inner-urban Aedes Aegypti distribution measured by an entomological surveillance system between January 2019 and December 2021. Our proposed semi-supervised model outperformed a supervised model for water tank detection …

Abdulkadir Memduhoğlu

Abdulkadir Memduhoğlu

Harran Üniversitesi

Intercontinental Geoinformation Days

Determination of suitable areas for wind power plant installation in Şanlıurfa with GIS and AHP

Despite the increasing energy consumption day by day, the demand for energy is also growing. Currently, energy is obtained in two different ways: renewable and non-renewable sources. Non-renewable sources cannot meet the increasing energy demand due to their limited availability. The use of non-renewable sources results in carbon emissions, posing a risk to living beings. Therefore, people have turned to renewable energy sources in search of alternative solutions. Wind energy, as one of the renewable energy sources, stands out as a clean and sustainable energy source. To harness this source, it is necessary to establish Wind Power Plants (WPP). For the efficient operation of these plants, proper site selection is crucial. In this study, Geographic Information Systems (GIS) and the Analytic Hierarchy Process (AHP) method were utilized to determine suitable areas for WPP installation in a part of Şanlıurfa province. Seven criteria were identified to be considered in determining the suitable areas for WPP, including wind speed, land use, slope, distance to power transmission lines, distance to highways, distance to active fault lines, and distance to residential areas. These criteria were compared with each other using the AHP method to establish a priority ranking. Based on this ranking, the areas suitable for WPP installation were evaluated. The evaluation results identified a total area of 42.68 km² as highly suitable for WPP installation in the study area.

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Nature Communications

A spatio-temporal analysis investigating completeness and inequalities of global urban building data in OpenStreetMap

OpenStreetMap (OSM) has evolved as a popular dataset for global urban analyses, such as assessing progress towards the Sustainable Development Goals. However, many analyses do not account for the uneven spatial coverage of existing data. We employ a machine-learning model to infer the completeness of OSM building stock data for 13,189 urban agglomerations worldwide. For 1,848 urban centres (16% of the urban population), OSM building footprint data exceeds 80% completeness, but completeness remains lower than 20% for 9,163 cities (48% of the urban population). Although OSM data inequalities have recently receded, partially as a result of humanitarian mapping efforts, a complex unequal pattern of spatial biases remains, which vary across various human development index groups, population sizes and geographic regions. Based on these results, we provide recommendations for data …

Nir Fulman

Nir Fulman

Tel Aviv University

AGILE: GIScience Series

Exploring Non-Routine Trips Through Smartcard Transaction Analysis

Public transportation (PT) studies often overlook non-routine trips, focusing on commuting trips. However, recent research reveals that occasional trips comprise a significant portion of public transportation trips. Furthermore, traveler preferences for non-routine trips essentially differ from their preferences for regular commuting. We investigate non-routine trips based on a database of 63 million records of PT boardings made in Israel during June 2019. The behavioral patterns of PT users are revealed by clustering their boarding records based on the location of the boarding stops and time of day, applying an extended DBSCAN algorithm. Our major findings are that (1) conventional home-work-home commuters are a minority and constitute less than 15% of Israeli riders; (2) at least 30% of the PT trips do not belong to any cluster and can be classified occasional; (3) The vast majority of users make both recurrent and occasional trips. A linear regression model provides a good estimate (R2 = 0.85) of the number of occasional boardings at a stop as a function of the total number of boardings, time of a day, and land use composition around the trip origin.

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

Challenges and solution approach for greenhouse gas emission inventories at fine spatial resolutions–the example of the Rhine-Neckar district

This discussion paper originated as the concluding publication of one of the pilot projects of the "Climate Action Science" research initiative at Heidelberg Center for the Environment (HCE), focusing on the Rhine-Neckar district and the city of Heidelberg. The aim of the explorative project was to generate a first overview on greenhouse gas emission data in order to initiate climate action of various actors and to provide well-founded support by using accurate infor-mation. The focus during the pilot phase was on the collection, compilation and evaluation of the quality of heterogeneous data sets and methods for a greenhouse gas emission inventory, as well as on the information preparation and evaluation of different inventory and presentation options. These should in turn be adapted to the needs of different users and fields of applica-tion. The study focused on different German approaches to greenhouse gas accounting, espe-cially in Baden-Württemberg compared to other German states, and in detail on the City of Heidelberg compared to the surrounding municipalities in the Rhine-Neckar district. The over-arching goal is to use the results beyond the case study projected here as a stimulus and pre-liminary work for further projects and activities in the overall "Climate Action Science" project. Several difficulties were encountered in processing the emissions inventory and compiling var-ious data sets on emissions in the study area. Three basic situations were identified: 1. De-sired data is not available (measurements required), 2. Desired data is not freely accessible (stakeholder involvement), 3. Data generation via proxy data. In the pilot phase …

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

European Neuropsychopharmacology

Initial response to the COVID-19 pandemic on real-life well-being, social contact and roaming behavior in patients with schizophrenia, major depression and healthy controls: A …

The COVID-19 pandemic strongly impacted people's daily lives. However, it remains unknown how the pandemic situation affects daily-life experiences of individuals with preexisting severe mental illnesses (SMI). In this real-life longitudinal study, the acute onset of the COVID-19 pandemic in Germany did not cause the already low everyday well-being of patients with schizophrenia (SZ) or major depression (MDD) to decrease further. On the contrary, healthy participants’ well-being, anxiety, social isolation, and mobility worsened, especially in healthy individuals at risk for mental disorder, but remained above the levels seen in patients. Despite being stressful for healthy individuals at risk for mental disorder, the COVID-19 pandemic had little additional influence on daily-life well-being in psychiatric patients with SMI. This highlights the need for preventive action and targeted support of this vulnerable population.

Other articles from arXiv preprint arXiv:2401.04218 journal

Alexander Zipf

Alexander Zipf

Ruprecht-Karls-Universität Heidelberg

arXiv preprint arXiv:2401.04218

Distortions in Judged Spatial Relations in Large Language Models: The Dawn of Natural Language Geographic Data?

We present a benchmark for assessing the capability of Large Language Models (LLMs) to discern intercardinal directions between geographic locations and apply it to three prominent LLMs: GPT-3.5, GPT-4, and Llama-2. This benchmark specifically evaluates whether LLMs exhibit a hierarchical spatial bias similar to humans, where judgments about individual locations' spatial relationships are influenced by the perceived relationships of the larger groups that contain them. To investigate this, we formulated 14 questions focusing on well-known American cities. Seven questions were designed to challenge the LLMs with scenarios potentially influenced by the orientation of larger geographical units, such as states or countries, while the remaining seven targeted locations less susceptible to such hierarchical categorization. Among the tested models, GPT-4 exhibited superior performance with 55.3% accuracy, followed by GPT-3.5 at 47.3%, and Llama-2 at 44.7%. The models showed significantly reduced accuracy on tasks with suspected hierarchical bias. For example, GPT-4's accuracy dropped to 32.9% on these tasks, compared to 85.7% on others. Despite these inaccuracies, the models identified the nearest cardinal direction in most cases, suggesting associative learning, embodying human-like misconceptions. We discuss the potential of text-based data representing geographic relationships directly to improve the spatial reasoning capabilities of LLMs.

Nir Fulman

Nir Fulman

Tel Aviv University

arXiv preprint arXiv:2401.04218

Distortions in Judged Spatial Relations in Large Language Models: The Dawn of Natural Language Geographic Data?

We present a benchmark for assessing the capability of Large Language Models (LLMs) to discern intercardinal directions between geographic locations and apply it to three prominent LLMs: GPT-3.5, GPT-4, and Llama-2. This benchmark specifically evaluates whether LLMs exhibit a hierarchical spatial bias similar to humans, where judgments about individual locations' spatial relationships are influenced by the perceived relationships of the larger groups that contain them. To investigate this, we formulated 14 questions focusing on well-known American cities. Seven questions were designed to challenge the LLMs with scenarios potentially influenced by the orientation of larger geographical units, such as states or countries, while the remaining seven targeted locations less susceptible to such hierarchical categorization. Among the tested models, GPT-4 exhibited superior performance with 55.3% accuracy, followed by GPT-3.5 at 47.3%, and Llama-2 at 44.7%. The models showed significantly reduced accuracy on tasks with suspected hierarchical bias. For example, GPT-4's accuracy dropped to 32.9% on these tasks, compared to 85.7% on others. Despite these inaccuracies, the models identified the nearest cardinal direction in most cases, suggesting associative learning, embodying human-like misconceptions. We discuss the potential of text-based data representing geographic relationships directly to improve the spatial reasoning capabilities of LLMs.

Abdulkadir Memduhoğlu

Abdulkadir Memduhoğlu

Harran Üniversitesi

arXiv preprint arXiv:2401.04218

Distortions in Judged Spatial Relations in Large Language Models: The Dawn of Natural Language Geographic Data?

We present a benchmark for assessing the capability of Large Language Models (LLMs) to discern intercardinal directions between geographic locations and apply it to three prominent LLMs: GPT-3.5, GPT-4, and Llama-2. This benchmark specifically evaluates whether LLMs exhibit a hierarchical spatial bias similar to humans, where judgments about individual locations' spatial relationships are influenced by the perceived relationships of the larger groups that contain them. To investigate this, we formulated 14 questions focusing on well-known American cities. Seven questions were designed to challenge the LLMs with scenarios potentially influenced by the orientation of larger geographical units, such as states or countries, while the remaining seven targeted locations less susceptible to such hierarchical categorization. Among the tested models, GPT-4 exhibited superior performance with 55.3% accuracy, followed by GPT-3.5 at 47.3%, and Llama-2 at 44.7%. The models showed significantly reduced accuracy on tasks with suspected hierarchical bias. For example, GPT-4's accuracy dropped to 32.9% on these tasks, compared to 85.7% on others. Despite these inaccuracies, the models identified the nearest cardinal direction in most cases, suggesting associative learning, embodying human-like misconceptions. We discuss the potential of text-based data representing geographic relationships directly to improve the spatial reasoning capabilities of LLMs.