New research identified areas for improving geospatial modeling
January 30, 2025

Geospatial modeling methods have become an important tool for environmental monitoring, which is used to manage environmental risks and monitor natural disaster threats. The modeling results are an important source of information for forecasting and understanding the consequences of various scenarios of socio-economic development and climate change. Geospatial research is increasingly taking advantage of machine learning methods to monitor vegetation cover, evaluate ecosystem functioning and biodiversity, as well as to combat fires, floods, and droughts.

Researchers publish many articles where they report on improving models, solving fundamental problems, and new approaches, including in the natural sciences. However, these publications often suffer from methodological errors, mainly due to the limitations inherent in machine learning. A group of scientists from Skoltech and AIRI Institute analyzed the academic literature, identified typical problems, and proposed solutions. The results are presented in the review article published in the Nature Communications journal.

“We found that among the key difficulties are the imbalance of the data, spatial autocorrelation, biases in the data, forecast errors, and difficulties in estimating model uncertainty. Although these problems are well-known, existing approaches often ignore them, limiting themselves to standard training procedures and validation of machine learning models,” said one of the lead authors Diana Koldasbayeva, a PhD student at Skoltech from the Computational and Data Science and Engineering program.

subscription
Image 1. Example of uncertainty quantification for spatial mapping. a) A map of one of the target variables — soil pH (water) in the topsoil layer. b) A map where higher values indicate greater uncertainty in the data. Source: Challenges in data-driven geospatial modeling for environmental research and practice.

“To eliminate these limitations, it is necessary to develop methods that consider the unique features of environmental data and spatiotemporal processes. The article presents a unified approach to solving such problems, including tools and techniques to improve the accuracy of models, as well as recommendations for improving their quality assessment. We hope that our results will help scientists from different countries choose their research directions,” said Alexey Zaitsev, a study co-author, an assistant professor at the Skoltech AI Center, and the head of the Skoltech-Sberbank Applied Research Laboratory.

The authors also identified key areas for the development of geospatial research with the specifics of environmental data in mind and presented their own collection of advanced tools, resources, and projects that use geospatial technologies to solve environmental problems. The collection is publicly available on GitHub and invites colleagues to use and supplement it.

“In the study, we introduced new datasets, models, and approaches to ensure the quality of work needed to implement applied scientific developments in the industry and solve the problem of interpretability of data-based forecasts. For example, it is extremely important to create well-organized databases. Better data naturally leads to a reduction in the distortions associated with imbalance and autocorrelation. We anticipate the emergence of self-supervised models trained on large semi-curated datasets for geospatial mapping in environmental research, similar to what we have seen in language modeling and computer vision,” commented Professor Evgeny Burnaev, the director of the Skoltech AI Center and head of the Learnable Intelligence research group at AIRI.