Swarm Robotics, Simplified UAV Control, and Autonomous Driving: Research and Projects by Skoltech Roboticists

февраль 07, 2025

On International Robotics Day, we will explore the work of engineers and scientists at the Intelligent Space Robotics Lab (Skoltech Center for Digital Engineering), led by Associate Professor Dzmitry Tsetserukou.

Prometheus

A system for teleoperating a manipulator or humanoid robot with a two-finger gripper equipped with force feedback. The purpose of the system is to generate datasets to perform various actions required for neural network training. A special feature of the system is that it allows the operator to sense the force with which it holds an object. The system works by using a force-sensor on the manipulator, which is mounted in conjunction with a force-transfer system. When the gripper starts to compress around the object, electronics designed to read and filter the data transmit information about the compression force to another board, which processes it and controls a motor built into a special handle. This handle allows not only to control the grip on the manipulator, but also to transmit to the user the sensation of the grip force.

The sensitivity of the system is so high that even the moment of gripping a plastic disposable cup can be felt. Experiments with users found that they squeeze objects about 40 per cent weaker than usual, making it possible to safely handle fragile objects.

The lab team designed the electrical circuits using microcontrollers, fabricated the circuit boards, created the force-sensor and handle designs, and wrote all the software code for the microcontroller and computer. The system is ready for small-scale production.

Research on Improving Navigation Capabilities for Vehicles or Mobile Robots

In the laboratory, researchers are working on integrating natural languages into an end-to-end autonomous driving model and visual-language navigation tasks to achieve reliable navigation capabilities for vehicles or mobile robots. Visual language models have shown significant influence on cross-modal learning. Harnessing the knowledge from these models to serve robotics is considered attractive by researchers.

The current work titled “METDrive: Multi-modal End-to-end Autonomous Driving with Temporal Guidance”** will be presented at the international conference ICRA 2025 (Core2023 A* Ranked, No. 1 Conference in Robotics, indexed in Scopus and WoS, H-index (SJR)=222).

This article introduces the METDrive system, which enhances autonomous driving through the use of multi-modal data. It analyzes both static and dynamic aspects of road conditions, making driving safer. METDrive utilizes vehicle state data (steering angles, throttle, route information) and sensor inputs to predict future paths. The authors proposed a special loss function that takes temporal changes into account.

The system was tested on the CARLA platform (Longest6 benchmark), where it demonstrated good results: 70% overall driving score, 94% route completion rate, and 0.78 violation count. This showcases its effectiveness under real-world conditions.

Operating system CognitiveOS

CognitiveOS is the first operating system designed for cognitive robots that can run on different robotic platforms. It consists of nine modules that help the robot to perform complex tasks in the real world. Depending on the task, the modules can be customised, modified or disabled and new ones can be added. This makes the system flexible and scalable compared to traditional approaches. CognitiveOS simplifies the work of researchers and developers, depriving them of the complexities of creating a system from scratch.

Experiments have shown that the system performs well, adapts to different environments, and outperforms other models in the Logical Reasoning category. The authors also provide code and data to reproduce the system.

Haptic technology

At the Intelligent Space Robotics Laboratory (ISR Lab), we are pioneering advancements in human-robot interaction (HRI) and haptics through artificial intelligence. As part of the HRI and Haptics team, my work focuses on integrating LLMs (large language models), VLMs (vision-language models), and VLAs (vision-language-action models) to enhance robotic perception, decision-making, and interaction. These AI systems allow our robots to understand and respond to their environment more intuitively, enabling applications such as:

Bimanual manipulation with collaborative robots, where robots generate actions based on AI-driven scene interpretation
Assistive navigation for blind individuals, using VLMs to enhance spatial awareness
Haptic interfaces combining neural networks and multimodal AI, for more immersive and natural interactions

This year, we are excited to present our research at IEEE/ACM Int. Conf. on Human Robot Interaction (HRI 2025) (Core2023 A) and IEEE Int. Conf. on Robotics and Automation (ICRA 2025) (Core2023 A*, No. 1 Conference in Robotics), while awaiting results from CHI 2025 and World Haptics.

Heterogeneous robot swarm with deep learning for autonomous navigation in unstructured environments LogiSAR

The critical advantage of heterogeneous robots in a swarm is their ability to distribute delivery tasks among agents that possess unique tools to address complex tasks in partially unknown environments.

The main areas of our research include deep reinforcement learning algorithms for task allocation and decentralised route finding for multirobot teams, as well as algorithms for swarm landing with unmanned aerial vehicles (UAVs) on mobile robots. In these research areas Skoltech's ISR laboratory is actively cooperating with the Indian Institute of Science Education and Research, Bhopal (IISER’B) and Hamad bin Khalifa University (HBKU), where teams of both universities work together on algorithms for multiagent path finding by heterogeneous robot swarms.

UAV-VLA

Технология генерирует крупномасштабные воздушные миссии с использованием ИИ, открывая двери для сквозных автономных полётов. В её основе — использование спутниковых снимков и мощных алгоритмов обработки текстов и изображений для создания маршрутов полётов и планов действий по простым запросам. Это позволяет лучше планировать миссии и оперативно принимать решения. В итоге система делает работу с БПЛА более эффективной и удобной. Новый метод продемонстрировал повышение точности поиска объектов на карте и сокращение длины маршрутов. Разработка будет полезным инструментом в создании датасетов для VLA моделей. Работа будет представлена на конференции IEEE/ACM International Conference on Human-Robot Interaction в марте этого года.

Race.AI

Проект представляет новый метод автономной навигации гоночных дронов, основанный на Visual-Language-Action (VLA) моделях и позволяющий эмулировать поведение человека-пилота. Основная цель исследования — разработка алгоритмов, которые позволяют дронам адаптировать стратегию полёта на основе визуальной и языковой информации в реальном времени, что приближает процесс принятия решений к человеческому уровню. Для достижения этой цели модель была обучена на специализированном датасете гоночных дронов, что позволило добиться высокой степени обобщения даже в сложных гоночных сценариях. Проект RaceVLA открывает новые горизонты в области автономной навигации, позволяя дронам быстро адаптироваться к изменяющимся условиям, что особенно важно в условиях динамических гоночных трасс.