CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Qadruped Robot

The project focus on CognitiveDog, a pioneering development of quadruped robot with Large Multi-modal Model (LMM) that is capable of not only communicating with humans verbally but also physically interacting with the environment through object manipulation. The system is realized on Unitree Go1 robot-dog equipped with a custom gripper and demonstrated autonomous decision-making capabilities, independently determining the most appropriate actions and interactions with various objects to fulfill user-defined tasks. These tasks do not necessarily include direct instructions, challenging the robot to comprehend and execute them based on natural language input and environmental cues. Key to this development is the robot’s proficiency in navigating space using Visual-SLAM, effectively manipulating and transporting objects, and providing insightful natural language commentary during task execution.

Project status: implemented.

Center for Digital Engineering

Intelligent Space Robotics Laboratory

creators

publications

Artem Lykov, Mikhail Litvinov, Mikhail Konenkov, Rinat Prochii, Nikita Burtsev, Ali Alridha Abdulkarim, Artem Bazhenov, Vladimir Berman, Dzmitry Tsetserukou. CognitiveDog: Large Multimodal Model Based System to Translate Vision and Language into Action of Quadruped Robot. HRI '24: Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, pages 712 – 716, 2024.