Research

My research interests span general purpose robotic algorithms and specialised robotic systems, including semantic reasoning, planning, human-robot interaction, and service robots design. I aim to narrow the gap between robotics research and its applications in socially-aware scenarios. My long-term research goal is to advance the capability of household robotic assistants to adapt to unforeseen objects and tasks in diverse human-centric environments. Currently, I am focusing on developing open-world planning and interaction capabilities for service robots in daily-life assistive tasks.

Open-world Planning and Interaction

Develop algorithms and systems for service robots to interact with the open world.

Socially-Aware Service Robots

Autonomous service robots that can perform useful socially-aware applications for humans.

Navigation in the Wild

Enable service robots with complex navigation skills in unstructured environments.

Open-world Planning & Interaction

Robi Butler: Remote Multimodal Interactions with Household Robot Assistant

First Author | Accepted to ICRA 2025
We introduce Robi Butler, a novel household robotic system that enables multimodal interactions with remote users. Building on the advanced communication interfaces, Robi Butler allows users to monitor the robot's status, send text or voice instructions, and select target objects by hand pointing. At the core of our system is a high-level behavior module, powered by Large Language Models (LLMs), that interprets multimodal instructions to generate action plans. These plans are composed of a set of open vocabulary primitives supported by Vision Language Models (VLMs) that handle both text and pointing queries. The integration of the above components allows Robi Butler to ground remote multimodal instructions in the real-world home environment in a zero-shot manner.
Paper Video Website

Octopi: Object Property Reasoning with Large Tactile-Language Models

Collaboration | RSS 2024
In this work, we investigate combining tactile perception with language, which enables embodied systems to obtain physical properties through interaction and apply common-sense reasoning. We contribute a new dataset PHYSICLEAR, which comprises both physical/property reasoning tasks and annotated tactile videos obtained using a GelSight tactile sensor. We then introduce OCTOPI, a system that leverages both tactile representation learning and large vision-language models to predict and reason about tactile inputs with minimal language fine-tuning. Our evaluations on PHYSICLEAR show that OCTOPI is able to effectively use intermediate physical property predictions to improve physical reasoning in both trained tasks and for zero-shot reasoning.
Paper Website Code

LLM-State: Expandable State Representation for Long-horizon Task Planning in the Open World

Collaboration | Preprint
We propose a novel, expandable state representation that provides continuous expansion and updating of object attributes from the Language Model's inherent capabilities for context understanding and historical action reasoning. Our proposed representation maintains a comprehensive record of an object's attributes and changes, enabling robust retrospective summary of the sequence of actions leading to the current state. We validate our model through experiments across simulated and real-world task planning scenarios, demonstrating significant improvements over baseline methods in a variety of tasks requiring long-horizon state tracking and reasoning.
Paper Video

Socially-Aware Service Robots

Collaborative Trolley Transportation System with Autonomous Nonholonomic Robots

Research Mentor | IROS 2023
This paper presents an autonomous nonholonomic multi-robot system and a hierarchical autonomy framework for collaborative luggage trolley transportation. This framework finds kinematic-feasible paths, computes online motion plans, and provides feedback that enables the multi-robot system to handle long lines of luggage trolleys and navigate obstacles and pedestrians while dealing with multiple inherently complex and coupled constraints. We demonstrate the designed collaborative trolley transportation system through practical transportation tasks in complex and dynamic environments.
Paper Video

Quadruped Guidance Robot for the Visually Impaired: A Comfort-Based Approach

Research Mentor | ICRA 2023
We propose a novel guidance robot system with a comfort-based concept. To allow humans to be guided safely and more comfortably to the target position in complex environments, our proposed force planner can plan the forces experienced by the human with the force-based human motion model. And the proposed motion planner generate the specific motion command for robot and controllable leash to track the planned force. Our system has been deployed on Unitree Laikago quadrupedal platform and validated in real-world scenarios.
Paper Video

Robotic Autonomous Trolley Collection with Progressive Perception and Nonlinear Model Predictive Control

First Author | ICRA 2022
We propose a novel mobile manipulation system with applications in luggage trolley collection. The proposed system integrates a compact hardware design and a progressive perception stragy and MPC-based planning framework, enabling the system to efficiently and robustly collect trolleys in dynamic and complex environments. We demonstrate our design and framework by deploying the system on actual trolley collection tasks, and their effectiveness and robustness are experimentally validated.
Paper Video

Robotic Guide Dog: Leading a Human with Leash-Guided Hybrid Physical Interactions

First Author | ICRA 2021
We propose a hybrid physical Human-Robot Interaction model that involves leash tension to describe the dynamical relationship in the robot-guiding human system. This hybrid model is utilized in a mixed-integer programming problem to develop a reactive planner that is able to utilize slack-taut switching to guide a blind-folded person to safely travel in a confined space. The proposed leash-guided robot framework is deployed on a Mini Cheetah quadrupedal robot and validated in experiments.
Paper Video

Navigation in the Wild

GSON: A Group-based Social Navigation Framework with Large Multimodal Model

Research Mentor | In Submission
In this paper, we present a group-based social navigation framework GSON to enable mobile robots to perceive and exploit the social group of their surroundings by leveling the visual reasoning capability of the Large Multimodal Model (LMM). For perception, we apply visual prompting techniques to zero-shot extract the social relationship among pedestrians and combine the result with a robust pedestrian detection and tracking pipeline to alleviate the problem of low inference speed of the LMM. Given the perception result, the planning system is designed to avoid disrupting the current social structure. We adopt a social structure-based mid-level planner as a bridge between global path planning and local motion planning to preserve the global context and reactive response.
Paper Video

PUTN: A Plane-fitting based Uneven Terrain Navigation Framework

Research Mentor | IROS 2022
We proposed a plane-fitting based uneven terrain navigation framework(PUTN) which is designed for effectively navigating on uneven terrain. A new terrain assessment with plane-fitting to evaluate the traversability of the terrain is proposed. Combined with the informed-RRT* and this terrain assessment method, a new planning algorithm, PF-RRT*, is proposed. By using Gaussian Process, the traversability of the dense path is generated given the sample tree generated by PF-RRT*. The results verify the advantages of the PF-RRT* algorithm and the practicability of PUTN.
Paper Video Code

Autonomous Navigation with Optimized Jumping through Constrained Obstacles on Quadrupeds

Collaboration | CASE 2021
We developed an end-to-end framework that enabled multi-modal transitions between walking and jumping skills. Using multi-phased collocation based nonlinear optimization, optimal trajectories were generated for the quadrupedal robot while avoiding obstacles and allowing the robot to jump through window-shaped obstacles. An integrated state machine, path planner, and jumping and walking controllers enabled the Mini-Cheetah to jump over obstacles and navigate previously nontraversable areas.
Paper Video

Hexapod Robotic's Trajectory Tracking with DNN-Based Nonlinear Model Predictive Control

Collaboration | AIM 2021
We first contribute a well design deep neural network (DNN) as a precise black-box kinematic model of the amphibious robot. Then, we design a DNN based nonlinear model predictive controller which obtains the robot's real-time moving command by iterative optimization. The simulation results indicate the proposed controller is superior to the basic controller in the robot's tracking efficiency and accuracy.
Paper

Anxing Xiao

CS PhD Student at National University of Singapore

Research

Open-world Planning and Interaction

Socially-Aware Service Robots

Navigation in the Wild

Open-world Planning & Interaction

Robi Butler: Remote Multimodal Interactions with Household Robot Assistant

Octopi: Object Property Reasoning with Large Tactile-Language Models

LLM-State: Expandable State Representation for Long-horizon Task Planning in the Open World

Socially-Aware Service Robots

Collaborative Trolley Transportation System with Autonomous Nonholonomic Robots

Quadruped Guidance Robot for the Visually Impaired: A Comfort-Based Approach

Robotic Autonomous Trolley Collection with Progressive Perception and Nonlinear Model Predictive Control

Robotic Guide Dog: Leading a Human with Leash-Guided Hybrid Physical Interactions

Navigation in the Wild

GSON: A Group-based Social Navigation Framework with Large Multimodal Model

PUTN: A Plane-fitting based Uneven Terrain Navigation Framework

Autonomous Navigation with Optimized Jumping through Constrained Obstacles on Quadrupeds

Hexapod Robotic's Trajectory Tracking with DNN-Based Nonlinear Model Predictive Control

Media

Robotic Guide Dog [Apr. 2021]