About Me

I am a researcher at the UI-Venus Team, Ant Group. I received my Ph.D. in Computer Science from Shanghai Jiao Tong University in 2022, advised by Professor Liqing Zhang. Prior to that, I obtained my B.Sc. in Mathematics from SJTU in 2016.

My research benefits from collaboration with esteemed colleagues including Mr. Haoxing Chen, and Mr. Zhuoer Xu.

Research Interest

My research spans computer vision, multimodal learning, and GUI agents. Currently, I focus on the following research topics:

Multimodal Large Language Models

Building high-performance multimodal models that bridge vision and language understanding, with applications in document understanding and visual grounding tasks.

GUI Agents

Developing intelligent agents that can understand and interact with graphical user interfaces, enabling automated UI testing, element detection, and task completion on mobile and desktop platforms.

Object Detection & Segmentation

Advancing detection and segmentation methods including diffusion-based approaches, zero-shot semantic segmentation, and UI element detection with prompt tuning.

Computer Vision

Research on image harmonization, text editing, depth-privileged segmentation, and other fundamental vision tasks with practical applications.

Experiences

Researcher | UI-Venus Team, Ant Group

2022 – Present

Ph.D. in Computer Science | Shanghai Jiao Tong University

2016 – 2022. Advisor: Prof. Liqing Zhang

B.Sc. in Mathematics | Shanghai Jiao Tong University

2012 – 2016

Selected Publications Google Scholar

2026

UI-Venus-1.5 Technical Report
Venus-Team: Changlong Gao*, Zhangxuan Gu*, Yulin Liu*, Xinyu Qiu*, Shuheng Shen*, Yue Wen*, Tianyu Xia*, Zhenyu Xu*, Zhengwen Zeng*, Beitong Zhou*, Xingran Zhou*, et al.
Arxiv, 2026

2025

VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks
Beitong Zhou*, Zhexiao Huang*, Yuan Guo*, Zhangxuan Gu*, Tianyu Xia, Zichen Luo, Fei Tang, Dehan Kong, Yanyi Shang, Suling Ou, Zhenlin Guo, Changhua Meng, Shuheng Shen
Arxiv, 2025
GUI-G2: Gaussian Reward Modeling for GUI Grounding
Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, Wenqi Zhang, Yongliang Shen, Weiming Lu, Jun Xiao, Yueting Zhuang
AAAI 2025 CCF-A
UI-Venus Technical Report: Building High-performance UI Agents with RFT
Zhangxuan Gu*, Zhengwen Zeng*, Zhenyu Xu*, Xingran Zhou*, Shuheng Shen*, Yunfei Liu*, Beitong Zhou*, Changhua Meng, Tianyu Xia, et al.
Arxiv, 2025

2024

DiffusionInst: Diffusion Model for Instance Segmentation
Zhangxuan Gu, Haoxing Chen, Zhuoer Xu, Jun Lan, Changhua Meng, Weiqiang Wang
ICASSP 2024 CCF-B Oral
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu^, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, Huaxiong Li
SCIS, 2024 CCF-A

2023

Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Zhangxuan Gu, Zhuoer Xu, Haoxing Chen, Jun Lan, Changhua Meng, Weiqiang Wang
CVPR 2023CCF-A
DiffUTE: Universal Text Editing Diffusion Model
Haoxing Chen, Zhuoer Xu, Zhangxuan Gu^, Jun Lan, Xing Zheng, Yaohui Li, Changhua Meng, Huijia Zhu, Weiqiang Wang
NeurIPS 2023 CCF-A
Hierarchical Dynamic Image Harmonization
Haoxing Chen, Zhangxuan Gu, Yaohui Li, Jun Lan, Changhua Meng, Weiqiang Wang, Huaxiong Li
ACM MM 2023 CCF-A Oral

2022

Context-aware Feature Generation for Zero-shot Semantic Segmentation
Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, Liqing Zhang
ACM MM 2022 CCF-A
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding
Zhangxuan Gu, Changhua Meng, Ke Wang, Jun Lan, Weiqiang Wang, Ming Gu, Liqing Zhang
CVPR 2022 CCF-A
From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation
Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, Liqing Zhang
TNNLS, 2022CCF-B

2020

Hard Pixel Mining for Depth Privileged Semantic Segmentation
Zhangxuan Gu, Li Niu, Haohua Zhao, Liqing Zhang
TMM, 2020 CCF-A