Zhangxuan Gu

Researcher
Ant Group

Biography

I work on computer vision, multimodal learning, and GUI agents. I received my Ph.D. from Shanghai Jiao Tong University and am now a researcher at Ant Group.

Research Interests: Computer Vision, Object Detection, Multimodal Large Language Models, GUI Agent

Education

Ph.D. in Computer Science

2016 – 2022

Shanghai Jiao Tong University

Advisor: Professor Liqing Zhang

B.Sc. in Mathematics

2012 – 2016

Shanghai Jiao Tong University

Experience

Researcher

2022 – present

Ant Group

Publications

2026
UI-Venus-1.5 Technical Report Recent
Venus-Team: Changlong Gao* , Zhangxuan Gu* , Yulin Liu* , Xinyu Qiu* , Shuheng Shen* , Yue Wen* , Tianyu Xia* , Zhenyu Xu* , Zhengwen Zeng* , Beitong Zhou* , Xingran Zhou* , Weizhi Chen , Sunhao Dai , Jingya Dou , Yichen Gong , Yuan Guo , Zhenlin Guo , Feng Li , Qian Li , Jinzhen Lin , Yuqi Zhou , Linchao Zhu , Liang Chen , Zhenyu Guo , Changhua Meng , Weiqiang Wang
Arxiv, 2026
2025
VenusBench-GD: A Comprehensive Multi-Platform GUI Benchmark for Diverse Grounding Tasks Recent
Beitong Zhou* , Zhexiao Huang* , Yuan Guo* , Zhangxuan Gu* , Tianyu Xia , Zichen Luo , Fei Tang , Dehan Kong , Yanyi Shang , Suling Ou , Zhenlin Guo , Changhua Meng , Shuheng Shen
Arxiv, 2025
GUI-G2: Gaussian Reward Modeling for GUI Grounding Recent
Fei Tang , Zhangxuan Gu , Zhengxi Lu , Xuyang Liu , Shuheng Shen , Changhua Meng , Wen Wang , Wenqi Zhang , Yongliang Shen , Weiming Lu , Jun Xiao , Yueting Zhuang
AAAI, 2025
UI-Venus Technical Report: Building High-performance UI Agents with RFT Recent
Zhangxuan Gu* , Zhengwen Zeng* , Zhenyu Xu* , Xingran Zhou* , Shuheng Shen*^ , Yunfei Liu* , Beitong Zhou* , Changhua Meng , Tianyu Xia , Weizhi Chen , Yue Wen , Jingya Dou , Fei Tang , Jinzhen Lin , Yulin Liu , Zhenlin Guo , Yichen Gong , Heng Jia , Changlong Gao , Yuan Guo , Yong Deng , Zhenyu Guo , Liang Chen , Weiqiang Wang
Arxiv, 2025
2024
DiffusionInst: Diffusion Model for Instance Segmentation
Zhangxuan Gu , Haoxing Chen , Zhuoer Xu , Jun Lan , Changhua Meng , Weiqiang Wang
Icassp(oral), 2024
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Haoxing Chen , Yan Hong , Zizheng Huang , Zhuoer Xu , Zhangxuan Gu^ , Yaohui Li , Jun Lan , Huijia Zhu , Jianfu Zhang , Weiqiang Wang , Huaxiong Li
Arxiv, 2024
2023
Mobile User Interface Element Detection Via Adaptively Prompt Tuning
Zhangxuan Gu , Zhuoer Xu , Haoxing Chen , Jun Lan , Changhua Meng , Weiqiang Wang
CVPR, 2023
DiffUTE: Universal Text Editing Diffusion Model
Haoxing Chen , Zhuoer Xu , Zhangxuan Gu^ , Jun Lan , Xing Zheng , Yaohui Li , Changhua Meng , Huijia Zhu , Weiqiang Wang
NIPS, 2023
Hierarchical Dynamic Image Harmonization
Haoxing Chen , Zhangxuan Gu , Yaohui Li , Jun Lan , Changhua Meng , Weiqiang Wang , Huaxiong Li
ACMMM(oral), 2023
2022
Context-aware Feature Generation for Zero-shot Semantic Segmentation
Zhangxuan Gu , Siyuan Zhou , Li Niu , Zihan Zhao , Liqing Zhang
ACMMM, 2022
XYLayoutLM: Towards Layout-Aware Multimodal Networks For Visually-Rich Document Understanding
Zhangxuan Gu , Changhua Meng , Ke Wang , Jun Lan , Weiqiang Wang , Ming Gu , Liqing Zhang
CVPR, 2022
From Pixel to Patch: Synthesize Context-aware Features for Zero-shot Semantic Segmentation
Zhangxuan Gu , Siyuan Zhou , Li Niu , Zihan Zhao , Liqing Zhang
TNNLS, 2022
2020
Hard Pixel Mining for Depth Privileged Semantic Segmentation
Zhangxuan Gu , Li Niu , Haohua Zhao , Liqing Zhang
TMM, 2020