Zhenxing Zhang

张振兴 Zhenxing Zhang

博士研究生 | 研究实习生 PhD Student | Research Intern
合肥工业大学计算机科学与技术专业博士研究生,毕业于合肥工业大学软件工程专业,师从郭丹教授汪萌教授。现于智谱AI担任研究实习生,主要研究方向为图像生成模型的后训练优化,包括模型蒸馏、强化学习以及基础设施搭建。致力于通过先进的技术手段提升生成模型的性能和效率。 PhD student at Hefei University of Technology with a BS in Software Engineering, advised by Prof. Dan Guo and Prof. Meng Wang. Currently working as a Research Intern at Zhipu AI. My research focuses on post-training optimization for image generation models, including model distillation, reinforcement learning, and infrastructure development. I am passionate about advancing the boundaries of visual content understanding and generation through deep learning techniques.

关于我 About Me

我是一名对人工智能和图像生成充满热情的研究者。目前在智谱AI实习,专注于图像生成模型的后训练优化工作,包括模型蒸馏、强化学习以及基础设施搭建。我致力于通过技术创新提升生成模型的性能和效率,推动图像生成领域的发展。 I am a passionate researcher in artificial intelligence and image generation. Currently working at Zhipu AI, I focus on post-training optimization for image generation models, including model distillation, reinforcement learning, and infrastructure development. My goal is to enhance the performance and efficiency of generative models through technical innovation, advancing the field of image generation.

工作经历 Experience

研究实习生 Research Intern @ 智谱AI (Zhipu AI)

Sep 2024 – Present

专注图像生成模型的后训练优化工作。负责模型蒸馏与强化学习算法研发,提升模型性能和推理效率。同时参与基础设施搭建,包括训练pipeline优化、数据处理系统开发等,为模型研发提供高效稳定的技术支撑。 Focusing on post-training optimization for image generation models. Responsible for research and development of model distillation and reinforcement learning algorithms to enhance model performance and inference efficiency. Also participating in infrastructure construction, including training pipeline optimization and data processing system development, providing efficient and stable technical support for model research.

大模型提示词设计师,前端工程师 Large Model Prompt Designer, Front-End Engineer @ 科大讯飞星火探索营 (iFlytek Spark)

Aug 2023 – Nov 2023

参与讯飞星火大模型应用开发,设计高效prompt工程策略提升模型输出质量,并负责前端交互界面开发,优化用户体验。 Participated in iFlytek Spark LLM application development, designed effective prompt engineering strategies to improve model output quality, and developed interactive front-end interfaces to enhance user experience.

支教老师(英语、音乐、信息技术) Volunteer Teacher (English, Music, and IT) @ 研究生支教团 (Graduate Volunteer Teaching Group)

Aug 2022 – Jul 2023

作为研究生支教团成员,在贵州省丹寨县开展为期一年的支教服务。为贫困地区学生教授英语、音乐和信息技术课程,致力于缩小教育差距,帮助学生开拓视野。 As a member of the Graduate Volunteer Teaching Group, provided one-year teaching service in Danzhai, Guizhou Province. Taught English, music, and IT courses to underprivileged students, working to bridge the educational gap and broaden students' horizons.

发表论文 Publications

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding

张振兴, 等 Zhenxing Zhang, Yaxiong Wang*, Lechao Cheng, Zhun Zhong, Dan Guo*, Meng Wang
CVPR 2025 (IEEE/CVF Conference on Computer Vision and Pattern Recognition)
本文提出了ASAP框架,通过先进的语义对齐技术来促进多模态操纵检测和定位。该方法在图像操纵检测任务中取得了显著性能提升。 We propose the ASAP framework, which advances semantic alignment to promote multi-modal manipulation detection and grounding. Our method achieves significant performance improvements in image manipulation detection tasks.

KALEIDO: Open-Sourced Multi-Subject Reference Video Generation Model

张振兴, 等 Zhenxing Zhang, Jiayan Teng†, Zhuoyi Yang, Tiankun Cao, Cheng Wang, Xiaotao Gu, Jie Tang, Dan Guo, Meng Wang*
arXiv 2025
KALEIDO是一个开源的多主体参考视频生成模型,能够根据多个参考主体生成高质量的视频内容,为视频生成领域提供了新的基准。 KALEIDO is an open-sourced multi-subject reference video generation model capable of producing high-quality videos based on multiple reference subjects, establishing a new benchmark for video generation.

A Survey on Image-Text Cross-Modal Retrieval

张振兴, 王亚雄 Zhenxing Zhang, Yaxiong Wang
北京交通大学学报 (Journal of Beijing Jiaotong University) Journal of Beijing Jiaotong University
本文系统综述了图文跨模态检索领域的研究进展,涵盖了深度学习方法、注意力机制、Transformer架构等关键技术。对比分析了不同方法在常用数据集上的性能,并讨论了未来的研究方向和挑战。 This paper provides a comprehensive survey of image-text cross-modal retrieval, covering deep learning methods, attention mechanisms, and Transformer architectures. We comparatively analyze different methods on common datasets and discuss future research directions and challenges.

项目经历 Projects

GLM-Image

基于自回归范式的密集知识感知与高保真图像生成模型。该项目通过整合多模态知识,显著提升了生成图像的质量和语义准确性,在文本到图像生成任务中取得优异表现。 An autoregressive model for dense-knowledge-aware and high-fidelity image generation. By integrating multimodal knowledge, this project significantly enhances the quality and semantic accuracy of generated images, achieving state-of-the-art performance in text-to-image synthesis.
个人贡献: My Contributions: 模型蒸馏与RL强化 Model Distillation and Reinforcement Learning
查看项目 View Project -

RealVideo

创新的实时流式对话视频系统,采用自回归扩散技术,能够将文本对话实时转换为连续的高保真视频响应。该系统为AI视频交互应用开辟了新的可能性。 An innovative real-time streaming conversational video system that leverages autoregressive diffusion to transform text conversations into continuous, high-fidelity video responses. This system opens new possibilities for AI video interaction applications.
个人贡献: My Contributions: 训练数据数据构建 Training Dataset construction
查看项目 View Project -

荣誉奖项 Honors & Awards

联系方式 Contact

WeChat: QuantumScope WeChat: QuantumScope