Zhenxing Zhang - Academic Homepage / 张振兴

关于我 About Me

我是一名对人工智能和图像生成充满热情的研究者。目前在智谱AI实习，专注于图像生成模型的后训练优化工作，包括模型蒸馏、强化学习以及基础设施搭建。我致力于通过技术创新提升生成模型的性能和效率，推动图像生成领域的发展。 I am a passionate researcher in artificial intelligence and image generation. Currently working at Zhipu AI, I focus on post-training optimization for image generation models, including model distillation, reinforcement learning, and infrastructure development. My goal is to enhance the performance and efficiency of generative models through technical innovation, advancing the field of image generation.

工作经历 Experience

研究实习生 Research Intern @ 智谱AI (Zhipu AI)

Sep 2024 – Present

专注图像生成模型的后训练优化工作。负责模型蒸馏与强化学习算法研发，提升模型性能和推理效率。同时参与基础设施搭建，包括训练pipeline优化、数据处理系统开发等，为模型研发提供高效稳定的技术支撑。 Focusing on post-training optimization for image generation models. Responsible for research and development of model distillation and reinforcement learning algorithms to enhance model performance and inference efficiency. Also participating in infrastructure construction, including training pipeline optimization and data processing system development, providing efficient and stable technical support for model research.

大模型提示词设计师，前端工程师 Large Model Prompt Designer, Front-End Engineer @ 科大讯飞星火探索营 (iFlytek Spark)

Aug 2023 – Nov 2023

参与讯飞星火大模型应用开发，设计高效prompt工程策略提升模型输出质量，并负责前端交互界面开发，优化用户体验。 Participated in iFlytek Spark LLM application development, designed effective prompt engineering strategies to improve model output quality, and developed interactive front-end interfaces to enhance user experience.

支教老师（英语、音乐、信息技术） Volunteer Teacher (English, Music, and IT) @ 研究生支教团 (Graduate Volunteer Teaching Group)

Aug 2022 – Jul 2023

作为研究生支教团成员，在贵州省丹寨县开展为期一年的支教服务。为贫困地区学生教授英语、音乐和信息技术课程，致力于缩小教育差距，帮助学生开拓视野。 As a member of the Graduate Volunteer Teaching Group, provided one-year teaching service in Danzhai, Guizhou Province. Taught English, music, and IT courses to underprivileged students, working to bridge the educational gap and broaden students' horizons.

发表论文 Publications

ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding

张振兴, 等 Zhenxing Zhang, Yaxiong Wang*, Lechao Cheng, Zhun Zhong, Dan Guo*, Meng Wang

CVPR 2025 (IEEE/CVF Conference on Computer Vision and Pattern Recognition)

本文提出了ASAP框架，通过先进的语义对齐技术来促进多模态操纵检测和定位。该方法在图像操纵检测任务中取得了显著性能提升。 We propose the ASAP framework, which advances semantic alignment to promote multi-modal manipulation detection and grounding. Our method achieves significant performance improvements in image manipulation detection tasks.

Paper Code -

KALEIDO: Open-Sourced Multi-Subject Reference Video Generation Model

张振兴, 等 Zhenxing Zhang, Jiayan Teng†, Zhuoyi Yang, Tiankun Cao, Cheng Wang, Xiaotao Gu, Jie Tang, Dan Guo, Meng Wang*

arXiv 2025

KALEIDO是一个开源的多主体参考视频生成模型，能够根据多个参考主体生成高质量的视频内容，为视频生成领域提供了新的基准。 KALEIDO is an open-sourced multi-subject reference video generation model capable of producing high-quality videos based on multiple reference subjects, establishing a new benchmark for video generation.

Paper Code - Demo

A Survey on Image-Text Cross-Modal Retrieval

张振兴, 王亚雄 Zhenxing Zhang, Yaxiong Wang

北京交通大学学报 (Journal of Beijing Jiaotong University) Journal of Beijing Jiaotong University

本文系统综述了图文跨模态检索领域的研究进展，涵盖了深度学习方法、注意力机制、Transformer架构等关键技术。对比分析了不同方法在常用数据集上的性能，并讨论了未来的研究方向和挑战。 This paper provides a comprehensive survey of image-text cross-modal retrieval, covering deep learning methods, attention mechanisms, and Transformer architectures. We comparatively analyze different methods on common datasets and discuss future research directions and challenges.

Paper

项目经历 Projects

GLM-Image

基于自回归范式的密集知识感知与高保真图像生成模型。该项目通过整合多模态知识，显著提升了生成图像的质量和语义准确性，在文本到图像生成任务中取得优异表现。 An autoregressive model for dense-knowledge-aware and high-fidelity image generation. By integrating multimodal knowledge, this project significantly enhances the quality and semantic accuracy of generated images, achieving state-of-the-art performance in text-to-image synthesis.

个人贡献： My Contributions: 模型蒸馏与RL强化 Model Distillation and Reinforcement Learning

查看项目 View Project -

RealVideo

创新的实时流式对话视频系统，采用自回归扩散技术，能够将文本对话实时转换为连续的高保真视频响应。该系统为AI视频交互应用开辟了新的可能性。 An innovative real-time streaming conversational video system that leverages autoregressive diffusion to transform text conversations into continuous, high-fidelity video responses. This system opens new possibilities for AI video interaction applications.

个人贡献： My Contributions: 训练数据数据构建 Training Dataset construction

查看项目 View Project -

荣誉奖项 Honors & Awards

国家励志奖学金 National Endeavor Scholarship
国家级奖学金，表彰品学兼优的贫困学生 National scholarship for excellent students with financial need
本科优秀学生奖学金、硕士学业奖学金、博士学业奖学金 Undergraduate Excellence Scholarship, Master's Academic Scholarship, PhD Academic Scholarship
合肥工业大学 Hefei University of Technology
三七互娱奖学金 37 Interactive Entertainment Scholarship
企业专项奖学金，表彰在科技创新方面表现突出的学生 Corporate scholarship recognizing students with outstanding performance in technological innovation
优秀毕业生 Outstanding Graduate
合肥工业大学 Hefei University of Technology
合肥工业大学十佳共产党员 Top Ten CPC Members, HFUT
表彰在思想、学习、工作各方面表现突出的党员 Recognizing CPC members with outstanding performance in ideology, academics, and work
优秀三好学生、优秀学生干部 Excellent Merit Student, Outstanding Student Cadre
连续多年获得校级荣誉称号 University-level honors awarded for multiple consecutive years
科大讯飞星火训练营优秀营员、星火三等奖 Excellent Participant, iFlytek Spark Training Camp; Spark Third Prize
在大模型应用开发项目中表现优异 Excellent performance in LLM application development projects

联系方式 Contact

Crilias.zzx@gmail.com

WeChat: QuantumScope WeChat: QuantumScope

github.com/CriliasMiller

Google Scholar

张振兴 Zhenxing Zhang