Research Journey
Lab Experience
Lab Introduction
Professor Peng Hu's laboratory focuses on multimodal AI technology research, committed to advancing AI technology applications in healthcare, education, transportation, and other fields. Through interdisciplinary research methods, the laboratory explores multimodal data fusion and understanding by combining deep learning, computer vision, and natural language processing technologies to solve complex real-world problems.
Multi-modal Learning Research
Peng Hu's Laboratory, Sichuan University
February 2024 – Present
Research Project
Project: Cross-modal Retrieval Research under Multi-label Noise
Project Background
Multi-label noise is a significant challenge in cross-modal retrieval tasks, particularly affecting large-scale dataset annotations. This noise can substantially impact model training effectiveness and retrieval performance. Our research focuses on developing robust methods to handle noisy labels in cross-modal retrieval systems.
Methodology
- Noise Model Design: Developed a noise generation method that maintains dataset label distribution stability while introducing controlled noise rates.
- Noise Transition Matrix: Implemented and optimized a noise transition matrix estimation approach for multi-label scenarios.
- Model Architecture: Analyzed and modified existing cross-modal retrieval models to enhance noise robustness.
Technical Implementation
- Developed noise generation algorithms considering label correlation and distribution balance.
- Implemented baseline models using PyTorch, including data preprocessing and model training pipelines.
- Created evaluation frameworks for measuring model performance under various noise rates (0.1-0.5).
Current Findings
- Observed significant performance degradation in baseline models under high noise rates (>0.3).
- Identified the effectiveness of soft-label contrastive loss in noise-resistant learning.
- Demonstrated the importance of maintaining label distribution stability in noise generation.
Technical Challenges
- Developing accurate noise transition matrix estimation methods for multi-label scenarios.
- Balancing model complexity with computational efficiency in noise-robust architectures.
- Maintaining retrieval performance while implementing noise-resistant mechanisms.
Technical Skills & Research Capabilities
Implementation & Development
• PyTorch implementation of cross-modal retrieval models • Custom dataset processing pipelines • Experiment tracking and visualization using TensorBoard
Data Analysis & Experimentation
• Statistical analysis of noise effects • Performance metrics evaluation • Experimental design and validation
Research Methodology
• Literature review and analysis • Problem formulation • Scientific documentation and reporting
Current Status
I am currently a junior undergraduate student at Sichuan University, participating in Professor Peng Hu's laboratory meetings since February 2024. While I'm still at the beginning of my research journey, this experience has been instrumental in shaping my understanding of academic research and future aspirations.
Why Research?
My journey in computer science began with a simple yet profound desire: to create something meaningful that could make a difference in the world. As I delved deeper into my studies, I found myself increasingly drawn to the uncertainty and challenges that research presents. The possibility of contributing to the advancement of knowledge, no matter how small, continues to drive my academic pursuits.
Current Learning & Exploration
Multimodal Learning
Generative AI
Embodied Intelligence
Human-Computer Interaction
Current Work
- Conducting comprehensive experiments on NUS-WIDE and MS-COCO datasets with various noise rates.
- Implementing and evaluating different noise-robust training strategies.
- Developing improved methods for noise transition matrix estimation in multi-label scenarios.
Future Research Directions
- Exploring self-supervised learning approaches for noise-robust cross-modal retrieval.
- Investigating the integration of pre-trained vision-language models for improved robustness.
- Developing theoretical frameworks for understanding multi-label noise in cross-modal scenarios.