DCML
Data-Centric Machine Learning Group at Harvard University
Harvard University
We study how data—not just models—shapes the behavior and reliability of AI systems. Our research develops foundational principles and methods for characterizing, transforming, and optimizing datasets to make learning more efficient, interpretable, and adaptive.
Our work integrates tools from optimal transport, information theory, and generative modeling, applying them across domains including scientific data, language, and vision.
Latest News
- Jan 2026
- Dec 2025
- Aug 2025
- Aug 2025
- Jul 2025
Research
Our group works on data-centric and trustworthy machine learning.
Dataset Characterization & Geometry
Understanding what makes data valuable for learning through geometry and optimal transport
Dataset Transformations & Training Dynamics
How data structure affects learning dynamics and model behavior during training
Dataset Optimization & Synthesis
Principled methods to reduce, enhance, and synthesize training data
Adaptive & Reconfigurable Models
Dynamically combining and adapting models based on constraints and objectives







