DCML

Data-Centric Machine Learning Group at Harvard University

Harvard University

We study how data—not just models—shapes the behavior and reliability of AI systems. Our research develops foundational principles and methods for characterizing, transforming, and optimizing datasets to make learning more efficient, interpretable, and adaptive.

Our work integrates tools from optimal transport, information theory, and generative modeling, applying them across domains including scientific data, language, and vision.