Dataset Transformations & Training Dynamics

Overview

How does the structure of training data shape what models learn? We study the dynamics of learning from a data-centric perspective—understanding how datasets evolve under transformations and how their properties influence model behavior throughout training.

Using tools from optimal transport and dynamical systems, we analyze how data augmentation, filtering, and other transformations affect downstream performance, and how gradient flows can model dataset evolution.

Key Questions

How do common data transformations (augmentation, filtering, mixing) affect learning outcomes?
What role does data structure play in phenomena like grokking, phase transitions, and emergent capabilities?
Can we predict how changes to training data will affect model behavior?

Methods & Tools

Wasserstein Gradient Flows: Modeling dataset evolution as flows in probability space
Equilibrium Models: Deep equilibrium architectures for processing distributional inputs
Training Dynamics Analysis: Understanding learning trajectories through a data-centric lens

Dataset Transformations & Training Dynamics

Overview

Key Questions

Methods & Tools

Selected Publications