The role of automatic differentiation
LVMT
2025-06-13
Slides available at gdalle.github.io/LVMTSeminaire2025
MATSim model of Paris (https://matsim.org/gallery/paris/)
Flatland railway simulator
Mohanty et al. (2020)
Fitting model parameters to explain observed data.
Replacing complicated models with simpler surrogates.
Using these models to inform industrial or political choices.
We focus on simpler models
They are easier to analyze and optimize.
Most transportation problems live on a graph \(G = (V, E)\):
Shortest path = sequence of edges \(a \to b\) with minimum cost.
Single agent | Interacting agents |
---|---|
Easy | Hard |
What if conflicting agents cause a slowdown?
\[ t_e(f_e) = t_e^0 \left[1 + \alpha \left(\frac{f_e}{c_e}\right)^\beta \right] \enspace \text{with} \enspace \begin{cases} \text{$t_e$: travel time} \\ \text{$f_e$: edge flow} \\ \text{$c_e$: capacity}\end{cases}\]
Everyone behaves selfishly: find a Nash equilibrium.
Xu et al. (2024)
Goal: estimate model parameters from the data.
Idea
During calibration, the TA problem is a subroutine.
Can we do better than manual fine-tuning?
Grid search does not scale.
What if conflicting agents are forbidden?
Everyone behaves selflessly: find a social optimum.
MAPF is difficult, choice between optimality and speed?
\[ \xrightarrow[]{\text{Input}} \boxed{\text{Encoder}} \xrightarrow[]{\text{Guidance}} \boxed{\text{Fast solver}} \xrightarrow[]{\text{Solution}} \]
Goal: learn encoder parameters to guide the solver.
Idea
During encoder learning, the MAPF problem is a subroutine.
Change the rules:
Change the rules:
Idea
When we evaluate policy changes, the original problem is a subroutine.
Imagine we have a dataset of image-label pairs \((x_i, y_i)\):
\[ (🐱, \texttt{cat}) \quad (🐶, \texttt{dog}) \quad (🦆, \texttt{duck}) \]
We want to recognize animals using a function
\[ f_p : x \longmapsto y \]
This function has parameters \(p\) which need to be set.
Parameters are set by minimizing a loss function
\[ \ell(p) = \sum_i \lvert f_p(x_i) - y_i \rvert^2 \]
The gradient \(\nabla \ell(p)\) gives a direction where loss increases.
Taking small steps with \(p\) in the opposite direction = gradient descent.
Neural networks are a flexible family of parametric functions.
Subroutines called layers can be assembled arbitrarily.
Gradient descent works because individual layers are differentiable automatically.
Derivatives allow fast sensitivity analysis:
No need to evaluate small changes in every possible direction
A strange concept
What if we could compute derivatives of transportation problems automatically?
Machine learning 🤝 constraint satisfaction
A parametric optimization problem has the form
\[ p \quad \longmapsto \quad \min_v c(v, p) \enspace \text{subject to} \enspace v \in \mathcal{C} \]
\(v\) is the decision variable, \(c\) the cost, \(v \in \mathcal{C}\) the constraints.
TA | MAPF | |
---|---|---|
Variable \(v\) | Traffic flows | Agent paths |
Parameter \(p\) | Street capacities | Movement costs |
TA | MAPF | |
---|---|---|
Problem type | Continuous | Discrete |
Derivative \(\frac{\partial v^*}{\partial p}\) | Well-defined | Ill-defined |
Can we reuse existing algorithms and make them differentiable?
Can we leverage modern parallel processors (GPUs) to speed things up?
Can we do it all with open-source software and reproducible experiments?
I want to know what you do!
Send me a paper you like at guillaume.dalle@enpc.fr