Mixing data, optimization and decision

The role of automatic differentiation

Guillaume Dalle

LVMT

2025-06-13

A little about me

📜 CV

  • 2019-2022: PhD at CERMICS
  • 2022: visiting student at MIT
  • 2023-2024: postdoc at EPFL
  • 2025-????: researcher here!

💕 Passions

  • By day: applied mathematics, computer science
  • By night: board games, songwriting, musicals

Motivation

Road traffic

MATSim model of Paris (https://matsim.org/gallery/paris/)

Train routing

Flatland railway simulator

Mohanty et al. (2020)

Challenges

📊 Calibration

Fitting model parameters to explain observed data.

🏃 Acceleration

Replacing complicated models with simpler surrogates.

🗺️ Decision-making

Using these models to inform industrial or political choices.

We focus on simpler models

They are easier to analyze and optimize.

Path problems with congestion

Shortest paths

Most transportation problems live on a graph \(G = (V, E)\):

  • \(V\) is a set of vertices
  • \(E\) is a set of (weighted) edges

Shortest path = sequence of edges \(a \to b\) with minimum cost.

Single agent Interacting agents
Easy Hard

Static traffic assignment (theory)

What if conflicting agents cause a slowdown?

\[ t_e(f_e) = t_e^0 \left[1 + \alpha \left(\frac{f_e}{c_e}\right)^\beta \right] \enspace \text{with} \enspace \begin{cases} \text{$t_e$: travel time} \\ \text{$f_e$: edge flow} \\ \text{$c_e$: capacity}\end{cases}\]

Everyone behaves selfishly: find a Nash equilibrium.

Static traffic assignment (example)

Xu et al. (2024)

Static traffic assignment (calibration)

  • Model parameters: free flow times, street capacities
  • Input data: network structure, travel demand
  • Observed data: measured flows or speeds

Goal: estimate model parameters from the data.

Idea

During calibration, the TA problem is a subroutine.

Static traffic assignment (calibration)

Xu et al. (2024)

Can we do better than manual fine-tuning?

Grid search does not scale.

Multi-agent pathfinding (theory)

What if conflicting agents are forbidden?

Everyone behaves selflessly: find a social optimum.

Wang et al. (2025)

Multi-agent pathfinding (example)

Stern et al. (2019)

Multi-agent pathfinding (acceleration)

MAPF is difficult, choice between optimality and speed?

\[ \xrightarrow[]{\text{Input}} \boxed{\text{Encoder}} \xrightarrow[]{\text{Guidance}} \boxed{\text{Fast solver}} \xrightarrow[]{\text{Solution}} \]

Goal: learn encoder parameters to guide the solver.

  • modified movement costs
  • custom priority ordering

Idea

During encoder learning, the MAPF problem is a subroutine.

Both models (decision-making)

🚥 Static traffic assignment

Change the rules:

  • Set tolls
  • Close streets

🗺️ Multi-agent pathfinding

Change the rules:

  • Cancel trips
  • Adapt infrastructure

Idea

When we evaluate policy changes, the original problem is a subroutine.

Machine learning primer

Supervised learning

Imagine we have a dataset of image-label pairs \((x_i, y_i)\):

\[ (🐱, \texttt{cat}) \quad (🐶, \texttt{dog}) \quad (🦆, \texttt{duck}) \]

We want to recognize animals using a function

\[ f_p : x \longmapsto y \]

This function has parameters \(p\) which need to be set.

Losses and gradients

Parameters are set by minimizing a loss function

\[ \ell(p) = \sum_i \lvert f_p(x_i) - y_i \rvert^2 \]

The gradient \(\nabla \ell(p)\) gives a direction where loss increases.

Taking small steps with \(p\) in the opposite direction = gradient descent.

Deep learning

Neural networks are a flexible family of parametric functions.

Subroutines called layers can be assembled arbitrarily.

Gradient descent works because individual layers are differentiable automatically.

Vaswani et al. (2017)

The meaning of differentiation

Derivatives allow fast sensitivity analysis:

  • ➡️ for a given input, how much does it affect every output?
  • ⬅️ for a given output, how much is it affected by every input?

No need to evaluate small changes in every possible direction

Optimization as a layer

The gist

A strange concept

What if we could compute derivatives of transportation problems automatically?

  • Calibration: fit a large model with gradient descent
  • Acceleration: learn an encoder to improve fast solvers
  • Decision-making: solve bi-level optimization problems

Machine learning 🤝 constraint satisfaction

Parametric optimization problems

A parametric optimization problem has the form

\[ p \quad \longmapsto \quad \min_v c(v, p) \enspace \text{subject to} \enspace v \in \mathcal{C} \]

\(v\) is the decision variable, \(c\) the cost, \(v \in \mathcal{C}\) the constraints.

TA MAPF
Variable \(v\) Traffic flows Agent paths
Parameter \(p\) Street capacities Movement costs

Theoretical issues

TA MAPF
Problem type Continuous Discrete
Derivative \(\frac{\partial v^*}{\partial p}\) Well-defined Ill-defined

Berthet et al. (2020)

Practical issues

👩‍🔬 Expertise

Can we reuse existing algorithms and make them differentiable?

💻 Hardware

Can we leverage modern parallel processors (GPUs) to speed things up?

📖 Open science

Can we do it all with open-source software and reproducible experiments?

Conclusion

Recent works

  • Solving large-scale transportation problems for logistics: Bouvier et al. (2023)
  • Unifying techniques for differentiable optimization layers: Dalle et al. (2022)
  • Developing software for automatic differentiation of complex programs: Dalle and Hill (2025), Hill and Dalle (2025), Montoison, Dalle, and Gebremedhin (2025)

Research perspectives

  • Methodological: combinatorial optimization, graph machine learning, automatic differentiation, game theory
  • Practical: large-scale transportation & logistics problems
  • Industrial partners: Renault, Michelin, Califrais, ART, SNCF

I want to know what you do!

Send me a paper you like at guillaume.dalle@enpc.fr

References

Berthet, Quentin, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, and Francis Bach. 2020. “Learning with Differentiable Perturbed Optimizers.” In Advances in Neural Information Processing Systems, 33:9508–19. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2020/hash/6bb56208f672af0dd65451f869fedfd9-Abstract.html.
Bouvier, Louis, Guillaume Dalle, Axel Parmentier, and Thibaut Vidal. 2023. “Solving a Continent-Scale Inventory Routing Problem at Renault.” Transportation Science, October. https://doi.org/10.1287/trsc.2022.0342.
Boyles, Stephen D., Nicholas E. Lownes, and Avinash Unnikrishnan. 2025. “Transportation Network Analysis, Volume I: Static and Dynamic Traffic Assignment.” arXiv. https://doi.org/10.48550/arXiv.2502.05182.
Dalle, Guillaume, Léo Baty, Louis Bouvier, and Axel Parmentier. 2022. “Learning with Combinatorial Optimization Layers: A Probabilistic Approach.” arXiv. https://doi.org/10.48550/arXiv.2207.13513.
Dalle, Guillaume, and Adrian Hill. 2025. “A Common Interface for Automatic Differentiation.” arXiv. https://doi.org/10.48550/arXiv.2505.05542.
Hill, Adrian, and Guillaume Dalle. 2025. “Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation.” Transactions on Machine Learning Research. https://openreview.net/forum?id=GtXSN52nIW.
Mandi, Jayanta, James Kotary, Senne Berden, Maxime Mulamba, Victor Bucarey, Tias Guns, and Ferdinando Fioretto. 2024. “Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities.” Journal of Artificial Intelligence Research 80 (August): 1623–1701. https://doi.org/10.1613/jair.1.15320.
Mohanty, Sharada, Erik Nygren, Florian Laurent, Manuel Schneider, Christian Scheller, Nilabha Bhattacharya, Jeremy Watson, et al. 2020. “Flatland-RL : Multi-Agent Reinforcement Learning on Trains.” http://arxiv.org/abs/2012.05893.
Montoison, Alexis, Guillaume Dalle, and Assefaw Gebremedhin. 2025. “Revisiting Sparse Matrix Coloring and Bicoloring.” arXiv. https://doi.org/10.48550/arXiv.2505.07308.
Scardapane, Simone. 2024. “Alice’s Adventures in a Differentiable WonderlandVolume I, A Tour of the Land.” arXiv. https://doi.org/10.48550/arXiv.2404.17625.
Stern, Roni, Nathan Sturtevant, Ariel Felner, Sven Koenig, Hang Ma, Thayne Walker, Jiaoyang Li, et al. 2019. “Multi-Agent Pathfinding: Definitions, Variants, and Benchmarks.” Proceedings of the International Symposium on Combinatorial Search 10 (1): 151–58. https://doi.org/10.1609/socs.v10i1.18510.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” In Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
Wang, Shiyue, Haozheng Xu, Yuhan Zhang, Jingran Lin, Changhong Lu, Xiangfeng Wang, and Wenhao Li. 2025. “Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding.” arXiv. https://doi.org/10.48550/arXiv.2505.19219.
Xu, Xiaotong, Zhenjie Zheng, Zijian Hu, Kairui Feng, and Wei Ma. 2024. “A Unified Dataset for the City-Scale Traffic Assignment Model in 20 U.S. Cities.” Scientific Data 11 (1): 325. https://doi.org/10.1038/s41597-024-03149-8.