Publications | Guillaume Dalle

2025

Preprint

A Common Interface for Automatic Differentiation

Guillaume Dalle, and Adrian Hill

May 2025

Abs DOI Code

For scientific machine learning tasks with a lot of custom code, picking the right Automatic Differentiation (AD) system matters. Our Julia package DifferentiationInterface.jl provides a common frontend to a dozen AD backends, unlocking easy comparison and modular development. In particular, its built-in preparation mechanism leverages the strengths of each backend by amortizing one-time computations. This is key to enabling sophisticated features like sparsity handling without putting additional burdens on the user.
Journal

Minimax Estimation of Partially-Observed Vector Autoregressions

Guillaume Dalle, and Yohann De Castro

Electronic Journal of Statistics, Jan 2025

Abs DOI

Multi-dimensional time series are a core ingredient of the statistical modeling toolkit, for which numerous estimation methods are known. But when observations are scarce or corrupted, the learning task becomes much harder. The question is: how much harder? In this paper, we study the properties of a partially-observed Vector AutoRegressive process, which is a state-space model endowed with a stochastic observation mechanism. Our goal is to estimate its transition matrix, but we only have access to a small and noisy subsample of the state components. We start by describing an estimator based on the Yule-Walker equation and the Dantzig selector, and we give an upper bound on its non-asymptotic error. Then, we provide a matching minimax lower bound, thus proving near-optimality of our estimator. The convergence rate we obtain sheds light on the role of several key parameters such as the sampling ratio, the amount of noise and the number of non-zero coefficients in the transition matrix, which can be dense in low dimension but is usually sparse in high dimension. These theoretical findings are commented and illustrated by numerical experiments on simulated data.
Blog post

An Illustrated Guide to Automatic Sparse Differentiation

Adrian Hill, Guillaume Dalle, and Alexis Montoison

In ICLR Blogposts 2025, Apr 2025

Abs Blog

In numerous applications of machine learning, Hessians and Jacobians exhibit sparsity, a property that can be leveraged to vastly accelerate their computation. While the usage of automatic differentiation in machine learning is ubiquitous, automatic sparse differentiation (ASD) remains largely unknown. This post introduces ASD, explaining its key components and their roles in the computation of both sparse Jacobians and Hessians. We conclude with a practical demonstration showcasing the performance benefits of ASD.
Journal

Sparser, Better, Faster, Stronger: Sparsity Detection for Efficient Automatic Differentiation

Adrian Hill, and Guillaume Dalle

Transactions on Machine Learning Research, Apr 2025

Abs PDF Code

From implicit differentiation to probabilistic modeling, Jacobian and Hessian matrices have many potential use cases in Machine Learning (ML), but they are viewed as computationally prohibitive. Fortunately, these matrices often exhibit sparsity, which can be leveraged to speed up the process of Automatic Differentiation (AD). This paper presents advances in sparsity detection, previously the performance bottleneck of Automatic Sparse Differentiation (ASD). Our implementation of sparsity detection is based on operator overloading, able to detect both local and global sparsity patterns, and supports flexible index set representations. It is fully automatic and requires no modification of user code, making it compatible with existing ML codebases. Most importantly, it is highly performant, unlocking Jacobians and Hessians at scales where they were considered too expensive to compute. On real-world problems from scientific ML, graph neural networks and optimization, we show significant speed-ups of up to three orders of magnitude. Notably, using our sparsity detection system, ASD outperforms standard AD for one-off computations, without amortization of either sparsity detection or matrix coloring.
Preprint

Revisiting Sparse Matrix Coloring and Bicoloring

Alexis Montoison, Guillaume Dalle, and Assefaw Gebremedhin

May 2025

Abs DOI Code

Sparse matrix coloring and bicoloring are fundamental building blocks of sparse automatic differentiation. Bicoloring is particularly advantageous for rectangular Jacobian matrices with at least one dense row and column. Indeed, in such cases, unidirectional row or column coloring demands a number of colors equal to the number of rows or columns. We introduce a new strategy for bicoloring that encompasses both direct and substitution-based decompression approaches. Our method reformulates the two variants of bicoloring as star and acyclic colorings of an augmented symmetric matrix. We extend the concept of neutral colors, previously exclusive to bicoloring, to symmetric colorings, and we propose a post-processing routine that neutralizes colors to further reduce the overall color count. We also present the Julia package SparseMatrixColorings, which includes these new bicoloring algorithms alongside all standard coloring methods for sparse derivative matrix computation. Compared to ColPack, the Julia package also offers enhanced implementations for star and acyclic coloring, vertex ordering, as well as decompression.

2024

Proceedings

Analysis of Bootstrap and Subsampling in High-dimensional Regularized Regression

Lucas Clarté, Adrien Vandenbroucque, Guillaume Dalle, and 3 more authors

In The 40th Conference on Uncertainty in Artificial Intelligence, Jun 2024

Abs PDF

We investigate popular resampling methods for estimating the uncertainty of statistical models, such as subsampling, bootstrap and the jackknife, and their performance in high-dimensional supervised regression tasks. We provide a tight asymptotic description of the biases and variances estimated by these methods in the context of generalized linear models, such as ridge and logistic regression, taking the limit where the number of samples \n and dimension \d of the covariates grow at a comparable rate: {}alpha=n/d fixed. Our findings are three-fold: i) resampling methods are fraught with problems in high dimensions and exhibit the double-descent-like behavior typical of these situations; ii) only when {}alpha is large enough do they provide consistent and reliable error estimations (we give convergence rates); iii) in the over-parametrized regime {}alpha<1 relevant to modern machine learning practice, their predictions are not consistent, even with optimal regularization.
Journal

HiddenMarkovModels.Jl: Generic, Fast and Reliable State Space Modeling

Guillaume Dalle

Journal of Open Source Software, Apr 2024

Abs DOI Code

Dalle, G., (2024). HiddenMarkovModels.jl: generic, fast and reliable state space modeling. Journal of Open Source Software, 9(96), 6436, https://doi.org/10.21105/joss.06436
Proceedings

Optimal Performance of Graph Convolutional Networks on the Contextual Stochastic Block Model

Guillaume Dalle, and Patrick Thiran

In The Third Learning on Graphs Conference, Nov 2024

Abs PDF

For Graph Neural Networks, oversmoothing denotes the homogenization of vertex embeddings as the number of layers increases. To better understand this phenomenon, we study community detection with a linearized Graph Convolutional Network on the Contextual Stochastic Block Model. We express the distribution of the embeddings in each community as a Gaussian mixture over a low-dimensional latent space, with explicit formulas in the case of a single layer. This yields tractable estimators for classification accuracy at finite depth. Numerical experiments suggest that modeling with a single Gaussian is insufficient and that the impact of depth may be more complex than previously anticipated.

2023

Journal

Solving a Continent-Scale Inventory Routing Problem at Renault

Louis Bouvier, Guillaume Dalle, Axel Parmentier, and 1 more author

Transportation Science, Oct 2023

Abs DOI PDF Code

This paper is the fruit of a partnership with Renault. Their reverse logistic requires solving a continent-scale multiattribute inventory routing problem (IRP). With an average of 30 commodities, 16 depots, and 600 customers spread across a continent, our instances are orders of magnitude larger than those in the literature. Existing algorithms do not scale, so we propose a large neighborhood search (LNS). To make it work, (1) we generalize existing split delivery vehicle routing problems and IRP neighborhoods to this context, (2) we turn a state-of-the-art matheuristic for medium-scale IRP into a large neighborhood, and (3) we introduce two novel perturbations: the reinsertion of a customer and that of a commodity into the IRP solution. We also derive a new lower bound based on a flow relaxation. In order to stimulate the research on large-scale IRP, we introduce a library of industrial instances. We benchmark our algorithms on these instances and make our code open source. Extensive numerical experiments highlight the relevance of each component of our LNS. Funding: This work was supported by Renault Group. Supplemental Material: The online appendix is available at https://doi.org/10.1287/trsc.2022.0342.

2022

Preprint

Learning with Combinatorial Optimization Layers: A Probabilistic Approach

Guillaume Dalle, Léo Baty, Louis Bouvier, and 1 more author

Dec 2022

Abs DOI Code

Combinatorial optimization (CO) layers in machine learning (ML) pipelines are a powerful tool to tackle data-driven decision tasks, but they come with two main challenges. First, the solution of a CO problem often behaves as a piecewise constant function of its objective parameters. Given that ML pipelines are typically trained using stochastic gradient descent, the absence of slope information is very detrimental. Second, standard ML losses do not work well in combinatorial settings. A growing body of research addresses these challenges through diverse methods. Unfortunately, the lack of well-maintained implementations slows down the adoption of CO layers. In this paper, building upon previous works, we introduce a probabilistic perspective on CO layers, which lends itself naturally to approximate differentiation and the construction of structured losses. We recover many approaches from the literature as special cases, and we also derive new ones. Based on this unifying perspective, we present InferOpt.jl, an open-source Julia package that 1) allows turning any CO oracle with a linear objective into a differentiable layer, and 2) defines adequate losses to train pipelines containing such layers. Our library works with arbitrary optimization algorithms, and it is fully compatible with Julia’s ML ecosystem. We demonstrate its abilities using a pathfinding problem on video game maps as guiding example, as well as three other applications from operations research.
Thesis

Machine Learning and Combinatorial Optimization Algorithms, with Applications to Railway Planning

Guillaume Dalle

École des Ponts ParisTech, Dec 2022

Abs PDF

This thesis investigates the frontier between machine learning and combinatorial optimization, two active areas of applied mathematics research. We combine theoretical insights with efficient algorithms, and develop several open source Julia libraries. Inspired by a collaboration with the Société nationale des chemins de fer français (SNCF), we study high-impact use cases from the railway world: train failure prediction, delay propagation, and track allocation. In Part I, we provide mathematical background and describe software implementations for various tools that will be needed later on: implicit differentiation, temporal point processes, Hidden Markov Models and Multi-Agent Path Finding. Our publicly available code fills a void in the Julia package ecosystem, aiming at ease of use without compromising on performance. In Part II, we highlight theoretical contributions related to both statistics and decision-making. We consider a Vector AutoRegressive process with partial observations, and prove matching upper and lower bounds on the estimation error. We unify and extend the state of the art for combinatorial optimization layers in deep learning, gathering various approaches in a Julia library called InferOpt.jl. We also seek to differentiate through multi-objective optimization layers, which leads to a novel theory of lexicographic convex analysis. In Part III, these mathematical and algorithmic foundations come together to tackle railway problems. We design a hierarchical model of train failures, propose a graph-based framework for delay propagation, and suggest new avenues for track allocation, with the Flatland challenge as a testing ground.

2020

Patent

Method for Determining a Soiling Speed of a Photovoltaic Generation Unit

Pierre Stephan, and Guillaume Dalle

Jun 2020

Abs Website

The invention relates to a method for determining a soiling speed of a photovoltaic generation unit, in which, on the basis of values of an electrical variable generated by the photovoltaic generation unit at a plurality of moments in a time series and corresponding values of meteorological parameters, and on the basis of a relationship between an electrical variable generated by said generation unit at one moment, the values taken with the meteorological parameters at the same moment, and the occurrence of cleaning events, wherein said relationship comprises multiple relational parameters including the speed of soiling and the occurrence of a cleaning event is modelled by a probability law involving a relational parameter, the soiling speed is determined in iterations in which vectors that are representative of the occurrence of cleaning events are simulated.