How can we ensure that a neural network returns a structured object, like a path in a graph? One option is to create a custom layer that solves a combinatorial optimization problem, and then insert it into the network. Unfortunately, backpropagation in this setting is far from trivial.This talk introduces a probabilistic approach for approximate differentiation of combinatorial optimization layers. The core idea is to spread a single solution into a distribution on the solution space, thus smoothing the input-output mapping. An implementation is provided in the state-of-the-art Julia package called InferOpt.jl.
@unpublished{dalleEverySolutionEverywhere2023,type={Workshop talk},title={Every Solution, Everywhere, All at Once: Turning Optimization Solvers into Probability Distributions},author={Dalle, Guillaume},date={2023-04-26},url={https://mathplus.de/topic-development-lab/tes-summer-2023/workshop/},eventtitle={Workshop "{{Exploring}} Synergies: {{Machine Learning Meets Physics}} \& {{Optimization}}"},venue={{Zuse Institute, Berlin}},keywords={talk},}
Talk
Every Solution, Everywhere, All at Once: Turning Optimization Solvers into Probability Distributions
@unpublished{dalleEverySolutionEverywhere2023a,type={Seminar},title={Every Solution, Everywhere, All at Once: Turning Optimization Solvers into Probability Distributions},author={Dalle, Guillaume},date={2023-03-30},venue={{Swiss Data Science Center, Lausanne}},keywords={talk},}
Talk
What Is the Gradient of a Linear Program? Automatic Differentiation on a Polytope
Solving combinatorial optimization problems is a routine ingredient of many industrial processes. While such problems are often intractable, it is possible to learn from past instances in order to obtain better solutions with less computation time. This is the topic of our present contribution, see our preprint for more details. We first justify the practical interest of such hybrid approaches, before focusing on a key technical hurdle: gradient computation for combinatorial solvers. Overcoming this hurdle requires a novel mix of several mathematical ideas, which we implemented in an easy-to-use Julia package.
@unpublished{dalleWhatGradientLinear2023,type={Conference talk},title={What Is the Gradient of a {{Linear Program}}? {{Automatic}} Differentiation on a Polytope},author={Dalle, Guillaume},date={2023-06-30},url={https://jpoc13.sciencesconf.org/},eventtitle={Journées {{Polyèdres}} et {{Optimisation Combinatoire}}},langid={english},venue={{Clermont-Ferrand}},keywords={⛔ No DOI found,talk},}
2022
Talk
Large Neighborhood Search and Structured Prediction for the Inventory Routing Problem
We consider a large-scale multi-depot multi-commodity Inventory Routing Problem (IRP) to model the packaging return logistics of a major European car manufacturer. No algorithm is known to properly scale to our context. We propose a Large Neighborhood Search (LNS) based on common routing neighborhoods and two new ones: the reinsertion of a customer and a commodity in the IRP solution. We also try to bypass the heavy computations of the LNS leveraging recent ideas in Machine Learning for Operations Research in structured prediction.
@unpublished{bouvierLargeNeighborhoodSearch2022,type={Conference talk},title={Large {{Neighborhood Search}} and {{Structured Prediction}} for the {{Inventory Routing Problem}}},author={Bouvier, Louis and Dalle, Guillaume and Parmentier, Axel},date={2022-02},urldate={2022-03-04},eventtitle={23ème {{Congrès Annuel}} de {{La Société Française}} de {{Recherche Opérationnelle}} et d’{{Aide}} à {{La Décision}}},langid={english},venue={{Villeurbanne - Lyon, France}},keywords={talk},}
Preprint
Solving a Continent-Scale Inventory Routing Problem at Renault
This paper is the fruit of a partnership with Renault. Their backward logistic requires to solve a continent-scale multi-attribute inventory routing problem (IRP). With an average of 30 commodities, 16 depots, and 600 customers spread across a continent, our instances are orders of magnitude larger than those in the literature. Existing algorithms do not scale. We propose a large neighborhood search (LNS). To make it work, (1) we generalize existing split delivery vehicle routing problem and IRP neighborhoods to this context, (2) we turn a state-of-the art matheuristic for medium-scale IRP into a large neighborhood, and (3) we introduce two novel perturbations: the reinsertion of a customer and that of a commodity into the IRP solution. We also derive a new lower bound based on a flow relaxation. In order to stimulate the research on large-scale IRP, we introduce a library of industrial instances. We benchmark our algorithms on these instances and make our code open-source. Extensive numerical experiments highlight the relevance of each component of our LNS.
@online{bouvierSolvingContinentScaleInventory2022,title={Solving a {{Continent-Scale Inventory Routing Problem}} at {{Renault}}},author={Bouvier, Louis and Dalle, Guillaume and Parmentier, Axel and Vidal, Thibaut},date={2022-10-10},eprint={2209.00412},eprinttype={arxiv},eprintclass={math},doi={10.48550/arXiv.2209.00412},url={http://arxiv.org/abs/2209.00412},urldate={2023-01-26},pubstate={preprint},note={Submitted to Transportation Science}}
We present a Julia package for differentiating through functions that are defined implicitly. It can be used to compute derivatives for a wide array of "black box" procedures, from optimization algorithms to fixed point iterations or systems of nonlinear equations. Since it mostly relies on defining custom chain rules, our code is lightweight and integrates nicely with Julia’s automatic differentiation and machine learning ecosystem.
@unpublished{dalleImplicitDifferentiationJlDifferentiating2022,type={Conference talk},title={{{ImplicitDifferentiation}}{{.jl}}: Differentiating Implicit Functions},author={Dalle, Guillaume and Tarek, Mohamed},date={2022-07-29},url={https://pretalx.com/juliacon-2022/talk/DTHTBC/},eventtitle={{{JuliaCon}}},langid={english},keywords={\#nosource,talk}}
Talk
InferOpt.jl: Combinatorial Optimization in ML Pipelines
We present InferOpt.jl, a generic package for combining combinatorial optimization algorithms with machine learning models. It has two purposes: Increasing the expressivity of learning models thanks to new types of structured layers. Increasing the efficiency of optimization algorithms thanks to an additional inference step. Our library provides wrappers for several state-of-the-art methods in order to make them compatible with Julia’s automatic differentiation ecosystem.
@unpublished{dalleInferOptJlCombinatorial2022,type={Conference talk},title={{{InferOpt}}{{.jl}}: Combinatorial Optimization in {{ML}} Pipelines},author={Dalle, Guillaume and Bouvier, Louis and Baty, Léo},date={2022-07-29},url={https://pretalx.com/juliacon-2022/talk/P7XJCV/},eventtitle={{{JuliaCon}}},langid={english},keywords={\#nosource,talk}}
Preprint
Learning with Combinatorial Optimization Layers: A Probabilistic Approach
Combinatorial optimization (CO) layers in machine learning (ML) pipelines are a powerful tool to tackle data-driven decision tasks, but they come with two main challenges. First, the solution of a CO problem often behaves as a piecewise constant function of its objective parameters. Given that ML pipelines are typically trained using stochastic gradient descent, the absence of slope information is very detrimental. Second, standard ML losses do not work well in combinatorial settings. A growing body of research addresses these challenges through diverse methods. Unfortunately, the lack of well-maintained implementations slows down the adoption of CO layers. In this paper, building upon previous works, we introduce a probabilistic perspective on CO layers, which lends itself naturally to approximate differentiation and the construction of structured losses. We recover many approaches from the literature as special cases, and we also derive new ones. Based on this unifying perspective, we present InferOpt.jl, an open-source Julia package that 1) allows turning any CO oracle with a linear objective into a differentiable layer, and 2) defines adequate losses to train pipelines containing such layers. Our library works with arbitrary optimization algorithms, and it is fully compatible with Julia’s ML ecosystem. We demonstrate its abilities using a pathfinding problem on video game maps as guiding example, as well as three other applications from operations research.
@online{dalleLearningCombinatorialOptimization2022,title={Learning with {{Combinatorial Optimization Layers}}: A {{Probabilistic Approach}}},shorttitle={Learning with {{Combinatorial Optimization Layers}}},author={Dalle, Guillaume and Baty, Léo and Bouvier, Louis and Parmentier, Axel},date={2022-12-03},eprint={2207.13513},eprinttype={arxiv},eprintclass={cs, math, stat},doi={10.48550/arXiv.2207.13513},url={http://arxiv.org/abs/2207.13513},urldate={2023-01-26},pubstate={preprint},note={Submitted to the Journal of Machine Learning Research}}
Talk
Learning with Combinatorial Optimization Layers: A Probabilistic Approach
This thesis investigates the frontier between machine learning and combinatorial optimization, two active areas of applied mathematics research. We combine theoretical insights with efficient algorithms, and develop several open source Julia libraries. Inspired by a collaboration with the Société nationale des chemins de fer français (SNCF), we study high-impact use cases from the railway world: train failure prediction, delay propagation, and track allocation.In Part I, we provide mathematical background and describe software implementations for various tools that will be needed later on: implicit differentiation, temporal point processes, Hidden Markov Models and Multi-Agent Path Finding. Our publicly-available code fills a void in the Julia package ecosystem, aiming at ease of use without compromising on performance.In Part II, we highlight theoretical contributions related to both statistics and decision-making. We consider a Vector AutoRegressive process with partial observations, and prove matching upper and lower bounds on the estimation error. We unify and extend the state of the art for combinatorial optimization layers in deep learning, gathering various approaches in a Julia library called InferOpt.jl. We also seek to differentiate through multi-objective optimization layers, which leads to a novel theory of lexicographic convex analysis.In Part III, these mathematical and algorithmic foundations come together to tackle railway problems. We design a hierarchical model of train failures, propose a graph-based framework for delay propagation, and suggest new avenues for track allocation, with the Flatland challenge as a testing ground.
@thesis{dalleMachineLearningCombinatorial2022,type={phdthesis},title={Machine Learning and Combinatorial Optimization Algorithms, with Applications to Railway Planning},author={Dalle, Guillaume},editora={Meunier, Frédéric and De Castro, Yohann and Parmentier, Axel},editoratype={collaborator},date={2022-12-16},institution={{École des Ponts ParisTech}},url={https://www.theses.fr/2022ENPC0047},urldate={2023-03-31},langid={english},}
Preprint
Minimax Estimation of Partially-Observed Vector AutoRegressions
High-dimensional time series are a core ingredient of the statistical modeling toolkit, for which numerous estimation methods are known.But when observations are scarce or corrupted, the learning task becomes much harder.The question is: how much harder? In this paper, we study the properties of a partially-observed Vector AutoRegressive process, which is a state-space model endowed with a stochastic observation mechanism.Our goal is to estimate its sparse transition matrix, but we only have access to a small and noisy subsample of the state components.Interestingly, the sampling process itself is random and can exhibit temporal correlations, a feature shared by many realistic data acquisition scenarios.We start by describing an estimator based on the Yule-Walker equation and the Dantzig selector, and we give an upper bound on its non-asymptotic error.Then, we provide a matching minimax lower bound, thus proving near-optimality of our estimator.The convergence rate we obtain sheds light on the role of several key parameters such as the sampling ratio, the amount of noise and the number of non-zero coefficients in the transition matrix.These theoretical findings are commented and illustrated by numerical experiments on simulated data.
@online{dalleMinimaxEstimationPartiallyObserved2022,title={Minimax {{Estimation}} of {{Partially-Observed Vector AutoRegressions}}},author={Dalle, Guillaume and De Castro, Yohann},date={2022-05-05},eprint={2106.09327},eprinttype={arxiv},eprintclass={eess, math, stat},doi={10.48550/arXiv.2106.09327},url={http://arxiv.org/abs/2106.09327},urldate={2023-01-26},pubstate={preprint},note={Submitted to the Electronic Journal of Statistics}}
Talk
Recherche d’itinéraires dans un réseau ferroviaire : apprendre à mieux optimiser
Dans tout réseau ferroviaire de grande taille, une gestion efficace du trafic est essentielle pour maximiser la ponctualité et réagir aux incidents. Le défi principal est de construire puis d’ajuster l’itinéraire de chaque train en évitant les conflits d’infrastructure : sur des rails, difficile de se dépasser ou de se croiser. Ce problème, appelé "Multi-Agent Path Finding", est NP-dur, si bien que pour les instances de grande taille, seules les méthodes heuristiques génèrent des solutions en un temps raisonnable. L’une d’entre elles, appelée "Prioritized Planning", ordonne les trains au préalable, puis calcule leurs trajets de façon séquentielle grâce à un algorithme de plus court chemin (typiquement A*). La performance de cette heuristique dépend grandement de la permutation choisie, c’est pourquoi nous en proposons une version data-driven, inspirée des travaux d’Axel Parmentier. En effet, les grilles horaires varient peu d’un jour à l’autre, et l’on peut donc s’appuyer sur l’historique des instances passées (et de leurs solutions) afin de mieux résoudre celles du futur. La phase d’apprentissage consiste à estimer les paramètres de l’encodeur, qui combine des features d’intérêt pour associer un score d’importance à chaque train. Cependant, ni le tri ni le Prioritized Planning ne sont des opérations différentiables, ce qui exclut l’utilisation de méthodes de gradient lors de l’entraînement. Pour pallier cette difficulté, nous interprétons le tri comme un Programme Linéaire, dont nous perturbons le vecteur de coût avec un bruit aléatoire pour lisser son comportement. Nous modifions également la cible de l’apprentissage, en prédisant non pas l’ensemble des chemins mais simplement la permutation des trains. Cela permet de considérer le Prioritized Planning comme une étape de post-traitement, qu’il n’est plus nécessaire de différencier. En revanche, il faut alors constituer le dataset de façon adaptée, par exemple en résolvant ses instances grâce à une recherche locale sur le permutahedron. Nous implémentons notre algorithme dans le langage Julia, ce qui nous permet de tirer parti de son écosystème de différenciation algorithmique. Les tests sont pratiqués sur le challenge Flatland, un environnement simplifié de simulation ferroviaire issu d’une collaboration entre les chemins de fer suisses, allemands et français. A court terme, nous prévoyons d’étendre notre approche au cas stochastique, afin de faciliter l’adaptation du plan de transport suite à une défaillance matérielle.
@unpublished{dalleRechercheItinerairesDans2022,type={Conference talk},title={Recherche d'itinéraires dans un réseau ferroviaire : apprendre à mieux optimiser},author={Dalle, Guillaume},date={2022-03-06},url={https://indico.math.cnrs.fr/event/6564/contributions/6450/},eventtitle={Journées SMAI MODE 2022},langid={french},venue={{Limoges}},keywords={\#nosource,talk},}
Les réseaux ferroviaires sont des systèmes très instables : il suffit parfois d’un petit incident pour paralyser toute une ligne pendant plusieurs heures. Ces phénomènes de propagation sont dus à des conflits de ressources entre trains, c’est pourquoi leur compréhension nécessiterait des données très précises sur le plan de transport. Malheureusement, de telles données sont souvent inaccessibles. Nous proposons donc un nouveau modèle de retards basé sur une variable latente qui représente la congestion des voies. Cette congestion vit sur les arêtes du graphe qui représente le réseau, et elle évolue suivant un processus auto-régressif vectoriel (VAR) en grande dimension. Cependant, pas question de l’observer directement : les seules informations dont on dispose sont les temps de trajet des trains, ce qui correspond à une projection bruitée du signal d’intérêt. Dans ces conditions, apprendre la matrice de transition du processus VAR peut se révéler difficile. Nos analyses statistiques fournissent le taux de convergence optimal de l’erreur d’estimation, qui dépend de plusieurs paramètres facilement interprétables : dimension du réseau, densité du trafic, vitesse de propagation des retards. Nous prouvons d’abord l’existence d’une borne inférieure minimax, qui s’applique à n’importe quel estimateur, puis nous identifions un algorithme parcimonieux capable d’atteindre cette borne inférieure.
@unpublished{dallePourquoiTrainsSontils2021,type={Seminar},title={Pourquoi Les Trains Sont-Ils Toujours En Retard ?},author={Dalle, Guillaume},date={2021-11-16},url={https://irmar.univ-rennes.fr/evenements/pourquoi-les-trains-sont-ils-toujours-en-retard},eventtitle={Séminaire {{Gaussbusters}}},venue={{IRMAR, Rennes}},keywords={talk}}
Talk
Understanding Railway Delay Propagation through Latent Variable Models
Railway networks are very unstable systems: a single delay caused by an external incident can set off a chain reaction that ends up paralyzing an entire line. Unfortunately, we rarely have complete information about resource conflicts between trains, which makes delay propagation hard to predict. To overcome this issue, we introduce a new model based on a latent congestion variable, which lives on the network graph and evolves according to a Vector AutoRegressive process \X_t = θX_{t-1} + \varepsilon_t\. Since we only get partial and noisy observations of \X the estimation of \θcan be challenging: depending on the density of traffic and the locality of propagation phenomena, how much information can we recover? We provide two complementary answers to this question: an information-theoretic minimax lower bound, and the non-asymptotic analysis of a specific sparse estimator. Combining both of them yields an optimal convergence rate for this statistical problem, which is validated on simulated data and put to the test on real railway logs.
@unpublished{dalleUnderstandingRailwayDelay2021,type={Seminar},title={Understanding Railway Delay Propagation through Latent Variable Models},author={Dalle, Guillaume},date={2021-10-14},url={https://cermics-lab.enpc.fr/seminaires/decision-algorithms-and-geometry/},eventtitle={"{{Decision}}, {{Algorithms}} and {{Geometry}}" {{Seminar}}},venue={{École des Ponts, Paris}},keywords={talk},}
Understanding and predicting delays is a central task for any railway system, but it is made much more difficult by interactions between trains. We propose a new model for the delay propagation phenomenon, which takes into account infrastructure constraints without assuming knowledge of the resource conflicts linking trains with one another. Our approach relies on a set of hidden variables called the network jam, structured as a Dynamic Bayesian Network to enable spatial propagation. We also present a statistical analysis of the minimax estimation error, a variational inference method and numerical tests on simulated and real data.
@unpublished{dalleDelayPropagationSuburban2020,type={Conference talk},title={Delay Propagation on a Suburban Railway Network},author={Dalle, Guillaume and De Castro, Yohann and Parmentier, Axel},date={2020-01},url={https://easychair.org/publications/preprint/9CGf},urldate={2022-02-21},eventtitle={21ème {{Congrès Annuel}} de {{La Société Française}} de {{Recherche Opérationnelle}} et d’{{Aide}} à {{La Décision}}},langid={english},venue={{Montpellier, France}},keywords={talk},}
Patent
Method for Determining a Soiling Speed of a Photovoltaic Generation Unit
The invention relates to a method for determining a soiling speed of a photovoltaic generation unit, in which, on the basis of values of an electrical variable generated by the photovoltaic generation unit at a plurality of moments in a time series and corresponding values of meteorological parameters, and on the basis of a relationship between an electrical variable generated by said generation unit at one moment, the values taken with the meteorological parameters at the same moment, and the occurrence of cleaning events, wherein said relationship comprises multiple relational parameters including the speed of soiling and the occurrence of a cleaning event is modelled by a probability law involving a relational parameter, the soiling speed is determined in iterations in which vectors that are representative of the occurrence of cleaning events are simulated.
@patent{stephanMethodDeterminingSoiling2020,type={patent},title={Method for Determining a Soiling Speed of a Photovoltaic Generation Unit},author={Stephan, Pierre and Dalle, Guillaume},holder={{Electricite De France}},date={2020-06-11},number={WO2020115431A1},location={{WO}},url={https://patents.google.com/patent/WO2020115431A1/en},urldate={2021-11-18},langid={english},}
A new method is introduced to predict train delays on a suburban railway network. Delay propagation is accounted for using a Dynamic Bayesian Network related to the underlying infrastructure graph. We analyze a simplified model from a statistical point of view, and obtain a lower bound on the minimax estimation risk. We propose a generic implementation using Stochastic Variational Inference and the Pyro library to separate estimation from modeling. We present numerical tests on both simulated data and real event logs from the Paris RER network.
@unpublished{dalleDelayPropagationSuburban2019,type={Conference talk},title={Delay Propagation on a Suburban Railway Network},author={Dalle, Guillaume and De Castro, Yohann and Parmentier, Axel},date={2019-12-04},eventtitle={{{PGMO Days}}},langid={english},venue={{Saclay, France}},keywords={talk},}