Publications by LightOn
Technical Reports and Preprints
Randomized Numerical Linear Algebra

Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra

Randomized Numerical Linear Algebra (RandNLA) is a powerful class of methods, widely used in High Performance Computing (HPC). RandNLA provides approximate solutions to linear algebra functions applied to large signals, at reduced computational costs. However, the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware. Leveraging near constant-time linear random projections delivered by LightOn Optical Processing Units we show that randomization can be significantly accelerated, at negligible precision loss, in a wide range of important RandNLA algorithms, such as RandSVD or trace estimators.
Authors: Daniel Hesslow, Alessandro Cappelli, Igor Carron, Laurent Daudet, Raphaël Lafargue, Kilian Müller, Ruben Ohana, Gustave Pariente, Iacopo Poli

April 2021

Technical Reports and Preprints
Neural Architecture Search

Contrastive Embeddings for Neural Architectures

The performance of algorithms for neural architecture search strongly depends on the parametrization of the search space. We use contrastive learning to identify networks across different initializations based on their data Jacobians, and automatically produce the first architecture embeddings independent from the parametrization of the search space. Using our contrastive embeddings, we show that traditional black-box optimization algorithms, without modification, can reach state-of-the-art performance in Neural Architecture Search. As our method provides a unified embedding space, we perform for the first time transfer learning between search spaces. Finally, we show the evolution of embeddings during training, motivating future studies into using embeddings at different training stages to gain a deeper understanding of the networks in a search space.
Authors: Daniel Hesslow, Iacopo Poli
Technical Reports and Preprints
Robust Machine Learning

Adversarial Robustness by Design through Analog Computing and Synthetic Gradients

We propose a new defense mechanism against adversarial attacks inspired by an optical co-processor, providing robustness without compromising natural accuracy in both white-box and black-box settings. This hardware co-processor performs a nonlinear fixed random transformation, where the parameters are unknown and impossible to retrieve with sufficient precision for large enough dimensions. In the white-box setting, our defense works by obfuscating the parameters of the random projection. Unlike other defenses relying on obfuscated gradients, we find we are unable to build a reliable backward differentiable approximation for obfuscated parameters. Moreover, while our model reaches a good natural accuracy with a hybrid backpropagation – synthetic gradient method, the same approach is suboptimal if employed to generate adversarial examples. We find the combination of a random projection and binarization in the optical system also improves robustness against various types of black-box attacks. Finally, our hybrid training method builds robust features against transfer attacks. We demonstrate our approach on a VGG-like architecture, placing the defense on top of the convolutional features, on CIFAR-10 and CIFAR-100. Code is available at this https URL.
Authors: Alessandro Cappelli, Ruben Ohana, Julien Launay, Laurent Meunier, Iacopo Poli, Florent Krzakala
Conference Proceedings
Machine Learning Techniques

Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and architectures. Here, we challenge this perspective and study the applicability of Direct Feedback Alignment to neural view synthesis, recommender systems, geometric learning, and natural language processing.
Authors: Julien Launay, Iacopo Poli, François Boniface, Florent Krzakala
In: Proceedings of NeurIPS 2020
Conference Proceedings
Machine Learning Techniques

Reservoir Computing meets Recurrent Kernels and Structured Transforms

Reservoir Computing is a class of simple yet efficient Recurrent Neural Networks where internal weights are fixed at random and only a linear output layer is trained. In the large size limit, such random neural networks have a deep connection with kernel methods. Our contributions are threefold: a) We rigorously establish the recurrent kernel limit of Reservoir Computing and prove its convergence. b) We test our models on chaotic time series prediction, a classic but challenging benchmark in Reservoir Computing, and show how the Recurrent Kernel is competitive and computationally efficient when the number of data points remains moderate. c) When the number of samples is too large, we leverage the success of structured Random Features for kernel approximation by introducing Structured Reservoir Computing. The two proposed methods, Recurrent Kernel and Structured Reservoir Computing, turn out to be much faster and more memory-efficient than conventional Reservoir Computing.
Authors: Jonathan Dong, Ruben Ohana, Mushegh Rafayelyan, Florent Krzakala
In: Proceedings of NeurIPS 2020
Technical Reports and Preprints
Machine Learning Techniques

The dynamics of learning with feedback alignment

Direct Feedback Alignment (DFA) is emerging as an efficient and biologically plausible alternative to the ubiquitous backpropagation algorithm for training deep neural networks. Despite relying on random feedback weights for the backward pass, DFA successfully trains state-of-the-art models such as Transformers. On the other hand, it notoriously fails to train convolutional networks. An understanding of the inner workings of DFA to explain these diverging results remains elusive. Here, we propose a theory for the success of DFA. We first show that learning in shallow networks proceeds in two steps: an alignment phase, where the model adapts its weights to align the approximate gradient with the true gradient of the loss function, is followed by a memorisation phase, where the model focuses on fitting the data. This two-step process has a degeneracy breaking effect: out of all the low-loss solutions in the landscape, a network trained with DFA naturally converges to the solution which maximises gradient alignment. We also identify a key quantity underlying alignment in deep linear networks: the conditioning of the alignment matrices. The latter enables a detailed understanding of the impact of data structure on alignment and suggests a simple explanation for the well-known failure of DFA to train convolutional neural networks. Numerical experiments on MNIST and CIFAR10 clearly demonstrate degeneracy breaking in deep non-linear networks and show that the align-then-memorize process occurs sequentially from the bottom layers of the network to the top.
Authors: Maria Refinetti, Stéphane d’Ascoli, Ruben Ohana, Sebastian Goldt
Conference Proceedings
Hardware

Light-in-the-loop: using a photonics co-processor for scalable training of neural networks

As neural networks grow larger and more complex and data-hungry, training costs are skyrocketing. Especially when lifelong learning is necessary, such as in recommender systems or self-driving cars, this might soon become unsustainable. In this study, we present the first optical co-processor able to accelerate the training phase of digitally-implemented neural networks.
Authors: Julien Launay, Iacopo Poli, Kilian Müller, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan
In: Proceedings of Hot Chips 2020
In: Proceedings of IEEE HotChips 2020 (Stanford, USA).
Journal Papers
Time series

NEWMA: a new method for scalable model-free online change-point detection

We consider the problem of detecting abrupt changes in the distribution of a multi-dimensional time series, with limited computing power and memory. In this paper, we propose a new, simple method for model-free online change-point detection that relies only on fast and light recursive statistics, inspired by the classical Exponential Weighted Moving Average algorithm (EWMA).
Authors: Nicolas Keriven, Damien Garreau, Iacopo Poli
In: IEEE Transactions on Signal Processing Vol. 68 pp. 3515 – 3528 (2020)
Conference Proceedings
Computer Vision

Kernel computations from large-scale random features obtained by Optical Processing Units

Approximating kernel functions with random features (RFs)has been a successful application of random projections for nonparametric estimation. However, performing random projections presents computational challenges for large-scale problems. Recently, a new optical hardware called Optical Processing Unit (OPU) has been developed for fast and energy-efficient computation of large-scale RFs in the analog domain.
Authors: Ruben Ohana, Jonas Wacker, Jonathan Dong, Sébastien Marmin, Florent Krzakala, Maurizio Filippone, Laurent Daudet
In: Proceedings of ICASSP 2020
Conference Proceedings
Signal Processing

Fast Optical System Identification by Numerical Interferometry

We propose a numerical interferometry method for identification of optical multiply-scattering systems when only intensity can be measured. Our method simplifies the calibration of optical transmission matrices from a quadratic to a linear inverse problem by first recovering the phase of the measurements. We show that by carefully designing the probing signals, measurement phase retrieval amounts to a distance geometry problem—a multilateration—in the complex plane.
Authors: Sidharth Gupta, Rémi Gribonval, Laurent Daudet, Ivan Dokmanić
In: Proceedings of ICASSP 2020
Technical Reports and Preprints
Time Series

Online Change Point Detection in Molecular Dynamics With Optical Random Features

Proteins are made of atoms constantly fluctuating, but can occasionally undergo large-scale changes. Such transitions are of biological interest, linking the structure of a protein to its function with a cell. Atomic-level simulations, such as Molecular Dynamics (MD), are used to study these events. However, molecular dynamics simulations produce time series with multiple observables, while changes often only affect a few of them.
Authors: Amélie Chatelain, Giuseppe Luca Tommasone, Laurent Daudet, Iacopo Poli
Journal Papers
Time Series

Optical Reservoir Computing using multiple light scattering for chaotic systems prediction

Reservoir Computing is a relatively recent computational framework based on a large Recurrent Neural Network with fixed weights. Many physical implementations of Reservoir Computing have been proposed to improve speed and energy efficiency. In this study, we report new advances in Optical Reservoir Computing using multiple light scattering to accelerate the recursive computation of the reservoir states.
Authors: Jonathan Dong, Mushegh Rafayelyan, Florent Krzakala, Sylvain Gigan
In: IEEE Journal of Selected Topics in Quantum Electronic Vol. 26(1) (2020)
Conference Proceedings
Signal Processing

Don’t take it lightly: Phasing optical random projections with unknown operators

In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input. We are motivated by the recent development of dedicated optics-based hardware for rapid random projections which leverages the propagation of light in random media.
Authors: Sidharth Gupta, Rémi Gribonval, Laurent Daudet, Ivan Dokmanić
In: Proceedings of NeurIPS 2019
Technical Reports and Preprints
Machine Learning Techniques

Principled Training of Neural Networks with Direct Feedback Alignment

The backpropagation algorithm has long been the canonical training method for neural networks. Modern paradigms are implicitly optimized for it, and numerous guidelines exist to ensure its proper use. Recently, synthetic gradients methods -where the error gradient is only roughly approximated – have garnered interest. These methods not only better portray how biological brains are learning, but also open new computational possibilities, such as updating layers asynchronously.
Authors: Julien Launay, Iacopo Poli, Florent Krzakala
Journal Papers
Quantum Physics

Machine learning and the physical sciences

Machine learning (ML) encompasses a broad range of algorithms and modeling tools used for a vast array of data processing tasks, which has entered most scientific disciplines in recent years. This article reviews in a selective way the recent research on the interface between machine learning and the physical sciences.
Authors: Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto, Lenka Zdeborová
In: Reviews of Modern Physics Vol. 91(4) (2019)
Conference Proceedings
Time Series 

Scaling up Echo-State Networks with multiple light scattering

Echo-State Networks and Reservoir Computing have been studied for more than a decade. They provide a simpler yet powerful alternative to Recurrent Neural Networks, every internal weight is fixed and only the last linear layer is trained. They involve many multiplications by dense random matrices. Very large networks are difficult to obtain, as the complexity scales quadratically both in time and memory.
Authors: Jonathan Dong, Sylvain Gigan, Florent Krzakala, Gilles Wainrib
In: IEEE Statistical Signal Processing Workshop 2018
Conference Proceedings
Hardware

Random Projections through multiple optical scattering: Approximating kernels at the speed of light

Random projections have proven extremely useful in many signal processing and machine learning applications. However, they often require either to store a very large random matrix, or to use a different, structured matrix to reduce the computational and memory costs. Here, we overcome this difficulty by proposing an analog, optical device, that performs the random projections literally at the speed of light without having to store any matrix in memory.
Authors: Alaa Saade, Francesco Caltagirone, Igor Carron, Laurent Daudet, Angélique Drémeau, Sylvain Gigan, Florent Krzakala
In: Proceedings of ICASSP 2016
Publications by the LightOn User Community
Publications by the LightOn Community
Machine Learning Techniques

Fast Graph Kernel with Optical Random Features

The graphlet kernel is a classical method in graph classification. It, however, suffers from a high computation cost due to the isomorphism test it includes. As a generic proxy, and in general, at the cost of losing some information, this test can be efficiently replaced by a user-defined mapping that computes various graph characteristics. In this paper, we propose to leverage kernel random features within the graphlet framework and establish a theoretical link with a mean kernel metric. If this method can still be prohibitively costly for usual random features, we then incorporate optical random features that can be computed in constant time. Experiments show that the resulting algorithm is orders of magnitude faster than the graphlet kernel for the same, or better, accuracy.
Authors: Hashem Ghanem, Nicolas Keriven, Nicolas Tremblay
Publications by the LightOn Community
Machine Learning Techniques

Total least squares phase retrieval

We address the phase retrieval problem with errors in the sensing vectors. A number of recent methods for phase retrieval are based on least squares (LS) formulations which assume errors in the quadratic measurements. We extend this approach to handle errors in the sensing vectors by adopting the total least squares (TLS) framework familiar from linear inverse problems with operator errors. We show how gradient descent and the peculiar geometry of the phase retrieval problem can be used to obtain a simple and efficient TLS solution. Additionally, we derive the gradients of the TLS and LS solutions with respect to the sensing vectors and measurements which enables us to calculate the solution errors. By analyzing these error expressions we determine when each method should perform well. We run simulations to demonstrate the benefits of our method and verify the analysis. We further demonstrate the effectiveness of our approach by performing phase retrieval experiments on real optical hardware which naturally contains sensing vector and measurement errors.
Authors: Sidharth Gupta and Ivan Dokmanic ́
Publications by the LightOn Community
ICHEP 2020 | Prague

Using an Optical Processing Unit for tracking and calorimetry at the LHC

The High Luminosity Large Hadron Collider is expected to have a 10 times higher readout rate than the current state, significantly increasing the computational load required. It is then essential to explore new hardware paradigms. In this work we consider the Optical Processing Units (OPU) from LightOn, which compute random matrix multiplications on large datasets in an analog, fast and economic way, fostering faster machine learning results on a dataset of reduced dimension. We consider two case studies.
1) “Event classification”: high energy proton collision at the Large Hadron Collider have been simulated, each collision being recorded as an image representing the energy flux in the detector. The task is to train a classifier to separate a Susy signal from the background. The OPU allows fast end-to-end classification without building intermediate objects (like jets). This technique is presented, compared with more classical particle physics approaches.
2) “Tracking”: high energy proton collisions at the LHC yield billions of records with typically 100,000 3D points corresponding to the trajectory of 10.000 particles. Using two datasets from previous tracking challenges, we investigate the OPU potential to solve similar or related problems in high-energy physics, in terms of dimensionality reduction, data representation, and preliminary results.
Authors: Aishik Ghosh (Centre National de la Recherche Scientifique (FR)), Laurent Basara (LAL/LRI, Université Paris Saclay), Biswajit Biswas (Centre National de la Recherche Scientifique (FR)), David Rousseau (IJCLab-Orsay)