Publications by LightOn
Technical Reports and Preprints – LLMs – October 2021


PAGnol: An Extra-Large French Generative Model

Access to large pre-trained models of varied architectures, in many different languages, is central to the democratization of NLP. We introduce PAGnol, a collection of French GPT models. Using scaling laws, we efficiently train PAGnol-XL (1.5B parameters) with the same computational budget as CamemBERT, a model 13 times smaller. PAGnol-XL is the largest model trained to date for the French language. We plan to train increasingly large and performing versions of PAGnol, exploring the capabilities of French extreme-scale models.
For this first release, we focus on the pre-training and scaling calculations underlining PAGnol. We fit a scaling law for compute for the French language and compare it with its English counterpart. We find the pre-training dataset significantly conditions the quality of the outputs, with common datasets such as OSCAR leading to low-quality offensive text. We evaluate our models on discriminative and generative tasks in French, comparing them to other state-of-the-art French and multilingual models, and reaching the state of the art in the abstract summarization task. Our research was conducted on the public GENCI Jean Zay supercomputer, and our models up to the Large are made publicly available.
Authors: Julien Launay, E.L. Tommasone, Baptiste Pannier, François Boniface, Amélie Chatelain, Alessandro Cappelli, Iacopo Poli, Djamé Seddah
Conference Proceedings – Randomized Numerical Linear Algebra July 2021


Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra

Randomized Numerical Linear Algebra (RandNLA) is a powerful class of methods, widely used in High Performance Computing (HPC). RandNLA provides approximate solutions to linear algebra functions applied to large signals, at reduced computational costs. However, the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware. Leveraging near constant-time linear random projections delivered by LightOn Optical Processing Units we show that randomization can be significantly accelerated, at a negligible precision loss, in a wide range of important RandNLA algorithms, such as RandSVD or trace estimators.
Authors: Daniel Hesslow, Alessandro Cappelli, Igor Carron, Laurent Daudet, Raphaël Lafargue, Kilian Müller, Ruben Ohana, Gustave Pariente, Iacopo Poli
In: Proceedings of IEEE HotChips 2021
Conference Proceedings – Randomized Numerical Linear Algebra / Hardware July 2021


LightOn Optical Processing Unit: Scaling-up AI and HPC with a Non von Neumann co-processor

We introduce LightOn’s Optical Processing Unit (OPU), the first photonic AI accelerator chip available on the market for at-scale Non von Neumann computations, reaching 1500 TeraOPS. It relies on a combination of free-space optics with off-the-shelf components, together with a software API allowing a seamless integration within Python-based processing pipelines. We discuss a variety of use cases and hybrid network architectures, with the OPU used in combination with CPU/GPU, and draw a pathway towards “optical advantage”.
Authors: Charles Brossollet, Alessandro Cappelli, Igor Carron, Charidimos Chaintoutis, Amélie Chatelain, Laurent Daudet, Sylvain Gigan, Daniel Hesslow, Florent Krzakala, Julien Launay, Safa Mokaadi, Fabien Moreau, Kilian Müller, Ruben Ohana, Gustave Pariente, Iacopo Poli, Giuseppe L. Tommasone
In: Proceedings of IEEE HotChips 2021
Technical Reports and Preprints – Robust Machine LearningJune 2021


Photonic Differential Privacy with Direct Feedback Alignment

Optical Processing Units (OPUs) — low-power photonic chips dedicated to large scale random projections — have been used in previous work to train deep neural networks using Direct Feedback Alignment (DFA), an effective alternative to backpropagation. Here, we demonstrate how to leverage the intrinsic noise of optical random projections to build a differentially private DFA mechanism, making OPUs a solution of choice to provide a private-by-design training. We provide a theoretical analysis of our adaptive privacy mechanism, carefully measuring how the noise of optical random projections propagates in the process and gives rise to provable Differential Privacy. Finally, we conduct experiments demonstrating the ability of our learning procedure to achieve solid end-task performance.
Authors: Ruben Ohana, Hamlet J. Medina Ruiz, Julien Launay, Alessandro Cappelli, Iacopo Poli, Liva Ralaivola, Alain Rakotomamonjy
Technical Reports and Preprints – Quantum


Experimental Approach to Demonstrating Contextuality for Qudits

We propose a method to experimentally demonstrate contextuality with a family of tests for qudits. The experiment we propose uses a qudit encoded in the path of a single photon and its temporal degrees of freedom. We consider the impact of noise on the effectiveness of these tests, taking the approach of ontologically faithful non-contextuality. In this approach, imperfections in the experimental setup must be taken into account in any faithful ontological (classical) model, which limits how much the statistics can deviate within different contexts. In this way, we bound the precision of the experimental setup under which ontologically faithful non-contextual models can be refuted. We further consider the noise tolerance through different types of decoherence models on different types of encodings of qudits. We quantify the effect of the decoherence on the required precision for the experimental setup in order to demonstrate contextuality in this broader sense.
Authors: Adel Sohbi, Ruben Ohana, Isabelle Zaquine, Eleni Diamanti, Damian Markham
Technical Reports and Preprints – Neural Architecture Search


Contrastive Embeddings for Neural Architectures

The performance of algorithms for neural architecture search strongly depends on the parametrization of the search space. We use contrastive learning to identify networks across different initializations based on their data Jacobians, and automatically produce the first architecture embeddings independent from the parametrization of the search space. Using our contrastive embeddings, we show that traditional black-box optimization algorithms, without modification, can reach state-of-the-art performance in Neural Architecture Search. As our method provides a unified embedding space, we perform for the first time transfer learning between search spaces. Finally, we show the evolution of embeddings during training, motivating future studies into using embeddings at different training stages to gain a deeper understanding of the networks in a search space.
Authors: Daniel Hesslow, Iacopo Poli
Technical Reports and Preprints – Robust Machine Learning


Adversarial Robustness by Design through Analog Computing and Synthetic Gradients

We propose a new defense mechanism against adversarial attacks inspired by an optical co-processor, providing robustness without compromising natural accuracy in both white-box and black-box settings. This hardware co-processor performs a nonlinear fixed random transformation, where the parameters are unknown and impossible to retrieve with sufficient precision for large enough dimensions. In the white-box setting, our defense works by obfuscating the parameters of the random projection. Unlike other defenses relying on obfuscated gradients, we find we are unable to build a reliable backward differentiable approximation for obfuscated parameters. Moreover, while our model reaches a good natural accuracy with a hybrid backpropagation – synthetic gradient method, the same approach is suboptimal if employed to generate adversarial examples. We find the combination of a random projection and binarization in the optical system also improves robustness against various types of black-box attacks. Finally, our hybrid training method builds robust features against transfer attacks. We demonstrate our approach on a VGG-like architecture, placing the defense on top of the convolutional features, on CIFAR-10 and CIFAR-100. Code is available at this https URL.
Authors: Alessandro Cappelli, Ruben Ohana, Julien Launay, Laurent Meunier, Iacopo Poli, Florent Krzakala
Conference Proceedings – Machine Learning/ Deep Learning


Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and architectures. Here, we challenge this perspective and study the applicability of Direct Feedback Alignment to neural view synthesis, recommender systems, geometric learning, and natural language processing.
Authors: Julien Launay, Iacopo Poli, François Boniface, Florent Krzakala
In: Proceedings of NeurIPS 2020
Conference Proceedings – Machine Learning Techniques


Reservoir Computing meets Recurrent Kernels and Structured Transforms

Reservoir Computing is a class of simple yet efficient Recurrent Neural Networks where internal weights are fixed at random and only a linear output layer is trained. In the large size limit, such random neural networks have a deep connection with kernel methods. Our contributions are threefold: a) We rigorously establish the recurrent kernel limit of Reservoir Computing and prove its convergence. b) We test our models on chaotic time series prediction, a classic but challenging benchmark in Reservoir Computing, and show how the Recurrent Kernel is competitive and computationally efficient when the number of data points remains moderate. c) When the number of samples is too large, we leverage the success of structured Random Features for kernel approximation by introducing Structured Reservoir Computing. The two proposed methods, Recurrent Kernel and Structured Reservoir Computing, turn out to be much faster and more memory-efficient than conventional Reservoir Computing.
Authors: Jonathan Dong, Ruben Ohana, Mushegh Rafayelyan, Florent Krzakala
In: Proceedings of NeurIPS 2020
Technical Reports and Preprints – Machine Learning / Deep Learning


The dynamics of learning with feedback alignment

Direct Feedback Alignment (DFA) is emerging as an efficient and biologically plausible alternative to the ubiquitous backpropagation algorithm for training deep neural networks. Despite relying on random feedback weights for the backward pass, DFA successfully trains state-of-the-art models such as Transformers. On the other hand, it notoriously fails to train convolutional networks. An understanding of the inner workings of DFA to explain these diverging results remains elusive. Here, we propose a theory for the success of DFA. We first show that learning in shallow networks proceeds in two steps: an alignment phase, where the model adapts its weights to align the approximate gradient with the true gradient of the loss function, is followed by a memorisation phase, where the model focuses on fitting the data. This two-step process has a degeneracy breaking effect: out of all the low-loss solutions in the landscape, a network trained with DFA naturally converges to the solution which maximises gradient alignment. We also identify a key quantity underlying alignment in deep linear networks: the conditioning of the alignment matrices. The latter enables a detailed understanding of the impact of data structure on alignment and suggests a simple explanation for the well-known failure of DFA to train convolutional neural networks. Numerical experiments on MNIST and CIFAR10 clearly demonstrate degeneracy breaking in deep non-linear networks and show that the align-then-memorize process occurs sequentially from the bottom layers of the network to the top.
Authors: Maria Refinetti, Stéphane d’Ascoli, Ruben Ohana, Sebastian Goldt
Conference Proceedings – Hardware


Light-in-the-loop: using a photonics co-processor for scalable training of neural networks

As neural networks grow larger and more complex and data-hungry, training costs are skyrocketing. Especially when lifelong learning is necessary, such as in recommender systems or self-driving cars, this might soon become unsustainable. In this study, we present the first optical co-processor able to accelerate the training phase of digitally implemented neural networks.
Authors: Julien Launay, Iacopo Poli, Kilian Müller, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan
In: Proceedings of IEEE HotChips 2020 (Stanford, USA).
Journal Papers – Time Series


NEWMA: a new method for scalable model-free online change-point detection

We consider the problem of detecting abrupt changes in the distribution of a multi-dimensional time series, with limited computing power and memory. In this paper, we propose a new, simple method for model-free online change-point detection that relies only on fast and light recursive statistics, inspired by the classical Exponential Weighted Moving Average algorithm (EWMA).
Authors: Nicolas Keriven, Damien Garreau, Iacopo Poli
In: IEEE Transactions on Signal Processing Vol. 68 pp. 3515 – 3528 (2020)

Conference Proceedings – Computer Vision


Kernel computations from large-scale random features obtained by Optical Processing Units

Approximating kernel functions with random features (RFs) has been a successful application of random projections for nonparametric estimation. However, performing random projections presents computational challenges for large-scale problems. Recently, a new optical hardware called Optical Processing Unit (OPU) has been developed for fast and energy-efficient computation of large-scale RFs in the analog domain.
Authors: Ruben Ohana, Jonas Wacker, Jonathan Dong, Sébastien Marmin, Florent Krzakala, Maurizio Filippone, Laurent Daudet
In: Proceedings of ICASSP 2020

Conference Proceedings – Signal Processing


Fast Optical System Identification by Numerical Interferometry

We propose a numerical interferometry method for identification of optical multiply-scattering systems when only intensity can be measured. Our method simplifies the calibration of optical transmission matrices from a quadratic to a linear inverse problem by first recovering the phase of the measurements. We show that by carefully designing the probing signals, measurement phase retrieval amounts to a distance geometry problem—a multilateration—in the complex plane.
Authors: Sidharth Gupta, Rémi Gribonval, Laurent Daudet, Ivan Dokmanić
In: Proceedings of ICASSP 2020
Technical Reports and Preprints – Time Series


Online Change Point Detection in Molecular Dynamics With Optical Random Features

Proteins are made of atoms constantly fluctuating, but can occasionally undergo large-scale changes. Such transitions are of biological interest, linking the structure of a protein to its function with a cell. Atomic-level simulations, such as Molecular Dynamics (MD), are used to study these events. However, molecular dynamics simulations produce time series with multiple observables, while changes often only affect a few of them.
Authors: Amélie Chatelain, Giuseppe Luca Tommasone, Laurent Daudet, Iacopo Poli
Journal Papers – Time Series


Optical Reservoir Computing using multiple light scattering for chaotic systems prediction

Reservoir Computing is a relatively recent computational framework based on a large Recurrent Neural Network with fixed weights. Many physical implementations of Reservoir Computing have been proposed to improve speed and energy efficiency. In this study, we report new advances in Optical Reservoir Computing using multiple light scattering to accelerate the recursive computation of the reservoir states.
Authors: Jonathan Dong, Mushegh Rafayelyan, Florent Krzakala, Sylvain Gigan
In: IEEE Journal of Selected Topics in Quantum Electronics Vol. 26(1) (2020)
Conference Proceeding – Signal Processing


Don’t take it lightly: Phasing optical random projections with unknown operators

In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input. We are motivated by the recent development of dedicated optics-based hardware for rapid random projections which leverages the propagation of light in random media.
Authors: Sidharth Gupta, Rémi Gribonval, Laurent Daudet, Ivan Dokmanić
In: Proceedings of NeurIPS 2019
Technical Reports and Preprints – Machine Learning Techniques


Principled Training of Neural Networks with Direct Feedback Alignment

The backpropagation algorithm has long been the canonical training method for neural networks. Modern paradigms are implicitly optimized for it, and numerous guidelines exist to ensure its proper use. Recently, synthetic gradients methods -where the error gradient is only roughly approximated – have garnered interest. These methods not only better portray how biological brains are learning, but also open new computational possibilities, such as updating layers asynchronously.
Authors: Julien Launay, Iacopo Poli, Florent Krzakala
Journal Papers – Quantum Physics


Machine learning and the physical sciences

Machine learning (ML) encompasses a broad range of algorithms and modeling tools used for a vast array of data processing tasks, which has entered most scientific disciplines in recent years. This article reviews in a selective way the recent research on the interface between machine learning and the physical sciences.
Authors: Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto, Lenka Zdeborová
In: Reviews of Modern Physics Vol. 91(4) (2019)
Conference Proceedings – Time Series


Scaling up Echo-State Networks with multiple light scattering

Echo-State Networks and Reservoir Computing have been studied for more than a decade. They provide a simpler yet powerful alternative to Recurrent Neural Networks, every internal weight is fixed and only the last linear layer is trained. They involve many multiplications by dense random matrices. Very large networks are difficult to obtain, as the complexity scales quadratically both in time and memory.
Authors: Jonathan Dong, Sylvain Gigan, Florent Krzakala, Gilles Wainrib
In: IEEE Statistical Signal Processing Workshop 2018
Conference Proceedings – Hardware


Random Projections through multiple optical scattering: Approximating kernels at the speed of light

Random projections have proven extremely useful in many signal processing and machine learning applications. However, they often require either to store a very large random matrix, or to use a different, structured matrix to reduce the computational and memory costs. Here, we overcome this difficulty by proposing an analog, optical device, that performs the random projections literally at the speed of light without having to store any matrix in memory.
Authors: Alaa Saade, Francesco Caltagirone, Igor Carron, Laurent Daudet, Angélique Drémeau, Sylvain Gigan, Florent Krzakala
In: Proceedings of ICASSP 2016
Publications by the LightOn User Community
Publications by the LightOn Community – Machine Learning Techniques


Fast Graph Kernel with Optical Random Features

The graphlet kernel is a classical method in graph classification. It, however, suffers from a high computation cost due to the isomorphism test it includes. As a generic proxy, and in general, at the cost of losing some information, this test can be efficiently replaced by a user-defined mapping that computes various graph characteristics. In this paper, we propose to leverage kernel random features within the graphlet framework and establish a theoretical link with a mean kernel metric. If this method can still be prohibitively costly for usual random features, we then incorporate optical random features that can be computed in constant time. Experiments show that the resulting algorithm is orders of magnitude faster than the graphlet kernel for the same, or better, accuracy.
Authors: Hashem Ghanem, Nicolas Keriven, Nicolas Tremblay
Publications by the LightOn Community – Machine Learning Techniques


Total least squares phase retrieval

We address the phase retrieval problem with errors in the sensing vectors. A number of recent methods for phase retrieval are based on least squares (LS) formulations which assume errors in the quadratic measurements. We extend this approach to handle errors in the sensing vectors by adopting the total least squares (TLS) framework familiar from linear inverse problems with operator errors. We show how gradient descent and the peculiar geometry of the phase retrieval problem can be used to obtain a simple and efficient TLS solution. Additionally, we derive the gradients of the TLS and LS solutions with respect to the sensing vectors and measurements which enables us to calculate the solution errors. By analyzing these error expressions we determine when each method should perform well. We run simulations to demonstrate the benefits of our method and verify the analysis. We further demonstrate the effectiveness of our approach by performing phase retrieval experiments on real optical hardware which naturally contains sensing vector and measurement errors.
Authors: Sidharth Gupta and Ivan Dokmanic ́
Publications by the LightOn Community


Using an Optical Processing Unit for tracking and calorimetry at the LHC

The High Luminosity Large Hadron Collider is expected to have a 10 times higher readout rate than the current state, significantly increasing the computational load required. It is then essential to explore new hardware paradigms. In this work we consider the Optical Processing Units (OPU) from LightOn, which compute random matrix multiplications on large datasets in an analog, fast and economic way, fostering faster machine learning results on a dataset of reduced dimension. We consider two case studies.
1) “Event classification”: high energy proton collision at the Large Hadron Collider have been simulated, each collision being recorded as an image representing the energy flux in the detector. The task is to train a classifier to separate a Susy signal from the background. The OPU allows fast end-to-end classification without building intermediate objects (like jets). This technique is presented, compared with more classical particle physics approaches.
2) “Tracking”: high energy proton collisions at the LHC yield billions of records with typically 100,000 3D points corresponding to the trajectory of 10.000 particles. Using two datasets from previous tracking challenges, we investigate the OPU potential to solve similar or related problems in high-energy physics, in terms of dimensionality reduction, data representation, and preliminary results.
Authors: Aishik Ghosh (Centre National de la Recherche Scientifique (FR)), Laurent Basara (LAL/LRI, Université Paris Saclay), Biswajit Biswas (Centre National de la Recherche Scientifique (FR)), David Rousseau (IJCLab-Orsay)
In: ICHEP 2020 | Prague