Publications by LightOn
4th workshop on Neural Scaling Laws: Towards Maximally Beneficial AGI, NeurIPS 2022 – Machine Learning/NLP – LLMs, 

High Quality data need not apply: training LLMs with web data only

Abstract not available.
Authors: Julien Launay
 Machine Learning/NLP – LLMs


BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
Authors: Teven Le Scao and 300+ authors
Machine Learning/NLP – LLMs


What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization?

Large pretrained Transformer language models have been shown to exhibit zero-shot generalization, i.e. they can perform a wide variety of tasks that they were not explicitly trained on. However, the architectures and pretraining objectives used across state-of-the-art models differ significantly, and there has been limited systematic comparison of these factors. In this work, we present a large-scale evaluation of modeling choices and their impact on zero-shot generalization. In particular, we focus on text-to-text models and experiment with three model architectures (causal/non-causal decoder-only and encoder-decoder), trained with two different pretraining objectives (autoregressive and masked language modeling), and evaluated with and without multitask prompted finetuning. We train models with over 5 billion parameters for more than 170 billion tokens, thereby increasing the likelihood that our conclusions will transfer to even larger scales. Our experiments show that causal decoder-only models trained on an autoregressive language modeling objective exhibit the strongest zero-shot generalization after purely unsupervised pretraining. However, models with non-causal visibility on their input trained with a masked language modeling objective followed by multitask finetuning perform the best among our experiments. We therefore consider the adaptation of pretrained models across architectures and objectives. We find that pretrained non-causal decoder models can be adapted into performant generative causal decoder models, using autoregressive language modeling as a downstream task. Furthermore, we find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models, ultimately achieving competitive performance after multitask finetuning. Code and checkpoints are available at this https URL.
Authors: Thomas Wang, Adam Roberts, Daniel Hesslow, Teven Le Scao, Hyung Won Chung, Iz Beltagy, Julien Launay, Colin Raffel
 Technical Reports and Preprints – Machine Learning, LLMs for Biology


RITA: a Study on Scaling Up Generative Protein Sequence Models

In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database. Such generative models hold the promise of greatly accelerating protein design. We conduct the first systematic study of how capabilities evolve with model size for autoregressive transformers in the protein domain: we evaluate RITA models in next amino acid prediction, zero-shot fitness, and enzyme function prediction, showing benefits from increased scale. We release the RITA models openly, to the benefit of the research community.
Authors: Daniel Hesslow, Niccoló Zanichelli, Pascal Notin, Iacopo Poli, Debora Marks
ACL 2022 Workshop BigScience – LLMs – April 2022


A Holistic Assessment of the Carbon Footprint of Noor, a Very Large Arabic Language Model

As ever-larger language models grow more ubiquitous, it is crucial to consider their environmental impact. Characterised by extreme size and resource use, recent generations of models have been criticised for their voracious appetite for compute, and thus significant carbon footprint. Although reporting of carbon impact has grown more common in machine learning papers, this reporting is usually limited to compute resources used strictly for training. In this work, we propose a holistic assessment of the footprint of an extreme-scale language model, Noor. Noor is an ongoing project aiming to develop the largest multi-task Arabic language models–with up to 13B parameters–leveraging zero-shot generalisation to enable a wide range of downstream tasks via natural language instructions. We assess the total carbon bill of the entire project: starting with data collection and storage costs, including research and development budgets, pretraining costs, future serving estimates, and other exogenous costs necessary for this international cooperation. Notably, we find that inference costs and exogenous factors can have a significant impact on the total budget. Finally, we discuss pathways to reduce the carbon footprint of extreme-scale models.
Authors: Imad Lakim, Ebtesam Almazrouei, Merouane Debbah, Julien Launay
ACL 2022 Workshop BigScience – LLMs – April 2022


What Language Model to Train if You Have One Million GPU Hours?

As the size of language models continues to grow they become increasingly more powerful and lead to better results, but they also become more expensive to design and train. Given a compute budget that’s enough to train a multilingual transformers language model in the 100B+ parameter scale, our goal is to choose the architecture and the training setup of such a model. Specifically, we perform an ablation study comparing different modelling architectural, which can significantly impact the performance of the resulting models. We focus on the 1.3B parameter scale providing a compromise between the compute cost of the architecture search and the probability that our conclusions hold for the target 100B+ model. In addition, we study the impact of various popular pretraining corpora on the quality of the model. We also study the performance of training a multilingual model and how it compares to the English-only one. Finally, we consider the scaling behaviour of transformer models to choose the target model size, its shape and its training setup.
Authors: Daniel Hesslow, Teven Le Scao, Lucile Saulnier, Thomas Wang, M Saiful Bari, Stas Bekman, Stella Biderman, Hady Elsahar, Jason Phang, Ofir Press, Colin Raffel, Victor Sanh, Sheng Shen, Lintang Sutawika, Jaesung Tae, Zheng Xin Yong, Julien Launay, Iz Beltagy
LREC 2022 – LLMs – Initially published: October 2021


PAGnol: An Extra-Large French Generative Model

Access to large pre-trained models of varied architectures, in many different languages, is central to the democratization of NLP. We introduce PAGnol, a collection of French GPT models. Using scaling laws, we efficiently train PAGnol-XL (1.5B parameters) with the same computational budget as CamemBERT, a model 13 times smaller. PAGnol-XL is the largest model trained to date for the French language. We plan to train increasingly large and performing versions of PAGnol, exploring the capabilities of French extreme-scale models.
For this first release, we focus on the pre-training and scaling calculations underlining PAGnol. We fit a scaling law for compute for the French language and compare it with its English counterpart. We find the pre-training dataset significantly conditions the quality of the outputs, with common datasets such as OSCAR leading to low-quality offensive text. We evaluate our models on discriminative and generative tasks in French, comparing them to other state-of-the-art French and multilingual models, and reaching the state of the art in the abstract summarization task. Our research was conducted on the public GENCI Jean Zay supercomputer, and our models up to the Large are made publicly available.
Authors: Julien Launay, E.L. Tommasone, Baptiste Pannier, François Boniface, Amélie Chatelain, Alessandro Cappelli, Iacopo Poli, Djamé Seddah
NeurIPS 2022 – Workshop: I Can’t Believe It’s Not Better, December 2022

Scaling Laws Beyond Backpropagation

Alternatives to backpropagation have long been studied to better understand how biological brains may learn. Recently, they have also garnered interest as a way to train neural networks more efficiently. By relaxing constraints inherent to backpropagation (e.g., symmetric feedforward and feedback weights, sequential updates), these methods enable promising prospects, such as local learning. However, the tradeoffs between different methods in terms of final task performance, convergence speed, and ultimately compute and data requirements are rarely outlined. In this work, we use scaling laws to study the ability of Direct Feedback Alignment~(DFA) to train causal decoder-only Transformers efficiently. Scaling laws provide an overview of the tradeoffs implied by a modeling decision, up to extrapolating how it might transfer to increasingly large models. We find that DFA fails to offer more efficient scaling than backpropagation: there is never a regime for which the degradation in loss incurred by using DFA is worth the potential reduction in compute budget. Our finding comes at variance with previous beliefs in the alternative training methods community, and highlights the need for holistic empirical approaches to better understand modeling decisions.
Authors: Matthew J. Filipovich, Alessandro Cappelli, Daniel Hesslow, Julien Launay
 Technical Reports and Preprints – Machine Learning


Shedding a PAC-Bayesian Light on Adaptive Sliced-Wasserstein Distances

The Sliced-Wasserstein distance (SW) is a computationally efficient and theoretically grounded alternative to the Wasserstein distance. Yet, the literature on its statistical properties with respect to the distribution of slices, beyond the uniform measure, is scarce. To bring new contributions to this line of research, we leverage the PAC-Bayesian theory and the central observation that SW actually hinges on a slice-distribution-dependent Gibbs risk, the kind of quantity PAC-Bayesian bounds have been designed to characterize. We provide four types of results: i) PAC-Bayesian generalization bounds that hold on what we refer as adaptive Sliced-Wasserstein distances, i.e. distances defined with respect to any distribution of slices, ii) a procedure to learn the distribution of slices that yields a maximally discriminative SW, by optimizing our PAC-Bayesian bounds, iii) an insight on how the performance of the so-called distributional Sliced-Wasserstein distance may be explained through our theory, and iv) empirical illustrations of our findings.
Authors: Ruben Ohana, Kimia Nadjahi, Alain Rakotomamonjy, Liva Ralaivola
 Technical Reports and Preprints – Quantum


A high-fidelity and large-scale reconfigurable photonic processor for NISQ applications

Reconfigurable linear optical networks are a key component for the development of optical quantum information processing platforms in the NISQ era and beyond. We report the implementation of such a device based on an innovative design that uses the mode mixing of a multimode fiber in combination with the programmable wavefront shaping of an SLM. The capabilities of the platform are explored in the classical regime. For up to a record number of 8~inputs and 38~outputs we achieve fidelities in excess of 93%, week-long stability and losses below 6.5dB. The device was built inside a standard server rack to allow for real-world use.
Authors: A. Cavaillès, P. Boucher, L. Daudet, I. Carron, S. Gigan, K. Müller
Conference proceedings – Machine Learning – November 2021


Binarization for Optical Processing Units via REINFORCE

Optical Processing Units (OPUs) are computing devices that perform random projections of input vectors by exploiting the physical phenomenon of scattering a light source through an opaque medium. OPUs have successfully been proposed to carry out approximate kernel ridge regression at scale and with low power consumption by the means of optical random features. OPUs require input vectors to be binary, and this work proposes a novel way to perform supervised data binarization. The main difficulty to develop a solution is that the OPU projection matrices are unknown which poses a challenge in deriving a binarization approach in an end-to-end fashion. Our approach is based on the REINFORCE gradient estimator, which allows us to estimate the gradient of the loss function with respect to binarization parameters by treating the OPU as a black box. Through experiments on several UCI classification and regression problems, we show that our method outperforms alternative unsupervised and supervised binarization techniques.
Authors: B. Kozyrskiy, I. Poli, R. Ohana, L. Daudet, I. Carron, M. Filippone
Conference Proceedings – Randomized Numerical Linear Algebra July 2021


Photonic co-processors in HPC: using LightOn OPUs for Randomized Numerical Linear Algebra

Randomized Numerical Linear Algebra (RandNLA) is a powerful class of methods, widely used in High Performance Computing (HPC). RandNLA provides approximate solutions to linear algebra functions applied to large signals, at reduced computational costs. However, the randomization step for dimensionality reduction may itself become the computational bottleneck on traditional hardware. Leveraging near constant-time linear random projections delivered by LightOn Optical Processing Units we show that randomization can be significantly accelerated, at a negligible precision loss, in a wide range of important RandNLA algorithms, such as RandSVD or trace estimators.
Authors: Daniel Hesslow, Alessandro Cappelli, Igor Carron, Laurent Daudet, Raphaël Lafargue, Kilian Müller, Ruben Ohana, Gustave Pariente, Iacopo Poli
In: Proceedings of IEEE HotChips 2021
Conference Proceedings – Randomized Numerical Linear Algebra / Hardware July 2021


LightOn Optical Processing Unit: Scaling-up AI and HPC with a Non von Neumann co-processor

We introduce LightOn’s Optical Processing Unit (OPU), the first photonic AI accelerator chip available on the market for at-scale Non von Neumann computations, reaching 1500 TeraOPS. It relies on a combination of free-space optics with off-the-shelf components, together with a software API allowing a seamless integration within Python-based processing pipelines. We discuss a variety of use cases and hybrid network architectures, with the OPU used in combination with CPU/GPU, and draw a pathway towards “optical advantage”.
Authors: Charles Brossollet, Alessandro Cappelli, Igor Carron, Charidimos Chaintoutis, Amélie Chatelain, Laurent Daudet, Sylvain Gigan, Daniel Hesslow, Florent Krzakala, Julien Launay, Safa Mokaadi, Fabien Moreau, Kilian Müller, Ruben Ohana, Gustave Pariente, Iacopo Poli, Giuseppe L. Tommasone
In: Proceedings of IEEE HotChips 2021
Technical Reports and Preprints – Robust Machine LearningJune 2021


Photonic Differential Privacy with Direct Feedback Alignment

Optical Processing Units (OPUs) — low-power photonic chips dedicated to large scale random projections — have been used in previous work to train deep neural networks using Direct Feedback Alignment (DFA), an effective alternative to backpropagation. Here, we demonstrate how to leverage the intrinsic noise of optical random projections to build a differentially private DFA mechanism, making OPUs a solution of choice to provide a private-by-design training. We provide a theoretical analysis of our adaptive privacy mechanism, carefully measuring how the noise of optical random projections propagates in the process and gives rise to provable Differential Privacy. Finally, we conduct experiments demonstrating the ability of our learning procedure to achieve solid end-task performance.
Authors: Ruben Ohana, Hamlet J. Medina Ruiz, Julien Launay, Alessandro Cappelli, Iacopo Poli, Liva Ralaivola, Alain Rakotomamonjy
Technical Reports and Preprints – Quantum


Experimental Approach to Demonstrating Contextuality for Qudits

We propose a method to experimentally demonstrate contextuality with a family of tests for qudits. The experiment we propose uses a qudit encoded in the path of a single photon and its temporal degrees of freedom. We consider the impact of noise on the effectiveness of these tests, taking the approach of ontologically faithful non-contextuality. In this approach, imperfections in the experimental setup must be taken into account in any faithful ontological (classical) model, which limits how much the statistics can deviate within different contexts. In this way, we bound the precision of the experimental setup under which ontologically faithful non-contextual models can be refuted. We further consider the noise tolerance through different types of decoherence models on different types of encodings of qudits. We quantify the effect of the decoherence on the required precision for the experimental setup in order to demonstrate contextuality in this broader sense.
Authors: Adel Sohbi, Ruben Ohana, Isabelle Zaquine, Eleni Diamanti, Damian Markham
Technical Reports and Preprints – Neural Architecture Search


Contrastive Embeddings for Neural Architectures

The performance of algorithms for neural architecture search strongly depends on the parametrization of the search space. We use contrastive learning to identify networks across different initializations based on their data Jacobians, and automatically produce the first architecture embeddings independent from the parametrization of the search space. Using our contrastive embeddings, we show that traditional black-box optimization algorithms, without modification, can reach state-of-the-art performance in Neural Architecture Search. As our method provides a unified embedding space, we perform for the first time transfer learning between search spaces. Finally, we show the evolution of embeddings during training, motivating future studies into using embeddings at different training stages to gain a deeper understanding of the networks in a search space.
Authors: Daniel Hesslow, Iacopo Poli
Technical Reports and Preprints – Robust Machine Learning


Adversarial Robustness by Design through Analog Computing and Synthetic Gradients

We propose a new defense mechanism against adversarial attacks inspired by an optical co-processor, providing robustness without compromising natural accuracy in both white-box and black-box settings. This hardware co-processor performs a nonlinear fixed random transformation, where the parameters are unknown and impossible to retrieve with sufficient precision for large enough dimensions. In the white-box setting, our defense works by obfuscating the parameters of the random projection. Unlike other defenses relying on obfuscated gradients, we find we are unable to build a reliable backward differentiable approximation for obfuscated parameters. Moreover, while our model reaches a good natural accuracy with a hybrid backpropagation – synthetic gradient method, the same approach is suboptimal if employed to generate adversarial examples. We find the combination of a random projection and binarization in the optical system also improves robustness against various types of black-box attacks. Finally, our hybrid training method builds robust features against transfer attacks. We demonstrate our approach on a VGG-like architecture, placing the defense on top of the convolutional features, on CIFAR-10 and CIFAR-100. Code is available at this https URL.
Authors: Alessandro Cappelli, Ruben Ohana, Julien Launay, Laurent Meunier, Iacopo Poli, Florent Krzakala
Conference Proceedings – Machine Learning/ Deep Learning


Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures

Despite being the workhorse of deep learning, the backpropagation algorithm is no panacea. It enforces sequential layer updates, thus preventing efficient parallelization of the training process. Furthermore, its biological plausibility is being challenged. Alternative schemes have been devised; yet, under the constraint of synaptic asymmetry, none have scaled to modern deep learning tasks and architectures. Here, we challenge this perspective and study the applicability of Direct Feedback Alignment to neural view synthesis, recommender systems, geometric learning, and natural language processing.
Authors: Julien Launay, Iacopo Poli, François Boniface, Florent Krzakala
In: Proceedings of NeurIPS 2020
Conference Proceedings – Machine Learning Techniques


Reservoir Computing meets Recurrent Kernels and Structured Transforms

Reservoir Computing is a class of simple yet efficient Recurrent Neural Networks where internal weights are fixed at random and only a linear output layer is trained. In the large size limit, such random neural networks have a deep connection with kernel methods. Our contributions are threefold: a) We rigorously establish the recurrent kernel limit of Reservoir Computing and prove its convergence. b) We test our models on chaotic time series prediction, a classic but challenging benchmark in Reservoir Computing, and show how the Recurrent Kernel is competitive and computationally efficient when the number of data points remains moderate. c) When the number of samples is too large, we leverage the success of structured Random Features for kernel approximation by introducing Structured Reservoir Computing. The two proposed methods, Recurrent Kernel and Structured Reservoir Computing, turn out to be much faster and more memory-efficient than conventional Reservoir Computing.
Authors: Jonathan Dong, Ruben Ohana, Mushegh Rafayelyan, Florent Krzakala
In: Proceedings of NeurIPS 2020
Technical Reports and Preprints – Machine Learning / Deep Learning


The dynamics of learning with feedback alignment

Direct Feedback Alignment (DFA) is emerging as an efficient and biologically plausible alternative to the ubiquitous backpropagation algorithm for training deep neural networks. Despite relying on random feedback weights for the backward pass, DFA successfully trains state-of-the-art models such as Transformers. On the other hand, it notoriously fails to train convolutional networks. An understanding of the inner workings of DFA to explain these diverging results remains elusive. Here, we propose a theory for the success of DFA. We first show that learning in shallow networks proceeds in two steps: an alignment phase, where the model adapts its weights to align the approximate gradient with the true gradient of the loss function, is followed by a memorisation phase, where the model focuses on fitting the data. This two-step process has a degeneracy breaking effect: out of all the low-loss solutions in the landscape, a network trained with DFA naturally converges to the solution which maximises gradient alignment. We also identify a key quantity underlying alignment in deep linear networks: the conditioning of the alignment matrices. The latter enables a detailed understanding of the impact of data structure on alignment and suggests a simple explanation for the well-known failure of DFA to train convolutional neural networks. Numerical experiments on MNIST and CIFAR10 clearly demonstrate degeneracy breaking in deep non-linear networks and show that the align-then-memorize process occurs sequentially from the bottom layers of the network to the top.
Authors: Maria Refinetti, Stéphane d’Ascoli, Ruben Ohana, Sebastian Goldt
Conference Proceedings – Hardware


Light-in-the-loop: using a photonics co-processor for scalable training of neural networks

As neural networks grow larger and more complex and data-hungry, training costs are skyrocketing. Especially when lifelong learning is necessary, such as in recommender systems or self-driving cars, this might soon become unsustainable. In this study, we present the first optical co-processor able to accelerate the training phase of digitally implemented neural networks.
Authors: Julien Launay, Iacopo Poli, Kilian Müller, Igor Carron, Laurent Daudet, Florent Krzakala, Sylvain Gigan
In: Proceedings of IEEE HotChips 2020 (Stanford, USA).
Journal Papers – Time Series


NEWMA: a new method for scalable model-free online change-point detection

We consider the problem of detecting abrupt changes in the distribution of a multi-dimensional time series, with limited computing power and memory. In this paper, we propose a new, simple method for model-free online change-point detection that relies only on fast and light recursive statistics, inspired by the classical Exponential Weighted Moving Average algorithm (EWMA).
Authors: Nicolas Keriven, Damien Garreau, Iacopo Poli
In: IEEE Transactions on Signal Processing Vol. 68 pp. 3515 – 3528 (2020)

Conference Proceedings – Computer Vision


Kernel computations from large-scale random features obtained by Optical Processing Units

Approximating kernel functions with random features (RFs) has been a successful application of random projections for nonparametric estimation. However, performing random projections presents computational challenges for large-scale problems. Recently, a new optical hardware called Optical Processing Unit (OPU) has been developed for fast and energy-efficient computation of large-scale RFs in the analog domain.
Authors: Ruben Ohana, Jonas Wacker, Jonathan Dong, Sébastien Marmin, Florent Krzakala, Maurizio Filippone, Laurent Daudet
In: Proceedings of ICASSP 2020

Conference Proceedings – Signal Processing


Fast Optical System Identification by Numerical Interferometry

We propose a numerical interferometry method for identification of optical multiply-scattering systems when only intensity can be measured. Our method simplifies the calibration of optical transmission matrices from a quadratic to a linear inverse problem by first recovering the phase of the measurements. We show that by carefully designing the probing signals, measurement phase retrieval amounts to a distance geometry problem—a multilateration—in the complex plane.
Authors: Sidharth Gupta, Rémi Gribonval, Laurent Daudet, Ivan Dokmanić
In: Proceedings of ICASSP 2020
Technical Reports and Preprints – Time Series


Online Change Point Detection in Molecular Dynamics With Optical Random Features

Proteins are made of atoms constantly fluctuating, but can occasionally undergo large-scale changes. Such transitions are of biological interest, linking the structure of a protein to its function with a cell. Atomic-level simulations, such as Molecular Dynamics (MD), are used to study these events. However, molecular dynamics simulations produce time series with multiple observables, while changes often only affect a few of them.
Authors: Amélie Chatelain, Giuseppe Luca Tommasone, Laurent Daudet, Iacopo Poli
Journal Papers – Time Series


Optical Reservoir Computing using multiple light scattering for chaotic systems prediction

Reservoir Computing is a relatively recent computational framework based on a large Recurrent Neural Network with fixed weights. Many physical implementations of Reservoir Computing have been proposed to improve speed and energy efficiency. In this study, we report new advances in Optical Reservoir Computing using multiple light scattering to accelerate the recursive computation of the reservoir states.
Authors: Jonathan Dong, Mushegh Rafayelyan, Florent Krzakala, Sylvain Gigan
In: IEEE Journal of Selected Topics in Quantum Electronics Vol. 26(1) (2020)
Conference Proceeding – Signal Processing


Don’t take it lightly: Phasing optical random projections with unknown operators

In this paper we tackle the problem of recovering the phase of complex linear measurements when only magnitude information is available and we control the input. We are motivated by the recent development of dedicated optics-based hardware for rapid random projections which leverages the propagation of light in random media.
Authors: Sidharth Gupta, Rémi Gribonval, Laurent Daudet, Ivan Dokmanić
In: Proceedings of NeurIPS 2019
Technical Reports and Preprints – Machine Learning Techniques


Principled Training of Neural Networks with Direct Feedback Alignment

The backpropagation algorithm has long been the canonical training method for neural networks. Modern paradigms are implicitly optimized for it, and numerous guidelines exist to ensure its proper use. Recently, synthetic gradients methods -where the error gradient is only roughly approximated – have garnered interest. These methods not only better portray how biological brains are learning, but also open new computational possibilities, such as updating layers asynchronously.
Authors: Julien Launay, Iacopo Poli, Florent Krzakala
Journal Papers – Quantum Physics


Machine learning and the physical sciences

Machine learning (ML) encompasses a broad range of algorithms and modeling tools used for a vast array of data processing tasks, which has entered most scientific disciplines in recent years. This article reviews in a selective way the recent research on the interface between machine learning and the physical sciences.
Authors: Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Laurent Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt-Maranto, Lenka Zdeborová
In: Reviews of Modern Physics Vol. 91(4) (2019)
Conference Proceedings – Time Series


Scaling up Echo-State Networks with multiple light scattering

Echo-State Networks and Reservoir Computing have been studied for more than a decade. They provide a simpler yet powerful alternative to Recurrent Neural Networks, every internal weight is fixed and only the last linear layer is trained. They involve many multiplications by dense random matrices. Very large networks are difficult to obtain, as the complexity scales quadratically both in time and memory.
Authors: Jonathan Dong, Sylvain Gigan, Florent Krzakala, Gilles Wainrib
In: IEEE Statistical Signal Processing Workshop 2018
Conference Proceedings – Hardware


Random Projections through multiple optical scattering: Approximating kernels at the speed of light

Random projections have proven extremely useful in many signal processing and machine learning applications. However, they often require either to store a very large random matrix, or to use a different, structured matrix to reduce the computational and memory costs. Here, we overcome this difficulty by proposing an analog, optical device, that performs the random projections literally at the speed of light without having to store any matrix in memory.
Authors: Alaa Saade, Francesco Caltagirone, Igor Carron, Laurent Daudet, Angélique Drémeau, Sylvain Gigan, Florent Krzakala
In: Proceedings of ICASSP 2016
Publications by the LightOn User Community
Publications by the LightOn Community – Machine Learning Techniques


Fast Graph Kernel with Optical Random Features

The graphlet kernel is a classical method in graph classification. It, however, suffers from a high computation cost due to the isomorphism test it includes. As a generic proxy, and in general, at the cost of losing some information, this test can be efficiently replaced by a user-defined mapping that computes various graph characteristics. In this paper, we propose to leverage kernel random features within the graphlet framework and establish a theoretical link with a mean kernel metric. If this method can still be prohibitively costly for usual random features, we then incorporate optical random features that can be computed in constant time. Experiments show that the resulting algorithm is orders of magnitude faster than the graphlet kernel for the same, or better, accuracy.
Authors: Hashem Ghanem, Nicolas Keriven, Nicolas Tremblay
Publications by the LightOn Community – Machine Learning Techniques


Total least squares phase retrieval

We address the phase retrieval problem with errors in the sensing vectors. A number of recent methods for phase retrieval are based on least squares (LS) formulations which assume errors in the quadratic measurements. We extend this approach to handle errors in the sensing vectors by adopting the total least squares (TLS) framework familiar from linear inverse problems with operator errors. We show how gradient descent and the peculiar geometry of the phase retrieval problem can be used to obtain a simple and efficient TLS solution. Additionally, we derive the gradients of the TLS and LS solutions with respect to the sensing vectors and measurements which enables us to calculate the solution errors. By analyzing these error expressions we determine when each method should perform well. We run simulations to demonstrate the benefits of our method and verify the analysis. We further demonstrate the effectiveness of our approach by performing phase retrieval experiments on real optical hardware which naturally contains sensing vector and measurement errors.
Authors: Sidharth Gupta and Ivan Dokmanic ́
Publications by the LightOn Community


Using an Optical Processing Unit for tracking and calorimetry at the LHC

The High Luminosity Large Hadron Collider is expected to have a 10 times higher readout rate than the current state, significantly increasing the computational load required. It is then essential to explore new hardware paradigms. In this work we consider the Optical Processing Units (OPU) from LightOn, which compute random matrix multiplications on large datasets in an analog, fast and economic way, fostering faster machine learning results on a dataset of reduced dimension. We consider two case studies.
1) “Event classification”: high energy proton collision at the Large Hadron Collider have been simulated, each collision being recorded as an image representing the energy flux in the detector. The task is to train a classifier to separate a Susy signal from the background. The OPU allows fast end-to-end classification without building intermediate objects (like jets). This technique is presented, compared with more classical particle physics approaches.
2) “Tracking”: high energy proton collisions at the LHC yield billions of records with typically 100,000 3D points corresponding to the trajectory of 10.000 particles. Using two datasets from previous tracking challenges, we investigate the OPU potential to solve similar or related problems in high-energy physics, in terms of dimensionality reduction, data representation, and preliminary results.
Authors: Aishik Ghosh (Centre National de la Recherche Scientifique (FR)), Laurent Basara (LAL/LRI, Université Paris Saclay), Biswajit Biswas (Centre National de la Recherche Scientifique (FR)), David Rousseau (IJCLab-Orsay)
In: ICHEP 2020 | Prague