Learning with artificial and natural neural networks: trade-offs in energy consumption and representations

Simona Cocco; Rémi Monasson

doi:10.1051/epn/2025109

All issues

Volume 56 / No 1 (2025)

Europhysics News, 56 1 (2025) 24-26

Full HTML

Free Access

Issue		Europhysics News Volume 56, Number 1, 2025 AI for Physics


Page(s)		24 - 26
Section		Features
DOI		https://doi.org/10.1051/epn/2025109
Published online		24 March 2025

Europhysics News 56/1, 2025, p. 24–26

Learning with artificial and natural neural networks: trade-offs in energy consumption and representations

Simona Cocco and Rémi Monasson

Laboratory of Physics of the École Normale Supérieure, PSL Research and CNRS, Paris, France

Abstract

Decades after Hopfield and Hinton’s seminal works on neural computation and Boltzmann machines, the use of neural networks in machine learning has revolutionized artificial intelligence. Physics, with the help of neuroscience, still has a lot to say on many issues. We here discuss two of them, energy consumption and representations.

© European Physical Society, EDP Sciences, 2025

The works awarded this year’s Nobel Prize in Physics have greatly influenced our vision of how neural networks may compute and learn. This understanding illustrates the power and ubiquity on the key physics concept of energy landscape. In short, J.J. Hopfield showed how the dynamics of recurrent neural networks could be viewed as the descent of an abstract energy into the high-dimensional space of neural configurations. With a simple specification of the interactions between the neuronal units, the positions of the energy minima and, hence, the fixed points of the dynamics could roughly correspond to the data points, offering a possible mechanism for the formation and retrieval of auto-associative memories in the brain. G.E. Hinton and his collaborators then formulated, in full generality, the network’s learning phase as a dynamical descent process into another energy landscape, this time defined in the even higher dimensional space of neuron-neuron interactions [1]. After training, the network can be used to sample new neural configurations at finite temperature, guaranteed to reproduce the low-order statistics of the data. These Boltzmann Machines (BM) were, in practice, the first flexible and functional generative models. Furthermore, a fundamental benefit of this energy-based framing was that many concepts and tools of statistical physics were readily available to theoretically study the performance of neural networks. In particular, the powerful methods introduced a few years earlier by G. Parisi and his collaborators (cf. the 2021 Nobel Prize in Physics) for spin glasses, that is, magnetic material with irregular interactions, played a decisive role in the development of a statistical-physics theory of neural networks in machine learning, as well as in computational neuroscience.

In the four decades to date, fantastic progress has been made in the field of machine learning and artificial intelligence. Current approaches rely on a multitude of complex architectures and computation schemes, including deep networks, transformers, adversarial or diffusion models. Along the way, however, the original link with neuroscience and physics at the heart of Hopfield and Hinton’s works has been largely lost sight of. Yet, there are solid reasons to advocate this return to origins. Neuroscience has made fundamental progress in the understanding of how the brain works, following experimental breakthroughs in the interrogation and recording techniques. Combined with concepts coming from theoretical physics, these insights can help address general issues in machine learning. We illustrate below how physics and computation meet on two of them, energy consumption and representations.

Energy consumption is undoubtedly a major concern for AI (Figure 1A). Since the birth of machine learning seventy years ago, the training compute for AI models has increased by more than 20 orders of magnitude, and is now growing with a ×4 annual factor with the advent of the deep learning era. This exponential growth is only partially mitigated by advances in hardware, since it is estimated that the number of operations per Joule performed by GPUs is increasing by ×1.2 per year. And the energy budget is even more frightening if one considers inference, i.e. the use of models after training, particularly for popular generative models of speech, text or image/video.

Fig. 1

Energy consumption in artificial and natural neural networks.

A. Neural networks (middle) are fed with input data in machine learning (top left) or with sensory stimuli in an animal’s brain (bottom left). As the information is dynamically processed, an output is produced, for instance a new data item (top right) or a behavioral response (bottom right). Carrying out these computations requires sources of energy (shown in orange; bread designed by Freepik), and resource limitations affect performance.

B. If power is capped, the time needed to train an AI model to achieve a specific task generally increases. However, the cumulative energy required for the training may exhibit a minimum at intermediate power levels.

There are many ways in which physics can help. From a hardware point of view, the development of neuromorphic computing devices, for example based on spintronic or optical approaches, could eventually offer massively parallel, fast and energy-efficient alternatives to conventional electronics. Conceptually, the interplay between energy and computation has a long history, beginning with Maxwell’s demon over a century ago. According to Landauer’s famous motto, information is physical, and thermodynamics sets a fundamental limit on the minimum energy required for the elementary manipulation of bits [2], several orders of magnitude below what is required by today’s electronic devices. Statistical physics has also revealed how energy-driven non-equilibrium enzymatic reactions, such as the copying of DNA by a polymerase, can achieve low error rates, providing a concrete example of the existing trade-offs between energy consumption, speed and accuracy in computation. Trade-offs are also empirically found in AI, for instance GPU power capping increases the training time of large-language models, but may reduce the overall energy bill (Figure 1B).

The necessity to carry out essential computations despite highly fluctuating resources has always been a fundamental evolutionary constraint on animals [4]. Our brain makes of approximately 2% of our body mass, but consumes as much as 20% of our energy intake, showing that computation is a real burden on organisms (Figure 1A). Most of the energy consumption in brain circuits is due to the transmission of information between neurons, rather than to the generation of neural activity. An evolutionary strategy for mitigating these costs is synaptic failure, which randomly impedes transmission. How the dynamical noise in the interactions impacts performance remains unclear. The analogy with the techniques of dilution and dropout in machine learning [5] suggests that the effects could even be beneficial for learning and, possibly, for inference. In this regard, perhaps one of the lessons of biology is that energy constraints may have shaped the very way in which calculations are made. Predictive coding is a case in point: by processing only the differences between external data and a constantly updated internal model of the world, the brain requires fewer calculations than would otherwise be necessary.

How neural networks encode relevant features about data is a central question, irrespectively of energy considerations. BM account for correlations in the data through effective interactions, but how heterogeneous and strong interactions contribute to shape relevant (low-lying) configurations is notoriously difficult to understand; after all, this is what spin glasses are about! This difficulty was recognized from the outset by Hinton and collaborators, who introduced the concept of Restricted Boltzmann Machines (RBM), an alternative architecture extracting representations from the data [1]. Informally speaking, a RBM infers a dictionary of features frequently encountered across the data set, e.g. small patches of pixels in Figure 2A and each data point can be represented by the list of the features it contains. As features can be combinatorially mixed, RBM can generate a wide variety of new configurations.

Fig. 2

High-dimensional neural representations: nature and interpretability.

A. A set of data in dimension N (here, N=25 pixels on a 2D grid) is composed of multiple items differing by their color, size and shape. The neural net learns the data and extracts M features, here M=3. Case of entangled representations: the features are activated by diverse data points with mixed color, size, shape, and are thus not easily interpretable. Case of disentangled representations: the features extracted by the neural network are aligned with the independent and defining characteristics of the data. Conditional generation of new data with prescribed color, shape or size is then easy.

B. The state of a neural network with N neurons is a point in the high-dimensional space of the neuron activities r_i; here, N=3. Due to the constraints induced by the interactions between the neurons, these states can be embedded in one or more manifolds of dimensions much smaller than N.

The decomposition in features is generally not unique, and some may be easier to grasp than others. In the example of Figure 2A, the representations on the left are entangled: they mixed up shapes, sizes and colors, while the features on the right correspond one by one to these three independent and key characteristics of the data. Achieving disentangled and meaningful representations is an important topic in the context of explainable machine learning. A natural question, illustrating the pervasive notion of trade-off (Figure 1B), is how much disentangling the representations degrades the generative performance of the model. Quantitative answers to this question can be found with statistical physics methods in some simple settings [6], but a general framework is still missing. It is quite likely than encoding is brain areas is subject to similar constraints. The presence of pure cells, whose activity can be linked to a single piece of sensory information, and mixed cells, which encode multiple features, may reflect the contradictory imperatives imposed by learning and the simplicity of the readout (by downstream brain areas) [7].

Data are not always characterized by categorical features. A human face, for instance, can be continuously deformed by internal muscle motions, or by translations, rotations or scale transformations. Such data lie on continuous manifolds, generally of much lower dimensions than the data space (Figure 2B). Of particular interest are situations in which multiple manifolds coexist in the neural space. Each manifold may code for one concept, such as multiple photos of yourself. Understanding how these concept-related manifolds can be embedded, as well as the emergence of invariant neurons recognizing them are important issues in neuroscience and machine learning, e.g. in the field of vision [8]. The notion of manifolds and representations is not foreign to physics: condensed matter phases define sets in configuration space, and order parameters are measures of their characteristics. The tools of statistical physics and field theory are legitimate for answering the above general questions!

It goes without saying that energy considerations and representations do not exhaust the list of relevant issues in computation, be it artificial or natural. Robustness, reproducibility, over-parametrization, noise, … to mention a few, all raise tantalizing theoretical questions.

About the Authors

Simona Cocco and Rémi Monasson are both Directors of Research at CNRS, and work on the interdisciplinary applications of the statistical physics of disordered systems to machine learning and computational biology/neuroscience in the Ecole Normale Supérieure (Paris).

References

D.H. Ackley, G.E. Hinton, T.J. Sejnowski, Cognitive Science 9, 147 (1985) [CrossRef] [Google Scholar]
A. Engel, C. Van den Broeck, Statistical Mechanics of Learning. Cambridge University Press (2001) [Google Scholar]
R. Landauer, Physics Today 44, 23 (1991) [Google Scholar]
Z. Padamsey, N.L. Rochefort, Current Opinion in Neurobiology 78, 102668 (2023) [CrossRef] [PubMed] [Google Scholar]
N. Srivastava, G. Hinton, A. Krizhevsky et al., Journal of Machine Learning Research 15, 1929 (2014) [Google Scholar]
J. Fernandez-de-Cossio-Diaz, S. Cocco, R. Monasson, Physical Review X 13, 021003 (2023) [Google Scholar]
M. Rigotti, O. Barak, M. Warden et al., Nature 497, 585 (2013) [Google Scholar]
F.A. Wichmann, R. Geirhos, Annual Review of Vision Science 9, 501 (2023) [CrossRef] [PubMed] [Google Scholar]

All Figures

Fig. 1

Energy consumption in artificial and natural neural networks.

A. Neural networks (middle) are fed with input data in machine learning (top left) or with sensory stimuli in an animal’s brain (bottom left). As the information is dynamically processed, an output is produced, for instance a new data item (top right) or a behavioral response (bottom right). Carrying out these computations requires sources of energy (shown in orange; bread designed by Freepik), and resource limitations affect performance.

B. If power is capped, the time needed to train an AI model to achieve a specific task generally increases. However, the cumulative energy required for the training may exhibit a minimum at intermediate power levels.

In the text

Fig. 2

High-dimensional neural representations: nature and interpretability.

A. A set of data in dimension N (here, N=25 pixels on a 2D grid) is composed of multiple items differing by their color, size and shape. The neural net learns the data and extracts M features, here M=3. Case of entangled representations: the features are activated by diverse data points with mixed color, size, shape, and are thus not easily interpretable. Case of disentangled representations: the features extracted by the neural network are aligned with the independent and defining characteristics of the data. Conditional generation of new data with prescribed color, shape or size is then easy.

B. The state of a neural network with N neurons is a point in the high-dimensional space of the neuron activities r_i; here, N=3. Due to the constraints induced by the interactions between the neurons, these states can be embedded in one or more manifolds of dimensions much smaller than N.

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.