Issue |
Europhysics News
Volume 56, Number 1, 2025
AI for Physics
|
|
---|---|---|
Page(s) | 20 - 23 | |
Section | Features | |
DOI | https://doi.org/10.1051/epn/2025108 | |
Published online | 24 March 2025 |
The enduring relevance of simple machine learning models
Department of Physics and Astronomy, University of Padova, Via Marzolo 8, Padua, Italy
Hopfield’s associative memory model and Hinton’s Boltzmann machines showcase the importance of simplicity and interpretability in AI. Their work urges modern AI to balance power with transparency, ensuring models remain comprehensible for research, education, and broader applications.
© European Physical Society, EDP Sciences, 2025
The 2024 Nobel Prize in Physics, awarded to John Hopfield and Geoffrey Hinton, acknowledges that they founded the science of neural networks by using ideas and tools from physics. Here, we describe how their seminal works exemplify the importance of interpretability and elegance in modeling. These qualities tend to be lost in today’s complex artificial intelligence (AI) systems. The early examples by Hinton and Hopfield remind us that, especially in science, simple and interpretable models are fundamental and should be developed with the same effort as more complicated and performing AI tools.
John Hopfield, in 1982, introduced an associative memory neural network model [1] that demonstrated how physics can inspire computational models. With a simple mapping, a model of firing neurons is translated into a system of heterogeneously coupled magnetic spins seeking one of their lowest-energy configurations. However, unlike disordered spin systems, each local minimum of the Hopfield network corresponds to a specific pattern that has been stored through the design of spin couplings according to a rule inspired by neuroscience, known as the Hebbian rule. In this model, the patterns are binary vectors, each representing a possible state of the network, which we want to retrieve based on the network’s initial condition (after learning). The Hebbian rule establishes the network’s couplings by reinforcing the connections between units that are active simultaneously and weakening them when their signs are opposed. Specifically, the spin coupling between two units is positive when both units have the same state (both 1 or both −1) and negative when their states are opposite. This means that the couplings are assigned based on the pairwise products of the entries of the binary pattern that the network is meant to memorize. The energy of the system corresponds to its Hamiltonian, as usual in statistical mechanics. When the Hopfield network starts from a given initial condition, it evolves dynamically towards the closest local minimum, corresponding to one of the stored patterns (see Figure 1). This process is similar to a Monte Carlo update at zero temperature, where the system iteratively adjusts its state to minimize the energy function. In this way, the network can retrieve a stored pattern, even when the input data is noisy or incomplete, as long as the initial condition is close enough to one of the stored patterns.
![]() |
Fig. 1 Memory retrieval in the Hopfield neural network: starting from a corrupted memory (left panel, and black dot in the central panel), the system evolves by lowering the energy toward the nearest minimum (arrow in the central panel), reconstructing the stored pattern (right panel). |
However, the Hopfield network is not immune to challenges, particularly in terms of spurious solutions [2]. Spurious solutions refer to states that are not part of the original set of stored patterns but appear as local minima of the network’s energy landscape. These states arise because the network is finite, and the couplings between units can result in minima corresponding to random or non-representative configurations. The likelihood of spurious solutions increases with the number of patterns stored in the network, which can degrade its efficiency in memory retrieval. Advanced techniques borrowed from statistical physics and disordered systems are also used to understand the Hopfield model by studying the overlap between configurations, which allows for analysis of the network’s capacity and efficiency in memory retrieval. The overlap is used to measure the similarity between a state of the network and one of the stored patterns. Using mean-field theory, theoreticians can map the network’s behavior across various operational regimes [3]. For instance, one can study how the overlap evolves during memory retrieval, identifying distinct phases such as rapid retrieval, critical slowing down near transition points, or partial memory retrieval. These analyses provide deep insights into the functioning of the Hopfield model and establish connections with physical phenomena such as glass transitions and spin glass behavior. Such parallels enrich our understanding of both neural and complex physical systems, highlighting the power of Hopfield’s interdisciplinary approach.
Recently, there has been a resurgence of interest in Hopfield networks, particularly with the development of variants that overcome some of the limitations of the original model [4]. These “modern” Hopfield networks integrate ideas from deep learning and neural networks to create more efficient and scalable models capable of storing and retrieving a much larger number of patterns. A key advancement is the introduction of continuous-state versions, which use real-valued rather than binary states and employ optimization techniques to improve memory retrieval capacity and reliability [4]. These modern variants have been shown to outperform traditional Hopfield networks in a range of applications, such as image recognition and associative memory tasks, and even in providing the foundation for attention mechanisms in deep learning models [5]. This highlights the continued relevance of Hopfield networks in modern AI research, offering a promising avenue for future advancements in neuroscience-inspired computing and machine learning applications.
In the ’80s, Geoffrey Hinton expanded on the ideas behind the Hopfield model with the Boltzmann machine [6, 7], a probabilistic neural network. Named after Ludwig Boltzmann, this machine draws concepts from statistical physics, particularly the Boltzmann distribution, to model data. Unlike the deterministic Hopfield network, the Boltzmann machine uses uncertainty and stochastic dynamics. A compelling aspect of Boltzmann machines is their transparent way of processing data features. They calculate the probability of different configurations based on energy levels, directly correlating to the likelihood of data patterns. This explicit probabilistic framework allows us to understand why the model favors certain patterns over others. Restricted Boltzmann machines [8, 9, 10] (RBMs) are currently the most used version. They add a single layer of hidden nodes representing latent variables and remove the self-interaction between data units, now termed visible units. This restriction reduces computational complexity and, thanks to an enhancement in the model’s explanatory power, is crucial for capturing abstract relationships within the data. As usual in AI, latent and data variables are coupled through a matrix of weights (the lines in the sketch of Figure 2). One can interpret the introduction of hidden units connecting visible units as a replacement for self-interactions between visible units with a mediating auxiliary set of variables, similarly to the Hubbard-Stratonovich procedure of theoretical physics [10].
![]() |
Fig. 2 Sketch of an RBM. |
The RBM’s two-layer structure typically contains a hidden layer smaller than the visible one, which leads to an information bottleneck and forces the RBM to represent data’s salient abstract features in a low-dimensional space. This structure provides a straightforward framework for encoding and uncovering hidden relationships within data elements. Training an RBM involves adjusting the connections between layers to better align with the observed data. To understand such a process, note that the RBM is a generative model, i.e., an unsupervised machine learning method that aims to reproduce the probability distribution that generated the data. Its intuitive learning process (termed “contrastive divergence” [9]) compares the generated data to the actual data and iteratively improves the weights. Thus, the latent nodes process input, explore possible configurations by generating “fantasy particles,” and finally uncover typical multi-site patterns in the data.
The two-layer structure of the RBM is relatively simple, especially in cases with few hidden units. Hence, the connections between visible and hidden units can be studied and interpreted, revealing how the network associates input data with abstract features. Thus, by examining the weights learned by the RBM, researchers identify how the network encodes features in the hidden layer or organizes data into clusters [11, 12, 13], deciphering the system’s logic. Such interpretability contrasts modern “black-box” AI systems with billions of parameters, where decisions are often inscrutable (note that these systems exploit other fundamental algorithms introduced by Hinton, such as backpropagation or the RMSprop gradient descent method [10]). For instance, an RBM trained on handwritten digits might develop hidden nodes corresponding to specific strokes or shapes (Figure 3). Correlations in the genome or proteins’ amino acid sequences emerge in this way [11, 12, 13]. Moreover, interpretability is enhanced if all independent replicas in an ensemble of RBMs converge to a similar set of weights [12].
![]() |
Fig. 3 Two-dimensional representation of the weights attached to each hidden unit, and bias. These weights were learned by a Bernoulli RBM with 10 hidden units trained with 6000 handwritten digits in the ensemble {3, 8, 9} from the MNIST database. The first panel is the local bias imposed to generate “fantasy” numbers, on top of which the RBM adds a combination of patterns associated with the hidden units activated at the moment. |
The Nobel Committee’s decision to honor Hopfield and Hinton recognizes the impact of their work. The interdisciplinary nature of scientific discovery is highlighted by their models, whose transparency helps in practical applications, research, and teaching. For example, these models are studied in the master’s degree in Physics of Data, introduced in 2018 by the University of Padova, to endow students with machine learning tools in recognition of their growing relevance in physics.
The challenge for future generations is understanding AI better and developing interpretable methods. One may think of machine learning as a frontier for physics, in which the scientific approach, based on summing information up, simple modeling, and reproducibility, is applied to a new kind of matter: computational structures.
About the Authors
Marco Baiesi, an associate professor at the University of Padova, is an expert in statistical physics, systems out of equilibrium, soft matter, and biophysics. Recently, he became interested in approaches to understanding machine learning models.
Samir Suweis is an associate professor at the University of Padova. He is an expert in statistical physics, of ecological systems, complex systems, and computational/theoretical neuroscience. He also one of the founder of the AI & Society interdisciplinary project (https://www.aisociety-unipd.it/) and coordinator of the Master course in Physics of Data (https://physicsofdata.dfa.unipd.it/)
References
- J. J. Hopfield, Proceedings of the National Academy of Sciences 79, 2554 (1982) [CrossRef] [Google Scholar]
- D. Amit, Modeling Brain Function: The World of Attractor Neural Networks (Cambridge University Press, 1985) [Google Scholar]
- A. Fachechi, E. Agliari and A. Barra, Neural Networks 112, 24 (2019) [Google Scholar]
- D. Krotov, Nature Reviews Physics 5, 366 (2023) [Google Scholar]
- H. Ramsauer et al. Hopfield networks is all you need. Preprint arXiv:2008.02217 (2021). [Google Scholar]
- G. E. Hinton and T. J. Sejnowski, Proceedings of the fifth annual conference of the cognitive science society, 2554 (1983). [Google Scholar]
- D. H. Ackley, G. E. Hinton and T. J. Sejnowski, Cognitive science 9, 147 (1985). [Google Scholar]
- P. Smolensky, Information processing in dynamical systems: Foundations of harmony theory, chap. 6. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations (MIT Press, 1986). [Google Scholar]
- G. E. Hinton, Neural computation 14, 1771 (2002). [CrossRef] [PubMed] [Google Scholar]
- P. Mehta et al., Physics Reports 810, 1 (2019). [Google Scholar]
- J. Tubiana, S. Cocco and R. Monasson, Elife 8, e39397 (2019). [CrossRef] [PubMed] [Google Scholar]
- A. Braghetto, E. Orlandini and M. Baiesi, Journal of Chemical Theory and Computation 19, 6011 (2023). [CrossRef] [PubMed] [Google Scholar]
- A. Decelle, B. Seoane and L. Rosset, Physical Review E 108, 014110 (2023). [Google Scholar]
All Figures
![]() |
Fig. 1 Memory retrieval in the Hopfield neural network: starting from a corrupted memory (left panel, and black dot in the central panel), the system evolves by lowering the energy toward the nearest minimum (arrow in the central panel), reconstructing the stored pattern (right panel). |
In the text |
![]() |
Fig. 2 Sketch of an RBM. |
In the text |
![]() |
Fig. 3 Two-dimensional representation of the weights attached to each hidden unit, and bias. These weights were learned by a Bernoulli RBM with 10 hidden units trained with 6000 handwritten digits in the ensemble {3, 8, 9} from the MNIST database. The first panel is the local bias imposed to generate “fantasy” numbers, on top of which the RBM adds a combination of patterns associated with the hidden units activated at the moment. |
In the text |
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.