Training neural networks using Metropolis Monte Carlo and an adaptive variant

Whitelam, Stephen and Selin, Viktor and Benlolo, Ian and Casert, Corneel and Tamblyn, Isaac (2022) Training neural networks using Metropolis Monte Carlo and an adaptive variant. Machine Learning: Science and Technology, 3 (4). 045026. ISSN 2632-2153

[thumbnail of Whitelam_2022_Mach._Learn.__Sci._Technol._3_045026.pdf] Text
Whitelam_2022_Mach._Learn.__Sci._Technol._3_045026.pdf - Published Version

Download (3MB)

Abstract

We examine the zero-temperature Metropolis Monte Carlo (MC) algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis MC can train a neural net with an accuracy comparable to that of gradient descent (GD), if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm (aMC) to overcome these limitations. The intrinsic stochasticity and numerical stability of the MC method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by GD. MC methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.

Item Type: Article
Subjects: OA STM Library > Multidisciplinary
Depositing User: Unnamed user with email support@oastmlibrary.com
Date Deposited: 17 May 2024 10:38
Last Modified: 17 May 2024 10:38
URI: http://geographical.openscholararchive.com/id/eprint/1290

Actions (login required)

View Item
View Item