Learning Nonlinear Dynamics in Physical Modelling Synthesis using Neural Ordinary Differential Equations

Victor Zheleznov1, Stefan Bilbao1, Alec Wright1 and Simon King2

1Acoustics and Audio Group, University of Edinburgh, Edinburgh, UK
2Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK

Accompanying web-page for the DAFx25 paper

arXiv Code

Abstract

Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are possible in order to handle geometric nonlinearities. One such case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems such as electronic circuits automatically from data. In this work, we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution for linear vibration of system’s modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the network architecture. As an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.

Sound Examples

Below are some selected sound examples along with string and excitation parameters for the datasets used in the paper. All sound examples can be downloaded from the accompanying repository.

Test Dataset

Linear Target Predicted $\gamma$ $\kappa$ $x_e$ $x_o$ $f_{\mathrm{amp}}$ $T_e$ Note
$155.4$ $1.07$ $0.79$ $0.87$ $2.3 \times 10^4$ $1.1\;\mathrm{ms}$ Largest relative MSE for audio output (illustrated example in the manuscript)
$233.2$ $1.03$ $0.31$ $0.66$ $2.0 \times 10^4$ $1.3\;\mathrm{ms}$ Lowest relative MSE for audio output
$161.8$ $1.02$ $0.89$ $0.76$ $2.7 \times 10^4$ $0.5\;\mathrm{ms}$ Selected example #1
$203.5$ $1.04$ $0.74$ $0.71$ $2.9 \times 10^4$ $1.5\;\mathrm{ms}$ Selected example #2
$168.3$ $1.01$ $0.46$ $0.34$ $2.8 \times 10^4$ $1.1\;\mathrm{ms}$ Selected example #3
$242.0$ $1.03$ $0.15$ $0.20$ $2.2 \times 10^4$ $1.3\;\mathrm{ms}$ Selected example #4
$139.4$ $1.09$ $0.57$ $0.78$ $2.8 \times 10^4$ $1.3\;\mathrm{ms}$ Selected example #5
$182.7$ $1.02$ $0.79$ $0.14$ $2.7 \times 10^4$ $1.1\;\mathrm{ms}$ Selected example #6
$221.8$ $1.01$ $0.41$ $0.78$ $2.9 \times 10^4$ $0.8\;\mathrm{ms}$ Selected example #7
$191.3$ $1.08$ $0.34$ $0.88$ $2.9 \times 10^4$ $0.6\;\mathrm{ms}$ Selected example #8
$235.7$ $1.06$ $0.26$ $0.23$ $2.7 \times 10^4$ $0.8\;\mathrm{ms}$ Selected example #9
$149.3$ $1.05$ $0.18$ $0.89$ $2.1 \times 10^4$ $0.8\;\mathrm{ms}$ Selected example #10

Training and Validation Dataset

Linear Target Predicted $x_e$ $x_o$ $f_{\mathrm{amp}}$ $T_e$ Note
$0.80$ $0.14$ $2.7 \times 10^4$ $0.9\;\mathrm{ms}$ Largest relative MSE for audio output
$0.58$ $0.31$ $2.8 \times 10^4$ $1.5\;\mathrm{ms}$ Lowest relative MSE for audio output
$0.84$ $0.87$ $3.0 \times 10^4$ $1.0\;\mathrm{ms}$ Selected example #1
$0.19$ $0.13$ $2.1 \times 10^4$ $0.8\;\mathrm{ms}$ Selected example #2
$0.60$ $0.49$ $2.8 \times 10^4$ $1.4\;\mathrm{ms}$ Selected example #3
$0.59$ $0.26$ $2.4 \times 10^4$ $0.9\;\mathrm{ms}$ Selected example #4
$0.24$ $0.78$ $2.9 \times 10^4$ $1.3\;\mathrm{ms}$ Selected example #5
$0.46$ $0.39$ $2.7 \times 10^4$ $0.7\;\mathrm{ms}$ Selected example #6