Stable Differentiable Modal Synthesis for Learning Nonlinear Dynamics

Victor Zheleznov1, Stefan Bilbao2, Alec Wright1 and Simon King3

1Acoustics and Audio Group, University of Edinburgh, Edinburgh, UK
2STMS (UMR9912), IRCAM, CNRS, Sorbonne Université, Paris, France
3Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK

Accompanying web-page for the JAES paper

Code arXiv

Abstract

Modal methods are a long-standing approach to physical modelling synthesis. Extensions to nonlinear problems are possible, leading to coupled nonlinear systems of ordinary differential equations. Recent work in scalar auxiliary variable techniques has enabled construction of explicit and stable numerical solvers for such systems. On the other hand, neural ordinary differential equations have been successful in modelling nonlinear systems from data. In this work, we examine how scalar auxiliary variable techniques can be combined with neural ordinary differential equations to yield a stable differentiable model capable of learning nonlinear dynamics. The proposed approach leverages the analytical solution for linear vibration of the system’s modes so that physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the model architecture. Compared to our previous work that used multilayer perceptrons to parametrise nonlinear dynamics, we employ gradient networks that allow an interpretation in terms of a closed-form and non-negative potential required by scalar auxiliary variable techniques. As a proof of concept, we generate synthetic data for the nonlinear transverse vibration of a string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.

Sound Examples

Below are some sound examples along with string and excitation parameters for the datasets used in the paper. All sound examples can be downloaded from the accompanying repository.

Test Dataset

LinearTargetPredicted$\gamma$$\kappa$$\nu$$x_{\mathrm{e}}$$x_{\mathrm{o}}$$f_{\mathrm{amp}}$$T_{\mathrm{e}}$Note
232.51.05174.40.120.904.9e+047.2e-04Largest relative MSE for audio output (illustrated in the manuscript)
209.01.08129.10.370.234.9e+041.2e-03Smallest relative MSE for audio output
196.41.05171.90.850.894.8e+041.3e-03Strongest nonlinear effects (illustrated in the manuscript)
243.11.05146.10.430.824.4e+041.4e-03Random example #1
180.91.08157.10.130.324.9e+041.4e-03Random example #2
184.01.08148.70.630.744.3e+046.6e-04Random example #3
246.91.08167.40.870.644.2e+049.9e-04Random example #4
202.01.07171.10.810.793.8e+041.1e-03Random example #5
196.81.08140.50.290.504.2e+041.5e-03Random example #6
202.81.06126.10.570.614.1e+045.3e-04Random example #7
191.71.06139.10.650.873.8e+041.3e-03Random example #8
229.81.08138.60.810.554.8e+041.1e-03Random example #9
190.51.08125.20.870.584.8e+041.4e-03Random example #10

Validation Dataset

LinearTargetPredicted$\gamma$$\kappa$$\nu$$x_{\mathrm{e}}$$x_{\mathrm{o}}$$f_{\mathrm{amp}}$$T_{\mathrm{e}}$Note
203.21.07171.00.830.384.7e+045.0e-04Largest relative MSE for audio output
210.21.06139.10.780.494.3e+041.4e-03Smallest relative MSE for audio output
202.01.07154.00.230.604.9e+041.4e-03Random example #1
231.31.06160.80.390.213.5e+049.6e-04Random example #2
177.11.08170.50.300.194.7e+041.2e-03Random example #3
229.61.06140.50.790.274.3e+048.0e-04Random example #4
190.71.06130.90.220.283.5e+041.4e-03Random example #5

Training Dataset

LinearTargetPredicted$\gamma$$\kappa$$\nu$$x_{\mathrm{e}}$$x_{\mathrm{o}}$$f_{\mathrm{amp}}$$T_{\mathrm{e}}$Note
169.51.03169.90.720.312.5e+045.3e-04Largest relative MSE for audio output
148.61.02129.40.470.563.0e+041.3e-03Smallest relative MSE for audio output
150.31.01172.50.370.163.4e+041.5e-03Random example #1
144.91.04164.90.280.712.6e+041.0e-03Random example #2
167.31.05138.40.140.212.8e+048.1e-04Random example #3
141.41.02163.20.670.112.9e+047.5e-04Random example #4
125.41.01173.10.620.722.9e+047.2e-04Random example #5