Stable Differentiable Modal Synthesis for Learning Nonlinear Dynamics

Victor Zheleznov1, Stefan Bilbao1, Alec Wright1 and Simon King2

1Acoustics and Audio Group, University of Edinburgh, Edinburgh, UK
2Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK

Accompanying web-page for the JAES submission

Code

Abstract

Modal methods are a long-standing approach to physical modelling synthesis. Extensions to nonlinear problems are possible, including the case of a high-amplitude vibration of a string. A modal decomposition leads to a densely coupled nonlinear system of ordinary differential equations. Recent work in scalar auxiliary variable techniques has enabled construction of explicit and stable numerical solvers for such classes of nonlinear systems. On the other hand, machine learning approaches (in particular neural ordinary differential equations) have been successful in modelling nonlinear systems automatically from data. In this work, we examine how scalar auxiliary variable techniques can be combined with neural ordinary differential equations to yield a stable differentiable model capable of learning nonlinear dynamics. The proposed approach leverages the analytical solution for linear vibration of system’s modes so that physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the model architecture. As a proof of concept, we generate synthetic data for the nonlinear transverse vibration of a string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.

Sound Examples

Below are some sound examples along with string and excitation parameters for the datasets used in the submission. All sound examples can be downloaded from the accompanying repository.

Test Dataset

LinearTargetPredicted$\gamma$$\kappa$$\nu$$x_{\mathrm{e}}$$x_{\mathrm{o}}$$f_{\mathrm{amp}}$$T_{\mathrm{e}}$Note
232.51.05174.40.120.904.9e+047.2e-04Largest relative MSE for audio output (illustrated in the manuscript)
209.01.08129.10.370.234.9e+041.2e-03Smallest relative MSE for audio output
196.41.05171.90.850.894.8e+041.3e-03Strongest nonlinear effects (illustrated in the manuscript)
243.11.05146.10.430.824.4e+041.4e-03Random example #1
180.91.08157.10.130.324.9e+041.4e-03Random example #2
184.01.08148.70.630.744.3e+046.6e-04Random example #3
246.91.08167.40.870.644.2e+049.9e-04Random example #4
202.01.07171.10.810.793.8e+041.1e-03Random example #5
196.81.08140.50.290.504.2e+041.5e-03Random example #6
202.81.06126.10.570.614.1e+045.3e-04Random example #7
191.71.06139.10.650.873.8e+041.3e-03Random example #8
229.81.08138.60.810.554.8e+041.1e-03Random example #9
190.51.08125.20.870.584.8e+041.4e-03Random example #10

Validation Dataset

LinearTargetPredicted$\gamma$$\kappa$$\nu$$x_{\mathrm{e}}$$x_{\mathrm{o}}$$f_{\mathrm{amp}}$$T_{\mathrm{e}}$Note
203.21.07171.00.830.384.7e+045.0e-04Largest relative MSE for audio output
210.21.06139.10.780.494.3e+041.4e-03Smallest relative MSE for audio output
202.01.07154.00.230.604.9e+041.4e-03Random example #1
231.31.06160.80.390.213.5e+049.6e-04Random example #2
177.11.08170.50.300.194.7e+041.2e-03Random example #3
229.61.06140.50.790.274.3e+048.0e-04Random example #4
190.71.06130.90.220.283.5e+041.4e-03Random example #5

Training Dataset

LinearTargetPredicted$\gamma$$\kappa$$\nu$$x_{\mathrm{e}}$$x_{\mathrm{o}}$$f_{\mathrm{amp}}$$T_{\mathrm{e}}$Note
169.51.03169.90.720.312.5e+045.3e-04Largest relative MSE for audio output
148.61.02129.40.470.563.0e+041.3e-03Smallest relative MSE for audio output
150.31.01172.50.370.163.4e+041.5e-03Random example #1
144.91.04164.90.280.712.6e+041.0e-03Random example #2
167.31.05138.40.140.212.8e+048.1e-04Random example #3
141.41.02163.20.670.112.9e+047.5e-04Random example #4
125.41.01173.10.620.722.9e+047.2e-04Random example #5