Stable Differentiable Modal Synthesis for Learning Nonlinear Dynamics
Victor Zheleznov1, Stefan Bilbao1, Alec Wright1 and Simon King2
1Acoustics and Audio Group, University of Edinburgh, Edinburgh, UK
2Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK
Accompanying web-page for the JAES submission
Abstract
Modal methods are a long-standing approach to physical modelling synthesis. Extensions to nonlinear problems are possible, including the case of a high-amplitude vibration of a string. A modal decomposition leads to a densely coupled nonlinear system of ordinary differential equations. Recent work in scalar auxiliary variable techniques has enabled construction of explicit and stable numerical solvers for such classes of nonlinear systems. On the other hand, machine learning approaches (in particular neural ordinary differential equations) have been successful in modelling nonlinear systems automatically from data. In this work, we examine how scalar auxiliary variable techniques can be combined with neural ordinary differential equations to yield a stable differentiable model capable of learning nonlinear dynamics. The proposed approach leverages the analytical solution for linear vibration of system’s modes so that physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the model architecture. As a proof of concept, we generate synthetic data for the nonlinear transverse vibration of a string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.
Sound Examples
Below are some sound examples along with string and excitation parameters for the datasets used in the submission. All sound examples can be downloaded from the accompanying repository.
Test Dataset
| Linear | Target | Predicted | $\gamma$ | $\kappa$ | $\nu$ | $x_{\mathrm{e}}$ | $x_{\mathrm{o}}$ | $f_{\mathrm{amp}}$ | $T_{\mathrm{e}}$ | Note |
|---|---|---|---|---|---|---|---|---|---|---|
| 232.5 | 1.05 | 174.4 | 0.12 | 0.90 | 4.9e+04 | 7.2e-04 | Largest relative MSE for audio output (illustrated in the manuscript) | |||
| 209.0 | 1.08 | 129.1 | 0.37 | 0.23 | 4.9e+04 | 1.2e-03 | Smallest relative MSE for audio output | |||
| 196.4 | 1.05 | 171.9 | 0.85 | 0.89 | 4.8e+04 | 1.3e-03 | Strongest nonlinear effects (illustrated in the manuscript) | |||
| 243.1 | 1.05 | 146.1 | 0.43 | 0.82 | 4.4e+04 | 1.4e-03 | Random example #1 | |||
| 180.9 | 1.08 | 157.1 | 0.13 | 0.32 | 4.9e+04 | 1.4e-03 | Random example #2 | |||
| 184.0 | 1.08 | 148.7 | 0.63 | 0.74 | 4.3e+04 | 6.6e-04 | Random example #3 | |||
| 246.9 | 1.08 | 167.4 | 0.87 | 0.64 | 4.2e+04 | 9.9e-04 | Random example #4 | |||
| 202.0 | 1.07 | 171.1 | 0.81 | 0.79 | 3.8e+04 | 1.1e-03 | Random example #5 | |||
| 196.8 | 1.08 | 140.5 | 0.29 | 0.50 | 4.2e+04 | 1.5e-03 | Random example #6 | |||
| 202.8 | 1.06 | 126.1 | 0.57 | 0.61 | 4.1e+04 | 5.3e-04 | Random example #7 | |||
| 191.7 | 1.06 | 139.1 | 0.65 | 0.87 | 3.8e+04 | 1.3e-03 | Random example #8 | |||
| 229.8 | 1.08 | 138.6 | 0.81 | 0.55 | 4.8e+04 | 1.1e-03 | Random example #9 | |||
| 190.5 | 1.08 | 125.2 | 0.87 | 0.58 | 4.8e+04 | 1.4e-03 | Random example #10 |
Validation Dataset
| Linear | Target | Predicted | $\gamma$ | $\kappa$ | $\nu$ | $x_{\mathrm{e}}$ | $x_{\mathrm{o}}$ | $f_{\mathrm{amp}}$ | $T_{\mathrm{e}}$ | Note |
|---|---|---|---|---|---|---|---|---|---|---|
| 203.2 | 1.07 | 171.0 | 0.83 | 0.38 | 4.7e+04 | 5.0e-04 | Largest relative MSE for audio output | |||
| 210.2 | 1.06 | 139.1 | 0.78 | 0.49 | 4.3e+04 | 1.4e-03 | Smallest relative MSE for audio output | |||
| 202.0 | 1.07 | 154.0 | 0.23 | 0.60 | 4.9e+04 | 1.4e-03 | Random example #1 | |||
| 231.3 | 1.06 | 160.8 | 0.39 | 0.21 | 3.5e+04 | 9.6e-04 | Random example #2 | |||
| 177.1 | 1.08 | 170.5 | 0.30 | 0.19 | 4.7e+04 | 1.2e-03 | Random example #3 | |||
| 229.6 | 1.06 | 140.5 | 0.79 | 0.27 | 4.3e+04 | 8.0e-04 | Random example #4 | |||
| 190.7 | 1.06 | 130.9 | 0.22 | 0.28 | 3.5e+04 | 1.4e-03 | Random example #5 |
Training Dataset
| Linear | Target | Predicted | $\gamma$ | $\kappa$ | $\nu$ | $x_{\mathrm{e}}$ | $x_{\mathrm{o}}$ | $f_{\mathrm{amp}}$ | $T_{\mathrm{e}}$ | Note |
|---|---|---|---|---|---|---|---|---|---|---|
| 169.5 | 1.03 | 169.9 | 0.72 | 0.31 | 2.5e+04 | 5.3e-04 | Largest relative MSE for audio output | |||
| 148.6 | 1.02 | 129.4 | 0.47 | 0.56 | 3.0e+04 | 1.3e-03 | Smallest relative MSE for audio output | |||
| 150.3 | 1.01 | 172.5 | 0.37 | 0.16 | 3.4e+04 | 1.5e-03 | Random example #1 | |||
| 144.9 | 1.04 | 164.9 | 0.28 | 0.71 | 2.6e+04 | 1.0e-03 | Random example #2 | |||
| 167.3 | 1.05 | 138.4 | 0.14 | 0.21 | 2.8e+04 | 8.1e-04 | Random example #3 | |||
| 141.4 | 1.02 | 163.2 | 0.67 | 0.11 | 2.9e+04 | 7.5e-04 | Random example #4 | |||
| 125.4 | 1.01 | 173.1 | 0.62 | 0.72 | 2.9e+04 | 7.2e-04 | Random example #5 |