Stable Differentiable Modal Synthesis for Learning Nonlinear Dynamics
Victor Zheleznov1, Stefan Bilbao2, Alec Wright1 and Simon King3
1Acoustics and Audio Group, University of Edinburgh, Edinburgh, UK
2STMS (UMR9912), IRCAM, CNRS, Sorbonne Université, Paris, France
3Centre for Speech Technology Research, University of Edinburgh, Edinburgh, UK
Accompanying web-page for the JAES paper
Abstract
Modal methods are a long-standing approach to physical modelling synthesis. Extensions to nonlinear problems are possible, leading to coupled nonlinear systems of ordinary differential equations. Recent work in scalar auxiliary variable techniques has enabled construction of explicit and stable numerical solvers for such systems. On the other hand, neural ordinary differential equations have been successful in modelling nonlinear systems from data. In this work, we examine how scalar auxiliary variable techniques can be combined with neural ordinary differential equations to yield a stable differentiable model capable of learning nonlinear dynamics. The proposed approach leverages the analytical solution for linear vibration of the system’s modes so that physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the model architecture. Compared to our previous work that used multilayer perceptrons to parametrise nonlinear dynamics, we employ gradient networks that allow an interpretation in terms of a closed-form and non-negative potential required by scalar auxiliary variable techniques. As a proof of concept, we generate synthetic data for the nonlinear transverse vibration of a string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.
Sound Examples
Below are some sound examples along with string and excitation parameters for the datasets used in the paper. All sound examples can be downloaded from the accompanying repository.
Test Dataset
| Linear | Target | Predicted | $\gamma$ | $\kappa$ | $\nu$ | $x_{\mathrm{e}}$ | $x_{\mathrm{o}}$ | $f_{\mathrm{amp}}$ | $T_{\mathrm{e}}$ | Note |
|---|---|---|---|---|---|---|---|---|---|---|
| 232.5 | 1.05 | 174.4 | 0.12 | 0.90 | 4.9e+04 | 7.2e-04 | Largest relative MSE for audio output (illustrated in the manuscript) | |||
| 209.0 | 1.08 | 129.1 | 0.37 | 0.23 | 4.9e+04 | 1.2e-03 | Smallest relative MSE for audio output | |||
| 196.4 | 1.05 | 171.9 | 0.85 | 0.89 | 4.8e+04 | 1.3e-03 | Strongest nonlinear effects (illustrated in the manuscript) | |||
| 243.1 | 1.05 | 146.1 | 0.43 | 0.82 | 4.4e+04 | 1.4e-03 | Random example #1 | |||
| 180.9 | 1.08 | 157.1 | 0.13 | 0.32 | 4.9e+04 | 1.4e-03 | Random example #2 | |||
| 184.0 | 1.08 | 148.7 | 0.63 | 0.74 | 4.3e+04 | 6.6e-04 | Random example #3 | |||
| 246.9 | 1.08 | 167.4 | 0.87 | 0.64 | 4.2e+04 | 9.9e-04 | Random example #4 | |||
| 202.0 | 1.07 | 171.1 | 0.81 | 0.79 | 3.8e+04 | 1.1e-03 | Random example #5 | |||
| 196.8 | 1.08 | 140.5 | 0.29 | 0.50 | 4.2e+04 | 1.5e-03 | Random example #6 | |||
| 202.8 | 1.06 | 126.1 | 0.57 | 0.61 | 4.1e+04 | 5.3e-04 | Random example #7 | |||
| 191.7 | 1.06 | 139.1 | 0.65 | 0.87 | 3.8e+04 | 1.3e-03 | Random example #8 | |||
| 229.8 | 1.08 | 138.6 | 0.81 | 0.55 | 4.8e+04 | 1.1e-03 | Random example #9 | |||
| 190.5 | 1.08 | 125.2 | 0.87 | 0.58 | 4.8e+04 | 1.4e-03 | Random example #10 |
Validation Dataset
| Linear | Target | Predicted | $\gamma$ | $\kappa$ | $\nu$ | $x_{\mathrm{e}}$ | $x_{\mathrm{o}}$ | $f_{\mathrm{amp}}$ | $T_{\mathrm{e}}$ | Note |
|---|---|---|---|---|---|---|---|---|---|---|
| 203.2 | 1.07 | 171.0 | 0.83 | 0.38 | 4.7e+04 | 5.0e-04 | Largest relative MSE for audio output | |||
| 210.2 | 1.06 | 139.1 | 0.78 | 0.49 | 4.3e+04 | 1.4e-03 | Smallest relative MSE for audio output | |||
| 202.0 | 1.07 | 154.0 | 0.23 | 0.60 | 4.9e+04 | 1.4e-03 | Random example #1 | |||
| 231.3 | 1.06 | 160.8 | 0.39 | 0.21 | 3.5e+04 | 9.6e-04 | Random example #2 | |||
| 177.1 | 1.08 | 170.5 | 0.30 | 0.19 | 4.7e+04 | 1.2e-03 | Random example #3 | |||
| 229.6 | 1.06 | 140.5 | 0.79 | 0.27 | 4.3e+04 | 8.0e-04 | Random example #4 | |||
| 190.7 | 1.06 | 130.9 | 0.22 | 0.28 | 3.5e+04 | 1.4e-03 | Random example #5 |
Training Dataset
| Linear | Target | Predicted | $\gamma$ | $\kappa$ | $\nu$ | $x_{\mathrm{e}}$ | $x_{\mathrm{o}}$ | $f_{\mathrm{amp}}$ | $T_{\mathrm{e}}$ | Note |
|---|---|---|---|---|---|---|---|---|---|---|
| 169.5 | 1.03 | 169.9 | 0.72 | 0.31 | 2.5e+04 | 5.3e-04 | Largest relative MSE for audio output | |||
| 148.6 | 1.02 | 129.4 | 0.47 | 0.56 | 3.0e+04 | 1.3e-03 | Smallest relative MSE for audio output | |||
| 150.3 | 1.01 | 172.5 | 0.37 | 0.16 | 3.4e+04 | 1.5e-03 | Random example #1 | |||
| 144.9 | 1.04 | 164.9 | 0.28 | 0.71 | 2.6e+04 | 1.0e-03 | Random example #2 | |||
| 167.3 | 1.05 | 138.4 | 0.14 | 0.21 | 2.8e+04 | 8.1e-04 | Random example #3 | |||
| 141.4 | 1.02 | 163.2 | 0.67 | 0.11 | 2.9e+04 | 7.5e-04 | Random example #4 | |||
| 125.4 | 1.01 | 173.1 | 0.62 | 0.72 | 2.9e+04 | 7.2e-04 | Random example #5 |