Detailed Program

Overview
Tue
Sept 2
Wed
Sept 3
Thu
Sept 4
Fri
Sept 5
Sat
Sept 6
Time Session Speakers Venue
09:00 Registration (all day) Hall Mole
09:30
Tutorial 1: Non-Iterative Simulation: A Numerical Analysis Viewpoint
Alessia Andò (University of Udine) Auditorium Tamburi
11:00 Coffee Break Foyer Tamburi
11:30
Tutorial 2: Logarithmic Frequency Resolution Filter Design for Audio
Balázs Bank (Budapest University of Technology and Economics) Auditorium Tamburi
13:00 Lunch Foyer Tamburi
14:30
Tutorial 3: Building Flexible Audio DDSP Pipelines: A Case Study on Artificial Reverb
Gloria Dal Santo (Aalto University) Auditorium Tamburi
16:00 Coffee Break Foyer Tamburi
16:30
Tutorial 4: Plausible Editing of our Acoustic Environment
Annika Neidhardt (University of Surrey) Auditorium Tamburi
18:00 End of Sessions
18:00 DAFx Welcome Aperitivo Foyer Tamburi
20:30 End of Day
Time Session Speakers Venue
08:30 Registration (all day) Hall Mole
09:00 Welcome Remarks Leonardo Gabrielli and Stefania Cecchi Auditorium Tamburi
9:30 Oral Session 1: Virtual Analog

Session Chair: Alberto Bernardini

Towards Efficient Emulation of Nonlinear Analog Circuits for Audio Using Constraint Stabilization and Convex Quadratic Programming

Simplifying Antiderivative Antialiasing with Lookup Table Integration

Anti-Aliasing of Neural Distortion Effects via Model Fine Tuning

MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals

Impedance Synthesis for Hybrid Analog-Digital Audio Effects

Antiderivative Antialiasing for Recurrent Neural Networks

Various Auditorium Tamburi
11:00 Coffee Break Foyer Tamburi
11:30 Poster Session 1: Virtual Analog

Towards Neural Emulation of Voltage-Controlled Oscillators

Solid State Bus-Comp: A Large-Scale and Diverse Dataset for Dynamic Range Compressor Virtual Analog Modeling

Real-Time Virtual Analog Modelling of Diode-Based VCAs

Antialiasing in BBD Chips Using BLEP

Aliasing Reduction in Neural Amp Modeling by Smoothing Activations

Antialiased Black-Box Modeling of Audio Distortion Circuits Using Real Linear Recurrent Units

Training Neural Models of Nonlinear Multi-Port Elements Within Wave Digital Structures Through Discrete-Time Simulation

Various Sala Boxe
12:30
Keynote 1: Bridging Symbolic and Audio Data: Score-Informed Music Performance Data Estimation
Johanna Devaney (Brooklyn College and the Graduate Center, CUNY) Auditorium Tamburi
13:30 Lunch Foyer Tamburi
14:30 Oral Session 2: Physical Modeling

Session Chair: Stefan Bilbao

Distributed Single-Reed Modeling Based on Energy Quadratization and Approximate Modal Expansion

A Wavelet-Based Method for the Estimation of Clarity of Attack Parameters in Non-Percussive Instruments

Non-Iterative Numerical Simulation in Virtual Analog: A Framework Incorporating Current Trends

Power-Balanced Drift Regulation for Scalar Auxiliary Variable Methods: Application to Real-Time Simulation of Nonlinear String Vibrations

Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates

Learning Nonlinear Dynamics in Physical Modelling Synthesis Using Neural Ordinary Differential Equations

Various Auditorium Tamburi
16:00 Coffee Break Foyer Tamburi
16:30 Poster Session 2: Physical Modeling

Physics-Informed Deep Learning for Nonlinear Friction Model of Bow-String Interaction

Comparing Acoustic and Digital Piano Actions: Data Analysis and Key Insights

Wave Pulse Phase Modulation: Hybridising Phase Modulation and Phase Distortion

Digital Morphophone Environment. Computer Rendering of a Pioneering Sound Processing Device

Various Sala Boxe
Demo Session 2: Virtual Analog and Physical Modeling

MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals

Power-Balanced Drift Regulation for Scalar Auxiliary Variable Methods: Application to Real-Time Simulation of Nonlinear String Vibrations

Various Sala Boxe
17:30 End of Sessions
20:00 DAFx Banquet MaWay
22:30 End of Day
Time Session Speakers Venue
08:30 Registration (all day) Hall Mole
09:00 Oral Session 3: Spatial Sound, Room Acoustics and Perception

Session Chair: Karolina Prawda

Modeling the Impulse Response of Higher-Order Microphone Arrays Using Differentiable Feedback Delay Networks

A Modified Algorithm for a Loudspeaker Line Array Multi-Lobe Control

Estimation of Multi-Slope Amplitudes in Late Reverberation

Differentiable Scattering Delay Networks for Artificial Reverberation

Differentiable Attenuation Filters for Feedback Delay Networks

Perceptual Decorrelator Based on Resonators

Various Auditorium Tamburi
10:30 Coffee Break Foyer Tamburi
Sponsor Demo 1: DISCOVER THE POWER OF MODELING – PART 1: Synthesis Techniques for Acoustic Instrument Emulation
Audio Modeling Sala Boxe
11:00 Poster Session 3: Spatial Sound, Room Acoustics and Perception

Compression of Head-Related Transfer Functions Using Piecewise Cubic Hermite Interpolation

Spatializing Screen Readers: Extending VoiceOver via Head-Tracked Binaural Synthesis for User Interface Accessibility

Evaluating the Performance of Objective Audio Quality Metrics in Response to Common Audio Degradations

Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method

DataRES and PyRES: A Room Dataset and a Python Library for Reverberation Enhancement System Development, Evaluation, and Simulation

Auditory Discrimination of Early Reflections in Virtual Rooms

Various Sala Boxe
Demo Session 3: Spatial Sound, Room Acoustics and Perception

Partiels – Exploring, Analyzing and Understanding Sounds

Listener-Adaptive 3D Audio with Crosstalk Cancellation

Various Sala Boxe
12:00
Keynote 2: Reverberation – Dereverberation: The promise of hybrid models
Gaël Richard (Télécom Paris, Institut Polytechnique de Paris) Auditorium Tamburi
13:00 Lunch Foyer Tamburi
14:30 Oral Session 4: Signal Processing

Session Chair: Gianpaolo Evangelista

Biquad Coefficients Optimization via Kolmogorov-Arnold Networks

Audio Processor Parameters: Estimating Distributions Instead of Deterministic Values

A Parametric Equalizer with Interactive Poles and Zeros Control for Digital Signal Processing Education

Zero-Phase Sound via Giant FFT

Partiels – Exploring, Analyzing and Understanding Sounds

Stable Limit Cycles as Tunable Signal Sources

Various Auditorium Tamburi
16:00 Coffee Break Foyer Tamburi
16:30 Poster Session 4: Signal Processing

Lookup Table Based Audio Spectral Transformation

A Non-Uniform Subband Implementation of an Active Noise Control System for Snoring Reduction

Compositional Application of a Chaotic Dynamical System for the Synthesis of Sounds

Various Sala Boxe
Demo Session 4: Signal Processing

Delay Optimization Towards Smooth Sparse Noise

graetli: A Microcontroller-Based DSP Platform for Real-Time Audio Signal Processing

A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis

Various Sala Boxe
17:30 End of Sessions
18:30 Aperitivo Foyer Tamburi
20:00 DAFx Concert: “Macchine Nostre” – A/V Performance for Italian Synthesizers Auditorium Tamburi
22:30 End of Day
Time Session Speakers Venue
08:30 Registration (all day) Hall Mole
09:00 Oral Session 5: Deep Learning Methods, Effects and Data Analysis

Session Chair: Orchisama Das

DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions

Improving Lyrics-to-Audio Alignment Using Frame-wise Phoneme Labels with Masked Cross Entropy Loss

Automatic Classification of Chains of Guitar Effects Through Evolutionary Neural Architecture Search

Inference-Time Structured Pruning for Real-Time Neural Network Audio Effects

Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial Approaches

Empirical Results for Adjusting Truncated Backpropagation Through Time While Training Neural Audio Effects

Various Auditorium Tamburi
10:30 Coffee Break Foyer Tamburi
Sponsor Demo 2: DISCOVER THE POWER OF MODELING – PART 2: Room Modeling Combining Physical and Psychoacoustic Approaches
Audio Modeling Sala Boxe
11:00 Poster Session 5: Deep Learning Methods, Effects and Data Analysis

Neural-Driven Multi-Band Processing for Automatic Equalization and Style Transfer

TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration

Hyperbolic Embeddings for Order-Aware Classification of Audio Effect Chains

Towards an Objective Comparison of Panning Feature Algorithms for Unsupervised Learning

Unsupervised Text-to-Sound Mapping via Embedding Space Alignment

Generative Latent Spaces for Neural Synthesis of Audio Textures

Various Sala Boxe
Demo Session 5: Deep Learning Methods, Effects and Data Analysis

RT-PAD-VC – Creative Applications of Neural Voice Conversion as an Audio Effect

DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions

Various Sala Boxe
12:00
Keynote 3: Effecting Audio: An Entangled Approach to Signals, Concepts and Artistic Contexts
Andrew McPherson (Imperial College London) Auditorium Tamburi
13:00 Lunch Foyer Tamburi
14:30 Oral Session 6: Deep Learning for Synthesis

Session Chair: Stefano Fasciani

SCHAEFFER: A Dataset of Human-Annotated Sound Objects for Machine Learning Applications

Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space

Neural Sample-Based Piano Synthesis

Piano-SSM: Diagonal State Space Models for Efficient Midi-to-Raw Audio Synthesis

A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis

Various Auditorium Tamburi
15:45 Awards and Closing Session Leonardo Gabrielli and Stefania Cecchi Auditorium Tamburi
16:15 Handover Address Kurt Werner and Mark Rau Auditorium Tamburi
16:45 End of Sessions
16:45 Board Meeting Various TBA
17:45 End of Day
Time Session Speakers Venue
09:30 Guided tour of Ancona: visit to the ancient historical town Meeting point: Piazza della Repubblica at the stairs under the RAI sign
Tue, Sept 2 Wed, Sept 3 Thu, Sept 4 Fri, Sept 5 Sat, Sept 6
08:30 Registration (all day) Registration (all day) Registration (all day)
09:00 Registration (all day)
Welcome Remarks
Auditorium Tamburi
Spatial Sound, Room Acoustics and Perception
Auditorium Tamburi
Deep Learning Methods, Effects and Data Analysis
Auditorium Tamburi
09:30
Tutorial 1
Alessia Andò
Auditorium Tamburi
Virtual Analog
Auditorium Tamburi
Guided tour of Ancona: visit to the ancient historical town
Meeting point: Piazza della Repubblica at the stairs under the RAI sign
10:30
Coffee Break
Foyer Tamburi
Sponsor Demo 1
Sala Boxe
Coffee Break
Foyer Tamburi
Sponsor Demo 2
Sala Boxe
11:00
Coffee Break
Foyer Tamburi
Coffee Break
Foyer Tamburi
Spatial Sound, Room Acoustics and Perception
Sala Boxe
Spatial Sound, Room Acoustics and Perception
Sala Boxe
Deep Learning Methods, Effects and Data Analysis
Sala Boxe
Deep Learning Methods, Effects and Data Analysis
Sala Boxe
11:30
Tutorial 2
Balázs Bank
Auditorium Tamburi
Virtual Analog
Sala Boxe
12:00
Keynote 2
Gaël Richard
Auditorium Tamburi
Keynote 3
Andrew McPherson
Auditorium Tamburi
12:30
Keynote 1
Johanna Devaney
Auditorium Tamburi
13:00
Lunch
Foyer Tamburi
Lunch
Foyer Tamburi
Lunch
Foyer Tamburi
13:30
Lunch
Foyer Tamburi
14:30
Tutorial 3
Gloria Dal Santo
Auditorium Tamburi
Physical Modeling
Auditorium Tamburi
Signal Processing
Auditorium Tamburi
Deep Learning for Synthesis
Auditorium Tamburi
15:45
Awards and Closing Session
Auditorium Tamburi
16:00
Coffee Break
Foyer Tamburi
Coffee Break
Foyer Tamburi
Coffee Break
Foyer Tamburi
16:15
Handover Address
Auditorium Tamburi
16:30
Tutorial 4
Annika Neidhardt
Auditorium Tamburi
Physical Modeling
Sala Boxe
Virtual Analog and Physical Modeling
Sala Boxe
Signal Processing
Sala Boxe
Signal Processing
Sala Boxe
16:45
Board Meeting
TBA
18:00
DAFx Welcome Aperitivo
Foyer Tamburi
18:30
Aperitivo
Foyer Tamburi
20:00
DAFx Banquet
MaWay
DAFx Concert: “Macchine Nostre” – A/V Performance for Italian Synthesizers
Auditorium Tamburi

Color Keys  

Oral Sessions
Poster Sessions
Demo
Social Events
Other
Towards Efficient Emulation of Nonlinear Analog Circuits for Audio Using Constraint Stabilization and Convex Quadratic Programming

Miguel Zea and Luis A. Rivera



Abstract: This paper introduces a computationally efficient method for the emulation of nonlinear analog audio circuits by combining state-space representations, constraint stabilization, and convex quadratic programming (QP). Unlike traditional virtual analog (VA) modeling approaches or computationally demanding SPICE-based simulations, our approach reformulates the nonlinear differential-algebraic (DAE) systems that arise from analog circuit analysis into numerically stable optimization problems. The proposed method efficiently addresses the numerical challenges posed by nonlinear algebraic constraints via constraint stabilization techniques, significantly enhancing robustness and stability, suitable for real-time simulations. A canonical diode clipper circuit is presented as a test case, demonstrating that our method achieves accurate and faster emulations compared to conventional state-space methods. Furthermore, our method performs very well even at substantially lower sampling rates. Preliminary numerical experiments confirm that the proposed approach offers improved numerical stability and real-time feasibility, positioning it as a practical solution for high-fidelity audio applications.

Simplifying Antiderivative Antialiasing with Lookup Table Integration

Leonardo Gabrielli and Stefano Squartini



Abstract: Antiderivative Antialiasing (ADAA), has become a pivotal method for reducing aliasing when dealing with nonlinear function at audio rate. However, its implementation requires analytical computation of the antiderivative of the nonlinear function, which in practical cases can be challenging without a symbolic solver. Moreover, when the nonlinear function is given by measurements it must be approximated to get a symbolic description. In this paper, we propose a simple approach to ADAA for practical applications that employs numerical integration of lookup tables (LUTs) to approximate the antiderivative. This method eliminates the need for closed-form solutions, streamlining the ADAA implementation process in industrial applications. We analyze the trade-offs of this approach, highlighting its computational efficiency and ease of implementation while discussing the potential impact of numerical integration errors on aliasing performance. Experiments are conducted with static nonlinearities (tanh, a simple wavefolder and the Buchla 259 wavefolding circuit) and a stateful nonlinear system (the diode clipper).

Anti-Aliasing of Neural Distortion Effects via Model Fine Tuning

Alistair Carson, Alec Wright and Stefan Bilbao



Abstract: Neural networks have become ubiquitous with guitar distortion effects modelling in recent years. Despite their ability to yield perceptually convincing models, they are susceptible to frequency aliasing when driven by high frequency and high gain inputs. Nonlinear activation functions create both the desired harmonic distortion and unwanted aliasing distortion as the bandwidth of the signal is expanded beyond the Nyquist frequency. Here, we present a method for reducing aliasing in neural models via a teacher-student fine tuning approach, where the teacher is a pretrained model with its weights frozen, and the student is a copy of this with learnable parameters. The student is fine-tuned against an aliasing-free dataset generated by passing sinusoids through the original model and removing non-harmonic components from the output spectra. Our results show that this method significantly suppresses aliasing for both long-short-term-memory networks (LSTM) and temporal convolutional networks (TCN). In the majority of our case studies, the reduction in aliasing was greater than that achieved by two times oversampling. One side-effect of the proposed method is that harmonic distortion components are also affected. This adverse effect was found to be modeldependent, with the LSTM models giving the best balance between anti-aliasing and preserving the perceived similarity to an analog reference device.

MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals

Francesco Ardan Dal Rí, Domenico Stefani, Luca Turchet and Nicola Conci



Abstract: In this paper, we present an approach to the neural modeling of overdrive guitar pedals with conditioning from a cross-circuit and cross-setting latent space. The resulting network models the behavior of multiple overdrive pedals across different settings, offering continuous morphing between real configurations and hybrid behaviors. Compact conditioning spaces are obtained through unsupervised training of a variational autoencoder with adversarial training, resulting in accurate reconstruction performance across different sets of pedals. We then compare three Hyper-Recurrent architectures for processing, including dynamic and static HyperRNNs, and a smaller model for real-time processing. Additionally, we present pOD-set, a new open dataset including recordings of 27 analog overdrive pedals, each with 36 gain and tone parameter combinations totaling over 97 hours of recordings. Precise parameter setting was achieved through a custom-deployed recording robot.

Impedance Synthesis for Hybrid Analog-Digital Audio Effects

Francisco Bernardo, Matthew Davison and Andrew McPherson



Abstract: Most real systems, from acoustics to analog electronics, are characterised by bidirectional coupling amongst elements rather than neat, unidirectional signal flows between self-contained modules. Integrating digital processing into physical domains becomes a significant engineering challenge when the application requires bidirectional coupling across the physical-digital boundary rather than separate, well-defined inputs and outputs. We introduce an approach to hybrid analog-digital audio processing using synthetic impedance: digitally simulated circuit elements integrated into an otherwise analog circuit. This approach combines the physicality and classic character of analog audio circuits alongside the precision and flexibility of digital signal processing (DSP). Our impedance synthesis system consists of a voltage-controlled current source and a microcontroller-based DSP system. We demonstrate our technique through modifying an iconic guitar distortion pedal, the Boss DS-1, showing the ability of the synthetic impedance to both replicate and extend the behaviour of the pedal’s diode clipping stage. We discuss the behaviour of the synthetic impedance in isolated laboratory conditions and in the DS-1 pedal, highlighting the technical and creative potential of the technique as well as its practical limitations and future extensions.

Antiderivative Antialiasing for Recurrent Neural Networks

Otto Mikkonen and Kurt James Werner



Abstract: Neural networks have become invaluable for general audio processing tasks, such as virtual analog modeling of nonlinear audio equipment. For sequence modeling tasks in particular, recurrent neural networks (RNNs) have gained widespread adoption in recent years. Their general applicability and effectiveness stems partly from their inherent nonlinearity, which makes them prone to aliasing. Recent work has explored mitigating aliasing by oversampling the network—an approach whose effectiveness is directly linked with the incurred computational costs. This work explores an alternative route by extending the antiderivative antialiasing technique to explicit, computable RNNs. Detailed applications to the Gated Recurrent Unit and Long Short-Term Memory cell are shown as case studies. The proposed technique is evaluated on multiple pre-trained guitar amplifier models, assessing its impact on the amount of aliasing and model tonality. The method is shown to reduce the models’ tendency to alias considerably across all considered sample rates while only affecting their tonality moderately, without requiring high oversampling factors. The results of this study can be used to improve sound quality in neural audio processing tasks that employ a suitable class of RNNs. Additional materials are provided in the accompanying webpage.

Towards Neural Emulation of Voltage-Controlled Oscillators

Riccardo Simionato and Stefano Fasciani



Abstract: Machine learning models have become ubiquitous in modeling analog audio devices. Expanding on this line of research, our study focuses on Voltage-Controlled Oscillators of analog synthesizers. We employ black box autoregressive artificial neural networks to model the typical analog waveshapes, including triangle, square, and sawtooth. The models can be conditioned on wave frequency and type, enabling the generation of pitch envelopes and morphing across waveshapes. We conduct evaluations on both synthetic and analog datasets to assess the accuracy of various architectural variants. The LSTM variant performed better, although lower frequency ranges present particular challenges.

Solid State Bus-Comp: A Large-Scale and Diverse Dataset for Dynamic Range Compressor Virtual Analog Modeling

Yicheng Gu, Runsong Zhang, Lauri Juvela and Zhizheng Wu



Abstract: Virtual Analog (VA) modeling aims to simulate the behavior of hardware circuits via algorithms to replicate their tone digitally. Dynamic Range Compressor (DRC) is an audio processing module that controls the dynamics of a track by reducing and amplifying the volumes of loud and quiet sounds, which is essential in music production. In recent years, neural-network-based VA modeling has shown great potential in producing high-fidelity models. However, due to the lack of data quantity and diversity, their generalization ability in different parameter settings and input sounds is still limited. To tackle this problem, we present Solid State Bus-Comp, the first large-scale and diverse dataset for modeling the classical VCA compressor — SSL 500 G-Bus. Specifically, we manually collected 175 unmastered songs from the Cambridge Multitrack Library. We recorded the compressed audio in 220 parameter combinations, resulting in an extensive 2528-hour dataset with diverse genres, instruments, tempos, and keys. Moreover, to facilitate the use of our proposed dataset, we conducted benchmark experiments in various open-sourced black-box and grey-box models, as well as white-box plugins. We also conducted ablation studies in different data subsets to illustrate the effectiveness of the improved data diversity and quantity. The dataset and demos are on our project page: https: //www.yichenggu.com/SolidStateBusComp/.

Real-Time Virtual Analog Modelling of Diode-Based VCAs

Coriander V. Pines



Abstract: Some early analog voltage-controlled amplifiers (VCAs) utilized semiconductor diodes as a variable-gain element. Diode-based VCAs exhibit a unique sound quality, with distortion dependent both on signal level and gain control. In this work, we examine the behavior of a simplified circuit for a diode-based VCA and propose a nonlinear, explicit, stateless digital model. This approach avoids traditional iterative algorithms, which can be computationally intensive. The resulting digital model retains the sonic characteristics of the analog model and is suitable for real-time simulation. We present an analysis of the gain characteristics and harmonic distortion produced by this model, as well as practical guidance for implementation. We apply this approach to a set of alternative analog topologies and introduce a family of digital VCA models based on fixed nonlinearities with variable operating points.

Antialiasing in BBD Chips Using BLEP

Leonardo Gabrielli, Stefano D'Angelo and Stefano Squartini



Abstract: Several methods exist in the literature to accurately simulate Bucket Brigade Device (BBD) chips, which are widely used in analog delay-based audio effects for their characteristic lo-fi sound, which is affected by noise, nonlinearities and aliasing. The latter is a desired quality, being typical of those chips. However, when simulating BBDs in a discrete-time domain environment, additional aliasing components occur that need to be suppressed. In this work, we propose a novel method that applies the Bandlimited Step (BLEP) technique, effectively minimizing aliasing artifacts introduced by the simulation. The paper provides some insights on the design of a BBD simulation using interpolation at the input for clock rate conversion and, most importantly, shows how BLEP can be effective in reducing unwanted aliasing artifacts. Interpolation is shown to have minor importance in the reduction of spurious components.

Aliasing Reduction in Neural Amp Modeling by Smoothing Activations

Ryota Sato and Julius O. Smith III



Abstract: The increasing demand for high-quality digital emulations of analog audio hardware, such as vintage tube guitar amplifiers, led to numerous works on neural network-based black-box modeling, with deep learning architectures like WaveNet showing promising results. However, a key limitation in all of these models was the aliasing artifacts stemming from nonlinear activation functions in neural networks. In this paper, we investigated novel and modified activation functions aimed at mitigating aliasing within neural amplifier models. Supporting this, we introduced a novel metric, the Aliasing-to-Signal Ratio (ASR), which quantitatively assesses the level of aliasing with high accuracy. Measuring also the conventional Error-to-Signal Ratio (ESR), we conducted studies on a range of preexisting and modern activation functions with varying stretch factors. Our findings confirmed that activation functions with smoother curves tend to achieve lower ASR values, indicating a noticeable reduction in aliasing. Notably, this improvement in aliasing reduction was achievable without a substantial increase in ESR, demonstrating the potential for high modeling accuracy with reduced aliasing in neural amp models.

Antialiased Black-Box Modeling of Audio Distortion Circuits Using Real Linear Recurrent Units

Fabián Esqueda and Shogo Murai



Abstract: In this paper, we propose the use of real-valued Linear Recurrent Units (LRUs) for black-box modeling of audio circuits. A network architecture composed of real LRU blocks interleaved with nonlinear processing stages is proposed. Two case studies are presented, a second-order diode clipper and an overdrive distortion pedal. Furthermore, we show how to integrate the antiderivative antialiaisng technique into the proposed method, effectively lowering oversampling requirements. Our experiments show that the proposed method generates models that accurately capture the nonlinear dynamics of the examined devices and are highly efficient, which makes them suitable for real-time operation inside Digital Audio Workstations.

Training Neural Models of Nonlinear Multi-Port Elements Within Wave Digital Structures Through Discrete-Time Simulation

Oliviero Massi, Alessandro Ilic Mezza, Riccardo Giampiccolo and Alberto Bernardini



Abstract: Neural networks have been applied within the Wave Digital Filter (WDF) framework as data-driven models for nonlinear multi-port circuit elements. Conventionally, these models are trained on wave variables obtained by sampling the current-voltage characteristic of the considered nonlinear element before being incorporated into the circuit WDF implementation. However, isolating multi-port elements for this process can be challenging, as their nonlinear behavior often depends on dynamic effects that emerge from interactions with the surrounding circuit. In this paper, we propose a novel approach for training neural models of nonlinear multi-port elements directly within a circuit’s Wave Digital (WD) discretetime implementation, relying solely on circuit input-output voltage measurements. Exploiting the differentiability of WD simulations, we embed the neural network into the simulation process and optimize its parameters using gradient-based methods by minimizing a loss function defined over the circuit output voltage. Experimental results demonstrate the effectiveness of the proposed approach in accurately capturing the nonlinear circuit behavior, while preserving the interpretability and modularity of WDFs.

Distributed Single-Reed Modeling Based on Energy Quadratization and Approximate Modal Expansion

Champ C. Darabundit, Vasileios Chatziioannou and Gary Scavone



Abstract: Recently, energy quadratization and modal expansion have become popular methods for developing efficient physics-based sound synthesis algorithms. These methods have been primarily used to derive explicit schemes modeling the collision between a string and a fixed barrier. In this paper, these techniques are applied to a similar problem: modeling a distributed mouthpiece lay-reed-lip interaction in a woodwind instrument. The proposed model aims to provide a more accurate representation of how a musician’s embouchure affects the reed’s dynamics. The mouthpiece and lip are modeled as distributed static and dynamic viscoelastic barriers, respectively. The reed is modeled using an approximate modal expansion derived via the Rayleigh-Ritz method. The reed system is then acoustically coupled to a measured input impedance response of a saxophone. Numerical experiments are presented.

A Wavelet-Based Method for the Estimation of Clarity of Attack Parameters in Non-Percussive Instruments

Gianpaolo Evangelista and Alberto Acquilino



Abstract: From the exploration of databases of instrument sounds to the selfassisted practice of musical instruments, methods for automatically and objectively assessing the quality of musical tones are in high demand. In this paper, we develop a new algorithm for estimating the duration of the attack, with particular attention to wind and bowed string instruments. In fact, for these instruments, the quality of the tones is highly influenced by the attack clarity, for which, together with pitch stability, the attack duration is an indicator often used by teachers by ear. Since the direct estimation of the attack duration from sounds is made difficult by the initial preponderance of the excitation noise, we propose a more robust approach based on the separation of the ensemble of the harmonics from the excitation noise, which is obtained by means of an improved pitchsynchronous wavelet transform. We also define a new parameter, the noise ducking time, which is relevant for detecting the extent of the noise component in the attack. In addition to the exploration of available sound databases, for testing our algorithm, we created an annotated data set in which several problematic sounds are included. Moreover, to check the consistency and robustness of our duration estimates, we applied our algorithm to sets of synthetic sounds with noisy attacks of programmable duration.

Non-Iterative Numerical Simulation in Virtual Analog: A Framework Incorporating Current Trends

Alessia Andò, Enrico Bozzo and Federico Fontana



Abstract: For their low and constant computational cost, non-iterative methods for the solution of differential problems are gaining popularity in virtual analog provided their stability properties and accuracy level afford their use at no exaggerate temporal oversampling. At least in some application case studies, one recent family of noniterative schemes has shown promise to outperform methods that achieve accurate results at the cost of iterating several times while converging to the numerical solution. Here, this family is contextualized and studied against known classes of non-iterative methods. The results from these studies foster a more general discussion about the possibilities, role and prospective use of non-iterative methods in virtual analog.

Power-Balanced Drift Regulation for Scalar Auxiliary Variable Methods: Application to Real-Time Simulation of Nonlinear String Vibrations

Thomas Risse, Thomas Hélie and Stefan Bilbao



Abstract: Efficient stable integration methods for nonlinear systems are of great importance for physical modeling sound synthesis. Specifically, a number of musical systems of interest, including vibrating strings, bars or plates may be written as port-Hamiltonian systems with quadratic kinetic energy and non-quadratic potential energy. Efficient schemes have been developed for such systems through the introduction of a scalar auxiliary variable. As a result, the stable real-time simulations of nonlinear musical systems of up to a few thousands of degrees of freedom is possible, even for nearly lossless systems. However, convergence rates can be slow and seem to be system-dependent. Specifically, at audio rates, they may suffer from numerical drift of the auxiliary variable, resulting in dramatic unwanted effects on audio output, such as pitch drifts after several impacts on the same resonator. In this paper, a novel method for mitigating this unwanted drift while preserving power balance is presented, based on a control approach. A set of modified equations is proposed to control the drift artefact by rerouting energy through the scalar auxiliary variable and potential energy state. Numerical experiments are run in order to check convergence on simulations in the case of a cubic nonlinear string. A real-time implementation is provided as a Max/MSP external. 60-note polyphony is achieved on a laptop, and some simple high level control parameters are provided, making the proposed implementation suitable for use in artistic contexts. All code is available in a public repository, along with compiled Max/MSP externals1.

Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates

Rodrigo Diaz and Mark Sandler



Abstract: Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically informed audio synthesis. However, traditional implementations, particularly for non-linear models like the von Kármán plate, are computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast, differentiable, GPU-accelerated modal framework built with the JAX library, providing efficient simulations and enabling gradientbased inverse modelling. Benchmarks show that our approach significantly outperforms CPU and GPU-based implementations, particularly for simulations with many modes. Inverse modelling experiments demonstrate that our approach can recover physical parameters, including tension, stiffness, and geometry, from both synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to methods that fit abstract spectral parameters, it provides greater interpretability and more compact parameterisation. The code is released as open source to support future research and applications in differentiable physical modelling and sound synthesis.

Learning Nonlinear Dynamics in Physical Modelling Synthesis Using Neural Ordinary Differential Equations

Victor Zheleznov, Stefan Bilbao, Alec Wright and Simon King



Abstract: Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are possible in order to handle geometric nonlinearities. One such case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems such as electronic circuits automatically from data. In this work, we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution for linear vibration of system’s modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the network architecture. As an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.

Physics-Informed Deep Learning for Nonlinear Friction Model of Bow-String Interaction

Xinmeng Luan and Gary Scavone



Abstract: This study investigates the use of an unsupervised, physicsinformed deep learning framework to model a one-degree-offreedom mass-spring system subjected to a nonlinear friction bow force and governed by a set of ordinary differential equations. Specifically, it examines the application of Physics-Informed Neural Networks (PINNs) and Physics-Informed Deep Operator Networks (PI-DeepONets). Our findings demonstrate that PINNs successfully address the problem across different bow force scenarios, while PI-DeepONets perform well under low bow forces but encounter difficulties at higher forces. Additionally, we analyze the Hessian eigenvalue density and visualize the loss landscape. Overall, the presence of large Hessian eigenvalues and sharp minima indicates highly ill-conditioned optimization. These results underscore the promise of physics-informed deep learning for nonlinear modelling in musical acoustics, while also revealing the limitations of relying solely on physics-based approaches to capture complex nonlinearities. We demonstrate that PI-DeepONets, with their ability to generalize across varying parameters, are well-suited for sound synthesis. Furthermore, we demonstrate that the limitations of PI-DeepONets under higher forces can be mitigated by integrating observation data within a hybrid supervised-unsupervised framework. This suggests that a hybrid supervised-unsupervised DeepONets framework could be a promising direction for future practical applications.

Comparing Acoustic and Digital Piano Actions: Data Analysis and Key Insights

Michael Fioretti, Giuseppe Bergamino, Leonardo Gabrielli, Gianluca Ciattaglia and Susanna Spinsante



Abstract: The acoustic piano and its sound production mechanisms have been extensively studied in the field of acoustics. Similarly, digital piano synthesis has been the focus of numerous signal processing research studies. However, the role of the piano action in shaping the dynamics and nuances of piano sound has received less attention, particularly in the context of digital pianos. Digital pianos are well-established commercial instruments that typically use weighted keys with two or three sensors to measure the average key velocity—this being the only input to a sampling synthesis engine. In this study, we investigate whether this simplified measurement method adequately captures the full dynamic behavior of the original piano action. After a brief review of the state of the art, we describe an experimental setup designed to measure physical properties of the keys and hammers of a piano. This setup enables high-precision readings of acceleration, velocity, and position for both the key and hammer across various dynamic levels. Through extensive data analysis, we examine their relationships and identify the optimal key position for velocity measurement. We also analyze a digital piano key to determine where the average key velocity is measured and compare it with our proposed optimal timing. We find that the instantaneous key velocity just before let-off correlates most strongly with hammer impact velocity, indicating a target for improved sensing; however, due to the limitations of discrete velocity sensing this optimization alone may not suffice to replicate the nuanced expressiveness of acoustic piano touch. This study represents the first step in a broader research effort aimed at linking piano touch, dynamics, and sound production.

Wave Pulse Phase Modulation: Hybridising Phase Modulation and Phase Distortion

Matthew Smart



Abstract: This paper introduces Wave Pulse Phase Modulation (WPPM), a novel synthesis technique based on phase shaping. It combines two classic digital synthesis techniques: Phase Modulation (PM) and Phase Distortion (PD), aiming to overcome their respective limitations while enabling the creation of new, interesting timbres. It works by segmenting a phase signal into two regions, each independently driving the phase of a modulator waveform. This results in two distinct pulses per period that together form the signal used as the phase input to a carrier waveform, similar to PM, hence the name Wave Pulse Phase Modulation. This method provides a minimal set of parameters that enable the creation of complex, evolving waveforms, and rich dynamic textures. By modulating these parameters, WPPM can produce a wide range of interesting spectra, including those with formant-like resonant peaks. The paper examines PM and PD in detail, exploring the modifications needed to integrate them with WPPM, before presenting the full WPPM algorithm alongside its parameters and creative possibilities. Finally, it discusses scope for further research and developments into new similar phase shaping algorithms.

Digital Morphophone Environment. Computer Rendering of a Pioneering Sound Processing Device

Daniel Scorranese



Abstract: This paper introduces a digital reconstruction of the morphophone, a complex magnetophonic device developed in the 1950s within the laboratories of the GRM (Groupe de Recherches Musicales) in Paris. The analysis, design, and implementation methodologies underlying the Digital Morphophone Environment are discussed. Based on a detailed review of historical sources and limited documentation – including a small body of literature and, most notably, archival images – the core operational principles of the morphophone have been modeled within the MAX visual programming environment. The main goals of this work are, on the one hand, to study and make accessible a now obsolete and unavailable tool, and on the other, to provide the opportunity for new explorations in computer music and research.

MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals

Francesco Ardan Dal Rí, Domenico Stefani, Luca Turchet and Nicola Conci



Abstract: In this paper, we present an approach to the neural modeling of overdrive guitar pedals with conditioning from a cross-circuit and cross-setting latent space. The resulting network models the behavior of multiple overdrive pedals across different settings, offering continuous morphing between real configurations and hybrid behaviors. Compact conditioning spaces are obtained through unsupervised training of a variational autoencoder with adversarial training, resulting in accurate reconstruction performance across different sets of pedals. We then compare three Hyper-Recurrent architectures for processing, including dynamic and static HyperRNNs, and a smaller model for real-time processing. Additionally, we present pOD-set, a new open dataset including recordings of 27 analog overdrive pedals, each with 36 gain and tone parameter combinations totaling over 97 hours of recordings. Precise parameter setting was achieved through a custom-deployed recording robot.

Power-Balanced Drift Regulation for Scalar Auxiliary Variable Methods: Application to Real-Time Simulation of Nonlinear String Vibrations

Thomas Risse, Thomas Hélie and Stefan Bilbao



Abstract: Efficient stable integration methods for nonlinear systems are of great importance for physical modeling sound synthesis. Specifically, a number of musical systems of interest, including vibrating strings, bars or plates may be written as port-Hamiltonian systems with quadratic kinetic energy and non-quadratic potential energy. Efficient schemes have been developed for such systems through the introduction of a scalar auxiliary variable. As a result, the stable real-time simulations of nonlinear musical systems of up to a few thousands of degrees of freedom is possible, even for nearly lossless systems. However, convergence rates can be slow and seem to be system-dependent. Specifically, at audio rates, they may suffer from numerical drift of the auxiliary variable, resulting in dramatic unwanted effects on audio output, such as pitch drifts after several impacts on the same resonator. In this paper, a novel method for mitigating this unwanted drift while preserving power balance is presented, based on a control approach. A set of modified equations is proposed to control the drift artefact by rerouting energy through the scalar auxiliary variable and potential energy state. Numerical experiments are run in order to check convergence on simulations in the case of a cubic nonlinear string. A real-time implementation is provided as a Max/MSP external. 60-note polyphony is achieved on a laptop, and some simple high level control parameters are provided, making the proposed implementation suitable for use in artistic contexts. All code is available in a public repository, along with compiled Max/MSP externals1.

Modeling the Impulse Response of Higher-Order Microphone Arrays Using Differentiable Feedback Delay Networks

Riccardo Giampiccolo, Alessandro Ilic Mezza, Mirco Pezzoli, Shoichi Koyama, Alberto Bernardini and Fabio Antonacci



Abstract: Recently, differentiable multiple-input multiple-output Feedback Delay Networks (FDNs) have been proposed for modeling target multichannel room impulse responses by optimizing their parameters according to perceptually-driven time-domain descriptors. However, in spatial audio applications, frequency-domain characteristics and inter-channel differences are crucial for accurately replicating a given soundfield. In this article, targeting the modeling of the response of higher-order microphone arrays, we improve on the methodology by optimizing the FDN parameters using a novel spatially-informed loss function, demonstrating its superior performance over previous approaches and paving the way toward the use of differentiable FDNs in spatial audio applications such as soundfield reconstruction and rendering.

A Modified Algorithm for a Loudspeaker Line Array Multi-Lobe Control

Stefania Cecchi, Valeria Bruschi, Michele Frati, Marco Secondini and Andrea Tanoni



Abstract: The creation of personal sound zones is an effective solution for delivering personalized auditory experiences in shared spaces. Their applications span various domains, including in-car entertainment, home and office environments, and healthcare functions. This paper presents a novel approach for the creation of personal sound zones using a modified algorithm for multi-lobe control in loudspeaker line array. The proposed method integrates a pressurematching beamforming algorithm with an innovative technique for reducing side lobes, enhancing the precision and isolation of sound zones. The system was evaluated through simulations and experimental tests conducted in a semi-anechoic environment and a large listening room. Results demonstrate the effectiveness of the method in creating two separate sound zones.

Estimation of Multi-Slope Amplitudes in Late Reverberation

Jeremy B. Bai and Sebastian J. Schlecht



Abstract: The common-slope model is used to model late reverberation of complex room geometries such as multiple coupled rooms. The model fits band-limited room impulse responses using a set of common decay rates, with amplitudes varying based on listener positions. This paper investigates amplitude estimation methods within the common-slope model framework. We compare several traditional least squares estimation methods and propose using LINEX regression, a Maximum Likelihood approach using logsquared RIR statistics. Through statistical analysis and simulation tests, we demonstrate that LINEX regression improves accuracy and reduces bias when compared to traditional methods.

Differentiable Scattering Delay Networks for Artificial Reverberation

Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena and Alberto Bernardini



Abstract: Scattering delay networks (SDNs) provide a flexible and efficient framework for artificial reverberation and room acoustic modeling. In this work, we introduce a differentiable SDN, enabling gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating key parameters such as scattering matrices and absorption filters as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN configurations significantly improve the accuracy of synthetic reverberation, highlighting the potential of data-driven room acoustic modeling.

Differentiable Attenuation Filters for Feedback Delay Networks

Ilias Ibnyahya and Joshua D. Reiss



Abstract: We introduce a novel method for designing attenuation filters in digital audio reverberation systems based on Feedback Delay Networks (FDNs). Our approach uses Second Order Sections (SOS) of Infinite Impulse Response (IIR) filters arranged as parametric equalizers (PEQ), enabling fine control over frequency-dependent reverberation decay. Unlike traditional graphic equalizer designs, which require numerous filters per delay line, we propose a scalable solution where the number of filters can be adjusted. The frequency, gain, and quality factor (Q) parameters are shared parameters across delay lines and only the gain is adjusted based on delay length. This design not only reduces the number of optimization parameters, but also remains fully differentiable and compatible with gradient-based learning frameworks. Leveraging principles of analog filter design, our method allows for efficient and accurate filter fitting using supervised learning. Our method delivers a flexible and differentiable design, achieving state-of-the-art performance while significantly reducing computational cost.

Perceptual Decorrelator Based on Resonators

Jon Fagerström, Nils Meyer-Kahlen, Sebastian J. Schlecht and Vesa Välimäki



Abstract: Decorrelation filters transform mono audio into multiple decorrelated copies. This paper introduces a novel decorrelation filter design based on a resonator bank, which produces a sum of over a thousand exponentially decaying sinusoids. A headphone listening test was used to identify the minimum inter-channel time delays that perceptually match ERB-filtered coherent noise to corresponding incoherent noise. The decay rate of each resonator is set based on a group delay profile determined by the listening test results at its corresponding frequency. Furthermore, the delays from the test are used to refine frequency-dependent windowing in coherence estimation, which we argue represents the perceptually most accurate way of assessing interaural coherence. This coherence measure then guides an optimization process that adjusts the initial phases of the sinusoids to minimize the coherence between two instances of the resonator-based decorrelator. The delay results establish the necessary group delay per ERB for effective decorrelation, revealing higher-than-expected values, particularly at higher frequencies. For comparison, the optimization is also performed using two previously proposed group-delay profiles: one based on the period of the ERB band center frequency and another based on the maximum group-delay limit before introducing smearing. The results indicate that the perceptually informed profile achieves equal decorrelation to the latter profile while smearing less at high frequencies. Overall, optimizing the phase response of the proposed decorrelator yields significantly lower coherence compared to using a random phase.

Compression of Head-Related Transfer Functions Using Piecewise Cubic Hermite Interpolation

Tom Krueger and Julián Villegas



Abstract: We present a spline-based method for compressing and reconstructing Head-Related Transfer Functions (HRTFs) that preserves perceptual quality. Our approach focuses on the magnitude response and consists of four stages: (1) acquiring minimumphase head-related impulse responses (HRIR), (2) transforming them into the frequency domain and applying adaptive Wiener filtering to preserve important spectral features, (3) extracting a minimal set of control points using derivative-based methods to identify local maxima and inflection points, and (4) reconstructing the HRTF using piecewise cubic Hermite interpolation (PCHIP) over the refined control points. Evaluation on 301 subjects demonstrates that our method achieves an average compression ratio of 4.7:1 with spectral distortion ≤ 1.0 dB in each Equivalent Rectangular Band (ERB). The method preserves binaural cues with a mean absolute interaural level difference (ILD) error of 0.10 dB. Our method achieves about three times the compression obtained with a PCA-based method.

Spatializing Screen Readers: Extending VoiceOver via Head-Tracked Binaural Synthesis for User Interface Accessibility

Giuseppe Bergamino, Michael Fioretti, Leonardo Gabrielli and Stefano Squartini



Abstract: Traditional screen-based graphical user interfaces (GUIs) pose significant accessibility challenges for visually impaired users. This paper demonstrates how existing GUI elements can be translated into an interactive auditory domain using high-order Ambisonics and inertial sensor-based head tracking, culminating in a realtime binaural rendering over headphones. The proposed system is designed to spatialize the auditory output from VoiceOver, the built-in macOS screen reader, aiming to foster clearer mental mapping and enhanced navigability. A between-groups experiment was conducted to compare standard VoiceOver with the proposed spatialized version. Non visually-impaired participants (n = 32), with no visual access to the test interface, completed a list-based exploration and then attempted to reconstruct the UI solely from auditory cues. Experimental results indicate that the head-tracked group achieved a slightly higher accuracy in reconstructing the interface, while user experience assessments showed no significant differences in self-reported workload or usability. These findings suggest that potential benefits may come from the integration of head-tracked binaural audio into mainstream screen-reader workflows, but future investigations involving blind and low-vision users are needed. Although the experimental testbed uses a generic desktop app, our ultimate goal is to tackle the complex visual layouts of music-production software, where an head-tracked audio approach could benefit visually impaired producers and musicians navigating plug-in controls.

Evaluating the Performance of Objective Audio Quality Metrics in Response to Common Audio Degradations

Xie He, Duncan Williams and Bruno Fazenda



Abstract: This study evaluates the performance of five objective audio quality metrics—PEAQ Basic, PEAQ Advanced, PEMO-Q, ViSQOL, and HAAQI —in the context of digital music production. Unlike previous comparisons, we focus on their suitability for production environments, an area currently underexplored in existing research. Twelve audio examples were tested using two evaluation types: an effectiveness test under progressively increasing degradations (hum, hiss, clipping, glitches) and a robustness test under fixed-level, randomly fluctuating degradations. In the effectiveness test, HAAQI, PEMO-Q, and PEAQ Basic effectively tracked degradation changes, while PEAQ Advanced failed consistently and ViSQOL showed low sensitivity to hum and glitches. In the robustness test, ViSQOL and HAAQI demonstrated the highest consistency, with average standard deviations of 0.004 and 0.007, respectively, followed by PEMO-Q (0.021), PEAQ Basic (0.057), and PEAQ Advanced (0.065). However, ViSQOL also showed low variability across audio examples, suggesting limited genre sensitivity. These findings highlight the strengths and limitations of each metric for music production, specifically quality measurement with compressed audio. The source code and dataset will be made publicly available upon publication.

Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method

Haowen Zhao, Akihiko Suyama, Kazunobu Kondo and Damian T. Murphy



Abstract: Combining different room acoustic modelling methods could provide a better balance between perceptual plausibility and computational efficiency than using a single and potentially more computationally expensive model. In this work, a hybrid acoustic modelling system that integrates ray tracing (RT) with an advanced feedback delay network (FDN) is designed to generate perceptually plausible RIRs. A multiple stimuli with hidden reference and anchor (MUSHRA) test and a two-alternative-forced-choice (2AFC) discrimination task have been conducted to compare the proposed method against ground truth recordings and conventional RT-based approaches. The results show that the proposed system delivers robust performance in various scenarios, achieving highly plausible reverberation synthesis.

DataRES and PyRES: A Room Dataset and a Python Library for Reverberation Enhancement System Development, Evaluation, and Simulation

Gian Marco De Bortoli, Karolina Prawda, Philip Coleman and Sebastian J. Schlecht



Abstract: Reverberation is crucial in the acoustical design of physical spaces, especially halls for live music performances. Reverberation Enhancement Systems (RESs) are active acoustic systems that can control the reverberation properties of physical spaces, allowing them to adapt to specific acoustical needs. The performance of RESs strongly depends on the properties of the physical room and the architecture of the Digital Signal Processor (DSP). However, room-impulse-response (RIR) measurements and the DSP code from previous studies on RESs have never been made open access, leading to non-reproducible results. In this study, we present DataRES and PyRES—a RIR dataset and a Python library to increase the reproducibility of studies on RESs. The dataset contains RIRs measured in RES research and development rooms and professional music venues. The library offers classes and functionality for the development, evaluation, and simulation of RESs. The implemented DSP architectures are made differentiable, allowing their components to be trained in a machine-learning-like pipeline. The replication of previous studies by the authors shows that PyRES can become a useful tool in future research on RESs.

Auditory Discrimination of Early Reflections in Virtual Rooms

Junting Chen, Duncan Williams, Bruno Fazenda



Abstract: This study investigates the perceptual sensitivity to early reflection changes across different spatial directions in a virtual reality (VR) environment. Using an ABX discrimination paradigm, participants evaluated speech stimuli convolved with thirdorder Ambisonic room impulse responses under three position reversal (Left–Right, Front–Back, and Floor–Ceiling) and three reverberation conditions (RT60 = 1.0 s, 0.6 s, and 0.2 s). Binomial tests revealed that participants consistently detected early reflection differences in the Left–Right reversal, while discrimination performance in the other two directions remained at or near chance. This result can be explained by the higher acuity and lower localisation blur found for the human auditory system. A two-way ANOVA confirmed a significant main effect of spatial position (p = 0.00685, η² = 0.1605), with no significant effect of reverberation or interaction. The analysis of the binaural room impulse responses showed wave forms and Direct-ReverberantRatio differences in the Left–Right reversal position, aligning with perceptual results. However, no definitive causal link between DRR variations and perceptual outcomes can yet be established.

Partiels – Exploring, Analyzing and Understanding Sounds

Pierre Guillot



Abstract: This article presents Partiels, an open-source application developed at IRCAM to analyze digital audio files and explore sound characteristics. The application uses Vamp plug-ins to extract various information on different aspects of the sound, such as spectrum, partials, pitch, tempo, text, and chords. Partiels is the successor to AudioSculpt, offering a modern, flexible interface for visualizing, editing, and exporting analysis results, addressing a wide range of issues from musicological practice to sound creation and signal processing research. The article describes Partiels’ key features, including analysis organization, audio file management, results visualization and editing, as well as data export and sharing options, and its interoperability with other software such as Max and Pure Data. In addition, it highlights the numerous analysis plug-ins developed at IRCAM, based in particular on machine learning models, as well as the IRCAM Vamp extension, which overcomes certain limitations of the original Vamp format.

Listener-Adaptive 3D Audio with Crosstalk Cancellation

Francesco Veronesi, Filippo Fazi and Jacob Hollebon



Abstract: Crosstalk cancellation is a technology that allows the delivery of binaural audio over loudspeakers using loudspeaker beamforming, without the need for headphones. It enables spatial audio to be reproduced using practical loudspeaker distributions, for example a soundbar of loudspeakers positioned in front of the user only. Crosstalk cancellation requires the user to be positioned at a specific location in space, the 'sweet-spot'. However, by using a built-in camera or sensor, the listener's ear position relative to the audio device can tracked in real time, enabling a mobile sweet-spot through precise beamforming and effective crosstalk cancellation no matter where the listener is positioned. This demo allows users to experience listener-adaptive crosstalk cancellation developed by Audioscenic, on a multi-loudspeaker gaming soundbar. Audioscenic develops advanced crosstalk cancellation solutions for home audio, gaming, automotive, and public space applications. Founded in 2017 by Dr Marcos Simón and Professor Filippo Fazi, the company emerged from their collaborative research at the Institute of Sound and Vibration Research, University of Southampton.

Biquad Coefficients Optimization via Kolmogorov-Arnold Networks

Ayoub Malek, Donald Schulz and Felix Wuebbelmann



Abstract: Conventional Deep Learning (DL) approaches to Infinite Impulse Response (IIR) filter coefficients estimation from arbitrary frequency response are quite limited. They often suffer from inefficiencies such as tight training requirements, high complexity, and limited accuracy. As an alternative, in this paper, we explore the use of Kolmogorov-Arnold Networks (KANs) to predict the IIR filter—specifically biquad coefficients—effectively. By leveraging the high interpretability and accuracy of KANs, we achieve smooth coefficients’ optimization. Furthermore, by constraining the search space and exploring different loss functions, we demonstrate improved performance in speed and accuracy. Our approach is evaluated against other existing differentiable IIR filter solutions. The results show significant advantages of KANs over existing methods, offering steadier convergences and more accurate results. This offers new possibilities for integrating digital infinite impulse response (IIR) filters into deep-learning frameworks.

Audio Processor Parameters: Estimating Distributions Instead of Deterministic Values

Côme Peladeau, Dominique Fourer and Geoffroy Peeters



Abstract: Audio effects and sound synthesizers are widely used processors in popular music. Their parameters control the quality of the output sound. Multiple combinations of parameters can lead to the same sound. While recent approaches have been proposed to estimate these parameters given only the output sound, those are deterministic, i.e. they only estimate a single solution among the many possible parameter configurations. In this work, we propose to model the parameters as probability distributions instead of deterministic values. To learn the distributions, we optimize two objectives: (1) we minimize the reconstruction error between the ground truth output sound and the one generated using the estimated parameters, asisit usuallydone, but also(2)we maximize the parameter diversity, using entropy. We evaluate our approach through two numerical audio experiments to show its effectiveness. These results show how our approach effectively outputs multiple combinations of parameters to match one sound.

A Parametric Equalizer with Interactive Poles and Zeros Control for Digital Signal Processing Education

Andrea Casati, Giorgio Presti and Marco Tiraboschi



Abstract: This article presents ZePolA, a digital audio equalizer designed as an educational resource for understanding digital filter design. Unlike conventional equalization plug-ins, which define the frequency response first and then derive the filter coefficients, this software adopts an inverse approach: users directly manipulate the placement of poles and zeros on the complex plane, with the corresponding frequency response visualized in real time. This methodology provides an intuitive link between theoretical filter concepts and their practical application. The plug-in features three main panels: a filter parameter panel, a frequency response panel, and a filter design panel. It allows users to configure a cascade of firstor second-order filter elements, each parameterized by the location of its poles or zeros. The GUI supports interaction through drag-and-drop gestures, enabling immediate visual and auditory feedback. This hands-on approach is intended to enhance learning by bridging the gap between theoretical knowledge and practical application. To assess the educational value and usability of the plug-in, a preliminary evaluation was conducted with focus groups of students and lecturers. Future developments will include support for additional filter types and increased architectural flexibility. Moreover, a systematic validation study involving students and educators is proposed to quantitatively evaluate the plug-in’s impact on learning outcomes. This work contributes to the field of digital signal processing education by offering an innovative tool that merges the hands-on approach of music production with a deeper theoretical understanding of digital filters, fostering an interactive and engaging educational experience.

Zero-Phase Sound via Giant FFT

Vesa Välimäki, Stefan Bilbao, Sebastian J. Schlecht, Roope Salmi and David Zicarelli



Abstract: Given the speedy computation of the FFT in current computer hardware, there are new possibilities for examining transformations for very long sounds. A zero-phase version of any audio signal can be obtained by zeroing the phase angle of its complex spectrum and taking the inverse FFT. This paper recommends additional processing steps, including zero-padding, transient suppression at the signal’s start and end, and gain compensation, to enhance the resulting sound quality. As a result, a sound with the same spectral characteristics as the original one, but with different temporal events, is obtained. Repeating rhythm patterns are retained, however. Zero-phase sounds are palindromic in the sense that they are symmetric in time. A comparison of the zero-phase conversion to the autocorrelation function helps to understand its properties, such as why the rhythm of the original sound is emphasized. It is also argued that the zero-phase signal has the same autocorrelation function as the original sound. One exciting variation of the method is to apply the method separately to the real and imaginary parts of the spectrum to produce a stereo effect. A frame-based technique enables the use of the zero-phase conversion in real-time audio processing. The zero-phase conversion is another member of the giant FFT toolset, allowing the modification of sampled sounds, such as drum loops or entire songs.

Partiels – Exploring, Analyzing and Understanding Sounds

Pierre Guillot



Abstract: This article presents Partiels, an open-source application developed at IRCAM to analyze digital audio files and explore sound characteristics. The application uses Vamp plug-ins to extract various information on different aspects of the sound, such as spectrum, partials, pitch, tempo, text, and chords. Partiels is the successor to AudioSculpt, offering a modern, flexible interface for visualizing, editing, and exporting analysis results, addressing a wide range of issues from musicological practice to sound creation and signal processing research. The article describes Partiels’ key features, including analysis organization, audio file management, results visualization and editing, as well as data export and sharing options, and its interoperability with other software such as Max and Pure Data. In addition, it highlights the numerous analysis plug-ins developed at IRCAM, based in particular on machine learning models, as well as the IRCAM Vamp extension, which overcomes certain limitations of the original Vamp format.

Stable Limit Cycles as Tunable Signal Sources

Wolfram E. Weingartner



Abstract: This paper presents a method for synthesizing audio signals from nonlinear dynamical systems exhibiting stable limit cycles, with control over frequency and amplitude independent of changes to the system’s internal parameters. Using the van der Pol oscillator and the Brusselator as case studies, it is demonstrated how parameters are decoupled from frequency and amplitude by rescaling the angular frequency and normalizing amplitude extrema. Practical implementation considerations are discussed, as are the limits and challenges of this approach. The method’s validity is evaluated experimentally and synthesis examples show the application of tunable nonlinear oscillators in sound design, including the generation of transients in FM synthesis by means of a van der Pol oscillator and a Supersaw oscillator bank based on the Brusselator.

Lookup Table Based Audio Spectral Transformation

Ryoho Kobayashi



Abstract: We present a unified visual interface for flexible spectral audio manipulation based on editable lookup tables (LUTs). In the proposed approach, the audio spectrum is visualized as a two-dimensional color map of frequency versus amplitude, serving as an editable lookup table for modifying the sound. This single tool can replicate common audio effects such as equalization, pitch shifting, and spectral compression, while also enabling novel sound transformations through creative combinations of adjustments. By consolidating these capabilities into one visual platform, the system has the potential to streamline audio-editing workflows and encourage creative experimentation. The approach also supports real-time processing, providing immediate auditory feedback in an interactive graphical environment. Overall, this LUT-based method offers an accessible yet powerful framework for designing and applying a broad range of spectral audio effects through intuitive visual manipulation.

A Non-Uniform Subband Implementation of an Active Noise Control System for Snoring Reduction

Stefano Nobili, Alessandro Nicolini, Ferruccio Bettarelli, Valeria Bruschi and Stefania Cecchi



Abstract: The snoring noise can be extremely annoying and can negatively affect people’s social lives. To reduce this problem, active noise control (ANC) systems can be adopted for snoring cancellation. Recently, adaptive subband systems have been developed to improve the convergence rate and reduce the computational complexity of the ANC algorithm. Several structures have been proposed with different approaches. This paper proposes a non-uniform subband adaptive filtering (SAF) structure to improve a feedforward active noise control algorithm. The non-uniform band distribution allows for a higher frequency resolution of the lower frequencies, where the snoring noise is most concentrated. Several experiments have been carried out to evaluate the proposed system in comparison with a reference ANC system which uses a uniform approach.

Compositional Application of a Chaotic Dynamical System for the Synthesis of Sounds

Costantino Rizzuti



Abstract: The paper presents a review of compositional application developed in the last years using a chaotic dynamical system in different sound synthesis processes. The use of chaotic dynamical systems in computer music has been a widespread practice for some time now. The experimentation presented in this work shows the use of a specific chaotic system: the Chua’s oscillator, within different sound synthesis methods. A family of new musical instruments has been developed exploiting the potential offered by the use of this chaotic system to produce complex timbres and sounds. The instruments have been used for the creation of musical pieces and for the realization of live electronics performances.

Delay Optimization Towards Smooth Sparse Noise

Cristóbal Andrade and Sebastian J. Schlecht



Abstract: Smooth sparse noise sequences are applied to efficiently model reverberation. This paper addresses the problem of optimizing sparse noise sequences for perceptual smoothness using gradient- based methods. We demonstrate that sinc-shaped artifacts introduced by fractional delay create non-convexities in an envelope-based roughness loss function, hindering delay optimization. By temporarily removing pulse polarity and omitting envelope rectification, we obtain a convex loss suitable for gradient descent. Pulse signs are reintroduced after optimization during synthesis. Optimization results show roughness reduction across various pulse densities, with the optimized sequences approaching the perceptual smoothness of velvet noise.

graetli: A Microcontroller-Based DSP Platform for Real-Time Audio Signal Processing

Jonas Roth, Silvan Krebs, and Christoph Studer



Abstract: This demonstration presents graetli, a standalone digital signal processing (DSP) platform for real-time audio applications, built around the Electrosmith Daisy Seed [1] microcontroller platform. graetli features high-quality analog audio I/O, a zero-latency analog dry signal path, a user interface with programmable potentiometers, and a rugged enclosure. graetli is suitable for both performance interaction and algorithm prototyping. To showcase its capabilities, we implement a frequency domain artificial reverberation algorithm. Conference visitors are invited to interact with the platform and experience the real-time DSP reverb algorithm.

A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis

Esteban Gutiérrez, Frederic Font, Xavier Serra and Lonce Wyse



Abstract: In this work, we introduce TexStat, a novel loss function specifically designed for the analysis and synthesis of texture sounds characterized by stochastic structure and perceptual stationarity. Drawing inspiration from the statistical and perceptual framework of McDermott and Simoncelli, TexStat identifies similarities between signals belonging to the same texture category without relying on temporal structure. We also propose using TexStat as a validation metric alongside Frechet Audio Distances (FAD) to evaluate texture sound synthesis models. In addition to TexStat, we present TexEnv, an efficient, lightweight and differentiable texture sound synthesizer that generates audio by imposing amplitude envelopes on filtered noise. We further integrate these components into TexDSP, a DDSP-inspired generative model tailored for texture sounds. Through extensive experiments across various texture sound types, we demonstrate that TexStat is perceptually meaningful, time-invariant, and robust to noise, features that make it effective both as a loss function for generative tasks and as a validation metric. All tools and code are provided as open-source contributions and our PyTorch implementations are efficient, differentiable, and highly configurable, enabling its use in both generative tasks and as a perceptually grounded evaluation metric.

DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions

Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas and Yuki Mitsufuji



Abstract: This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for “Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for parameter estimation. Vocal presets are retrieved from two datasets, comprising 70 tracks from MedleyDB and 365 tracks from a private collection. Analysis of parameter correlations reveals strong relationships between effects and parameters, such as the highpass and low-shelf filters often working together to shape the low end, and the delay time correlating with the intensity of the delayed signals. Principal component analysis reveals connections to McAdams’ timbre dimensions, where the most crucial component modulates the perceived spaciousness while the secondary components influence spectral brightness. Statistical testing confirms the non-Gaussian nature of the parameter distribution, highlighting the complexity of the vocal effects space. These initial findings on the parameter distributions set the foundation for future research in vocal effects modelling and automatic mixing.

Improving Lyrics-to-Audio Alignment Using Frame-wise Phoneme Labels with Masked Cross Entropy Loss

Tian Cheng, Tomayasu Nakano and Masataka Goto



Abstract: This paper addresses the task of lyrics-to-audio alignment, which involves synchronizing textual lyrics with corresponding music audio. Most publicly available datasets for this task provide annotations only at the line or word level. This poses a challenge for training lyrics-to-audio models due to the lack of frame-wise phoneme labels. However, we find that phoneme labels can be partially derived from word-level annotations: for single-phoneme words, all frames corresponding to the word can be labeled with the same phoneme; for multi-phoneme words, phoneme labels can be assigned at the first and last frames of the word. To leverage this partial information, we construct a mask for those frames and propose a masked frame-wise cross-entropy (CE) loss that considers only frames with known phoneme labels. As a baseline model, we adopt an autoencoder trained with a Connectionist Temporal Classification (CTC) loss and a reconstruction loss. We then enhance the training process by incorporating the proposed framewise masked CE loss. Experimental results show that incorporating the frame-wise masked CE loss improves alignment performance. In comparison to other state-of-the art models, our model provides a comparable Mean Absolute Error (MAE) of 0.216 seconds and a top Median Absolute Error (MedAE) of 0.041 seconds on the testing Jamendo dataset.

Automatic Classification of Chains of Guitar Effects Through Evolutionary Neural Architecture Search

Michele Rossi, Giovanni Iacca and Luca Turchet



Abstract: Recent studies on classifying electric guitar effects have achieved high accuracy, particularly with deep learning techniques. However, these studies often rely on simplified datasets consisting mainly of single notes rather than realistic guitar recordings. Moreover, in the specific field of effect chain estimation, the literature tends to rely on large models, making them impractical for real-time or resource-constrained applications. In this work, we recorded realistic guitar performances using four different guitars and created three datasets by applying a chain of five effects with increasing complexity: (1) fixed order and parameters, (2) fixed order with randomly sampled parameters, and (3) random order and parameters. We also propose a novel Neural Architecture Search method aimed at discovering accurate yet compact convolutional neural network models to reduce power and memory consumption. We compared its performance to a basic random search strategy, showing that our custom Neural Architecture Search outperformed random search in identifying models that balance accuracy and complexity. We found that the number of convolutional and pooling layers becomes increasingly important as dataset complexity grows, while dense layers have less impact. Additionally, among the effects, tremolo was identified as the most challenging to classify.

Inference-Time Structured Pruning for Real-Time Neural Network Audio Effects

Christopher Johann Clarke and Jatin Chowdhury



Abstract: Structured pruning is a technique for reducing the computational load and memory footprint of neural networks by removing structured subsets of parameters according to a predefined schedule or ranking criterion. This paper investigates the application of structured pruning to real-time neural network audio effects, focusing on both feedforward networks and recurrent architectures. We evaluate multiple pruning strategies at inference time, without retraining, and analyze their effects on model performance. To quantify the trade-off between parameter count and audio fidelity, we construct a theoretical model of the approximation error as a function of network architecture and pruning level. The resulting bounds establish a principled relationship between pruninginduced sparsity and functional error, enabling informed deployment of neural audio effects in constrained real-time environments.

Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial Approaches

Eloi Moliner, Michal Švento, Alec Wright, Lauri Juvela, Pavel Rajmic and Vesa Välimäki



Abstract: Accurately estimating nonlinear audio effects without access to paired input-output signals remains a challenging problem. This work studies unsupervised probabilistic approaches for solving this task. We introduce a method, novel for this application, based on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using blackand gray-box models. This study compares this method with a previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the effect operator and varying lengths of available effected recordings. Through experiments on guitar distortion effects, we show that the diffusion-based approach provides more stable results and is less sensitive to data availability, while the adversarial approach is superior at estimating more pronounced distortion effects. Our findings contribute to the robust unsupervised blind estimation of audio effects, demonstrating the potential of diffusion models for system identification in music technology.

Empirical Results for Adjusting Truncated Backpropagation Through Time While Training Neural Audio Effects

Yann Bourdin, Pierrick Legrand and Fanny Roche



Abstract: This paper investigates the optimization of Truncated Backpropagation Through Time (TBPTT) for training neural networks in digital audio effect modeling, with a focus on dynamic range compression. The study evaluates key TBPTT hyperparameters – sequence number, batch size, and sequence length – and their influence on model performance. Using a convolutional-recurrent architecture, we conduct extensive experiments across datasets with and without conditioning by user controls. Results demonstrate that carefully tuning these parameters enhances model accuracy and training stability, while also reducing computational demands. Objective evaluations confirm improved performance with optimized settings, while subjective listening tests indicate that the revised TBPTT configuration maintains high perceptual quality.

Neural-Driven Multi-Band Processing for Automatic Equalization and Style Transfer

Parakrant Sarkar and Permagnus Lindborg



Abstract: We present a Neural-Driven Multi-Band Processor (NDMP), a differentiable audio processing framework that augments a static sixband Parametric Equalizer (PEQ) with per-band dynamic range compression. We optimize this processor using neural inference for two tasks: Automatic Equalization (AutoEQ), which estimates tonal and dynamic corrections without a reference, and Production Style Transfer (NDMP-ST), which adapts the processing of an input signal to match the tonal and dynamic characteristics of a reference. We train NDMP using a self-supervised strategy, where the model learns to recover a clean signal from inputs degraded with randomly sampled NDMP parameters and gain adjustments. This setup eliminates the need for paired input–target data and enables end-to-end training with audio-domain loss functions. In the inference, AutoEQ enhances previously unseen inputs in a blind setting, while NDMP-ST performs style transfer by predicting taskspecific processing parameters. We evaluate our approach on the MUSDB18 dataset using both objective metrics (e.g., SI-SDR, PESQ, STFT loss) and a listening test. Our results show that NDMP consistently outperforms traditional PEQ and a PEQ+DRC (single-band) baseline, offering a robust neural framework for audio enhancement that combines learned spectral and dynamic control.

TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration

Matteo Spanio and Antonio Rodà



Abstract: The increasing complexity and real-time processing demands of audio signals require optimized algorithms that utilize the computational power of Graphics Processing Units (GPUs). Existing Digital Signal Processing (DSP) libraries often do not provide the necessary efficiency and flexibility, particularly for integrating with Artificial Intelligence (AI) models. In response, we introduce TorchFX: a GPU-accelerated Python library for DSP, engineered to facilitate sophisticated audio signal processing. Built on the PyTorch framework, TorchFX offers an Object-Oriented interface similar to torchaudio but enhances functionality with a novel pipe operator for intuitive filter chaining. The library provides a comprehensive suite of Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, with a focus on multichannel audio, thereby facilitating the integration of DSP and AI-based approaches. Our benchmarking results demonstrate significant efficiency gains over traditional libraries like SciPy, particularly in multichannel contexts. While there are current limitations in GPU compatibility, ongoing developments promise broader support and real-time processing capabilities. TorchFX aims to become a useful tool for the community, contributing to innovation in GPU-accelerated DSP. TorchFX is publicly available on GitHub at https://github.com/matteospanio/torchfx.

Hyperbolic Embeddings for Order-Aware Classification of Audio Effect Chains

Aogu Wada, Tomohiko Nakamura and Hiroshi Saruwatari



Abstract: Audio effects (AFXs) are essential tools in music production, frequently applied in chains to shape timbre and dynamics. The order of AFXs in a chain plays a crucial role in determining the final sound, particularly when non-linear (e.g., distortion) or timevariant (e.g., chorus) processors are involved. Despite its importance, most AFX-related studies have primarily focused on estimating effect types and their parameters from a wet signal. To address this gap, we formulate AFX chain recognition as the task of jointly estimating AFX types and their order from a wet signal. We propose a neural-network-based method that embeds wet signals into a hyperbolic space and classifies their AFX chains. Hyperbolic space can represent tree-structured data more efficiently than Euclidean space due to its exponential expansion property. Since AFX chains can be represented as trees, with AFXs as nodes and edges encoding effect order, hyperbolic space is well-suited for modeling the exponentially growing and non-commutative nature of ordered AFX combinations, where changes in effect order can result in different final sounds. Experiments using guitar sounds demonstrate that, with an appropriate curvature, the proposed method outperforms its Euclidean counterpart. Further analysis based on AFX type and chain length highlights the effectiveness of the proposed method in capturing AFX order.

Towards an Objective Comparison of Panning Feature Algorithms for Unsupervised Learning

Richard Mitic and Andreas Rossholm



Abstract: Estimations of panning attributes are an important feature to extract from a piece of recorded music, with downstream uses such as classification, quality assessment, and listening enhancement. While several algorithms exist in the literature, there is currently no comparison between them and no studies to suggest which one is most suitable for any particular task. This paper compares four algorithms for extracting amplitude panning features with respect to their suitability for unsupervised learning. It finds synchronicities between them and analyses their results on a small set of commercial music excerpts chosen for their distinct panning features. The ability of each algorithm to differentiate between the tracks is analysed. The results can be used in future work to either select the most appropriate panning feature algorithm or create a version customized for a particular task.

Unsupervised Text-to-Sound Mapping via Embedding Space Alignment

Luke Dzwonczyk and Carmine-Emanuele Cella



Abstract: This work focuses on developing an artistic tool that performs an unsupervised mapping between text and sound, converting an input text string into a series of sounds from a given sound corpus. With the use of a pre-trained sound embedding model and a separate, pre-trained text embedding model, the goal is to find a mapping between the two feature spaces. Our approach is unsupervised which allows any sound corpus to be used with the system. The tool performs the task of text-to-sound retrieval, creating a soundfile in which each word in the text input is mapped to a single sound in the corpus, and the resulting sounds are concatenated to play sequentially. We experiment with three different mapping methods, and perform quantitative and qualitative evaluations on the outputs. Our results demonstrate the potential of unsupervised methods for creative applications in text-to-sound mapping.

Generative Latent Spaces for Neural Synthesis of Audio Textures

Aaron Dees and Seán O'Leary



Abstract: This paper investigates the synthesis of audio textures and the structure of generative latent spaces using Variational Autoencoders (VAEs) within two paradigms of neural audio synthesis: DSP-inspired and data-driven approaches. For each paradigm, we propose VAE-based frameworks that allow fine-grained temporal control. We introduce datasets across three categories of environmental sounds to support our investigations. We evaluate and compare the models’ reconstruction performance using objective metrics, and investigate their generative capabilities and latent space structure through latent space interpolations.

RT-PAD-VC – Creative Applications of Neural Voice Conversion as an Audio Effect

Paolo Sani, Edgar Andres Suarez Guarnizo, Kishor Kayyar Lakshminarayana, and Christian Dittmar 



Abstract: Streaming-enabled voice conversion (VC) bears the potential for many creative applications as an audio effect. This demo paper details our low-latency, real-time implementation of the recently proposed Prosody-aware Decoder Voice Conversion (PAD-VC). Building on this technical foundation, we explore and demonstrate diverse use cases in creative processing of speech and vocal recordings. Enabled by it’s voice cloning capabilities and fine-grained controllability, RT-PAD-VC can be used as a low-delay, quasi real-time audio effects processor for gender conversion, timbre and formant-preserving pitch-shifting, vocal harmonization and cross-synthesis from musical instruments. The on-site demo setup will allow participants to interact in a playful way with our technology.

DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions

Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas and Yuki Mitsufuji



Abstract: This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for “Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for parameter estimation. Vocal presets are retrieved from two datasets, comprising 70 tracks from MedleyDB and 365 tracks from a private collection. Analysis of parameter correlations reveals strong relationships between effects and parameters, such as the highpass and low-shelf filters often working together to shape the low end, and the delay time correlating with the intensity of the delayed signals. Principal component analysis reveals connections to McAdams’ timbre dimensions, where the most crucial component modulates the perceived spaciousness while the secondary components influence spectral brightness. Statistical testing confirms the non-Gaussian nature of the parameter distribution, highlighting the complexity of the vocal effects space. These initial findings on the parameter distributions set the foundation for future research in vocal effects modelling and automatic mixing.

SCHAEFFER: A Dataset of Human-Annotated Sound Objects for Machine Learning Applications

Maurizio Berta and Daniele Ghisi



Abstract: Machine learning for sound generation is rapidly expanding within the computer music community. However, most datasets used to train models are built from field recordings, foley sounds, instrumental notes, or commercial music. This presents a significant limitation for composers working in acousmatic and electroacoustic music, who require datasets tailored to their creative processes. To address this gap, we introduce the SCHAEFFER Dataset (Spectromorphological Corpus of Human-annotated Audio with Electroacoustic Features For Experimental Research), a curated collection of 1000 sound objects designed and annotated by composers and students of electroacoustic composition. The dataset, distributed under Creative Commons licenses, features annotations combining technical and poetic descriptions, alongside classifications based on pre-defined spectromorphological categories.

Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space

Christian Limberg, Fares Schulz, Zhe Zhang and Stefan Weinzierl



Abstract: This paper presents a novel approach to neural instrument sound synthesis using a two-stage semi-supervised learning framework capable of generating pitch-accurate, high-quality music samples from an expressive timbre latent space. Existing approaches that achieve sufficient quality for music production often rely on highdimensional latent representations that are difficult to navigate and provide unintuitive user experiences. We address this limitation through a two-stage training paradigm: first, we train a pitchtimbre disentangled 2D representation of audio samples using a Variational Autoencoder; second, we use this representation as conditioning input for a Transformer-based generative model. The learned 2D latent space serves as an intuitive interface for navigating and exploring the sound landscape. We demonstrate that the proposed method effectively learns a disentangled timbre space, enabling expressive and controllable audio generation with reliable pitch conditioning. Experimental results show the model’s ability to capture subtle variations in timbre while maintaining a high degree of pitch accuracy. The usability of our method is demonstrated in an interactive web application, highlighting its potential as a step towards future music production environments that are both intuitive and creatively empowering: https://pgesam.faresschulz.com/.

Neural Sample-Based Piano Synthesis

Riccardo Simionato and Stefano Fasciani



Abstract: Piano sound emulation has been an active topic of research and development for several decades. Although comprehensive physicsbased piano models have been proposed, sample-based piano emulation is still widely utilized for its computational efficiency and relative accuracy despite presenting significant memory storage requirements. This paper proposes a novel hybrid approach to sample-based piano synthesis aimed at improving the fidelity of sound emulation while reducing memory requirements for storing samples. A neural network-based model processes the sound recorded from a single example of piano key at a given velocity. The network is trained to learn the nonlinear relationship between the various velocities at which a piano key is pressed and the corresponding sound alterations. Results show that the method achieves high accuracy using a specific neural architecture that is computationally efficient, presenting few trainable parameters, and it requires memory only for one sample for each piano key.

Piano-SSM: Diagonal State Space Models for Efficient Midi-to-Raw Audio Synthesis

Dominik Dallinger, Matthias Bittner, Daniel Schnöll, Matthias Wess and Axel Jantsch



Abstract: Deep State Space Models (SSMs) have shown remarkable performance in long-sequence reasoning tasks, such as raw audio classification, and audio generation. This paper introduces PianoSSM, an end-to-end deep SSM neural network architecture designed to synthesize raw piano audio directly from MIDI input. The network requires no intermediate representations or domainspecific expert knowledge, simplifying training and improving accessibility. Quantitative evaluations on the MAESTRO dataset show that Piano-SSM achieves a Multi-Scale Spectral Loss (MSSL) of 7.02 at 16kHz, outperforming DDSP-Piano v1 with a MSSL of 7.09. At 24kHz, Piano-SSM maintains competitive performance with an MSSL of 6.75, closely matching DDSP-Piano v2’s result of 6.58. Evaluations on the MAPS dataset achieve an MSSL score of 8.23, which demonstrates the generalization capability even when training with very limited data. Further analysis highlights Piano-SSM’s ability to train on high sampling-rate audio while synthesizing audio at lower sampling rates, explicitly linking performance loss to aliasing effects. Additionally, the proposed model facilitates real-time causal inference through a custom C++17 header-only implementation. Using an Intel Core i712700 processor at 4.5GHz, with single core inference, allows synthesizing one second of audio at 44.1kHz in 0.44s with a workload of 23.1GFLOPS/s and an 10.1µs input/output delay with the largest network. While the smallest network at 16kHz only needs 0.04s with 2.3GFLOP/s and 2.6µs input/output delay. These results underscore Piano-SSM’s practical utility and efficiency in real-time audio synthesis applications.

A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis

Esteban Gutiérrez, Frederic Font, Xavier Serra and Lonce Wyse



Abstract: In this work, we introduce TexStat, a novel loss function specifically designed for the analysis and synthesis of texture sounds characterized by stochastic structure and perceptual stationarity. Drawing inspiration from the statistical and perceptual framework of McDermott and Simoncelli, TexStat identifies similarities between signals belonging to the same texture category without relying on temporal structure. We also propose using TexStat as a validation metric alongside Frechet Audio Distances (FAD) to evaluate texture sound synthesis models. In addition to TexStat, we present TexEnv, an efficient, lightweight and differentiable texture sound synthesizer that generates audio by imposing amplitude envelopes on filtered noise. We further integrate these components into TexDSP, a DDSP-inspired generative model tailored for texture sounds. Through extensive experiments across various texture sound types, we demonstrate that TexStat is perceptually meaningful, time-invariant, and robust to noise, features that make it effective both as a loss function for generative tasks and as a validation metric. All tools and code are provided as open-source contributions and our PyTorch implementations are efficient, differentiable, and highly configurable, enabling its use in both generative tasks and as a perceptually grounded evaluation metric.

Non-Iterative Simulation: A Numerical Analysis Viewpoint

Alessia Andò (University of Udine)



Abstract: Stiff ordinary differential equations (ODEs) frequently appear in scientific and engineering applications, necessitating numerical methods that ensure stability and efficiency. Non-iterative approaches for stiff ODEs provide an alternative to fully implicit schemes, which often feature a certain degree of unpredictability of the computation time that can be unacceptable in virtual analog real time applications. This tutorial will especially focus on Rosenbrock-Wanner (ROW) methods and exponential integration techniques, whose origins date back to the 1960s. ROW methods are linearly implicit methods, that is, their idea is to replace the solution of nonlinear systems with a finite number of linear systems per step. Exponential integrators, on the other hand, incorporate stiff dynamics by leveraging matrix exponentials, and offer advantages in problems whose stiffness or oscillatory nature is mainly driven by their linear component. We will discuss the derivation, stability properties, and practical implementation of these methods, as well as compare their strengths, limitations, and potential for virtual analog in real-world applications through illustrative examples.

Alessia Andò Dr. Alessia Andò is a postdoctoral fellow at the Department of Mathematics, Computer Science and Physics, University of Udine, where she received her PhD in 2020. She also worked as a postdoc at GSSI (Gran Sasso Science Institute), Italy. Within the general area of Numerical Analysis, her main research interests are ordinary and delay differential equations and related dynamical systems and models. The focus is towards both the numerical time integration and the dynamical analysis of the models, which includes the computation of invariant sets and the study of their asymptotic stability.

Logarithmic Frequency Resolution Filter Design for Audio

Balázs Bank (Budapest University of Technology and Economics)



Abstract: Digital filters are often used to model or equalize acoustic or electroacoustic transfer functions. Applications include headphone, loudspeaker, and room equalization, or modeling the radiation of musical instruments for sound synthesis. As the final judge of quality is the human ear, filter design should take into account the quasi-logarithmic frequency resolution of the auditory system. This tutorial presents various approaches for achieving this goal, including warped FIR and IIR, Kautz, and fixed-pole parallel filters, and discusses their differences and similarities. Application examples will include phyics-based sound synthesis, loudspeaker and room equalization, and the equalization of a spherical loudspeaker array.

Balazs Bank Balázs Bank is an associate professor at the Department of Artificial Intelligence and Systems Engineering, Budapest University of Technology and Economics (BUTE), Hungary. He received his M.Sc. and Ph.D. degrees in Electrical Engineering from BUTE in 2000 and in 2006, and his Hungarian Academy of Sciences (MTA) doctoral degree in 2023. In the academic year 1999/2000 and in year 2007 he was with the Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Finland. In 2008 he was with the Department of Computer Science, Verona University, Italy. Between 2000 to 2006, and since 2009 he was/is with BUTE. He has been an Associate Editor for IEEE Signal Processing Letters in 2013–2016 and for IEEE Signal Processing Magazine in 2018-2022. He has been the lead Guest Editor for the 2022 JAES special issue “Audio Filter Design”. His research interests include physics-based sound synthesis and filter design for audio applications.

Building Flexible Audio DDSP Pipelines: A Case Study on Artificial Reverb

Gloria Dal Santo (Aalto University)



Abstract: This tutorial focuses on Differentiable Digital Signal Processing (DDSP) for audio synthesis, an approach that applies automatic differentiation to digital signal processing operations. By implementing signal models in a differentiable manner, it becomes possible to backpropagate loss gradients through their parameters, enabling data-driven optimization without losing domain knowledge. DDSP has gained popularity due to its domain-appropriate inductive biases, yet it still presents several challenges. The parameters of differentiable models are often constrained by stability conditions, affected by non-uniqueness issues, and may belong to different domains and distributions, making optimization nontrivial. This tutorial provides an overview of these limitations and introduces FLAMO, a library designed to facilitate more flexible training pipelines. A key focus will be on loss functions: how to select appropriate ones, insights from perceptually informed losses, and techniques for validating them. Demonstrations will use FLAMO, an open-source Python library built on PyTorch’s automatic differentiation framework. Practical examples will primarily centre on recursive systems for artificial reverberation applications.

Gloria Dal Santo Gloria Dal Santo received the M.Sc. degree in electrical and electronic engineering from the Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland in 2022, during which she interned at the Audio Machine Learning team at Logitech. She is currently working toward a Doctoral degree with the Acoustics Lab, at Aalto University, Espoo, Finland. Her research interests include artificial reverberation and audio applications of machine learning, with a focus on designing more robust and psychoacoustically informed systems.

Plausible Editing of our Acoustic Environment

Annika Neidhardt (University of Surrey)



Abstract: The technology and methods for creating spatial auditory illusions have evolved phenomenally to the point where we can create illusions that cannot be distinguished from reality anymore. However, so far, such convincing quality can only be achieved with accurate knowledge about the target environment based on measurements or detailed modelling. Rendering virtual content into previously unknown environments remains a challenge. Quick automatic characterisation of its acoustic properties is necessary. Which information do we need to extract to render convincing illusions? Moreover, to what extent can we become creative in manipulating the appearance of the actual environment without compromising its plausibility and vividness? This tutorial will give insight into the perceptual requirements for rendering audio for Augmented and Extended Reality.

Annika Neidhardt Annika Neidhardt is a Senior Research Fellow in Immersive Audio at the University of Surrey. She has been an active researcher of related topics for more than 10 years. She holds an MSc in Electrical Engineering (Automation & Robotics) from Technische Universität Chemnitz and an MSc in Audio Engineering (Computermusic & Multimedia) from the University for Music and Performing Art Graz. After three years in advanced development and applied science, she started her own research project at Technische Universität Ilmenau in the group of Karlheinz Brandenburg in 2017 on 6DoF binaural audio and related perceptual requirements and evaluation. She defended her PhD thesis on the plausibility of simplified room acoustic representations in Augmented Reality in May 2023. In addition, she conducted research on the automatic characterisation of acoustic environments, and perceptual implications for audio in Social VR and XR. Since autumn 2023, Annika continues her research at the Institute of Sound Recording in Surrey with more focus on room acoustic modelling and perceptual modelling.

DISCOVER THE POWER OF MODELING – PART 1: Synthesis Techniques for Acoustic Instrument Emulation

Audio Modeling



Abstract: This presentation introduces SWAM (Synchronous Wave Acoustic Modeling), a technology for accurate acoustic instrument emulation, developed by Audio Modeling. The development process and underlying synthesis techniques are discussed, highlighting their ability to reproduce expressive nuances. VariFlute, a physical model of the flute family, is presented as a case study demonstrating high realism and detailed playability.

DISCOVER THE POWER OF MODELING – PART 2: Room Modeling Combining Physical and Psychoacoustic Approaches

Audio Modeling



Abstract: This presentation discusses the need for an efficient spatializer to represent the continuous and coherent positioning of realistic instruments in a room. Ambiente, an integrated spatializer within SWAM, is presented, combining physical modeling and psychoacoustic principles to achieve accurate and immersive room simulation.

Bridging Symbolic and Audio Data: Score-Informed Music Performance Data Estimation

Johanna Devaney (Brooklyn College and the Graduate Center, CUNY)



Abstract: The empirical study of musical performance dates back to the birth of recorded media. From the laborious manual processes used in the earliest work to the current data-hungry end-to-end models, the estimation, modelling, and generation of expressive performance data remains challenging. This talk will consider the advantages of score-aligned performance data estimation, both for guiding signal processing algorithms and leveraging musical score data and other types of linked symbolic data (such as annotations) for analysing and modelling performance-related data. While the focus of this talk will primarily be on musical performance, connections to speech data will also be discussed, as well as the resultant potential for cross-modal analysis.

Johanna Devaney Johanna Devaney is an Associate Professor at Brooklyn College and the Graduate Center, CUNY, where she teaches courses in music theory, music technology, and data analysis. Johanna’s research primarily examines the ways in which recordings can be used to study and model performance, and she has developed computational tools to facilitate this. Her research on computational methods for audio understanding has been funded by the National Endowment for the Humanities (NEH) Digital Humanities program and the National Science Foundation (NSF). Johanna currently serves as the Co-Editor-in-Chief of the Journal of New Music Research.

Reverberation – Dereverberation: The promise of hybrid models

Gaël Richard (Télécom Paris, Institut Polytechnique de Paris)



Abstract: The propagation of acoustic waves within enclosed environments is inherently shaped by complex interactions with surrounding surfaces and objects, leading to phenomena such as reflections, diffractions, and the resulting reverberation. Over the years, a wide range of reverberation models have been developed, driven by both theoretical interest and practical applications, including artificial reverberation synthesis—where realistic reverberation is added to anechoic signals—and dereverberation, which aims to suppress reverberant components in recorded signals. In this keynote, we will provide a concise overview of some reverberation modeling approaches and illustrate how these models can be integrated into hybrid frameworks that combine classical signal processing, physical modeling, and machine learning techniques to advance artificial reverberation synthesis or dereverberation.

Gaël RichardGaël RICHARD received the State Engineering degree from Telecom Paris, France in 1990, the Ph.D. degree and Habilitation from University of Paris-Saclay respectively in 1994 and 2001. After the Ph.D. degree, he spent two years at Rutgers University, Piscataway, NJ, in the Speech Processing Group of Prof. J. Flanagan. From 1997 to 2001, he successively worked for Matra, Bois d’Arcy, France, and for Philips, Montrouge, France. He then joined Telecom Paris, where he is now a Full Professor in audio signal processing. He is also the co-scientific director of the Hi! PARIS interdisciplinary center on AI and Data analytics. He is a coauthor of over 250 papers and inventor in 10 patents. His research interests are mainly in the field of speech and audio signal processing and include topics such as source separation, machine learning methods for audio/music signals and music information retrieval. He is a fellow member of the IEEE, and was the chair of the IEEE SPS TC for Audio and Acoustic Signal Processing (2021-2022). He received, in 2020, the Grand prize of IMT-National academy of science. In 2022, he was awarded of an advanced ERC grant of the European Union for a project on machine listening and artificial intelligence for sound.

Effecting Audio: An Entangled Approach to Signals, Concepts and Artistic Contexts

Andrew McPherson (Imperial College London)



Abstract: I propose to approach audio effects not as technical objects, but as a kind of activity. The shift from from noun (“audio effect”) to verb (“effecting audio”, in the sense of applying transformations to sound) calls attention to the motivations, discourses and contexts in which audio processing, analysis and synthesis take place. We build audio-technical systems for specific reasons in specific situations. No system is ever devoid of sociocultural context or human intervention, and even the simplest technologies when examined in situ can exhibit fascinating complexity. My talk will begin with a stubbornly contrarian take on some seemingly obvious premises of musical audio processing. Physicist and feminist theorist Karen Barad writes that “language has been granted too much power.” I would like to propose that as designers and researchers, we can let words about music take precedence over the messy and open-ended experience of making music, but that becoming overly preoccupied with language risks propagating clichés and reinforcing cultural stereotypes. Drawing on recent scholarship in human-computer interaction and science and technology studies, I will recount some alternative approaches and possible futures for designing digital audio technology when human and technical factors are inextricably entangled. I will illustrate these ideas with recent projects from the Augmented Instruments Laboratory, with a focus on rich bidirectional couplings between digital and analog electronics, acoustics and human creative experience.

Navigation Instructions

In the Overview Tab:
  • clicking on any event card will bring you to the detailed daily program scrolled at the time and day of the event (hit the Overview tab to go back to the program summary)
In the Daily Program:
  • On page reload, the program opens at the next closest date of conference events. Hit any date tab to change day or the Overview tab to reach the overview of the program.
  • Clicking on the title of any paper will open a popup window showing the abstract of that paper (close [X] the popup window or press Esc to continue navigating the program).
  • Clicking on the author(s) will open the original paper pdf in another tab of the browser.
  • Clicking on any tutorial or keynote title will open a popup window with a summary of the talk and a short biography of the speaker(s).
In the Abstract PopUp Window:
  • Move the horizontal slider to change the width of the window, move the right scroll bar (if available) to browse the abstract.

About

This online conference program was generated by means of scripts developed for DAFx by Gianpaolo Evangelista