Detailed Program

Overview

Tue

Sept 2

Wed

Sept 3

Thu

Sept 4

Fri

Sept 5

Sat

Sept 6

Authors

Time	Session	Speakers	Venue
09:00	Registration (all day)	–	Hall Mole
09:30	Tutorial 1: Non-Iterative Simulation: A Numerical Analysis Viewpoint	Alessia Andò (University of Udine)	Auditorium Tamburi
11:00	Coffee Break	–	Foyer Tamburi
11:30	Tutorial 2: Logarithmic Frequency Resolution Filter Design for Audio	Balázs Bank (Budapest University of Technology and Economics)	Auditorium Tamburi
13:00	Lunch	–	Foyer Tamburi
14:30	Tutorial 3: Building Flexible Audio DDSP Pipelines: A Case Study on Artificial Reverb	Gloria Dal Santo (Aalto University)	Auditorium Tamburi
16:00	Coffee Break	–	Foyer Tamburi
16:30	Tutorial 4: Plausible Editing of our Acoustic Environment	Annika Neidhardt (University of Surrey)	Auditorium Tamburi
18:00	End of Sessions	–	–
18:00	DAFx Welcome Aperitivo	–	Foyer Tamburi
20:30	End of Day	–	–

Time	Session	Speakers	Venue
08:30	Registration (all day)	–	Hall Mole
09:00	Welcome Remarks	Prof. Franco Chiaraluce (DII Director) and Stefania Cecchi	Auditorium Tamburi
09:30	Oral Session 1: Virtual Analog Session Chair: Alberto Bernardini Towards Efficient Emulation of Nonlinear Analog Circuits for Audio Using Constraint Stabilization and Convex Quadratic Programming Miguel Zea and Luis A. Rivera Simplifying Antiderivative Antialiasing with Lookup Table Integration Leonardo Gabrielli and Stefano Squartini Anti-Aliasing of Neural Distortion Effects via Model Fine Tuning Alistair Carson, Alec Wright and Stefan Bilbao MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals Francesco Ardan Dal Rí, Domenico Stefani, Luca Turchet and Nicola Conci Impedance Synthesis for Hybrid Analog-Digital Audio Effects Francisco Bernardo, Matthew Davison and Andrew McPherson Antiderivative Antialiasing for Recurrent Neural Networks Otto Mikkonen and Kurt James Werner	Various	Auditorium Tamburi
11:00	Coffee Break	–	Foyer Tamburi
11:30	Poster Session 1: Virtual Analog Towards Neural Emulation of Voltage-Controlled Oscillators Riccardo Simionato and Stefano Fasciani Solid State Bus-Comp: A Large-Scale and Diverse Dataset for Dynamic Range Compressor Virtual Analog Modeling Yicheng Gu, Runsong Zhang, Lauri Juvela and Zhizheng Wu Real-Time Virtual Analog Modelling of Diode-Based VCAs Coriander V. Pines Antialiasing in BBD Chips Using BLEP Leonardo Gabrielli, Stefano D'Angelo and Stefano Squartini Aliasing Reduction in Neural Amp Modeling by Smoothing Activations Ryota Sato and Julius O. Smith III Antialiased Black-Box Modeling of Audio Distortion Circuits Using Real Linear Recurrent Units Fabián Esqueda and Shogo Murai Training Neural Models of Nonlinear Multi-Port Elements Within Wave Digital Structures Through Discrete-Time Simulation Oliviero Massi, Alessandro Ilic Mezza, Riccardo Giampiccolo and Alberto Bernardini	Various	Sala Boxe
12:30	Keynote 1: Bridging Symbolic and Audio Data: Score-Informed Music Performance Data Estimation	Johanna Devaney (Brooklyn College and the Graduate Center, CUNY)	Auditorium Tamburi
13:30	Lunch	–	Foyer Tamburi
14:30	Oral Session 2: Physical Modeling Session Chair: Stefan Bilbao Distributed Single-Reed Modeling Based on Energy Quadratization and Approximate Modal Expansion Champ C. Darabundit, Vasileios Chatziioannou and Gary Scavone A Wavelet-Based Method for the Estimation of Clarity of Attack Parameters in Non-Percussive Instruments Gianpaolo Evangelista and Alberto Acquilino Non-Iterative Numerical Simulation in Virtual Analog: A Framework Incorporating Current Trends Alessia Andò, Enrico Bozzo and Federico Fontana Power-Balanced Drift Regulation for Scalar Auxiliary Variable Methods: Application to Real-Time Simulation of Nonlinear String Vibrations Thomas Risse, Thomas Hélie and Stefan Bilbao Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates Rodrigo Diaz and Mark Sandler Learning Nonlinear Dynamics in Physical Modelling Synthesis Using Neural Ordinary Differential Equations Victor Zheleznov, Stefan Bilbao, Alec Wright and Simon King	Various	Auditorium Tamburi
16:00	Coffee Break	–	Foyer Tamburi
16:30	Poster Session 2: Physical Modeling Physics-Informed Deep Learning for Nonlinear Friction Model of Bow-String Interaction Xinmeng Luan and Gary Scavone Comparing Acoustic and Digital Piano Actions: Data Analysis and Key Insights Michael Fioretti, Giuseppe Bergamino, Leonardo Gabrielli, Gianluca Ciattaglia and Susanna Spinsante Wave Pulse Phase Modulation: Hybridising Phase Modulation and Phase Distortion Matthew Smart Digital Morphophone Environment. Computer Rendering of a Pioneering Sound Processing Device Daniel Scorranese	Various	Sala Boxe
	Demo Session 2: Virtual Analog and Physical Modeling MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals Francesco Ardan Dal Rí, Domenico Stefani, Luca Turchet and Nicola Conci Power-Balanced Drift Regulation for Scalar Auxiliary Variable Methods: Application to Real-Time Simulation of Nonlinear String Vibrations Thomas Risse, Thomas Hélie and Stefan Bilbao	Various	Sala Boxe
17:30	End of Sessions	–	–
20:00	DAFx Banquet	–	MaWay
23:30	End of Day	–	–

Time	Session	Speakers	Venue
08:30	Registration (all day)	–	Hall Mole
09:00	Oral Session 3: Spatial Sound, Room Acoustics and Perception Session Chair: Karolina Prawda Modeling the Impulse Response of Higher-Order Microphone Arrays Using Differentiable Feedback Delay Networks Riccardo Giampiccolo, Alessandro Ilic Mezza, Mirco Pezzoli, Shoichi Koyama, Alberto Bernardini and Fabio Antonacci A Modified Algorithm for a Loudspeaker Line Array Multi-Lobe Control Stefania Cecchi, Valeria Bruschi, Michele Frati, Marco Secondini and Andrea Tanoni Estimation of Multi-Slope Amplitudes in Late Reverberation Jeremy B. Bai and Sebastian J. Schlecht Differentiable Scattering Delay Networks for Artificial Reverberation Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena and Alberto Bernardini Differentiable Attenuation Filters for Feedback Delay Networks Ilias Ibnyahya and Joshua D. Reiss Perceptual Decorrelator Based on Resonators Jon Fagerström, Nils Meyer-Kahlen, Sebastian J. Schlecht and Vesa Välimäki	Various	Auditorium Tamburi
10:30	Coffee Break	–	Foyer Tamburi
	Sponsor Demo 1: DISCOVER THE POWER OF MODELING – PART 1: Synthesis Techniques for Acoustic Instrument Emulation	Audio Modeling	Tamburi Desks
11:00	Poster Session 3: Spatial Sound, Room Acoustics and Perception Compression of Head-Related Transfer Functions Using Piecewise Cubic Hermite Interpolation Tom Krueger and Julián Villegas Spatializing Screen Readers: Extending VoiceOver via Head-Tracked Binaural Synthesis for User Interface Accessibility Giuseppe Bergamino, Michael Fioretti, Leonardo Gabrielli and Stefano Squartini Evaluating the Performance of Objective Audio Quality Metrics in Response to Common Audio Degradations Xie He, Duncan Williams and Bruno Fazenda Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method Haowen Zhao, Akihiko Suyama, Kazunobu Kondo and Damian T. Murphy DataRES and PyRES: A Room Dataset and a Python Library for Reverberation Enhancement System Development, Evaluation, and Simulation Gian Marco De Bortoli, Karolina Prawda, Philip Coleman and Sebastian J. Schlecht Auditory Discrimination of Early Reflections in Virtual Rooms Junting Chen, Duncan Williams, Bruno Fazenda	Various	Sala Boxe
	Demo Session 3: Spatial Sound, Room Acoustics and Perception Partiels – Exploring, Analyzing and Understanding Sounds Pierre Guillot Listener-Adaptive 3D Audio with Crosstalk Cancellation Francesco Veronesi, Filippo Fazi and Jacob Hollebon	Various	Sala Boxe
12:00	Keynote 2: Reverberation – Dereverberation: The promise of hybrid models	Gaël Richard (Télécom Paris, Institut Polytechnique de Paris)	Auditorium Tamburi
13:00	Lunch	–	Foyer Tamburi
14:30	Oral Session 4: Signal Processing Session Chair: Gianpaolo Evangelista Biquad Coefficients Optimization via Kolmogorov-Arnold Networks Ayoub Malek, Donald Schulz and Felix Wuebbelmann Audio Processor Parameters: Estimating Distributions Instead of Deterministic Values Côme Peladeau, Dominique Fourer and Geoffroy Peeters A Parametric Equalizer with Interactive Poles and Zeros Control for Digital Signal Processing Education Andrea Casati, Giorgio Presti and Marco Tiraboschi Zero-Phase Sound via Giant FFT Vesa Välimäki, Stefan Bilbao, Sebastian J. Schlecht, Roope Salmi and David Zicarelli Partiels – Exploring, Analyzing and Understanding Sounds Pierre Guillot Stable Limit Cycles as Tunable Signal Sources Wolfram E. Weingartner	Various	Auditorium Tamburi
16:00	Coffee Break	–	Foyer Tamburi
16:30	Poster Session 4: Signal Processing Lookup Table Based Audio Spectral Transformation Ryoho Kobayashi A Non-Uniform Subband Implementation of an Active Noise Control System for Snoring Reduction Stefano Nobili, Alessandro Nicolini, Ferruccio Bettarelli, Valeria Bruschi and Stefania Cecchi Compositional Application of a Chaotic Dynamical System for the Synthesis of Sounds Costantino Rizzuti	Various	Sala Boxe
	Demo Session 4: Signal Processing Delay Optimization Towards Smooth Sparse Noise Cristóbal Andrade and Sebastian J. Schlecht graetli: A Microcontroller-Based DSP Platform for Real-Time Audio Signal Processing Jonas Roth, Silvan Krebs, and Christoph Studer A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis Esteban Gutiérrez, Frederic Font, Xavier Serra and Lonce Wyse	Various	Sala Boxe
17:30	End of Sessions	–	–
18:30	Aperitivo	–	Foyer Tamburi
20:00	DAFx Concert: “Macchine Nostre” – A/V Performance for Italian Synthesizers	–	Auditorium Tamburi
22:30	End of Day	–	–

Time	Session	Speakers	Venue
08:30	Registration (all day)	–	Hall Mole
09:00	Oral Session 5: Deep Learning Methods, Effects and Data Analysis Session Chair: Orchisama Das DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas and Yuki Mitsufuji Improving Lyrics-to-Audio Alignment Using Frame-wise Phoneme Labels with Masked Cross Entropy Loss Tian Cheng, Tomayasu Nakano and Masataka Goto Automatic Classification of Chains of Guitar Effects Through Evolutionary Neural Architecture Search Michele Rossi, Giovanni Iacca and Luca Turchet Inference-Time Structured Pruning for Real-Time Neural Network Audio Effects Christopher Johann Clarke and Jatin Chowdhury Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial Approaches Eloi Moliner, Michal Švento, Alec Wright, Lauri Juvela, Pavel Rajmic and Vesa Välimäki Empirical Results for Adjusting Truncated Backpropagation Through Time While Training Neural Audio Effects Yann Bourdin, Pierrick Legrand and Fanny Roche	Various	Auditorium Tamburi
10:30	Coffee Break	–	Foyer Tamburi
	Sponsor Demo 2: DISCOVER THE POWER OF MODELING – PART 2: Room Modeling Combining Physical and Psychoacoustic Approaches	Audio Modeling	Tamburi Desks
11:00	Poster Session 5: Deep Learning Methods, Effects and Data Analysis Neural-Driven Multi-Band Processing for Automatic Equalization and Style Transfer Parakrant Sarkar and Permagnus Lindborg TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration Matteo Spanio and Antonio Rodà Hyperbolic Embeddings for Order-Aware Classification of Audio Effect Chains Aogu Wada, Tomohiko Nakamura and Hiroshi Saruwatari Towards an Objective Comparison of Panning Feature Algorithms for Unsupervised Learning Richard Mitic and Andreas Rossholm Unsupervised Text-to-Sound Mapping via Embedding Space Alignment Luke Dzwonczyk and Carmine-Emanuele Cella Generative Latent Spaces for Neural Synthesis of Audio Textures Aaron Dees and Seán O'Leary	Various	Sala Boxe
	Demo Session 5: Deep Learning Methods, Effects and Data Analysis RT-PAD-VC – Creative Applications of Neural Voice Conversion as an Audio Effect Paolo Sani, Edgar Andres Suarez Guarnizo, Kishor Kayyar Lakshminarayana, and Christian Dittmar DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas and Yuki Mitsufuji	Various	Sala Boxe
12:00	Keynote 3: Effecting Audio: An Entangled Approach to Signals, Concepts and Artistic Contexts	Andrew McPherson (Imperial College London)	Auditorium Tamburi
13:00	Lunch	–	Foyer Tamburi
14:30	Oral Session 6: Deep Learning for Synthesis Session Chair: Stefano Fasciani SCHAEFFER: A Dataset of Human-Annotated Sound Objects for Machine Learning Applications Maurizio Berta and Daniele Ghisi Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space Christian Limberg, Fares Schulz, Zhe Zhang and Stefan Weinzierl Neural Sample-Based Piano Synthesis Riccardo Simionato and Stefano Fasciani Piano-SSM: Diagonal State Space Models for Efficient Midi-to-Raw Audio Synthesis Dominik Dallinger, Matthias Bittner, Daniel Schnöll, Matthias Wess and Axel Jantsch A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis Esteban Gutiérrez, Frederic Font, Xavier Serra and Lonce Wyse	Various	Auditorium Tamburi
15:45	Awards and Closing Session	Leonardo Gabrielli and Stefania Cecchi	Auditorium Tamburi
16:15	Handover Address	Kurt Werner and Mark Rau	Auditorium Tamburi
16:45	End of Sessions	–	–
16:45	Board Meeting	Various	Backstage Tamburi
17:45	End of Day	–	–

Time	Session	Speakers	Venue
09:30	Guided tour of Ancona: visit to the ancient historical town	–	Meet: Piazza della Repubblica at the stairs under the RAI sign
11:30	End of Tour	–	–

Acquilino, Alberto

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Andrade, Cristóbal

Demo Session 4: Signal Processing (Thu Sept 4 at 16:30)

Andò, Alessia

Tutorial 1: Non-Iterative Simulation: A Numerical Analysis Viewpoint (Tue Sept 2 at 09:30)
Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Antonacci, Fabio

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Bai, Jeremy B.

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Bank, Balázs

Tutorial 2: Logarithmic Frequency Resolution Filter Design for Audio (Tue Sept 2 at 11:30)

Bergamino, Giuseppe

Poster Session 2: Physical Modeling (Wed Sept 3 at 16:30)
Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Bernardini, Alberto

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)
Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)
Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Bernardo, Francisco

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)

Berta, Maurizio

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Bettarelli, Ferruccio

Poster Session 4: Signal Processing (Thu Sept 4 at 16:30)

Bilbao, Stefan

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)
Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)
Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)
Demo Session 2: Virtual Analog and Physical Modeling (Wed Sept 3 at 16:30)
Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Bittner, Matthias

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Bourdin, Yann

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Bozzo, Enrico

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Bruschi, Valeria

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)
Poster Session 4: Signal Processing (Thu Sept 4 at 16:30)

Carson, Alistair

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)

Casati, Andrea

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Cecchi, Stefania

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)
Poster Session 4: Signal Processing (Thu Sept 4 at 16:30)

Cella, Carmine-Emanuele

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Chatziioannou, Vasileios

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Chen, Junting

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Cheng, Tian

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Chowdhury, Jatin

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Ciattaglia, Gianluca

Poster Session 2: Physical Modeling (Wed Sept 3 at 16:30)

Clarke, Christopher Johann

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Coleman, Philip

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Conci, Nicola

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)
Demo Session 2: Virtual Analog and Physical Modeling (Wed Sept 3 at 16:30)

D’Angelo, Stefano

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)

Dal Rí, Francesco Ardan

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)
Demo Session 2: Virtual Analog and Physical Modeling (Wed Sept 3 at 16:30)

Dal Santo, Gloria

Tutorial 3: Building Flexible Audio DDSP Pipelines: A Case Study on Artificial Reverb (Tue Sept 2 at 14:30)

Dallinger, Dominik

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Darabundit, Champ C.

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Davison, Matthew

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)

De Bortoli, Gian Marco

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

De Sena, Enzo

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Dees, Aaron

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Devaney, Johanna

Keynote 1: Bridging Symbolic and Audio Data: Score-Informed Music Performance Data Estimation (Wed Sept 3 at 12:30)

Diaz, Rodrigo

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Dittmar, Christian

Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Dzwonczyk, Luke

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Esqueda, Fabián

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)

Evangelista, Gianpaolo

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Fagerström, Jon

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Fasciani, Stefano

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)
Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Fazekas, György

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)
Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Fazenda, Bruno

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)
Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Fazi, Filippo

Demo Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Fioretti, Michael

Poster Session 2: Physical Modeling (Wed Sept 3 at 16:30)
Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Font, Frederic

Demo Session 4: Signal Processing (Thu Sept 4 at 16:30)
Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Fontana, Federico

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Fourer, Dominique

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Frati, Michele

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Gabrielli, Leonardo

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)
Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)
Poster Session 2: Physical Modeling (Wed Sept 3 at 16:30)
Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Ghisi, Daniele

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Giampiccolo, Riccardo

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)
Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)
Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Goto, Masataka

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Gu, Yicheng

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)

Guarnizo, Edgar Andres Suarez

Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Guillot, Pierre

Demo Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)
Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Gutiérrez, Esteban

Demo Session 4: Signal Processing (Thu Sept 4 at 16:30)
Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Hayes, Ben

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)
Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

He, Xie

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Hollebon, Jacob

Demo Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Hélie, Thomas

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)
Demo Session 2: Virtual Analog and Physical Modeling (Wed Sept 3 at 16:30)

Iacca, Giovanni

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Ibnyahya, Ilias

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Jantsch, Axel

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Juvela, Lauri

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)
Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

King, Simon

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Kobayashi, Ryoho

Poster Session 4: Signal Processing (Thu Sept 4 at 16:30)

Kondo, Kazunobu

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Koo, Junghyun

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)
Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Koyama, Shoichi

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Krebs, Silvan

Demo Session 4: Signal Processing (Thu Sept 4 at 16:30)

Krueger, Tom

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Lakshminarayana, Kishor Kayyar

Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Legrand, Pierrick

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Liao, Wei-Hsiang

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)
Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Limberg, Christian

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Lindborg, Permagnus

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Luan, Xinmeng

Poster Session 2: Physical Modeling (Wed Sept 3 at 16:30)

Malek, Ayoub

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Martínez-Ramírez, Marco A.

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)
Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Massi, Oliviero

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)

McPherson, Andrew

Keynote 3: Effecting Audio: An Entangled Approach to Signals, Concepts and Artistic Contexts (Fri Sept 5 at 12:00)
Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)

Meyer-Kahlen, Nils

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Mezza, Alessandro Ilic

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)
Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)
Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Mikkonen, Otto

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)

Mitic, Richard

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Mitsufuji, Yuki

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)
Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Moliner, Eloi

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Murai, Shogo

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)

Murphy, Damian T.

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Nakamura, Tomohiko

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Nakano, Tomayasu

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Neidhardt, Annika

Tutorial 4: Plausible Editing of our Acoustic Environment (Tue Sept 2 at 16:30)

Nicolini, Alessandro

Poster Session 4: Signal Processing (Thu Sept 4 at 16:30)

Nobili, Stefano

Poster Session 4: Signal Processing (Thu Sept 4 at 16:30)

O’Leary, Seán

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Peeters, Geoffroy

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Peladeau, Côme

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Pezzoli, Mirco

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Pines, Coriander V.

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)

Prawda, Karolina

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Presti, Giorgio

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Rajmic, Pavel

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Reiss, Joshua D.

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Richard, Gaël

Keynote 2: Reverberation – Dereverberation: The promise of hybrid models (Thu Sept 4 at 12:00)

Risse, Thomas

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)
Demo Session 2: Virtual Analog and Physical Modeling (Wed Sept 3 at 16:30)

Rivera, Luis A.

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)

Rizzuti, Costantino

Poster Session 4: Signal Processing (Thu Sept 4 at 16:30)

Roche, Fanny

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Rodà, Antonio

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Rossholm, Andreas

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Rossi, Michele

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Roth, Jonas

Demo Session 4: Signal Processing (Thu Sept 4 at 16:30)

Salmi, Roope

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Sandler, Mark

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Sani, Paolo

Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Sarkar, Parakrant

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Saruwatari, Hiroshi

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Sato, Ryota

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)

Scavone, Gary

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)
Poster Session 2: Physical Modeling (Wed Sept 3 at 16:30)

Schlecht, Sebastian J.

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)
Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)
Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)
Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)
Demo Session 4: Signal Processing (Thu Sept 4 at 16:30)

Schnöll, Daniel

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Schulz, Donald

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Schulz, Fares

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Scorranese, Daniel

Poster Session 2: Physical Modeling (Wed Sept 3 at 16:30)

Secondini, Marco

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Serra, Xavier

Demo Session 4: Signal Processing (Thu Sept 4 at 16:30)
Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Simionato, Riccardo

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)
Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Smart, Matthew

Poster Session 2: Physical Modeling (Wed Sept 3 at 16:30)

Smith III, Julius O.

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)

Spanio, Matteo

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Spinsante, Susanna

Poster Session 2: Physical Modeling (Wed Sept 3 at 16:30)

Squartini, Stefano

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)
Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)
Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Stefani, Domenico

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)
Demo Session 2: Virtual Analog and Physical Modeling (Wed Sept 3 at 16:30)

Studer, Christoph

Demo Session 4: Signal Processing (Thu Sept 4 at 16:30)

Suyama, Akihiko

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Tanoni, Andrea

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)

Tiraboschi, Marco

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Turchet, Luca

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)
Demo Session 2: Virtual Analog and Physical Modeling (Wed Sept 3 at 16:30)
Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Veronesi, Francesco

Demo Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Villegas, Julián

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Välimäki, Vesa

Oral Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 09:00)
Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)
Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Wada, Aogu

Poster Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Weingartner, Wolfram E.

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Weinzierl, Stefan

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Werner, Kurt James

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)

Wess, Matthias

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Williams, Duncan

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)
Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Wright, Alec

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)
Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)
Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Wu, Zhizheng

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)

Wuebbelmann, Felix

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Wyse, Lonce

Demo Session 4: Signal Processing (Thu Sept 4 at 16:30)
Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Yu, Chin-Yun

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)
Demo Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 11:00)

Zea, Miguel

Oral Session 1: Virtual Analog (Wed Sept 3 at 09:30)

Zhang, Runsong

Poster Session 1: Virtual Analog (Wed Sept 3 at 11:30)

Zhang, Zhe

Oral Session 6: Deep Learning for Synthesis (Fri Sept 5 at 14:30)

Zhao, Haowen

Poster Session 3: Spatial Sound, Room Acoustics and Perception (Thu Sept 4 at 11:00)

Zheleznov, Victor

Oral Session 2: Physical Modeling (Wed Sept 3 at 14:30)

Zicarelli, David

Oral Session 4: Signal Processing (Thu Sept 4 at 14:30)

Švento, Michal

Oral Session 5: Deep Learning Methods, Effects and Data Analysis (Fri Sept 5 at 09:00)

Tue, Sept 2

Wed, Sept 3

Thu, Sept 4

Fri, Sept 5

Sat, Sept 6

08:30

Registration (all day)

09:00

Registration (all day)

Welcome Remarks
Auditorium Tamburi

Spatial Sound, Room Acoustics and Perception
Auditorium Tamburi

Deep Learning Methods, Effects and Data Analysis
Auditorium Tamburi

09:30

Tutorial 1 Alessia Andò
Auditorium Tamburi

Virtual Analog
Auditorium Tamburi

10:30

Sponsor Demo 1
Tamburi Desks

Sponsor Demo 2
Tamburi Desks

11:00

Spatial Sound, Room Acoustics and Perception
Sala Boxe

Spatial Sound, Room Acoustics and Perception
Sala Boxe

Deep Learning Methods, Effects and Data Analysis
Sala Boxe

Deep Learning Methods, Effects and Data Analysis
Sala Boxe

11:30

Tutorial 2 Balázs Bank
Auditorium Tamburi

Virtual Analog
Sala Boxe

12:00

Keynote 2 Gaël Richard
Auditorium Tamburi

Keynote 3 Andrew McPherson
Auditorium Tamburi

12:30

Keynote 1 Johanna Devaney
Auditorium Tamburi

13:00

13:30

14:30

Tutorial 3 Gloria Dal Santo
Auditorium Tamburi

Physical Modeling
Auditorium Tamburi

Signal Processing
Auditorium Tamburi

Deep Learning for Synthesis
Auditorium Tamburi

15:45

Awards and Closing Session
Auditorium Tamburi

16:00

16:15

Handover Address
Auditorium Tamburi

16:30

Tutorial 4 Annika Neidhardt
Auditorium Tamburi

Physical Modeling
Sala Boxe

Virtual Analog and Physical Modeling
Sala Boxe

Signal Processing
Sala Boxe

Signal Processing
Sala Boxe

16:45

Board Meeting
Backstage Tamburi

17:30

17:45

18:00

18:30

20:00

Color Keys

Oral Sessions

Poster Sessions

Demo

Other

20:30

22:30

23:30

Towards Efficient Emulation of Nonlinear Analog Circuits for Audio Using Constraint Stabilization and Convex Quadratic Programming

Miguel Zea and Luis A. Rivera

Abstract: This paper introduces a computationally efficient method for the emulation of nonlinear analog audio circuits by combining state-space representations, constraint stabilization, and convex quadratic programming (QP). Unlike traditional virtual analog (VA) modeling approaches or computationally demanding SPICE-based simulations, our approach reformulates the nonlinear differential-algebraic (DAE) systems that arise from analog circuit analysis into numerically stable optimization problems. The proposed method efficiently addresses the numerical challenges posed by nonlinear algebraic constraints via constraint stabilization techniques, significantly enhancing robustness and stability, suitable for real-time simulations. A canonical diode clipper circuit is presented as a test case, demonstrating that our method achieves accurate and faster emulations compared to conventional state-space methods. Furthermore, our method performs very well even at substantially lower sampling rates. Preliminary numerical experiments confirm that the proposed approach offers improved numerical stability and real-time feasibility, positioning it as a practical solution for high-fidelity audio applications.

Simplifying Antiderivative Antialiasing with Lookup Table Integration

Leonardo Gabrielli and Stefano Squartini

Abstract: Antiderivative Antialiasing (ADAA), has become a pivotal method for reducing aliasing when dealing with nonlinear function at audio rate. However, its implementation requires analytical computation of the antiderivative of the nonlinear function, which in practical cases can be challenging without a symbolic solver. Moreover, when the nonlinear function is given by measurements it must be approximated to get a symbolic description. In this paper, we propose a simple approach to ADAA for practical applications that employs numerical integration of lookup tables (LUTs) to approximate the antiderivative. This method eliminates the need for closed-form solutions, streamlining the ADAA implementation process in industrial applications. We analyze the trade-offs of this approach, highlighting its computational efficiency and ease of implementation while discussing the potential impact of numerical integration errors on aliasing performance. Experiments are conducted with static nonlinearities (tanh, a simple wavefolder and the Buchla 259 wavefolding circuit) and a stateful nonlinear system (the diode clipper).

Anti-Aliasing of Neural Distortion Effects via Model Fine Tuning

Alistair Carson, Alec Wright and Stefan Bilbao

Abstract: Neural networks have become ubiquitous with guitar distortion effects modelling in recent years. Despite their ability to yield perceptually convincing models, they are susceptible to frequency aliasing when driven by high frequency and high gain inputs. Nonlinear activation functions create both the desired harmonic distortion and unwanted aliasing distortion as the bandwidth of the signal is expanded beyond the Nyquist frequency. Here, we present a method for reducing aliasing in neural models via a teacher-student fine tuning approach, where the teacher is a pretrained model with its weights frozen, and the student is a copy of this with learnable parameters. The student is fine-tuned against an aliasing-free dataset generated by passing sinusoids through the original model and removing non-harmonic components from the output spectra. Our results show that this method significantly suppresses aliasing for both long-short-term-memory networks (LSTM) and temporal convolutional networks (TCN). In the majority of our case studies, the reduction in aliasing was greater than that achieved by two times oversampling. One side-effect of the proposed method is that harmonic distortion components are also affected. This adverse effect was found to be modeldependent, with the LSTM models giving the best balance between anti-aliasing and preserving the perceived similarity to an analog reference device.

MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals

Francesco Ardan Dal Rí, Domenico Stefani, Luca Turchet and Nicola Conci

Abstract: In this paper, we present an approach to the neural modeling of overdrive guitar pedals with conditioning from a cross-circuit and cross-setting latent space. The resulting network models the behavior of multiple overdrive pedals across different settings, offering continuous morphing between real configurations and hybrid behaviors. Compact conditioning spaces are obtained through unsupervised training of a variational autoencoder with adversarial training, resulting in accurate reconstruction performance across different sets of pedals. We then compare three Hyper-Recurrent architectures for processing, including dynamic and static HyperRNNs, and a smaller model for real-time processing. Additionally, we present pOD-set, a new open dataset including recordings of 27 analog overdrive pedals, each with 36 gain and tone parameter combinations totaling over 97 hours of recordings. Precise parameter setting was achieved through a custom-deployed recording robot.

Impedance Synthesis for Hybrid Analog-Digital Audio Effects

Francisco Bernardo, Matthew Davison and Andrew McPherson

Abstract: Most real systems, from acoustics to analog electronics, are characterised by bidirectional coupling amongst elements rather than neat, unidirectional signal flows between self-contained modules. Integrating digital processing into physical domains becomes a significant engineering challenge when the application requires bidirectional coupling across the physical-digital boundary rather than separate, well-defined inputs and outputs. We introduce an approach to hybrid analog-digital audio processing using synthetic impedance: digitally simulated circuit elements integrated into an otherwise analog circuit. This approach combines the physicality and classic character of analog audio circuits alongside the precision and flexibility of digital signal processing (DSP). Our impedance synthesis system consists of a voltage-controlled current source and a microcontroller-based DSP system. We demonstrate our technique through modifying an iconic guitar distortion pedal, the Boss DS-1, showing the ability of the synthetic impedance to both replicate and extend the behaviour of the pedal’s diode clipping stage. We discuss the behaviour of the synthetic impedance in isolated laboratory conditions and in the DS-1 pedal, highlighting the technical and creative potential of the technique as well as its practical limitations and future extensions.

Antiderivative Antialiasing for Recurrent Neural Networks

Otto Mikkonen and Kurt James Werner

Abstract: Neural networks have become invaluable for general audio processing tasks, such as virtual analog modeling of nonlinear audio equipment. For sequence modeling tasks in particular, recurrent neural networks (RNNs) have gained widespread adoption in recent years. Their general applicability and effectiveness stems partly from their inherent nonlinearity, which makes them prone to aliasing. Recent work has explored mitigating aliasing by oversampling the network—an approach whose effectiveness is directly linked with the incurred computational costs. This work explores an alternative route by extending the antiderivative antialiasing technique to explicit, computable RNNs. Detailed applications to the Gated Recurrent Unit and Long Short-Term Memory cell are shown as case studies. The proposed technique is evaluated on multiple pre-trained guitar amplifier models, assessing its impact on the amount of aliasing and model tonality. The method is shown to reduce the models’ tendency to alias considerably across all considered sample rates while only affecting their tonality moderately, without requiring high oversampling factors. The results of this study can be used to improve sound quality in neural audio processing tasks that employ a suitable class of RNNs. Additional materials are provided in the accompanying webpage.

Towards Neural Emulation of Voltage-Controlled Oscillators

Riccardo Simionato and Stefano Fasciani

Abstract: Machine learning models have become ubiquitous in modeling analog audio devices. Expanding on this line of research, our study focuses on Voltage-Controlled Oscillators of analog synthesizers. We employ black box autoregressive artificial neural networks to model the typical analog waveshapes, including triangle, square, and sawtooth. The models can be conditioned on wave frequency and type, enabling the generation of pitch envelopes and morphing across waveshapes. We conduct evaluations on both synthetic and analog datasets to assess the accuracy of various architectural variants. The LSTM variant performed better, although lower frequency ranges present particular challenges.

Solid State Bus-Comp: A Large-Scale and Diverse Dataset for Dynamic Range Compressor Virtual Analog Modeling

Yicheng Gu, Runsong Zhang, Lauri Juvela and Zhizheng Wu

Abstract: Virtual Analog (VA) modeling aims to simulate the behavior of hardware circuits via algorithms to replicate their tone digitally. Dynamic Range Compressor (DRC) is an audio processing module that controls the dynamics of a track by reducing and amplifying the volumes of loud and quiet sounds, which is essential in music production. In recent years, neural-network-based VA modeling has shown great potential in producing high-fidelity models. However, due to the lack of data quantity and diversity, their generalization ability in different parameter settings and input sounds is still limited. To tackle this problem, we present Solid State Bus-Comp, the first large-scale and diverse dataset for modeling the classical VCA compressor — SSL 500 G-Bus. Specifically, we manually collected 175 unmastered songs from the Cambridge Multitrack Library. We recorded the compressed audio in 220 parameter combinations, resulting in an extensive 2528-hour dataset with diverse genres, instruments, tempos, and keys. Moreover, to facilitate the use of our proposed dataset, we conducted benchmark experiments in various open-sourced black-box and grey-box models, as well as white-box plugins. We also conducted ablation studies in different data subsets to illustrate the effectiveness of the improved data diversity and quantity. The dataset and demos are on our project page: https: //www.yichenggu.com/SolidStateBusComp/.

Real-Time Virtual Analog Modelling of Diode-Based VCAs

Coriander V. Pines

Abstract: Some early analog voltage-controlled amplifiers (VCAs) utilized semiconductor diodes as a variable-gain element. Diode-based VCAs exhibit a unique sound quality, with distortion dependent both on signal level and gain control. In this work, we examine the behavior of a simplified circuit for a diode-based VCA and propose a nonlinear, explicit, stateless digital model. This approach avoids traditional iterative algorithms, which can be computationally intensive. The resulting digital model retains the sonic characteristics of the analog model and is suitable for real-time simulation. We present an analysis of the gain characteristics and harmonic distortion produced by this model, as well as practical guidance for implementation. We apply this approach to a set of alternative analog topologies and introduce a family of digital VCA models based on fixed nonlinearities with variable operating points.

Antialiasing in BBD Chips Using BLEP

Leonardo Gabrielli, Stefano D'Angelo and Stefano Squartini

Abstract: Several methods exist in the literature to accurately simulate Bucket Brigade Device (BBD) chips, which are widely used in analog delay-based audio effects for their characteristic lo-fi sound, which is affected by noise, nonlinearities and aliasing. The latter is a desired quality, being typical of those chips. However, when simulating BBDs in a discrete-time domain environment, additional aliasing components occur that need to be suppressed. In this work, we propose a novel method that applies the Bandlimited Step (BLEP) technique, effectively minimizing aliasing artifacts introduced by the simulation. The paper provides some insights on the design of a BBD simulation using interpolation at the input for clock rate conversion and, most importantly, shows how BLEP can be effective in reducing unwanted aliasing artifacts. Interpolation is shown to have minor importance in the reduction of spurious components.

Aliasing Reduction in Neural Amp Modeling by Smoothing Activations

Ryota Sato and Julius O. Smith III

Abstract: The increasing demand for high-quality digital emulations of analog audio hardware, such as vintage tube guitar amplifiers, led to numerous works on neural network-based black-box modeling, with deep learning architectures like WaveNet showing promising results. However, a key limitation in all of these models was the aliasing artifacts stemming from nonlinear activation functions in neural networks. In this paper, we investigated novel and modified activation functions aimed at mitigating aliasing within neural amplifier models. Supporting this, we introduced a novel metric, the Aliasing-to-Signal Ratio (ASR), which quantitatively assesses the level of aliasing with high accuracy. Measuring also the conventional Error-to-Signal Ratio (ESR), we conducted studies on a range of preexisting and modern activation functions with varying stretch factors. Our findings confirmed that activation functions with smoother curves tend to achieve lower ASR values, indicating a noticeable reduction in aliasing. Notably, this improvement in aliasing reduction was achievable without a substantial increase in ESR, demonstrating the potential for high modeling accuracy with reduced aliasing in neural amp models.

Antialiased Black-Box Modeling of Audio Distortion Circuits Using Real Linear Recurrent Units

Fabián Esqueda and Shogo Murai

Abstract: In this paper, we propose the use of real-valued Linear Recurrent Units (LRUs) for black-box modeling of audio circuits. A network architecture composed of real LRU blocks interleaved with nonlinear processing stages is proposed. Two case studies are presented, a second-order diode clipper and an overdrive distortion pedal. Furthermore, we show how to integrate the antiderivative antialiaisng technique into the proposed method, effectively lowering oversampling requirements. Our experiments show that the proposed method generates models that accurately capture the nonlinear dynamics of the examined devices and are highly efficient, which makes them suitable for real-time operation inside Digital Audio Workstations.

Training Neural Models of Nonlinear Multi-Port Elements Within Wave Digital Structures Through Discrete-Time Simulation

Oliviero Massi, Alessandro Ilic Mezza, Riccardo Giampiccolo and Alberto Bernardini

Abstract: Neural networks have been applied within the Wave Digital Filter (WDF) framework as data-driven models for nonlinear multi-port circuit elements. Conventionally, these models are trained on wave variables obtained by sampling the current-voltage characteristic of the considered nonlinear element before being incorporated into the circuit WDF implementation. However, isolating multi-port elements for this process can be challenging, as their nonlinear behavior often depends on dynamic effects that emerge from interactions with the surrounding circuit. In this paper, we propose a novel approach for training neural models of nonlinear multi-port elements directly within a circuit’s Wave Digital (WD) discretetime implementation, relying solely on circuit input-output voltage measurements. Exploiting the differentiability of WD simulations, we embed the neural network into the simulation process and optimize its parameters using gradient-based methods by minimizing a loss function defined over the circuit output voltage. Experimental results demonstrate the effectiveness of the proposed approach in accurately capturing the nonlinear circuit behavior, while preserving the interpretability and modularity of WDFs.

Distributed Single-Reed Modeling Based on Energy Quadratization and Approximate Modal Expansion

Champ C. Darabundit, Vasileios Chatziioannou and Gary Scavone

Abstract: Recently, energy quadratization and modal expansion have become popular methods for developing efficient physics-based sound synthesis algorithms. These methods have been primarily used to derive explicit schemes modeling the collision between a string and a fixed barrier. In this paper, these techniques are applied to a similar problem: modeling a distributed mouthpiece lay-reed-lip interaction in a woodwind instrument. The proposed model aims to provide a more accurate representation of how a musician’s embouchure affects the reed’s dynamics. The mouthpiece and lip are modeled as distributed static and dynamic viscoelastic barriers, respectively. The reed is modeled using an approximate modal expansion derived via the Rayleigh-Ritz method. The reed system is then acoustically coupled to a measured input impedance response of a saxophone. Numerical experiments are presented.

A Wavelet-Based Method for the Estimation of Clarity of Attack Parameters in Non-Percussive Instruments

Gianpaolo Evangelista and Alberto Acquilino

Abstract: From the exploration of databases of instrument sounds to the selfassisted practice of musical instruments, methods for automatically and objectively assessing the quality of musical tones are in high demand. In this paper, we develop a new algorithm for estimating the duration of the attack, with particular attention to wind and bowed string instruments. In fact, for these instruments, the quality of the tones is highly influenced by the attack clarity, for which, together with pitch stability, the attack duration is an indicator often used by teachers by ear. Since the direct estimation of the attack duration from sounds is made difficult by the initial preponderance of the excitation noise, we propose a more robust approach based on the separation of the ensemble of the harmonics from the excitation noise, which is obtained by means of an improved pitchsynchronous wavelet transform. We also define a new parameter, the noise ducking time, which is relevant for detecting the extent of the noise component in the attack. In addition to the exploration of available sound databases, for testing our algorithm, we created an annotated data set in which several problematic sounds are included. Moreover, to check the consistency and robustness of our duration estimates, we applied our algorithm to sets of synthetic sounds with noisy attacks of programmable duration.

Non-Iterative Numerical Simulation in Virtual Analog: A Framework Incorporating Current Trends

Alessia Andò, Enrico Bozzo and Federico Fontana

Abstract: For their low and constant computational cost, non-iterative methods for the solution of differential problems are gaining popularity in virtual analog provided their stability properties and accuracy level afford their use at no exaggerate temporal oversampling. At least in some application case studies, one recent family of noniterative schemes has shown promise to outperform methods that achieve accurate results at the cost of iterating several times while converging to the numerical solution. Here, this family is contextualized and studied against known classes of non-iterative methods. The results from these studies foster a more general discussion about the possibilities, role and prospective use of non-iterative methods in virtual analog.

Power-Balanced Drift Regulation for Scalar Auxiliary Variable Methods: Application to Real-Time Simulation of Nonlinear String Vibrations

Thomas Risse, Thomas Hélie and Stefan Bilbao

Abstract: Efficient stable integration methods for nonlinear systems are of great importance for physical modeling sound synthesis. Specifically, a number of musical systems of interest, including vibrating strings, bars or plates may be written as port-Hamiltonian systems with quadratic kinetic energy and non-quadratic potential energy. Efficient schemes have been developed for such systems through the introduction of a scalar auxiliary variable. As a result, the stable real-time simulations of nonlinear musical systems of up to a few thousands of degrees of freedom is possible, even for nearly lossless systems. However, convergence rates can be slow and seem to be system-dependent. Specifically, at audio rates, they may suffer from numerical drift of the auxiliary variable, resulting in dramatic unwanted effects on audio output, such as pitch drifts after several impacts on the same resonator. In this paper, a novel method for mitigating this unwanted drift while preserving power balance is presented, based on a control approach. A set of modified equations is proposed to control the drift artefact by rerouting energy through the scalar auxiliary variable and potential energy state. Numerical experiments are run in order to check convergence on simulations in the case of a cubic nonlinear string. A real-time implementation is provided as a Max/MSP external. 60-note polyphony is achieved on a laptop, and some simple high level control parameters are provided, making the proposed implementation suitable for use in artistic contexts. All code is available in a public repository, along with compiled Max/MSP externals.

Fast Differentiable Modal Simulation of Non-Linear Strings, Membranes, and Plates

Rodrigo Diaz and Mark Sandler

Abstract: Modal methods for simulating vibrations of strings, membranes, and plates are widely used in acoustics and physically informed audio synthesis. However, traditional implementations, particularly for non-linear models like the von Kármán plate, are computationally demanding and lack differentiability, limiting inverse modelling and real-time applications. We introduce a fast, differentiable, GPU-accelerated modal framework built with the JAX library, providing efficient simulations and enabling gradientbased inverse modelling. Benchmarks show that our approach significantly outperforms CPU and GPU-based implementations, particularly for simulations with many modes. Inverse modelling experiments demonstrate that our approach can recover physical parameters, including tension, stiffness, and geometry, from both synthetic and experimental data. Although fitting physical parameters is more sensitive to initialisation compared to methods that fit abstract spectral parameters, it provides greater interpretability and more compact parameterisation. The code is released as open source to support future research and applications in differentiable physical modelling and sound synthesis.

Learning Nonlinear Dynamics in Physical Modelling Synthesis Using Neural Ordinary Differential Equations

Victor Zheleznov, Stefan Bilbao, Alec Wright and Simon King

Abstract: Modal synthesis methods are a long-standing approach for modelling distributed musical systems. In some cases extensions are possible in order to handle geometric nonlinearities. One such case is the high-amplitude vibration of a string, where geometric nonlinear effects lead to perceptually important effects including pitch glides and a dependence of brightness on striking amplitude. A modal decomposition leads to a coupled nonlinear system of ordinary differential equations. Recent work in applied machine learning approaches (in particular neural ordinary differential equations) has been used to model lumped dynamic systems such as electronic circuits automatically from data. In this work, we examine how modal decomposition can be combined with neural ordinary differential equations for modelling distributed musical systems. The proposed model leverages the analytical solution for linear vibration of system’s modes and employs a neural network to account for nonlinear dynamic behaviour. Physical parameters of a system remain easily accessible after the training without the need for a parameter encoder in the network architecture. As an initial proof of concept, we generate synthetic data for a nonlinear transverse string and show that the model can be trained to reproduce the nonlinear dynamics of the system. Sound examples are presented.

Physics-Informed Deep Learning for Nonlinear Friction Model of Bow-String Interaction

Xinmeng Luan and Gary Scavone

Abstract: This study investigates the use of an unsupervised, physicsinformed deep learning framework to model a one-degree-offreedom mass-spring system subjected to a nonlinear friction bow force and governed by a set of ordinary differential equations. Specifically, it examines the application of Physics-Informed Neural Networks (PINNs) and Physics-Informed Deep Operator Networks (PI-DeepONets). Our findings demonstrate that PINNs successfully address the problem across different bow force scenarios, while PI-DeepONets perform well under low bow forces but encounter difficulties at higher forces. Additionally, we analyze the Hessian eigenvalue density and visualize the loss landscape. Overall, the presence of large Hessian eigenvalues and sharp minima indicates highly ill-conditioned optimization. These results underscore the promise of physics-informed deep learning for nonlinear modelling in musical acoustics, while also revealing the limitations of relying solely on physics-based approaches to capture complex nonlinearities. We demonstrate that PI-DeepONets, with their ability to generalize across varying parameters, are well-suited for sound synthesis. Furthermore, we demonstrate that the limitations of PI-DeepONets under higher forces can be mitigated by integrating observation data within a hybrid supervised-unsupervised framework. This suggests that a hybrid supervised-unsupervised DeepONets framework could be a promising direction for future practical applications.

Comparing Acoustic and Digital Piano Actions: Data Analysis and Key Insights

Michael Fioretti, Giuseppe Bergamino, Leonardo Gabrielli, Gianluca Ciattaglia and Susanna Spinsante

Abstract: The acoustic piano and its sound production mechanisms have been extensively studied in the field of acoustics. Similarly, digital piano synthesis has been the focus of numerous signal processing research studies. However, the role of the piano action in shaping the dynamics and nuances of piano sound has received less attention, particularly in the context of digital pianos. Digital pianos are well-established commercial instruments that typically use weighted keys with two or three sensors to measure the average key velocity—this being the only input to a sampling synthesis engine. In this study, we investigate whether this simplified measurement method adequately captures the full dynamic behavior of the original piano action. After a brief review of the state of the art, we describe an experimental setup designed to measure physical properties of the keys and hammers of a piano. This setup enables high-precision readings of acceleration, velocity, and position for both the key and hammer across various dynamic levels. Through extensive data analysis, we examine their relationships and identify the optimal key position for velocity measurement. We also analyze a digital piano key to determine where the average key velocity is measured and compare it with our proposed optimal timing. We find that the instantaneous key velocity just before let-off correlates most strongly with hammer impact velocity, indicating a target for improved sensing; however, due to the limitations of discrete velocity sensing this optimization alone may not suffice to replicate the nuanced expressiveness of acoustic piano touch. This study represents the first step in a broader research effort aimed at linking piano touch, dynamics, and sound production.

Wave Pulse Phase Modulation: Hybridising Phase Modulation and Phase Distortion

Matthew Smart

Abstract: This paper introduces Wave Pulse Phase Modulation (WPPM), a novel synthesis technique based on phase shaping. It combines two classic digital synthesis techniques: Phase Modulation (PM) and Phase Distortion (PD), aiming to overcome their respective limitations while enabling the creation of new, interesting timbres. It works by segmenting a phase signal into two regions, each independently driving the phase of a modulator waveform. This results in two distinct pulses per period that together form the signal used as the phase input to a carrier waveform, similar to PM, hence the name Wave Pulse Phase Modulation. This method provides a minimal set of parameters that enable the creation of complex, evolving waveforms, and rich dynamic textures. By modulating these parameters, WPPM can produce a wide range of interesting spectra, including those with formant-like resonant peaks. The paper examines PM and PD in detail, exploring the modifications needed to integrate them with WPPM, before presenting the full WPPM algorithm alongside its parameters and creative possibilities. Finally, it discusses scope for further research and developments into new similar phase shaping algorithms.

Digital Morphophone Environment. Computer Rendering of a Pioneering Sound Processing Device

Daniel Scorranese

Abstract: This paper introduces a digital reconstruction of the morphophone, a complex magnetophonic device developed in the 1950s within the laboratories of the GRM (Groupe de Recherches Musicales) in Paris. The analysis, design, and implementation methodologies underlying the Digital Morphophone Environment are discussed. Based on a detailed review of historical sources and limited documentation – including a small body of literature and, most notably, archival images – the core operational principles of the morphophone have been modeled within the MAX visual programming environment. The main goals of this work are, on the one hand, to study and make accessible a now obsolete and unavailable tool, and on the other, to provide the opportunity for new explorations in computer music and research.

MorphDrive: Latent Conditioning for Cross-Circuit Effect Modeling and a Parametric Audio Dataset of Analog Overdrive Pedals

Francesco Ardan Dal Rí, Domenico Stefani, Luca Turchet and Nicola Conci

Power-Balanced Drift Regulation for Scalar Auxiliary Variable Methods: Application to Real-Time Simulation of Nonlinear String Vibrations

Thomas Risse, Thomas Hélie and Stefan Bilbao

Modeling the Impulse Response of Higher-Order Microphone Arrays Using Differentiable Feedback Delay Networks

Riccardo Giampiccolo, Alessandro Ilic Mezza, Mirco Pezzoli, Shoichi Koyama, Alberto Bernardini and Fabio Antonacci

Abstract: Recently, differentiable multiple-input multiple-output Feedback Delay Networks (FDNs) have been proposed for modeling target multichannel room impulse responses by optimizing their parameters according to perceptually-driven time-domain descriptors. However, in spatial audio applications, frequency-domain characteristics and inter-channel differences are crucial for accurately replicating a given soundfield. In this article, targeting the modeling of the response of higher-order microphone arrays, we improve on the methodology by optimizing the FDN parameters using a novel spatially-informed loss function, demonstrating its superior performance over previous approaches and paving the way toward the use of differentiable FDNs in spatial audio applications such as soundfield reconstruction and rendering.

A Modified Algorithm for a Loudspeaker Line Array Multi-Lobe Control

Stefania Cecchi, Valeria Bruschi, Michele Frati, Marco Secondini and Andrea Tanoni

Abstract: The creation of personal sound zones is an effective solution for delivering personalized auditory experiences in shared spaces. Their applications span various domains, including in-car entertainment, home and office environments, and healthcare functions. This paper presents a novel approach for the creation of personal sound zones using a modified algorithm for multi-lobe control in loudspeaker line array. The proposed method integrates a pressurematching beamforming algorithm with an innovative technique for reducing side lobes, enhancing the precision and isolation of sound zones. The system was evaluated through simulations and experimental tests conducted in a semi-anechoic environment and a large listening room. Results demonstrate the effectiveness of the method in creating two separate sound zones.

Estimation of Multi-Slope Amplitudes in Late Reverberation

Jeremy B. Bai and Sebastian J. Schlecht

Abstract: The common-slope model is used to model late reverberation of complex room geometries such as multiple coupled rooms. The model fits band-limited room impulse responses using a set of common decay rates, with amplitudes varying based on listener positions. This paper investigates amplitude estimation methods within the common-slope model framework. We compare several traditional least squares estimation methods and propose using LINEX regression, a Maximum Likelihood approach using logsquared RIR statistics. Through statistical analysis and simulation tests, we demonstrate that LINEX regression improves accuracy and reduces bias when compared to traditional methods.

Differentiable Scattering Delay Networks for Artificial Reverberation

Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena and Alberto Bernardini

Abstract: Scattering delay networks (SDNs) provide a flexible and efficient framework for artificial reverberation and room acoustic modeling. In this work, we introduce a differentiable SDN, enabling gradient-based optimization of its parameters to better approximate the acoustics of real-world environments. By formulating key parameters such as scattering matrices and absorption filters as differentiable functions, we employ gradient descent to optimize an SDN based on a target room impulse response. Our approach minimizes discrepancies in perceptually relevant acoustic features, such as energy decay and frequency-dependent reverberation times. Experimental results demonstrate that the learned SDN configurations significantly improve the accuracy of synthetic reverberation, highlighting the potential of data-driven room acoustic modeling.

Differentiable Attenuation Filters for Feedback Delay Networks

Ilias Ibnyahya and Joshua D. Reiss

Abstract: We introduce a novel method for designing attenuation filters in digital audio reverberation systems based on Feedback Delay Networks (FDNs). Our approach uses Second Order Sections (SOS) of Infinite Impulse Response (IIR) filters arranged as parametric equalizers (PEQ), enabling fine control over frequency-dependent reverberation decay. Unlike traditional graphic equalizer designs, which require numerous filters per delay line, we propose a scalable solution where the number of filters can be adjusted. The frequency, gain, and quality factor (Q) parameters are shared parameters across delay lines and only the gain is adjusted based on delay length. This design not only reduces the number of optimization parameters, but also remains fully differentiable and compatible with gradient-based learning frameworks. Leveraging principles of analog filter design, our method allows for efficient and accurate filter fitting using supervised learning. Our method delivers a flexible and differentiable design, achieving state-of-the-art performance while significantly reducing computational cost.

Perceptual Decorrelator Based on Resonators

Jon Fagerström, Nils Meyer-Kahlen, Sebastian J. Schlecht and Vesa Välimäki

Abstract: Decorrelation filters transform mono audio into multiple decorrelated copies. This paper introduces a novel decorrelation filter design based on a resonator bank, which produces a sum of over a thousand exponentially decaying sinusoids. A headphone listening test was used to identify the minimum inter-channel time delays that perceptually match ERB-filtered coherent noise to corresponding incoherent noise. The decay rate of each resonator is set based on a group delay profile determined by the listening test results at its corresponding frequency. Furthermore, the delays from the test are used to refine frequency-dependent windowing in coherence estimation, which we argue represents the perceptually most accurate way of assessing interaural coherence. This coherence measure then guides an optimization process that adjusts the initial phases of the sinusoids to minimize the coherence between two instances of the resonator-based decorrelator. The delay results establish the necessary group delay per ERB for effective decorrelation, revealing higher-than-expected values, particularly at higher frequencies. For comparison, the optimization is also performed using two previously proposed group-delay profiles: one based on the period of the ERB band center frequency and another based on the maximum group-delay limit before introducing smearing. The results indicate that the perceptually informed profile achieves equal decorrelation to the latter profile while smearing less at high frequencies. Overall, optimizing the phase response of the proposed decorrelator yields significantly lower coherence compared to using a random phase.

Compression of Head-Related Transfer Functions Using Piecewise Cubic Hermite Interpolation

Tom Krueger and Julián Villegas

Abstract: We present a spline-based method for compressing and reconstructing Head-Related Transfer Functions (HRTFs) that preserves perceptual quality. Our approach focuses on the magnitude response and consists of four stages: (1) acquiring minimumphase head-related impulse responses (HRIR), (2) transforming them into the frequency domain and applying adaptive Wiener filtering to preserve important spectral features, (3) extracting a minimal set of control points using derivative-based methods to identify local maxima and inflection points, and (4) reconstructing the HRTF using piecewise cubic Hermite interpolation (PCHIP) over the refined control points. Evaluation on 301 subjects demonstrates that our method achieves an average compression ratio of 4.7:1 with spectral distortion ≤ 1.0 dB in each Equivalent Rectangular Band (ERB). The method preserves binaural cues with a mean absolute interaural level difference (ILD) error of 0.10 dB. Our method achieves about three times the compression obtained with a PCA-based method.

Spatializing Screen Readers: Extending VoiceOver via Head-Tracked Binaural Synthesis for User Interface Accessibility

Giuseppe Bergamino, Michael Fioretti, Leonardo Gabrielli and Stefano Squartini

Abstract: Traditional screen-based graphical user interfaces (GUIs) pose significant accessibility challenges for visually impaired users. This paper demonstrates how existing GUI elements can be translated into an interactive auditory domain using high-order Ambisonics and inertial sensor-based head tracking, culminating in a realtime binaural rendering over headphones. The proposed system is designed to spatialize the auditory output from VoiceOver, the built-in macOS screen reader, aiming to foster clearer mental mapping and enhanced navigability. A between-groups experiment was conducted to compare standard VoiceOver with the proposed spatialized version. Non visually-impaired participants (n = 32), with no visual access to the test interface, completed a list-based exploration and then attempted to reconstruct the UI solely from auditory cues. Experimental results indicate that the head-tracked group achieved a slightly higher accuracy in reconstructing the interface, while user experience assessments showed no significant differences in self-reported workload or usability. These findings suggest that potential benefits may come from the integration of head-tracked binaural audio into mainstream screen-reader workflows, but future investigations involving blind and low-vision users are needed. Although the experimental testbed uses a generic desktop app, our ultimate goal is to tackle the complex visual layouts of music-production software, where an head-tracked audio approach could benefit visually impaired producers and musicians navigating plug-in controls.

Evaluating the Performance of Objective Audio Quality Metrics in Response to Common Audio Degradations

Xie He, Duncan Williams and Bruno Fazenda

Abstract: This study evaluates the performance of five objective audio quality metrics—PEAQ Basic, PEAQ Advanced, PEMO-Q, ViSQOL, and HAAQI —in the context of digital music production. Unlike previous comparisons, we focus on their suitability for production environments, an area currently underexplored in existing research. Twelve audio examples were tested using two evaluation types: an effectiveness test under progressively increasing degradations (hum, hiss, clipping, glitches) and a robustness test under fixed-level, randomly fluctuating degradations. In the effectiveness test, HAAQI, PEMO-Q, and PEAQ Basic effectively tracked degradation changes, while PEAQ Advanced failed consistently and ViSQOL showed low sensitivity to hum and glitches. In the robustness test, ViSQOL and HAAQI demonstrated the highest consistency, with average standard deviations of 0.004 and 0.007, respectively, followed by PEMO-Q (0.021), PEAQ Basic (0.057), and PEAQ Advanced (0.065). However, ViSQOL also showed low variability across audio examples, suggesting limited genre sensitivity. These findings highlight the strengths and limitations of each metric for music production, specifically quality measurement with compressed audio. The source code and dataset will be made publicly available upon publication.

Room Acoustic Modelling Using a Hybrid Ray-Tracing/Feedback Delay Network Method

Haowen Zhao, Akihiko Suyama, Kazunobu Kondo and Damian T. Murphy

Abstract: Combining different room acoustic modelling methods could provide a better balance between perceptual plausibility and computational efficiency than using a single and potentially more computationally expensive model. In this work, a hybrid acoustic modelling system that integrates ray tracing (RT) with an advanced feedback delay network (FDN) is designed to generate perceptually plausible RIRs. A multiple stimuli with hidden reference and anchor (MUSHRA) test and a two-alternative-forced-choice (2AFC) discrimination task have been conducted to compare the proposed method against ground truth recordings and conventional RT-based approaches. The results show that the proposed system delivers robust performance in various scenarios, achieving highly plausible reverberation synthesis.

DataRES and PyRES: A Room Dataset and a Python Library for Reverberation Enhancement System Development, Evaluation, and Simulation

Gian Marco De Bortoli, Karolina Prawda, Philip Coleman and Sebastian J. Schlecht

Abstract: Reverberation is crucial in the acoustical design of physical spaces, especially halls for live music performances. Reverberation Enhancement Systems (RESs) are active acoustic systems that can control the reverberation properties of physical spaces, allowing them to adapt to specific acoustical needs. The performance of RESs strongly depends on the properties of the physical room and the architecture of the Digital Signal Processor (DSP). However, room-impulse-response (RIR) measurements and the DSP code from previous studies on RESs have never been made open access, leading to non-reproducible results. In this study, we present DataRES and PyRES—a RIR dataset and a Python library to increase the reproducibility of studies on RESs. The dataset contains RIRs measured in RES research and development rooms and professional music venues. The library offers classes and functionality for the development, evaluation, and simulation of RESs. The implemented DSP architectures are made differentiable, allowing their components to be trained in a machine-learning-like pipeline. The replication of previous studies by the authors shows that PyRES can become a useful tool in future research on RESs.

Auditory Discrimination of Early Reflections in Virtual Rooms

Junting Chen, Duncan Williams, Bruno Fazenda

Abstract: This study investigates the perceptual sensitivity to early reflection changes across different spatial directions in a virtual reality (VR) environment. Using an ABX discrimination paradigm, participants evaluated speech stimuli convolved with thirdorder Ambisonic room impulse responses under three position reversal (Left–Right, Front–Back, and Floor–Ceiling) and three reverberation conditions (RT60 = 1.0 s, 0.6 s, and 0.2 s). Binomial tests revealed that participants consistently detected early reflection differences in the Left–Right reversal, while discrimination performance in the other two directions remained at or near chance. This result can be explained by the higher acuity and lower localisation blur found for the human auditory system. A two-way ANOVA confirmed a significant main effect of spatial position (p = 0.00685, η² = 0.1605), with no significant effect of reverberation or interaction. The analysis of the binaural room impulse responses showed wave forms and Direct-ReverberantRatio differences in the Left–Right reversal position, aligning with perceptual results. However, no definitive causal link between DRR variations and perceptual outcomes can yet be established.

Partiels – Exploring, Analyzing and Understanding Sounds

Pierre Guillot

Abstract: This article presents Partiels, an open-source application developed at IRCAM to analyze digital audio files and explore sound characteristics. The application uses Vamp plug-ins to extract various information on different aspects of the sound, such as spectrum, partials, pitch, tempo, text, and chords. Partiels is the successor to AudioSculpt, offering a modern, flexible interface for visualizing, editing, and exporting analysis results, addressing a wide range of issues from musicological practice to sound creation and signal processing research. The article describes Partiels’ key features, including analysis organization, audio file management, results visualization and editing, as well as data export and sharing options, and its interoperability with other software such as Max and Pure Data. In addition, it highlights the numerous analysis plug-ins developed at IRCAM, based in particular on machine learning models, as well as the IRCAM Vamp extension, which overcomes certain limitations of the original Vamp format.

Listener-Adaptive 3D Audio with Crosstalk Cancellation

Francesco Veronesi, Filippo Fazi and Jacob Hollebon

Abstract: Crosstalk cancellation is a technology that allows the delivery of binaural audio over loudspeakers using loudspeaker beamforming, without the need for headphones. It enables spatial audio to be reproduced using practical loudspeaker distributions, for example a soundbar of loudspeakers positioned in front of the user only. Crosstalk cancellation requires the user to be positioned at a specific location in space, the 'sweet-spot'. However, by using a built-in camera or sensor, the listener's ear position relative to the audio device can tracked in real time, enabling a mobile sweet-spot through precise beamforming and effective crosstalk cancellation no matter where the listener is positioned. This demo allows users to experience listener-adaptive crosstalk cancellation developed by Audioscenic, on a multi-loudspeaker gaming soundbar. Audioscenic develops advanced crosstalk cancellation solutions for home audio, gaming, automotive, and public space applications. Founded in 2017 by Dr Marcos Simón and Professor Filippo Fazi, the company emerged from their collaborative research at the Institute of Sound and Vibration Research, University of Southampton.

Biquad Coefficients Optimization via Kolmogorov-Arnold Networks

Ayoub Malek, Donald Schulz and Felix Wuebbelmann

Abstract: Conventional Deep Learning (DL) approaches to Infinite Impulse Response (IIR) filter coefficients estimation from arbitrary frequency response are quite limited. They often suffer from inefficiencies such as tight training requirements, high complexity, and limited accuracy. As an alternative, in this paper, we explore the use of Kolmogorov-Arnold Networks (KANs) to predict the IIR filter—specifically biquad coefficients—effectively. By leveraging the high interpretability and accuracy of KANs, we achieve smooth coefficients’ optimization. Furthermore, by constraining the search space and exploring different loss functions, we demonstrate improved performance in speed and accuracy. Our approach is evaluated against other existing differentiable IIR filter solutions. The results show significant advantages of KANs over existing methods, offering steadier convergences and more accurate results. This offers new possibilities for integrating digital infinite impulse response (IIR) filters into deep-learning frameworks.

Audio Processor Parameters: Estimating Distributions Instead of Deterministic Values

Côme Peladeau, Dominique Fourer and Geoffroy Peeters

Abstract: Audio effects and sound synthesizers are widely used processors in popular music. Their parameters control the quality of the output sound. Multiple combinations of parameters can lead to the same sound. While recent approaches have been proposed to estimate these parameters given only the output sound, those are deterministic, i.e. they only estimate a single solution among the many possible parameter configurations. In this work, we propose to model the parameters as probability distributions instead of deterministic values. To learn the distributions, we optimize two objectives: (1) we minimize the reconstruction error between the ground truth output sound and the one generated using the estimated parameters, asisit usuallydone, but also(2)we maximize the parameter diversity, using entropy. We evaluate our approach through two numerical audio experiments to show its effectiveness. These results show how our approach effectively outputs multiple combinations of parameters to match one sound.

A Parametric Equalizer with Interactive Poles and Zeros Control for Digital Signal Processing Education

Andrea Casati, Giorgio Presti and Marco Tiraboschi

Abstract: This article presents ZePolA, a digital audio equalizer designed as an educational resource for understanding digital filter design. Unlike conventional equalization plug-ins, which define the frequency response first and then derive the filter coefficients, this software adopts an inverse approach: users directly manipulate the placement of poles and zeros on the complex plane, with the corresponding frequency response visualized in real time. This methodology provides an intuitive link between theoretical filter concepts and their practical application. The plug-in features three main panels: a filter parameter panel, a frequency response panel, and a filter design panel. It allows users to configure a cascade of firstor second-order filter elements, each parameterized by the location of its poles or zeros. The GUI supports interaction through drag-and-drop gestures, enabling immediate visual and auditory feedback. This hands-on approach is intended to enhance learning by bridging the gap between theoretical knowledge and practical application. To assess the educational value and usability of the plug-in, a preliminary evaluation was conducted with focus groups of students and lecturers. Future developments will include support for additional filter types and increased architectural flexibility. Moreover, a systematic validation study involving students and educators is proposed to quantitatively evaluate the plug-in’s impact on learning outcomes. This work contributes to the field of digital signal processing education by offering an innovative tool that merges the hands-on approach of music production with a deeper theoretical understanding of digital filters, fostering an interactive and engaging educational experience.

Zero-Phase Sound via Giant FFT

Vesa Välimäki, Stefan Bilbao, Sebastian J. Schlecht, Roope Salmi and David Zicarelli

Abstract: Given the speedy computation of the FFT in current computer hardware, there are new possibilities for examining transformations for very long sounds. A zero-phase version of any audio signal can be obtained by zeroing the phase angle of its complex spectrum and taking the inverse FFT. This paper recommends additional processing steps, including zero-padding, transient suppression at the signal’s start and end, and gain compensation, to enhance the resulting sound quality. As a result, a sound with the same spectral characteristics as the original one, but with different temporal events, is obtained. Repeating rhythm patterns are retained, however. Zero-phase sounds are palindromic in the sense that they are symmetric in time. A comparison of the zero-phase conversion to the autocorrelation function helps to understand its properties, such as why the rhythm of the original sound is emphasized. It is also argued that the zero-phase signal has the same autocorrelation function as the original sound. One exciting variation of the method is to apply the method separately to the real and imaginary parts of the spectrum to produce a stereo effect. A frame-based technique enables the use of the zero-phase conversion in real-time audio processing. The zero-phase conversion is another member of the giant FFT toolset, allowing the modification of sampled sounds, such as drum loops or entire songs.

Partiels – Exploring, Analyzing and Understanding Sounds

Pierre Guillot

Stable Limit Cycles as Tunable Signal Sources

Wolfram E. Weingartner

Abstract: This paper presents a method for synthesizing audio signals from nonlinear dynamical systems exhibiting stable limit cycles, with control over frequency and amplitude independent of changes to the system’s internal parameters. Using the van der Pol oscillator and the Brusselator as case studies, it is demonstrated how parameters are decoupled from frequency and amplitude by rescaling the angular frequency and normalizing amplitude extrema. Practical implementation considerations are discussed, as are the limits and challenges of this approach. The method’s validity is evaluated experimentally and synthesis examples show the application of tunable nonlinear oscillators in sound design, including the generation of transients in FM synthesis by means of a van der Pol oscillator and a Supersaw oscillator bank based on the Brusselator.

Lookup Table Based Audio Spectral Transformation

Ryoho Kobayashi

Abstract: We present a unified visual interface for flexible spectral audio manipulation based on editable lookup tables (LUTs). In the proposed approach, the audio spectrum is visualized as a two-dimensional color map of frequency versus amplitude, serving as an editable lookup table for modifying the sound. This single tool can replicate common audio effects such as equalization, pitch shifting, and spectral compression, while also enabling novel sound transformations through creative combinations of adjustments. By consolidating these capabilities into one visual platform, the system has the potential to streamline audio-editing workflows and encourage creative experimentation. The approach also supports real-time processing, providing immediate auditory feedback in an interactive graphical environment. Overall, this LUT-based method offers an accessible yet powerful framework for designing and applying a broad range of spectral audio effects through intuitive visual manipulation.

A Non-Uniform Subband Implementation of an Active Noise Control System for Snoring Reduction

Stefano Nobili, Alessandro Nicolini, Ferruccio Bettarelli, Valeria Bruschi and Stefania Cecchi

Abstract: The snoring noise can be extremely annoying and can negatively affect people’s social lives. To reduce this problem, active noise control (ANC) systems can be adopted for snoring cancellation. Recently, adaptive subband systems have been developed to improve the convergence rate and reduce the computational complexity of the ANC algorithm. Several structures have been proposed with different approaches. This paper proposes a non-uniform subband adaptive filtering (SAF) structure to improve a feedforward active noise control algorithm. The non-uniform band distribution allows for a higher frequency resolution of the lower frequencies, where the snoring noise is most concentrated. Several experiments have been carried out to evaluate the proposed system in comparison with a reference ANC system which uses a uniform approach.

Compositional Application of a Chaotic Dynamical System for the Synthesis of Sounds

Costantino Rizzuti

Abstract: The paper presents a review of compositional application developed in the last years using a chaotic dynamical system in different sound synthesis processes. The use of chaotic dynamical systems in computer music has been a widespread practice for some time now. The experimentation presented in this work shows the use of a specific chaotic system: the Chua’s oscillator, within different sound synthesis methods. A family of new musical instruments has been developed exploiting the potential offered by the use of this chaotic system to produce complex timbres and sounds. The instruments have been used for the creation of musical pieces and for the realization of live electronics performances.

Delay Optimization Towards Smooth Sparse Noise

Cristóbal Andrade and Sebastian J. Schlecht

Abstract: Smooth sparse noise sequences are applied to efficiently model reverberation. This paper addresses the problem of optimizing sparse noise sequences for perceptual smoothness using gradient- based methods. We demonstrate that sinc-shaped artifacts introduced by fractional delay create non-convexities in an envelope-based roughness loss function, hindering delay optimization. By temporarily removing pulse polarity and omitting envelope rectification, we obtain a convex loss suitable for gradient descent. Pulse signs are reintroduced after optimization during synthesis. Optimization results show roughness reduction across various pulse densities, with the optimized sequences approaching the perceptual smoothness of velvet noise.

graetli: A Microcontroller-Based DSP Platform for Real-Time Audio Signal Processing

Jonas Roth, Silvan Krebs, and Christoph Studer

Abstract: This demonstration presents graetli, a standalone digital signal processing (DSP) platform for real-time audio applications, built around the Electrosmith Daisy Seed [1] microcontroller platform. graetli features high-quality analog audio I/O, a zero-latency analog dry signal path, a user interface with programmable potentiometers, and a rugged enclosure. graetli is suitable for both performance interaction and algorithm prototyping. To showcase its capabilities, we implement a frequency domain artificial reverberation algorithm. Conference visitors are invited to interact with the platform and experience the real-time DSP reverb algorithm.

A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis

Esteban Gutiérrez, Frederic Font, Xavier Serra and Lonce Wyse

Abstract: In this work, we introduce TexStat, a novel loss function specifically designed for the analysis and synthesis of texture sounds characterized by stochastic structure and perceptual stationarity. Drawing inspiration from the statistical and perceptual framework of McDermott and Simoncelli, TexStat identifies similarities between signals belonging to the same texture category without relying on temporal structure. We also propose using TexStat as a validation metric alongside Frechet Audio Distances (FAD) to evaluate texture sound synthesis models. In addition to TexStat, we present TexEnv, an efficient, lightweight and differentiable texture sound synthesizer that generates audio by imposing amplitude envelopes on filtered noise. We further integrate these components into TexDSP, a DDSP-inspired generative model tailored for texture sounds. Through extensive experiments across various texture sound types, we demonstrate that TexStat is perceptually meaningful, time-invariant, and robust to noise, features that make it effective both as a loss function for generative tasks and as a validation metric. All tools and code are provided as open-source contributions and our PyTorch implementations are efficient, differentiable, and highly configurable, enabling its use in both generative tasks and as a perceptually grounded evaluation metric.

DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions

Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas and Yuki Mitsufuji

Abstract: This study introduces a novel and interpretable model, DiffVox, for matching vocal effects in music production. DiffVox, short for “Differentiable Vocal Fx", integrates parametric equalisation, dynamic range control, delay, and reverb with efficient differentiable implementations to enable gradient-based optimisation for parameter estimation. Vocal presets are retrieved from two datasets, comprising 70 tracks from MedleyDB and 365 tracks from a private collection. Analysis of parameter correlations reveals strong relationships between effects and parameters, such as the highpass and low-shelf filters often working together to shape the low end, and the delay time correlating with the intensity of the delayed signals. Principal component analysis reveals connections to McAdams’ timbre dimensions, where the most crucial component modulates the perceived spaciousness while the secondary components influence spectral brightness. Statistical testing confirms the non-Gaussian nature of the parameter distribution, highlighting the complexity of the vocal effects space. These initial findings on the parameter distributions set the foundation for future research in vocal effects modelling and automatic mixing.

Improving Lyrics-to-Audio Alignment Using Frame-wise Phoneme Labels with Masked Cross Entropy Loss

Tian Cheng, Tomayasu Nakano and Masataka Goto

Abstract: This paper addresses the task of lyrics-to-audio alignment, which involves synchronizing textual lyrics with corresponding music audio. Most publicly available datasets for this task provide annotations only at the line or word level. This poses a challenge for training lyrics-to-audio models due to the lack of frame-wise phoneme labels. However, we find that phoneme labels can be partially derived from word-level annotations: for single-phoneme words, all frames corresponding to the word can be labeled with the same phoneme; for multi-phoneme words, phoneme labels can be assigned at the first and last frames of the word. To leverage this partial information, we construct a mask for those frames and propose a masked frame-wise cross-entropy (CE) loss that considers only frames with known phoneme labels. As a baseline model, we adopt an autoencoder trained with a Connectionist Temporal Classification (CTC) loss and a reconstruction loss. We then enhance the training process by incorporating the proposed framewise masked CE loss. Experimental results show that incorporating the frame-wise masked CE loss improves alignment performance. In comparison to other state-of-the art models, our model provides a comparable Mean Absolute Error (MAE) of 0.216 seconds and a top Median Absolute Error (MedAE) of 0.041 seconds on the testing Jamendo dataset.

Automatic Classification of Chains of Guitar Effects Through Evolutionary Neural Architecture Search

Michele Rossi, Giovanni Iacca and Luca Turchet

Abstract: Recent studies on classifying electric guitar effects have achieved high accuracy, particularly with deep learning techniques. However, these studies often rely on simplified datasets consisting mainly of single notes rather than realistic guitar recordings. Moreover, in the specific field of effect chain estimation, the literature tends to rely on large models, making them impractical for real-time or resource-constrained applications. In this work, we recorded realistic guitar performances using four different guitars and created three datasets by applying a chain of five effects with increasing complexity: (1) fixed order and parameters, (2) fixed order with randomly sampled parameters, and (3) random order and parameters. We also propose a novel Neural Architecture Search method aimed at discovering accurate yet compact convolutional neural network models to reduce power and memory consumption. We compared its performance to a basic random search strategy, showing that our custom Neural Architecture Search outperformed random search in identifying models that balance accuracy and complexity. We found that the number of convolutional and pooling layers becomes increasingly important as dataset complexity grows, while dense layers have less impact. Additionally, among the effects, tremolo was identified as the most challenging to classify.

Inference-Time Structured Pruning for Real-Time Neural Network Audio Effects

Christopher Johann Clarke and Jatin Chowdhury

Abstract: Structured pruning is a technique for reducing the computational load and memory footprint of neural networks by removing structured subsets of parameters according to a predefined schedule or ranking criterion. This paper investigates the application of structured pruning to real-time neural network audio effects, focusing on both feedforward networks and recurrent architectures. We evaluate multiple pruning strategies at inference time, without retraining, and analyze their effects on model performance. To quantify the trade-off between parameter count and audio fidelity, we construct a theoretical model of the approximation error as a function of network architecture and pruning level. The resulting bounds establish a principled relationship between pruninginduced sparsity and functional error, enabling informed deployment of neural audio effects in constrained real-time environments.

Unsupervised Estimation of Nonlinear Audio Effects: Comparing Diffusion-Based and Adversarial Approaches

Eloi Moliner, Michal Švento, Alec Wright, Lauri Juvela, Pavel Rajmic and Vesa Välimäki

Abstract: Accurately estimating nonlinear audio effects without access to paired input-output signals remains a challenging problem. This work studies unsupervised probabilistic approaches for solving this task. We introduce a method, novel for this application, based on diffusion generative models for blind system identification, enabling the estimation of unknown nonlinear effects using blackand gray-box models. This study compares this method with a previously proposed adversarial approach, analyzing the performance of both methods under different parameterizations of the effect operator and varying lengths of available effected recordings. Through experiments on guitar distortion effects, we show that the diffusion-based approach provides more stable results and is less sensitive to data availability, while the adversarial approach is superior at estimating more pronounced distortion effects. Our findings contribute to the robust unsupervised blind estimation of audio effects, demonstrating the potential of diffusion models for system identification in music technology.

Empirical Results for Adjusting Truncated Backpropagation Through Time While Training Neural Audio Effects

Yann Bourdin, Pierrick Legrand and Fanny Roche

Abstract: This paper investigates the optimization of Truncated Backpropagation Through Time (TBPTT) for training neural networks in digital audio effect modeling, with a focus on dynamic range compression. The study evaluates key TBPTT hyperparameters – sequence number, batch size, and sequence length – and their influence on model performance. Using a convolutional-recurrent architecture, we conduct extensive experiments across datasets with and without conditioning by user controls. Results demonstrate that carefully tuning these parameters enhances model accuracy and training stability, while also reducing computational demands. Objective evaluations confirm improved performance with optimized settings, while subjective listening tests indicate that the revised TBPTT configuration maintains high perceptual quality.

Neural-Driven Multi-Band Processing for Automatic Equalization and Style Transfer

Parakrant Sarkar and Permagnus Lindborg

Abstract: We present a Neural-Driven Multi-Band Processor (NDMP), a differentiable audio processing framework that augments a static sixband Parametric Equalizer (PEQ) with per-band dynamic range compression. We optimize this processor using neural inference for two tasks: Automatic Equalization (AutoEQ), which estimates tonal and dynamic corrections without a reference, and Production Style Transfer (NDMP-ST), which adapts the processing of an input signal to match the tonal and dynamic characteristics of a reference. We train NDMP using a self-supervised strategy, where the model learns to recover a clean signal from inputs degraded with randomly sampled NDMP parameters and gain adjustments. This setup eliminates the need for paired input–target data and enables end-to-end training with audio-domain loss functions. In the inference, AutoEQ enhances previously unseen inputs in a blind setting, while NDMP-ST performs style transfer by predicting taskspecific processing parameters. We evaluate our approach on the MUSDB18 dataset using both objective metrics (e.g., SI-SDR, PESQ, STFT loss) and a listening test. Our results show that NDMP consistently outperforms traditional PEQ and a PEQ+DRC (single-band) baseline, offering a robust neural framework for audio enhancement that combines learned spectral and dynamic control.

TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration

Matteo Spanio and Antonio Rodà

Abstract: The increasing complexity and real-time processing demands of audio signals require optimized algorithms that utilize the computational power of Graphics Processing Units (GPUs). Existing Digital Signal Processing (DSP) libraries often do not provide the necessary efficiency and flexibility, particularly for integrating with Artificial Intelligence (AI) models. In response, we introduce TorchFX: a GPU-accelerated Python library for DSP, engineered to facilitate sophisticated audio signal processing. Built on the PyTorch framework, TorchFX offers an Object-Oriented interface similar to torchaudio but enhances functionality with a novel pipe operator for intuitive filter chaining. The library provides a comprehensive suite of Finite Impulse Response (FIR) and Infinite Impulse Response (IIR) filters, with a focus on multichannel audio, thereby facilitating the integration of DSP and AI-based approaches. Our benchmarking results demonstrate significant efficiency gains over traditional libraries like SciPy, particularly in multichannel contexts. While there are current limitations in GPU compatibility, ongoing developments promise broader support and real-time processing capabilities. TorchFX aims to become a useful tool for the community, contributing to innovation in GPU-accelerated DSP. TorchFX is publicly available on GitHub at https://github.com/matteospanio/torchfx.

Hyperbolic Embeddings for Order-Aware Classification of Audio Effect Chains

Aogu Wada, Tomohiko Nakamura and Hiroshi Saruwatari

Abstract: Audio effects (AFXs) are essential tools in music production, frequently applied in chains to shape timbre and dynamics. The order of AFXs in a chain plays a crucial role in determining the final sound, particularly when non-linear (e.g., distortion) or timevariant (e.g., chorus) processors are involved. Despite its importance, most AFX-related studies have primarily focused on estimating effect types and their parameters from a wet signal. To address this gap, we formulate AFX chain recognition as the task of jointly estimating AFX types and their order from a wet signal. We propose a neural-network-based method that embeds wet signals into a hyperbolic space and classifies their AFX chains. Hyperbolic space can represent tree-structured data more efficiently than Euclidean space due to its exponential expansion property. Since AFX chains can be represented as trees, with AFXs as nodes and edges encoding effect order, hyperbolic space is well-suited for modeling the exponentially growing and non-commutative nature of ordered AFX combinations, where changes in effect order can result in different final sounds. Experiments using guitar sounds demonstrate that, with an appropriate curvature, the proposed method outperforms its Euclidean counterpart. Further analysis based on AFX type and chain length highlights the effectiveness of the proposed method in capturing AFX order.

Towards an Objective Comparison of Panning Feature Algorithms for Unsupervised Learning

Richard Mitic and Andreas Rossholm

Abstract: Estimations of panning attributes are an important feature to extract from a piece of recorded music, with downstream uses such as classification, quality assessment, and listening enhancement. While several algorithms exist in the literature, there is currently no comparison between them and no studies to suggest which one is most suitable for any particular task. This paper compares four algorithms for extracting amplitude panning features with respect to their suitability for unsupervised learning. It finds synchronicities between them and analyses their results on a small set of commercial music excerpts chosen for their distinct panning features. The ability of each algorithm to differentiate between the tracks is analysed. The results can be used in future work to either select the most appropriate panning feature algorithm or create a version customized for a particular task.

Unsupervised Text-to-Sound Mapping via Embedding Space Alignment

Luke Dzwonczyk and Carmine-Emanuele Cella

Abstract: This work focuses on developing an artistic tool that performs an unsupervised mapping between text and sound, converting an input text string into a series of sounds from a given sound corpus. With the use of a pre-trained sound embedding model and a separate, pre-trained text embedding model, the goal is to find a mapping between the two feature spaces. Our approach is unsupervised which allows any sound corpus to be used with the system. The tool performs the task of text-to-sound retrieval, creating a soundfile in which each word in the text input is mapped to a single sound in the corpus, and the resulting sounds are concatenated to play sequentially. We experiment with three different mapping methods, and perform quantitative and qualitative evaluations on the outputs. Our results demonstrate the potential of unsupervised methods for creative applications in text-to-sound mapping.

Generative Latent Spaces for Neural Synthesis of Audio Textures

Aaron Dees and Seán O'Leary

Abstract: This paper investigates the synthesis of audio textures and the structure of generative latent spaces using Variational Autoencoders (VAEs) within two paradigms of neural audio synthesis: DSP-inspired and data-driven approaches. For each paradigm, we propose VAE-based frameworks that allow fine-grained temporal control. We introduce datasets across three categories of environmental sounds to support our investigations. We evaluate and compare the models’ reconstruction performance using objective metrics, and investigate their generative capabilities and latent space structure through latent space interpolations.

RT-PAD-VC – Creative Applications of Neural Voice Conversion as an Audio Effect

Paolo Sani, Edgar Andres Suarez Guarnizo, Kishor Kayyar Lakshminarayana, and Christian Dittmar

Abstract: Streaming-enabled voice conversion (VC) bears the potential for many creative applications as an audio effect. This demo paper details our low-latency, real-time implementation of the recently proposed Prosody-aware Decoder Voice Conversion (PAD-VC). Building on this technical foundation, we explore and demonstrate diverse use cases in creative processing of speech and vocal recordings. Enabled by it’s voice cloning capabilities and fine-grained controllability, RT-PAD-VC can be used as a low-delay, quasi real-time audio effects processor for gender conversion, timbre and formant-preserving pitch-shifting, vocal harmonization and cross-synthesis from musical instruments. The on-site demo setup will allow participants to interact in a playful way with our technology.

DiffVox: A Differentiable Model for Capturing and Analysing Vocal Effects Distributions

Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Ben Hayes, Wei-Hsiang Liao, György Fazekas and Yuki Mitsufuji

SCHAEFFER: A Dataset of Human-Annotated Sound Objects for Machine Learning Applications

Maurizio Berta and Daniele Ghisi

Abstract: Machine learning for sound generation is rapidly expanding within the computer music community. However, most datasets used to train models are built from field recordings, foley sounds, instrumental notes, or commercial music. This presents a significant limitation for composers working in acousmatic and electroacoustic music, who require datasets tailored to their creative processes. To address this gap, we introduce the SCHAEFFER Dataset (Spectromorphological Corpus of Human-annotated Audio with Electroacoustic Features For Experimental Research), a curated collection of 1000 sound objects designed and annotated by composers and students of electroacoustic composition. The dataset, distributed under Creative Commons licenses, features annotations combining technical and poetic descriptions, alongside classifications based on pre-defined spectromorphological categories.

Pitch-Conditioned Instrument Sound Synthesis From an Interactive Timbre Latent Space

Christian Limberg, Fares Schulz, Zhe Zhang and Stefan Weinzierl

Abstract: This paper presents a novel approach to neural instrument sound synthesis using a two-stage semi-supervised learning framework capable of generating pitch-accurate, high-quality music samples from an expressive timbre latent space. Existing approaches that achieve sufficient quality for music production often rely on highdimensional latent representations that are difficult to navigate and provide unintuitive user experiences. We address this limitation through a two-stage training paradigm: first, we train a pitchtimbre disentangled 2D representation of audio samples using a Variational Autoencoder; second, we use this representation as conditioning input for a Transformer-based generative model. The learned 2D latent space serves as an intuitive interface for navigating and exploring the sound landscape. We demonstrate that the proposed method effectively learns a disentangled timbre space, enabling expressive and controllable audio generation with reliable pitch conditioning. Experimental results show the model’s ability to capture subtle variations in timbre while maintaining a high degree of pitch accuracy. The usability of our method is demonstrated in an interactive web application, highlighting its potential as a step towards future music production environments that are both intuitive and creatively empowering: https://pgesam.faresschulz.com/.

Neural Sample-Based Piano Synthesis

Riccardo Simionato and Stefano Fasciani

Abstract: Piano sound emulation has been an active topic of research and development for several decades. Although comprehensive physicsbased piano models have been proposed, sample-based piano emulation is still widely utilized for its computational efficiency and relative accuracy despite presenting significant memory storage requirements. This paper proposes a novel hybrid approach to sample-based piano synthesis aimed at improving the fidelity of sound emulation while reducing memory requirements for storing samples. A neural network-based model processes the sound recorded from a single example of piano key at a given velocity. The network is trained to learn the nonlinear relationship between the various velocities at which a piano key is pressed and the corresponding sound alterations. Results show that the method achieves high accuracy using a specific neural architecture that is computationally efficient, presenting few trainable parameters, and it requires memory only for one sample for each piano key.

Piano-SSM: Diagonal State Space Models for Efficient Midi-to-Raw Audio Synthesis

Dominik Dallinger, Matthias Bittner, Daniel Schnöll, Matthias Wess and Axel Jantsch

Abstract: Deep State Space Models (SSMs) have shown remarkable performance in long-sequence reasoning tasks, such as raw audio classification, and audio generation. This paper introduces PianoSSM, an end-to-end deep SSM neural network architecture designed to synthesize raw piano audio directly from MIDI input. The network requires no intermediate representations or domainspecific expert knowledge, simplifying training and improving accessibility. Quantitative evaluations on the MAESTRO dataset show that Piano-SSM achieves a Multi-Scale Spectral Loss (MSSL) of 7.02 at 16kHz, outperforming DDSP-Piano v1 with a MSSL of 7.09. At 24kHz, Piano-SSM maintains competitive performance with an MSSL of 6.75, closely matching DDSP-Piano v2’s result of 6.58. Evaluations on the MAPS dataset achieve an MSSL score of 8.23, which demonstrates the generalization capability even when training with very limited data. Further analysis highlights Piano-SSM’s ability to train on high sampling-rate audio while synthesizing audio at lower sampling rates, explicitly linking performance loss to aliasing effects. Additionally, the proposed model facilitates real-time causal inference through a custom C++17 header-only implementation. Using an Intel Core i712700 processor at 4.5GHz, with single core inference, allows synthesizing one second of audio at 44.1kHz in 0.44s with a workload of 23.1GFLOPS/s and an 10.1µs input/output delay with the largest network. While the smallest network at 16kHz only needs 0.04s with 2.3GFLOP/s and 2.6µs input/output delay. These results underscore Piano-SSM’s practical utility and efficiency in real-time audio synthesis applications.

A Statistics-Driven Differentiable Approach for Sound Texture Synthesis and Analysis

Esteban Gutiérrez, Frederic Font, Xavier Serra and Lonce Wyse

Non-Iterative Simulation: A Numerical Analysis Viewpoint

Alessia Andò (University of Udine)

Abstract: Stiff ordinary differential equations (ODEs) frequently appear in scientific and engineering applications, necessitating numerical methods that ensure stability and efficiency. Non-iterative approaches for stiff ODEs provide an alternative to fully implicit schemes, which often feature a certain degree of unpredictability of the computation time that can be unacceptable in virtual analog real time applications. This tutorial will especially focus on Rosenbrock-Wanner (ROW) methods and exponential integration techniques, whose origins date back to the 1960s. ROW methods are linearly implicit methods, that is, their idea is to replace the solution of nonlinear systems with a finite number of linear systems per step. Exponential integrators, on the other hand, incorporate stiff dynamics by leveraging matrix exponentials, and offer advantages in problems whose stiffness or oscillatory nature is mainly driven by their linear component. We will discuss the derivation, stability properties, and practical implementation of these methods, as well as compare their strengths, limitations, and potential for virtual analog in real-world applications through illustrative examples.

Dr. Alessia Andò is a postdoctoral fellow at the Department of Mathematics, Computer Science and Physics, University of Udine, where she received her PhD in 2020. She also worked as a postdoc at GSSI (Gran Sasso Science Institute), Italy. Within the general area of Numerical Analysis, her main research interests are ordinary and delay differential equations and related dynamical systems and models. The focus is towards both the numerical time integration and the dynamical analysis of the models, which includes the computation of invariant sets and the study of their asymptotic stability.

Logarithmic Frequency Resolution Filter Design for Audio

Balázs Bank (Budapest University of Technology and Economics)

Abstract: Digital filters are often used to model or equalize acoustic or electroacoustic transfer functions. Applications include headphone, loudspeaker, and room equalization, or modeling the radiation of musical instruments for sound synthesis. As the final judge of quality is the human ear, filter design should take into account the quasi-logarithmic frequency resolution of the auditory system. This tutorial presents various approaches for achieving this goal, including warped FIR and IIR, Kautz, and fixed-pole parallel filters, and discusses their differences and similarities. Application examples will include phyics-based sound synthesis, loudspeaker and room equalization, and the equalization of a spherical loudspeaker array.

Balazs Bank Balázs Bank is an associate professor at the Department of Artificial Intelligence and Systems Engineering, Budapest University of Technology and Economics (BUTE), Hungary. He received his M.Sc. and Ph.D. degrees in Electrical Engineering from BUTE in 2000 and in 2006, and his Hungarian Academy of Sciences (MTA) doctoral degree in 2023. In the academic year 1999/2000 and in year 2007 he was with the Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, Finland. In 2008 he was with the Department of Computer Science, Verona University, Italy. Between 2000 to 2006, and since 2009 he was/is with BUTE. He has been an Associate Editor for IEEE Signal Processing Letters in 2013–2016 and for IEEE Signal Processing Magazine in 2018-2022. He has been the lead Guest Editor for the 2022 JAES special issue “Audio Filter Design”. His research interests include physics-based sound synthesis and filter design for audio applications.

Building Flexible Audio DDSP Pipelines: A Case Study on Artificial Reverb

Gloria Dal Santo (Aalto University)

Abstract: This tutorial focuses on Differentiable Digital Signal Processing (DDSP) for audio synthesis, an approach that applies automatic differentiation to digital signal processing operations. By implementing signal models in a differentiable manner, it becomes possible to backpropagate loss gradients through their parameters, enabling data-driven optimization without losing domain knowledge. DDSP has gained popularity due to its domain-appropriate inductive biases, yet it still presents several challenges. The parameters of differentiable models are often constrained by stability conditions, affected by non-uniqueness issues, and may belong to different domains and distributions, making optimization nontrivial. This tutorial provides an overview of these limitations and introduces FLAMO, a library designed to facilitate more flexible training pipelines. A key focus will be on loss functions: how to select appropriate ones, insights from perceptually informed losses, and techniques for validating them. Demonstrations will use FLAMO, an open-source Python library built on PyTorch’s automatic differentiation framework. Practical examples will primarily centre on recursive systems for artificial reverberation applications.

Gloria Dal Santo received the M.Sc. degree in electrical and electronic engineering from the Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland in 2022, during which she interned at the Audio Machine Learning team at Logitech. She is currently working toward a Doctoral degree with the Acoustics Lab, at Aalto University, Espoo, Finland. Her research interests include artificial reverberation and audio applications of machine learning, with a focus on designing more robust and psychoacoustically informed systems.

Plausible Editing of our Acoustic Environment

Annika Neidhardt (University of Surrey)

Abstract: The technology and methods for creating spatial auditory illusions have evolved phenomenally to the point where we can create illusions that cannot be distinguished from reality anymore. However, so far, such convincing quality can only be achieved with accurate knowledge about the target environment based on measurements or detailed modelling. Rendering virtual content into previously unknown environments remains a challenge. Quick automatic characterisation of its acoustic properties is necessary. Which information do we need to extract to render convincing illusions? Moreover, to what extent can we become creative in manipulating the appearance of the actual environment without compromising its plausibility and vividness? This tutorial will give insight into the perceptual requirements for rendering audio for Augmented and Extended Reality.

Annika Neidhardt is a Senior Research Fellow in Immersive Audio at the University of Surrey. She has been an active researcher of related topics for more than 10 years. She holds an MSc in Electrical Engineering (Automation & Robotics) from Technische Universität Chemnitz and an MSc in Audio Engineering (Computermusic & Multimedia) from the University for Music and Performing Art Graz. After three years in advanced development and applied science, she started her own research project at Technische Universität Ilmenau in the group of Karlheinz Brandenburg in 2017 on 6DoF binaural audio and related perceptual requirements and evaluation. She defended her PhD thesis on the plausibility of simplified room acoustic representations in Augmented Reality in May 2023. In addition, she conducted research on the automatic characterisation of acoustic environments, and perceptual implications for audio in Social VR and XR. Since autumn 2023, Annika continues her research at the Institute of Sound Recording in Surrey with more focus on room acoustic modelling and perceptual modelling.

DISCOVER THE POWER OF MODELING – PART 1: Synthesis Techniques for Acoustic Instrument Emulation

Audio Modeling

Abstract: This presentation introduces SWAM (Synchronous Wave Acoustic Modeling), a technology for accurate acoustic instrument emulation, developed by Audio Modeling. The development process and underlying synthesis techniques are discussed, highlighting their ability to reproduce expressive nuances. VariFlute, a physical model of the flute family, is presented as a case study demonstrating high realism and detailed playability.

DISCOVER THE POWER OF MODELING – PART 2: Room Modeling Combining Physical and Psychoacoustic Approaches

Audio Modeling

Abstract: This presentation discusses the need for an efficient spatializer to represent the continuous and coherent positioning of realistic instruments in a room. Ambiente, an integrated spatializer within SWAM, is presented, combining physical modeling and psychoacoustic principles to achieve accurate and immersive room simulation.

Bridging Symbolic and Audio Data: Score-Informed Music Performance Data Estimation

Johanna Devaney (Brooklyn College and the Graduate Center, CUNY)

Abstract: The empirical study of musical performance dates back to the birth of recorded media. From the laborious manual processes used in the earliest work to the current data-hungry end-to-end models, the estimation, modelling, and generation of expressive performance data remains challenging. This talk will consider the advantages of score-aligned performance data estimation, both for guiding signal processing algorithms and leveraging musical score data and other types of linked symbolic data (such as annotations) for analysing and modelling performance-related data. While the focus of this talk will primarily be on musical performance, connections to speech data will also be discussed, as well as the resultant potential for cross-modal analysis.

Johanna Devaney is an Associate Professor at Brooklyn College and the Graduate Center, CUNY, where she teaches courses in music theory, music technology, and data analysis. Johanna’s research primarily examines the ways in which recordings can be used to study and model performance, and she has developed computational tools to facilitate this. Her research on computational methods for audio understanding has been funded by the National Endowment for the Humanities (NEH) Digital Humanities program and the National Science Foundation (NSF). Johanna currently serves as the Co-Editor-in-Chief of the Journal of New Music Research.

Reverberation – Dereverberation: The promise of hybrid models

Gaël Richard (Télécom Paris, Institut Polytechnique de Paris)

Abstract: The propagation of acoustic waves within enclosed environments is inherently shaped by complex interactions with surrounding surfaces and objects, leading to phenomena such as reflections, diffractions, and the resulting reverberation. Over the years, a wide range of reverberation models have been developed, driven by both theoretical interest and practical applications, including artificial reverberation synthesis—where realistic reverberation is added to anechoic signals—and dereverberation, which aims to suppress reverberant components in recorded signals. In this keynote, we will provide a concise overview of some reverberation modeling approaches and illustrate how these models can be integrated into hybrid frameworks that combine classical signal processing, physical modeling, and machine learning techniques to advance artificial reverberation synthesis or dereverberation.

Gaël RICHARD received the State Engineering degree from Telecom Paris, France in 1990, the Ph.D. degree and Habilitation from University of Paris-Saclay respectively in 1994 and 2001. After the Ph.D. degree, he spent two years at Rutgers University, Piscataway, NJ, in the Speech Processing Group of Prof. J. Flanagan. From 1997 to 2001, he successively worked for Matra, Bois d’Arcy, France, and for Philips, Montrouge, France. He then joined Telecom Paris, where he is now a Full Professor in audio signal processing. He is also the co-scientific director of the Hi! PARIS interdisciplinary center on AI and Data analytics. He is a coauthor of over 250 papers and inventor in 10 patents. His research interests are mainly in the field of speech and audio signal processing and include topics such as source separation, machine learning methods for audio/music signals and music information retrieval. He is a fellow member of the IEEE, and was the chair of the IEEE SPS TC for Audio and Acoustic Signal Processing (2021-2022). He received, in 2020, the Grand prize of IMT-National academy of science. In 2022, he was awarded of an advanced ERC grant of the European Union for a project on machine listening and artificial intelligence for sound.

Effecting Audio: An Entangled Approach to Signals, Concepts and Artistic Contexts

Andrew McPherson (Imperial College London)

Abstract: I propose to approach audio effects not as technical objects, but as a kind of activity. The shift from from noun (“audio effect”) to verb (“effecting audio”, in the sense of applying transformations to sound) calls attention to the motivations, discourses and contexts in which audio processing, analysis and synthesis take place. We build audio-technical systems for specific reasons in specific situations. No system is ever devoid of sociocultural context or human intervention, and even the simplest technologies when examined in situ can exhibit fascinating complexity. My talk will begin with a stubbornly contrarian take on some seemingly obvious premises of musical audio processing. Physicist and feminist theorist Karen Barad writes that “language has been granted too much power.” I would like to propose that as designers and researchers, we can let words about music take precedence over the messy and open-ended experience of making music, but that becoming overly preoccupied with language risks propagating clichés and reinforcing cultural stereotypes. Drawing on recent scholarship in human-computer interaction and science and technology studies, I will recount some alternative approaches and possible futures for designing digital audio technology when human and technical factors are inextricably entangled. I will illustrate these ideas with recent projects from the Augmented Instruments Laboratory, with a focus on rich bidirectional couplings between digital and analog electronics, acoustics and human creative experience.

Navigation Instructions

In the Overview Tab:

Clicking on any event card will bring you to the detailed daily program scrolled at the time and day of the event (hit the Overview tab to go back to the program summary)

In the Daily Program:

On page reload, the program opens at the weekday of conference events or at the overview if the current weekday is not a conference day. Hit any date tab to change day or the Overview tab to reach the overview of the program.
Clicking on the title of any paper will open a popup window showing the abstract of that paper (close [X] the popup window or press Esc to continue navigating the program).
Clicking on the author(s) will open the original paper pdf in another tab of the browser.
Clicking on any tutorial or keynote title will open a popup window with a summary of the talk and a short biography of the speaker(s).

In the Abstract PopUp Window:

Move the horizontal slider to change the width of the window, move the right scroll bar (if available) to browse the abstract.

In the Authors Tab:

Clicking on any author shows the list of Sessions of presentations of the authored work.
Clicking on any of the listed Sessions will open the program at the corresponding presentation.

About

This online conference program was generated by means of scripts developed for DAFx by Gianpaolo Evangelista