19.05.2026 | Intelligente Eingebettete Systeme

Akzeptiertes Paper auf der ICLR 2026

Lukas Rauch, René Heinrich, Houtan Ghaffari, Lukas Miklautz, Ilyass Moummad, Bernhard Sick und Christoph Scholz haben einen Konferenzbeitrag mit dem Titel „Unmute the Patch Tokens: Rethinking Probing in Multi-Label Audio Classification” verfasst und diesen auf der International Conference on Learning Representations (ICLR) 2026 vorgestellt.

Abstract: Although probing frozen models has become a standard evaluation paradigm, self-supervised learning in audio defaults to fine-tuning when pursuing state-of-the-art on AudioSet. A key reason is that global pooling creates an information bottleneck causing linear probes to misrepresent the embedding quality: The cls-token discards crucial token information about dispersed, localized events in audio. This weakness is rooted in the mismatch between the pretraining objective (globally) and the downstream task (localized). Across a comprehensive benchmark of 13 datasets and 6 spectrogram-based encoders, we investigate the global pooling bottleneck. We introduce binarized prototypical probes: a lightweight and simple pooling method that learns prototypes to perform class-wise information aggregation. Despite its simplicity, our method notably outperforms linear and attentive probing. Our work establishes probing as a competitive and efficient paradigm for evaluating audio SSL models, challenging the reliance on costly fine-tuning.

https://openreview.net/forum?id=FbY5Co2NWk