From Sound to Structure : Robust Localization of Sources, Sensors, and Surroundings

Från ljud till struktur : Robust lokalisering av ljudkällor, sensorer och omgivande miljö

Author

Erik Tegler

Summary, in English

The topic of this thesis is how to infer geometric information using sound data. Achieving this requires solving several subproblems. First, signal processing of the recorded sound is needed to compute measurements of primitive geometric relations. Secondly, robust estimation is needed to go from primitive geometric measurements to more useful higher-level information such as the locations of microphones and sound sources.

In the case of an uncontrolled sound source, one of the main ways of extracting geometric information comes from computing the time between a sound arriving at each of two microphones. This measurement is referred to as the Time-Difference-of-Arrival (TDOA) and it defines a hyperboloid relative to the two microphones, on which the sound source must lie. While classical correlation-based techniques exist for how to compute the TDOA from two recordings, they typically struggle in reverberant environments where the two signals are not just shifted noisy versions of each other. One of the results of this thesis is showing that better time-delay estimation can be performed by using a learning-based approach. The main issue with using a learning-based approach in this domain is a lack of data. However, this thesis demonstrates that it is possible to solve this issue by utilizing simulations of sound propagation to create synthetic data. This data can then be used to train an energy-based model, which demonstrates improved performance on real data compared to classical methods.

After computing primitive geometric relationships from the sensor data, the goal is to convert them into more useful higher-level information such as the locations of microphones and sound sources. The main problem here lies in that a fraction of the measurements are outliers which means that robust estimation methods such as RANSAC (a hypothesis-and-test framework) need to be used. Since the speed of hypothesis creation is key when using RANSAC, this thesis shows how to construct new minimal solvers for several problems. One example is that we show that sensor network self-calibration in the presence of a reverberant plane allows for minimal problems containing fewer microphones than in the echo-free case.

Department/s

Publishing year

2025-12-04

Language

English

Full text

Available as PDF - 33 MB
Download statistics

Links

Publication in Lund University research portal

Document type

Dissertation

Publisher

Centre for Mathematical Sciences, Lund University

Topic

Computer Vision and learning System

Status

Published

Research group

Computer Vision and Machine Learning

Supervisor

ISBN/ISSN/Other

ISSN: 1404-0034
ISSN: 1404-0034
ISBN: 978-91-8104-764-6
ISBN: 978-91-8104-763-9

Defence date

16 January 2026

Defence time

13:15

Defence place

Lecture Hall MH:Hörmander, Centre of Mathematical Sciences, Märkesbacken 4, Faculty of Engineering LTH, Lund University, Lund.

Opponent

Tuomas Virtanen (Prof.)

From Sound to Structure : Robust Localization of Sources, Sensors, and Surroundings

Summary, in English

Contact information

Shortcuts

Find us on social media

Collaboration and networks