The browser you are using is not supported by this website. All versions of Internet Explorer are no longer supported, either by us or Microsoft (read more here: https://www.microsoft.com/en-us/microsoft-365/windows/end-of-ie-support).

Please use a modern browser to fully experience our website, such as the newest versions of Edge, Chrome, Firefox or Safari etc.

From Sound to Structure : Robust Localization of Sources, Sensors, and Surroundings

Från ljud till struktur : Robust lokalisering av ljudkällor, sensorer och omgivande miljö

Author

Summary, in English

The topic of this thesis is how to infer geometric information using sound data. Achieving this requires solving several subproblems. First, signal processing of the recorded sound is needed to compute measurements of primitive geometric relations. Secondly, robust estimation is needed to go from primitive geometric measurements to more useful higher-level information such as the locations of microphones and sound sources.

In the case of an uncontrolled sound source, one of the main ways of extracting geometric information comes from computing the time between a sound arriving at each of two microphones. This measurement is referred to as the Time-Difference-of-Arrival (TDOA) and it defines a hyperboloid relative to the two microphones, on which the sound source must lie. While classical correlation-based techniques exist for how to compute the TDOA from two recordings, they typically struggle in reverberant environments where the two signals are not just shifted noisy versions of each other. One of the results of this thesis is showing that better time-delay estimation can be performed by using a learning-based approach. The main issue with using a learning-based approach in this domain is a lack of data. However, this thesis demonstrates that it is possible to solve this issue by utilizing simulations of sound propagation to create synthetic data. This data can then be used to train an energy-based model, which demonstrates improved performance on real data compared to classical methods.

After computing primitive geometric relationships from the sensor data, the goal is to convert them into more useful higher-level information such as the locations of microphones and sound sources. The main problem here lies in that a fraction of the measurements are outliers which means that robust estimation methods such as RANSAC (a hypothesis-and-test framework) need to be used. Since the speed of hypothesis creation is key when using RANSAC, this thesis shows how to construct new minimal solvers for several problems. One example is that we show that sensor network self-calibration in the presence of a reverberant plane allows for minimal problems containing fewer microphones than in the echo-free case.

Topic

  • Computer Vision and learning System

Status

Published

Research group

  • Computer Vision and Machine Learning

ISBN/ISSN/Other

  • ISSN: 1404-0034
  • ISSN: 1404-0034
  • ISBN: 978-91-8104-764-6
  • ISBN: 978-91-8104-763-9

Defence date

16 January 2026

Defence time

13:15

Defence place

Lecture Hall MH:Hörmander, Centre of Mathematical Sciences, Märkesbacken 4, Faculty of Engineering LTH, Lund University, Lund.

Opponent

  • Tuomas Virtanen (Prof.)