Knowledge-light Letter-to-Sound Conversion for Swedish with FST and TBL
Author
Editor
- Gilbert Ambrazaitis
- Susanne Schötz
Summary, in English
This paper describes some exploratory attempts to apply a combination of finite state
transducers (FST) and transformation-based learning (TBL, Brill 1992) to the problem of
letter-to-sound (LTS) conversion for Swedish. Following Bouma (2000) for Dutch, we employ
FST for segmentation of the textual input into groups of letters and a first transcription stage;
we feed the output of this step into a TBL system. With this setup, we reach 96.2% correctly
transcribed segments with rather restricted means (a small set of hand-crafted rules for the
FST stage; a set of 12 templates and a training set of 30kw for the TBL stage).
Observing that quantity is the major error source and that compound morpheme
boundaries can be useful for inferring quantity, we exploratively add good precision-low
recall compound splitting based on graphotactic constraints. With this simple-minded
method, targeting only a subset of the compounds, performance improves to 96.9%.
transducers (FST) and transformation-based learning (TBL, Brill 1992) to the problem of
letter-to-sound (LTS) conversion for Swedish. Following Bouma (2000) for Dutch, we employ
FST for segmentation of the textual input into groups of letters and a first transcription stage;
we feed the output of this step into a TBL system. With this setup, we reach 96.2% correctly
transcribed segments with rather restricted means (a small set of hand-crafted rules for the
FST stage; a set of 12 templates and a training set of 30kw for the TBL stage).
Observing that quantity is the major error source and that compound morpheme
boundaries can be useful for inferring quantity, we exploratively add good precision-low
recall compound splitting based on graphotactic constraints. With this simple-minded
method, targeting only a subset of the compounds, performance improves to 96.9%.
Department/s
Publishing year
2006
Language
English
Pages
141-144
Publication/Series
Proceedings of Fonetik 2006
Full text
Document type
Conference paper
Publisher
Lund University
Topic
- General Language Studies and Linguistics
Keywords
- LTS
- Swedish
- grapheme-to-phoneme conversion for Swedish
- letter-to-sound conversion for Swedish
Conference name
Fonetik 2006
Conference date
2006-06-07 - 2006-06-09
Conference place
Lund, Sweden
Status
Published