Detection of Non-native Speaker Status from Backwards and Vocoded Content-masked Speech

Arkadiusz Rojczyk
https://orcid.org/0000-0002-7328-5911
Andrzej Porzuczek
https://orcid.org/0000-0001-6398-2150

Abstract

This paper addresses the issue of speech rhythm as a cue to non-native pronunciation. In natural recordings, it is impossible to disentangle rhythm from segmental, subphonemic or suprasegmental features that may influence nativeness ratings. However, two methods of speech manipulation, that is, backwards content-masked speech and vocoded speech, allow the identification of native and non-native speech in which segmental properties are masked and become inaccessible to the listeners. In the current study, we use these two methods to compare the perception of content-masked native English speech and Polish-accented speech. Both native English and Polish-accented recordings were manipulated using backwards masked speech and 4-band white-noise vocoded speech. Fourteen listeners classified the stimuli as produced by native or Polish speakers of English. Polish and English differ in their temporal organization, so, if rhythm is a significant contributor to the status of non-native accentedness, we expected an above-chance rate of recognition of native and non-native English speech. Moreover, backwards content-masked speech was predicted to yield better results than vocoded speech, because it retains some of the indexical properties of speakers. The results
show that listeners are unable to detect non-native accent in Polish learners of English from backwards and vocoded speech samples.


Keywords

accent detection; non-native accent; content-masked speech; vocoded speech; backwards speech

Alexander, L. G. (1967). Practice and progress: An integrated course for pre-intermediate students.London: Longman.

Andrianopolous, M. V., Darrow, K. N., & Chen, J. (2001). Multimodal standarization of voice among four multicultural populations: Formant structures. Journal of Voice, 15, 61–77.

Anisfeld, M., Bogo, N., & Lambert, W. E. (1962). Evaluational reactions to accented English speech. Journal of Abnormal and Social Psychology, 65, 223–231.

Arthur, B., Farrar, D., & Bradford, G. (1974). Evaluation reactions of college students to dialect differences in the English of Mexican-Americans. Language Speech, 17(3), 255–270.

Black, J. W. (1973). The ‘phonemic’ content of backward-reproduced speech. Journal of Speech and Hearing Research, 16, 165–174.

Boersma, P. (2001). Praat, a system for doing phonetics by computer. Glot International, 10, 341–345.

Cummings, F., & Port, R. (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics, 26, 145–171.

Davis, M. H., Johnsrude, I. S., Hervais-Adelman, A., Taylor, K., & McGettigan, C. (2005). Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences. Journal of Experimental Psychology, 134(2), 222–241.

Dellwo, V., Leemann, A., & Kolly, M.-J. (2012). Speaker idiosyncratic rhythmic features in the speech signal. Electronic Proceedings of Interspeech 2012. Portland, OR, USA, 1584–1587.

Derwing, T. M., Munro, M. J., & Thomson, R. I. (2008). A longitudinal study of ESL learners’ fluency and comprehensibility development. Applied Linguistics, 29(3), 359–380.

Donaldson, W. (1992). Measuring recognition memory. Journal of Experimental Psychology. General, 121(3), 275–277.

Donaldson, W. (1993). Accuracy of d’ and A’ as estimates of sensitivity. Bulletin of Psychonomic Society, 31, 271–274.

Flege, J. E., & Port. R. (1981). Cross-language phonetic interference: Arabic to English. Language and Speech, 24(2), 125–146.

Fourcin, A., & Dellwo, V. (2009). Rhythmic classification of languages based on voice timing. UCL Eprints. Retrieved from: http://eprints.ucl.ac.uk/15122/ accessed March 15, 2018.

Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley. Kolly, M.-J., Boula de Mareüil, P., Leemann, A., & Dellwo, V. (2017). Listeners use temporal information to identify French- and English-accented speech. Speech Communication, 86, 121–134.

Kolly, M.-J., & Dellwo, V. (2013). (How) can listeners identify the L1 in foreign-accented L2 speech? Travaux Neuchâtelois de Linguistique, 59, 127–148.

Kolly, M.-J., & Dellwo, V. (2014). Cues to linguistic origin: The contribution of speech temporal information to foreign accent recognition. Journal of Phonetics, 42, 12–23.

Laver, J. (1980). The phonetic description of voice quality. Cambridge: Cambridge University Press.

Lee, C. S., & Todd, N. P. M. (2004). Towards an auditory account of speech rhythm: Application of a model of the auditory ‘primal sketch’ to two multi-language corpora. Cognition, 93, 225–254.

Lev-Ari, S., & Keysar, B. 2010. Why don’t we believe non-native speakers? The influence of accent on credibility.Journal of Experimental Social Psychology, 46, 1093–1096.

Lippi-Green, R. (1997). English with an accent: Language, ideology, and discrimination in the United States. London–New York: Routledge.

Luke, S. G. (2017). Evaluating significance in linear mixed-effects models in R. Behavior Research Methods, 49, 1494–1502.

Mennen, I. (2004). Bidirectional interference in the intonation of Dutch speakers of Greek. Journal of Phonetics, 32, 543–563.

Munro, M. J., & Derwing, T. M. (2001). Modeling perceptions of the accentedness and comprehensibility of L2 speech: The role of speaking rate. Studies in Second Language Acquisition, 23(4), 451–468.

Munro, M. J., Derwing, T. M., & Burgess, C. S. (2010). Detection of nonnative speaker status from content-masked speech. Speech Communication, 52, 626–637.

Porzuczek, A. (2012). Measuring vowel duration variability in native English speakers and Polish learners.Research in Language, 10(2), 201–214.

Ramus, F., Hauser, M. D., Marc, D., Miller, C., Morris, D., & Mehler, J. (2000). Language discrimination by human newborns and by cotton-top tamarin monkeys. Science, 288, 349–351.

Raupach, M. (1980). Temporal variables in first and second language speech production. In H. W. Dechert & M. Raupach (Eds.), Temporal variables in speech: Studies in honour of F. Goldman-Eisler (pp. 263–270). The Hague: Mouton Publishers.

Riggenbach, H. (1991). Toward an understanding of fluency: A microanalysis of nonnative speaker conversations. Discourse Process, 14, 423–441.

Ryan, E. B., & Carranza, M. A. (1975). Evaluative reactions of adolescents toward speakers of standard English and Mexican American accented English. Journal of Personality and Social Psychology, 31(5), 855–863.

Schairer, K. E. (1992). Native speaker reaction to non-native speech. Modern Language Journal, 76(3), 309–319.

Searle, S. R., Casella, G., & McCulloch, C. E. (1992). Variance components. New York: Wiley.

Shannon, R. V., Zeng, F.-G., Kamath, V., Wygonski, J., & Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270, 303–304.

Stoet, G. (2010). A software package for programming psychological experiments using Linux. Behavior Research Methods, 42(4), 1096–1104.

Stoet, G. (2017). A novel web-based method for running online questionnaires and reaction-time experiments. Teaching of Psychology, 44(1), 24–31.

Tajima, K., Port, R., & Dalby, J. (1997). Effects of temporal correction on intelligibility of foreign accented English. Journal of Phonetics, 25, 1–24.

Tilsen, S., & Arvaniti, A. (2013). Speech rhythm analysis with decomposition of the amplitude envelope: Characterizing rhythmic patterns within and across languages. Journal of the Acoustical Society of America, 134, 628–639.

Toro, J. M., Trobalon, J. B., & Sebastián-Gallés, N. (2003). The use of prosodic cues in language discrimination tasks by rats. Animal Cognition, 6, 131–136.

Trofimovich, P., & Baker, W. (2006). Learning second-language suprasegmentals: Effect of L2 experience on prosody and fluency characteristics of L2 speech. Studies in Second Language Acquisition, 28, 1–30.

Van Lancker, D., Kreiman, J., & Emmorey, K. (1985). Familiar voice recognition: Patterns and parameters Part I: Recognition of backwards voices. Journal of Phonetics, 13, 19–38.

White, L., & Mattys, S. L. (2007). Calibrating rhythm: First language and second language studies. Journal of Phonetics, 35, 501–522.

Download

Published : 2021-01-18


RojczykA., & PorzuczekA. (2021). Detection of Non-native Speaker Status from Backwards and Vocoded Content-masked Speech. Theory and Practice of Second Language Acquisition, 6(2), 87-105. https://doi.org/10.31261/TAPSLA.7714

Arkadiusz Rojczyk 
University of Silesia  Poland
https://orcid.org/0000-0002-7328-5911
Andrzej Porzuczek 
University of Silesia  Poland
https://orcid.org/0000-0001-6398-2150




Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

The Copyright Holders of the submitted texts are the Authors. The Reader is granted the rights to use the material available in the TAPSLA websites and pdf documents under the provisions of the Creative Commons 4.0 International License: Attribution - Share Alike  (CC BY-SA 4.0). The user is free to copy and redistribute the material in any medium or format, and to remix, transform, and build upon the material for any purpose, even commercially.

1. License

The University of Silesia Press provides immediate open access to journal’s content under the Creative Commons BY-SA 4.0 license (http://creativecommons.org/licenses/by-sa/4.0/). Authors who publish with this journal retain all copyrights and agree to the terms of the above-mentioned CC BY-SA 4.0 license.

2. Author’s Warranties

The author warrants that the article is original, written by stated author/s, has not been published before, contains no unlawful statements, does not infringe the rights of others, is subject to copyright that is vested exclusively in the author and free of any third party rights, and that any necessary written permissions to quote from other sources have been obtained by the author/s.

If the article contains illustrative material (drawings, photos, graphs, maps), the author declares that the said works are of his authorship, they do not infringe the rights of the third party (including personal rights, i.a. the authorization to reproduce physical likeness) and the author holds exclusive proprietary copyrights. The author publishes the above works as part of the article under the licence "Creative Commons Attribution-ShareAlike 4.0 International".

ATTENTION! When the legal situation of the illustrative material has not been determined and the necessary consent has not been granted by the proprietary copyrights holders, the submitted material will not be accepted for editorial process. At the same time the author takes full responsibility for providing false data (this also regards covering the costs incurred by the University of Silesia Press and financial claims of the third party).

3. User Rights

Under the CC BY-SA 4.0 license, the users are free to share (copy, distribute and transmit the contribution) and adapt (remix, transform, and build upon the material) the article for any purpose, provided they attribute the contribution in the manner specified by the author or licensor.

4. Co-Authorship

If the article was prepared jointly with other authors, the signatory of this form warrants that he/she has been authorized by all co-authors to sign this agreement on their behalf, and agrees to inform his/her co-authors of the terms of this agreement.

I hereby declare that in the event of withdrawal of the text from the publishing process or submitting it to another publisher without agreement from the editorial office, I agree to cover all costs incurred by the University of Silesia in connection with my application.