Phonotactic Probabilities and Sub-syllabic Segmentation in Language Learning

High phonotactic probabilities are known to exert a facilitative effect on word learning in children and adults in their first language. The present study was designed to investigate the role of phonotactic probabilities when learning a foreign language. Focusing on Austrian and Korean learners of English, we investigated two hypotheses related to phonotactic frequency effects: (1) High-frequency segments have more deeply entrenched phonetic representations, with more automatized pronunciation patterns, rendering phonetic learning of homophonous segments more difficult; (2) High-frequency segments are associated with higher phonetic variability in the first language, which can facilitate phonetic learning in a foreign language. Additionally, the locus of phoneme/ bigram frequency effects was analyzed in relation to left-branching and right-branching syllable structure in German and Korean. We found that proximity to English voice-onset time is correlated with phoneme and bigram frequencies in the first language, but results varied by learner group. Sub-syllabic segmentation of the first language was also shown to be an influential factor. Our study is grounded in research on frequency effects and combines its central premise with phonetic learning in a foreign language. The results show a tight relationship between first language statistical probabilities and phonetic learning in a foreign language.


Background
Phonotactic probability is defined as the position-specific frequency of segments and segment combinations (Vitevitch, 1997;Vitevitch & Sommers, 2003) and is thus a measure of how frequent (and probable) particular segments of words and sequences of phonemes are (Vitevitch & Luce, 1999).Different phonotactic constraints apply to different languages and (first and foreign) language learners accumulate knowledge on phonotactic probabilities based on experience (Weber & Cutler, 2006).High-frequency phonotactic combinations serve an important purpose in word recognition, as words including such combinations are generally recalled faster and more accurately (Frisch, Large, & Pisoni, 2000;Luce & Large, 2001;Vitevitch, Armbruster, & Chu, 2004;Vitevitch & Luce, 1998).High phonotactic probability has not only been linked to more rapid word learning in adults but also in child language acquisition (Storkel, 2001;Storkel & Maekawa, 2005;Storkel & Rogers, 2000).The advantage in word learning involving high-probability phonotactic combinations could result from strengthened cognitive representations of the frequent phonotactic combinations (Bybee, 2007).Storkel (2001), for example, suggested that high phonotactic probability segments also influence the formation of semantic representations and the association between semantic and lexical ones, thus furthering learning.
While some studies have linked phonotactic probabilities to word learning in general (e.g., Storkel, Armbruster, & Hogan, 2006), less is known about phonetic learning.Based on previous work on word frequencies, several predictions can be inferred regarding frequency and probability effects in relation to phonotactic combinations.It has been shown that high-frequency words may be more deeply engrained in linguistic memory, and thus have more entrenched phonetic patterns (Bybee, 2007;Levy & Hanulikova, 2019;Pierrehumbert, 2001;Schweitzer et al., 2015).The special role of high-frequency distributions of particular words in connection to phonological changes has long been acknowledged in studies on linguistic change (Bybee, 2002;Phillips, 1984;Pierrehumbert, 2001).Under certain circumstances low-frequency words may be phonetically more malleable and thus more prone to sound change than high-frequency words (Phillips, 1984(Phillips, , 2006;;Todd, Pierrehumbert, & Hay, 2019).An alternative hypothesis describes high-frequency words as having larger exemplar clouds, that is, being associated with more phonetic variation in the speaker's mind (Levy & Hanulikova, 2019;Schweitzer et al., 2015).This implies that speakers have more numerous and diverse phonetic targets associated with each high-frequency speech sound.Low-frequency sounds have smaller exemplar clouds and thus show less phonetic variability (Levy & Hanulikova, 2019).The crucial difference between these hypotheses is whether high-frequency rates limit or increase variability, and this has implications not only for sound change but also for language learning.While the abovementioned studies focused on the word level, similar tendencies may be at work at the segmental level.Lexical frequency rates and phonotactic probabilities have shown to be correlated (Storkel & Maekawa, 2005) in English, allowing the cross-fertilization of theories in the two strands of linguistic investigation.
What is suggested for phonological change may also apply to foreign language learning of novel phonetic detail in a known phonotactic combination (i.e., cross-linguistic phonotactics).When learners of a foreign language encounter a high-frequency phonotactic combination that is similar to one in their first language (e.g., /bi/), they may either be phonetically limited by their first language, or they may have access to a highly variable phonetic inventory and thus be better able to approximate the foreign-language phonetics.In contrast, low-frequency phonotactic combinations in the first language may be either more malleable to phonetic learning due to their shallow cognitive entrenchment, or learners may have a smaller phonetic inventory and face more difficulty in finding a suitable pronunciation.The two hypotheses lead to very different predictions with respect to how learners can acquire the phonetics of phonotactic combinations in the foreign language.The following study focuses on learners of English as a Foreign Language (EFL) and investigates how phonotactic probabilities of utterance-initial segments in their first languages (Korean, German) impact phonetic learning of the cross-linguistic variants of the combinations in English.
Korean and German are typologically different languages, and one key difference concerns the structure of the syllable.While syllable universals have been hard to define, the general outline of onset-rhyme (i.e., right-branching syllables) and body-coda (i.e., left-branching syllables) is an accepted categorization (Berg & Koops, 2010;J.-Y. Kim & Lee, 2011).The difference between the two types is the linkage strength between the initial segments.Whereas the onset-rhyme structure separates the initial phoneme from the rhyme in closed syllables, the body-coda system binds the initial phoneme and the following vowel together (J.Kim, 2015).For instance, a syllable such as /ban/ would be perceived with /b/ separate from /an/ in the German onset-rhyme structure, whereas in the Korean body-coda structure, /ba/ would go together and /n/ would be perceived as a separate entity (see Figure 1).Berg and Koops (2010) and Kim (2015) speculate whether the left-and right-branching preferences found across Korean and English are also related to phonotactic dependencies between segments.How robustly the nucleus vowel is formed in phonetic memory in connection with either the onset or the coda is unclear at the moment.Phonotactic probabilities have been shown to have an effect on the perception and processing of syllable structure, with Korean speakers being better at processing the onset and nucleus of a syllable rather than only the initial phoneme (J.Kim, 2015;J. Kim & Davis, 2002;Witzel, Witzel, TAPSLA.12468 p. 4/31 Eva Maria Luef, Pia Resnik & Choi, 2013).The sub-syllabic characteristics of Korean indicate that initial bigrams are a crucial unit in speech processing in the language.In German, the initial segment may be more influential.The present study analyzes phonotactic probabilities of word-initial phonemes and bigrams (or biphones) in English, Korean, and German, and relates them to phonetic learning of English as a Foreign Language in speakers of Korean and German.The following two inter-linked research questions are posed: 1. Are high-frequency phonotactic combinations more difficult to adapt through learning than low-frequency phonotactic combinations?2. Does sub-syllabic structure play a role?Specifically, is Koreans' EFL speech more strongly impacted by initial bigram frequencies, while Germans' EFL speech is more strongly influenced by initial phoneme frequencies?Two groups of EFL learners, Korean first language (L1) users from Seoul and Austrian speakers of L1 German, are compared in terms of phonetic learning of voice onset time in word-initial fortis and lenis plosives in English.Confounding factors that may influence phonotactic probability and/ or wordinitial voice onset time (VOT), such as lexical frequency rates, neighborhood density, English phoneme and bigram frequency, and EFL phoneme and bigram frequency are considered in the analysis.

Voice onset Time in English, Korean and Austrian Plosives
English distinguishes two phonation types of plosives, commonly referred to as "lenis" and "fortis" (or "voiced" and "voiceless").In utterance-initial position, American English lenis plosives are phonologically voiced, phonetically voiceless and unaspirated, with a mean VOT range of 8 to 17 msec.(Chodroff, Godfrey, Khudanpur, & Wilson, 2015).The utterance-initial fortis plosives are phonologically and phonetically voiceless and aspirated, with a mean VOT range of 65 to 120 msec. in American English speakers (Berry & Moyle, 2011).In other positions, including word-initial but utterance-medial, American English plosives are more likely to have voicing (Davidson, 2016).Regional differences in VOT have been noted, with speakers from Southern states displaying a tendency to pre-voice word-initial lenis plosives (Hunnicutt & Morris, 2016;Morris, 2018).Lenis VOTs of speakers from Southern British English (e.g., London) range from 10-22 msec.(Sonderegger, 2015), but speakers from Scotland may show significant pre-voicing of up to 100 msec.(Watt & Yurkova, 2007).British English fortis VOTs most frequently range between between 50 and 100 msec.for Northern England and Scottish speakers (Docherty, Watt, Llamas, Hall, & Nycz, 2011) but are shorter for Southern England speakers, ranging between 35-75 msec.(Sonderegger, 2015).There is significant overlap between British and American English voice onset times, and both are different from Austrian German and Korean in certain respects.
The terms lenis and fortis are also used to describe the two phonation types of German plosives.German shows no voicing of plosives in wordinitial position and has longer VOTs of lenis plosives than English.Southern German (including Austrian) plosives differ from Northern/ Middle German and a near-merger of word-initial fortis-lenis contrasts in some articulatory positions complicates the pattern (Moosmüller & Ringen, 2004;Moosmüller, Schmid, & Brandstätter, 2015).Aspiration is absent (Moosmüller, 1987;Siebs, de Boor, Moser, & Winkler, 1969) and contemporary Austrian lenis plosives are characterized by short VOTs, while fortis plosives show no aspiration in bilabial, little aspiration in alveolar and strong aspiration in velar position (Luef, 2020).In younger Austrian speakers, who are in the process of phonetically splitting the near-merger, mean lenis VOTs range between 4 and 13 msec., while fortis plosives show an average range of 33 to 68 msec.
Korean shows a three-way distinction in plosives (lax or lenis, aspirated, and tense, see J. Y. Kim, 2010;Shin, Kiaer, & Cha, 2013).The so-called fortis plosives in Korean usually refer to the tense category (e.g., [p*], characterized by very short VOT), and are thus not equivalent to the Germanic fortis plosives.The Korean lenis plosives are phonologically voiceless and show mean VOT values of approximately 55 to 70 msec.; the aspirated plosives are phonologi-  (Kang, 2014;Silva, 2004Silva, , 2006)).A merger of lax and aspirated plosives in all three articulatory positions has led to VOT overlap of phrase-initial lenis and aspirated plosives (Jucker & Smith, 2006;Silva, 2006).Recent studies have shown that the VOT ranges for lax stops have increased, while those for aspirated ones have decreased, with the VOT difference between these two categories reducing accordingly (Chang & Kwon, 2020).This change in Korean has implications for the realignment of the Korean and English stop categories, with both lax and aspirated stops approximating the VOT ranges associated with English voiceless stops, as schematized in Figure 2.While the Korean plosive merger has obscured phonetic distinctions between lax and aspirated plosives, the F0 distinction at the onset of the following vowel has been amplified: the F0 values for aspirated stops are higher than those of lax stops, a trend that has led to distinct tonal levels (Kang, 2014).The vowel environment of a word-initial plosive can have influences on VOT duration in different languages (Esposito, 2002;Grassegger, 1996;Klatt, 1975;Moosmüller & Ringen, 2004;Mortensen & Tøndering, 2013).Vowel height plays a role here and constricting the air passage through the vocal tract (such as when raising the tongue) will lead to a delay in voice onset time (Fischer-Jørgensen, 1980).Thus, high vowels will cause VOT to be prolonged, while low vowels cause it to be shortened.Mean VOT Values of Short-lag VOT (Lenis, Lax) and Long-lag VOT (Fortis, Aspirated) in American English (Based on Berry &Moyle, 2011, andChodroff et al., 2015), Austrian German (Based on Luef, 2020), and Korean (Based on Kang, 2014and Silva, 2004, 2006) The mapping of Austrian and Korean plosives onto English ones is phonetically complicated.Austrian lenis and American English lenis can be regarded as corresponding; however, Austrian fortis only has small overlaps with American English fortis.The Korean lenis category ranges within the Austrian fortis category, with significant overlaps with American English fortis Phonotactic Probabilities and Sub-syllabic Segmentation… TAPSLA.12468p. 7/31 plosives.Korean aspirated plosives range within the American English fortis plosives.While phonetic mapping of the three languages is difficult, grapheme mapping is clear.German and English graphemes of lenis and fortis plosives are identical and German readers of English will immediately map them correspondingly.A widely used language Romanization system in South Korea ("Revised Romanization of Korean"/ 국어의 로마자 표기법) transcribes the word-initial lenis plosives <ㅂ>, <ㄷ>, and <ㄱ> as <b>, <d>, and <g> and the aspirated plosives <ㅍ>, <ㅌ>, and <ㅋ> as <p>, <t>, and <k> (note: tense plosives are transcribed with double consonants, e.g., <bb>).Here, grapheme correspondences between Korean and American English lenis and aspirated/ fortis categories are established and may guide Korean readers of English in their mapping of plosive correspondences.The present study tests phonetic learning of Korean and Austrian learners of English and is based on reading stimuli.Therefore, grapheme mapping is expected to be influential in the process.Austrian learners certainly map their lenis and fortis contrasts onto the English lenis/fortis distinction, and Korean learners may be more inclined to map their lenis onto the English lenis and their aspirated contrasts onto the English fortis category.
According to the UCLA Phonological Segment Inventory Database (see Maddieson, 1984), plosive consonants (especially fortis) are among the most frequent phonemes in languages world-wide (also see Everett, 2018).Even though individual languages utilize them to different degrees, their articulatory and perceptual ease makes them pervasive to the human language capacity (Ohala, 1983).From such a universal view of phonological complexity (e.g., Romani, Galuzzi, Guariglia, & Goslin, 2017), it could be assumed that differences between their individual frequency rates may not lead to significant differences in foreign language learning. 1In usage-based accounts of language acquisition and development, phonemic frequency generally plays a role, with different predictions resulting for production and perception of phonemes (Bybee, 2001).Studies have shown that VOT contributes to transfer effects in second language learners (e.g., Schoonmaker-Gates, 2015;Skarnitzl & Rumlová, 2019), suggesting an effect of language-specific phonological patterns, which impede or facilitate phonological learning in a second language.

Participants and Procedures
Speakers whose first language was Korean (N = 22, male: 5; female: 17) and Austrian German (N = 21, male: 3; female: 18) were recruited in their home countries in the cities Seoul and Vienna, respectively, for a sentencereading task in their foreign language English.Participants were students whose ages ranged from 19 to 27 (mean = 23.2), and who were enrolled in foreignlanguage programs at their respective universities (Seoul National University, University of Vienna), where admission required English proficiency levels of B2 or higher according to the Common European Framework of Reference for Languages (Council of Europe, 2018).The majority of students were in advanced years of their program, some of them in graduate programs.They primarily reported using their first languages in their daily lives but were highly exposed to American English through online media and, in the case of Koreans, by American pronunciation teachers (Ahn, 2011).Austrian students of English may be exposed to British English to a higher degree, having travelled to Great Britain or being tutored by British pronunciation teachers.All participants were first informed about the recording procedures (but not told about the objective of the study) and their rights as participants.After having given their consent, they completed a survey that collected demographic information and details about the participants' linguistic habits (e.g., first language, dialect, exclusion of speech impediments).The participants were paid for their participation and the experiment took place between November 2018 and June 2019.The study compared two experimental groups but no control group was included in the experimental design.
The sentence-reading task consisted of 86 short English sentences or phrases (mean words per sentence = 6.3,SD = 2.1) which were read once at a comfortable speed and in the same order by each participant.The sentences were typed with a word processor and printed on a piece of paper that was given to each participant.Each sentence contained a target lexeme with a word-initial plosive consonant in sentence-initial position (e.g., 'Buffaloes are large animals' or 'Cats are active at night,' see supplementary material Table A1 for the list of carrier sentences), resulting in similar prosodic/rhythmic structure of the sentences.Participants were not familiar with the sentences and phrases before the start of their reading and were asked to assess the level of difficulty afterwards in their first language by speaking aloud the terms for 'easy,' 'medium,' and 'difficult' (Sino-Korean: 'ha': 하, 'jung': 중, 'sang': 상; German: 'leicht,' Phonotactic Probabilities and Sub-syllabic Segmentation… TAPSLA.12468p. 9/31 'mittel,' 'schwer').By uttering a Korean or German term after each sentence/ phrase, we attempted to minimize habituation effects.The order of word-initial plosive appearance was shuffled so that no consecutive sentences started with the same plosive.All target lexemes had the primary stress on the first syllable.Each plosive type (lenis/lax and fortis/aspirated variants of bilabials, alveolars, and velars) appeared in word-initial bigrams with high vowels ([i, ɪ]), mid ([e, ɛ, ae]), and low vowels ([ɑ, ʌ, a]).We grouped the vowels according to height in order to account for the VOT differences in relation to vowel height.Each bigram combination appeared a minimum of four times, resulting in each plosive type appearing at least 14 times throughout the sentence-reading task.For instance, the bigram [di, dɪ] started the five sentences 'Deans of colleges have to work long hours,' 'Dishwashers are too expensive for me,' 'Deals in the business world are hard to make,' 'Differences in opinion should not be expressed,' and 'Dill is an herb used for Italian cooking.' Sentences belonging to the same bigram class (e.g., lenis alveolar + i/ɪ) were spaced apart at a minimum of ten sentences.All sentences were semantically unrelated to their neighboring sentences and no phonological neighbors in target words were presented in consecutive sentences.Cases of deviant phonology (e.g., [ʤɪl] instead of [ɡɪl]) or stress placement (e.g., 'dessert' instead of 'desert') were removed from the sample.The possible difference in isochronous temporal patterns between Korean (Lee, Jin, Seong, Jung, & Lee, 1994) and German (Port, 1983) was negligible in the present study as only sentence-initial syllables with primary stress were the focus of analysis.
The participants' speech was recorded with a ZoomH4n digital audio recorder with an attached Sennheiser ME67 microphone.Speech was sampled at 44.1 kHz at 16-bit depth, and was subsequently saved and stored as .WAV files.Target lexemes were cut manually from the audio stream and saved as separate files, which were later processed with the open-source acoustic software Praat (Boersma & Weenink, 2019).Overall lexeme duration as well as the duration of the word-initial VOTs were manually annotated on two different tiers in the program that allowed automated extraction of the durations (in seconds) via a script.
The start of each lexeme/VOT was marked at the burst of the stop (Abramson & Whalen, 2017); the end of VOT was determined at the onset of glottal pulsing (settings: 100-600 Hertz for women and 75 to 300 Hertz for men, Vogel, Maruff, Snyder, & Mundt, 2009).The majority of words (78%) ended in alveolar fricatives (of which 97% were <s>, voiced or unvoiced) and here the end point was marked when the frication had ceased (i.e., the nearest zero crossing) as visible on the waveform and spectrogram.In the case of plosives (10%), nasals (7%), liquids (3%), or vowels (2%) constituting the final phonemes of the target lexemes, the end point was determined when the waveform cycle had ceased and the sound had completely faded.VOT was normalized for speech rate by calculating a measure of syllables per second on 5% of each participant's speech (= eight sentences per participant taken from the middle of the reading texts; the sentences were the same for each participant).This value was then multiplied with VOT (in seconds) and later converted to milli-seconds by multiplying it by 1000.
Approximately 7% of the data was coded for reliability by a second observer and Pearson's R along with the root mean square error (RMSE) were calculated to see whether the two coders agreed on (a) overall word duration and (b) start of VOT (= initial burst).For word durations, an excellent R value of .99 (RMSE = .023)and for VOT durations, an acceptable R value of .71(RMSE = .014)are reported.

Variables VOT Distance
In order to determine the degree of similarity of the Korean and Austrian learners' VOTs to those of native English speakers, VOTs of American English speakers were extracted from the TIMIT Corpus, a collection of sentences read by American English speakers from different dialect regions, which is widely used in the phonetic sciences (Garofolo et al., 1993).Even though American English VOTs show socio-phonetic and regional stratification (see, e.g., Lipani, 2019), the present study will focus on average VOTs across the variety of American English speakers.We identified sentences starting with nouns with initial bigrams that were the focus of our study (see Participants and Procedures).Primary stress had to be on the first syllable (N = 146).We measured VOT in the identical way as described for the EFL learners.Due to an underrepresentation of the sentence-initial bigrams b, d, and g plus [i, ɪ], d and g plus [e, ɛ, ae], selected recordings of the American radio show "This American Life" (https://www.thisamericanlife.org) were added to the corpus (N = 21).After identifying speakers whose biographical information (e.g., age) were available, bigrams representing the initial segments of sentence-initial nouns were manually cut from the .WAV files that were downloaded from the website of the show.Acoustical measurements followed the procedures as outlined for the EFL learners and the TIMIT Corpus.Speech rates of each American English sentence in the TIMIT corpus were calculated (syllables per second) and each VOT was normalized for speech rate.See Appendix Table A2 for more information on the American English speaker data.
The phonetic distance between the Korean/Austrian VOTs to the American English target VOT spaces was assessed by calculating the Mahalanobis distance (Kartushina, Hervais-Adelman, Frauenfelder, & Golestani, 2015), which computes the distance of a test point from the distribution mean by considering the covariance matrix (Martos, Muñoz, & González, 2013).The Mahalanobis Phonotactic Probabilities and Sub-syllabic Segmentation… TAPSLA.12468p. 11/31 distance takes into account natural variability in speech production by calculating the number of standard deviations from a learner's VOT to the mean of the target spaces (computed per plosive type) derived from the American English speakers, along each principal component axis of the target spaces (Kartushina et al., 2015).A Mahalanobis distance of 0 indicates that a learner VOT value is at the mean of the target space.After analzying z-scores of Mahalanobis distance scores and removing those over three standard deviations, the highest Mahalanobis distance in the present study was 14.21.

Frequency Variables
Frequency rates of Korean initial phonemes were taken from Shin, Kiaer, and Cha (2013) who based their calculations on the Yonsei Korean Language Dictionary and the Standard Korean Language Dictionary in combination with the SLILC Spoken Language Information Lab Corpus (Shin, 2008).To determine the frequency rate of plosive-plus-vowel bigrams in word-initial position in Korean (which are not included in Shin et al., 2013), we used the Korean corpus of the Leipzig Corpora Collection/ Deutscher Wortschatz Corpus, comprising over 109 million tokens and over seven million types extracted from Korean newspapers between 2011 and 2019 (Goldhahn, Eckart, & Quasthoff, 2012).We analyzed the first 100 types of each specific bigram (collapsing the nearly merged 애 and 에), noted down their token frequencies, and divided the token frequencies by the overall tokens of the corpus.
Frequencies of word-initial German phonemes and bigrams were calculated using CLEARPOND for German (GermanPOND, Marian, Bartolotti, Chabal, & Shook, 2012), which is based on the SUBTLEX-DE Corpus, a corpus of movie and TV subtitles that is considered an excellent corpus for spoken German (Brysbaert et al., 2011).Austrian German differs from Middle/Northern German; however, the majority of German corpora include only small portions of Bavarian and/or Austrian varieties.In order to establish the applicability of the GermanPOND resource for Austrian data, the only available Austrian language corpus was compared to CLEARPOND to see whether Austrian and German lexical frequency rates are correlated and CLEARPOND can be used to analyze Austrian speech data.The ANNO Corpus of the Austrian National Library ("Austrian Newspapers Online," http://anno.onb.ac.at) is a collection of 20 million pages of Austrian newspapers and magazines published between 1527 to 2014.It is the only sizable corpus of Austrian German.There is a corpus of spoken Austrian German, the GRASS Corpus (Schuppler, Hagmueller, Morales-Cordovilla, & Pressentheiner, 2014); however, it contains only spoken language and a limited number of speakers and tokens that can be analyzed with it.In the ANNO Corpus, the uninflected target words were searched between the TAPSLA.12468p. 12/31 Eva Maria Luef, Pia Resnik time period of 1950 and 2000 and the number of occurrences were noted down.As this corpus does not include a total token number but only gives the number of newspapers/magazines for a search period, token frequency was calculated per newspaper/magazine.For example, the word Bank (Engl.'bank') occurred 652 times within the corpus, which was constituted of 3,124 newspapers and magazines for the respective time period.Frequency was calculated by dividing 652 by 3,124.This resulted in a lexeme frequency of 0.21 for Bank.Next, uninflected target words were searched in GermanPOND and their frequencies were extracted.The database underlying the German Clearpond calculators is the SUBTLEX-DE Corpus.The frequency values obtained from the ANNO and Clearpond corpora were z-scored, and then checked for correlations.They were correlated (Pearson's r = 0.65) and thus reliability of the GermanPOND resource for Austrian speech was assumed.
VOTs of EFL learners may be influenced by frequencies of items in the learned language.Thus, word-initial phoneme and bigram frequencies of EFL were calculated and compared to the frequency rates from the native languages.We used different EFL corpora from which we calculated the phoneme and bigram frequency rates for the EFL learners of Korean or German language background.For the Korean learners of English, the "ICNALE/ International Corpus Network of Asian Learners of English Corpus" (Ishikawa, 2013) was used.We calculated the frequency rate of word-initial phoneme and bigrams of the sub-corpus spanning only Korean learners of English by dividing the overall occurrences of the phoneme and bigrams by the number of tokens of the Korean corpus (= 246,879).
For the Austrian learners of English, data was extracted from two corpora, the "Louvain International Database of Spoken English Interlanguage" or LINDSEI (Gilquin, de Cock, & Granger, 2010) and the "Giessen-Long Beach Chaplin Corpus/GLBCC" (Jucker et al., 2006).We selected materials produced by speakers whose first language was German and determined initial phoneme and bigram frequencies by dividing the overall occurrences of the word-initial phonemes and bigrams by the corpus tokens (combined corpus size = 489,270).
CLEARPOND for English (Marian et al., 2012) was used to obtain English lexical frequency rates of the target words, initial plosive and initial bigram frequency rates.In addition, neighborhood density (i.e., number of phonological neighbors of the English target words differing by one phoneme) was calculated, as this variable plays an important role in lexical processing of first and foreign languages (Fricke, Baese-Berk, & Goldrick, 2016).Syllable frequencies were not calculated as the majority of word-initial syllables of the stimuli do not appear in Korean or German (e.g., 'bath,' 'dance').This was due to the fact that many target words were monosyllabic (e.g., 'bills,' 'banks') and, thus, syllable frequency would be conflated with lexical frequency.
Phonotactic Probabilities and Sub-syllabic Segmentation… TAPSLA.12468p. 13/31 All phoneme and bigram frequency variables (L1, EFL, English) were first log transformed [LOG(x+1)] (to account for zero values in the data) and then rescaled to range between 0 and 1 in order to account for the different frequency distributions of phonemes and bigrams in the fortis and lenis category and per learner group.This allowed a direct comparison between Koreans and Austrians and between lenis and fortis consonants.

Statistical Analyses
First, a collinearity diagonistic was run on the independent variables (with the R packages "performance" and "car") and correlation coefficients and variance inflation factors were computed (see Table 1).English phoneme frequency was shown to be correlated with English bigram frequency, and neighborhood density was correlated with neighborhood frequency.For each correlated pair, the first principal component (PC1) was computed via Principal Components Analysis in order to combine the two variables into one that can account for the majority of the variability of the two variables (Salem & Hussein, 2019).The first principal component (PC1) of "English phoneme frequency" and "English bigram frequency" was correlated negatively at -0.71 with each of the two variables and explained 75% of the data variability.The combination variable was termed "English phoneme/bigram frequency."PC1 of "neighborhood density" and "neighborhood frequency" (termed "neighborhood density/frequency") was correlated with each of the original variables at -0.7 and was able to account for 86% of the data variability.
A series of linear mixed models was then calculated (Bates, Maechler, Bolker, & Walker, 2014), with the dependent variable being the Mahalanobis distance scores of the learners and the fixed effects being (1) L1 phoneme frequency and ( 2) L1 bigram frequency.As control variables we entered (3) EFL phoneme frequency, and ( 4) EFL bigram frequency, ( 5) English phoneme/ bigram frequency, ( 6) English lexical frequency rate, and ( 7) neighborhood density/frequency.As random effects (intercepts) we included 'subject' and 'word.'To keep type I error at the nominal level of 0.05, we included the maximal random slope structure (all fixed effects) per subject and per word (Barr, Levy, Scheepers, & Tily, 2013).Different models were computed with the Korean and the Austrian data.
As an overall test of the effect of the fixed effects, we compared the full model with a respective null model lacking the fixed effects (but being otherwise identical to the full model) using a likelihood ratio test (Dobson, 2002;Forstmeier & Schielzeth, 2011).We also tested the significance of individual fixed effects by comparing the full model with a respective reduced model lacking the effect to be tested.Due to low variance inflation factors, collinearity did not appear to be an issue (Field, 2005;Quinn & Keough, 2002).The models were implemented in R (R Studio Team, 2020) using the function lmer of the package lme4 (Bates et al., 2014).The sample size for the models was 3,590 tokens, involving 86 types, and 43 speakers.

Results
American English speakers generally showed shorter VOTs before low vowels (see Table 2).The same pattern was true for Korean and Austrian learners of English, and these results are in agreement with previous literature on the influence of vowel height on VOT (Mortensen & Tøndering, 2013).In total, 21.8% of Koreans' and 38.5% of Austrians' VOTs had a Mahalanobis distance of less than 1, which is close to the benchmark targets of the American English VOTs for their respective plosive types.Fortis plosives generally showed larger distances from the American English VOTs and the lenis plosives of the learners were closer to the American English phonetic spaces (see Figure 3).VOT distances of the lenis plosives were larger in Korean speakers, a fact that can be explained by the larger phonetic distance between Korean lenis and American English lenis VOTs.In addition, VOT distances of Koreans' /k/ also exceeded those of the Austrians.Both learner groups achieved the best VOT results for /g/.The largest Mahalanobis distances and variability in distances were measured for /p/ in both Koreans and Austrians.Mahalanobis Distance per Plosive Type and First Language Background

Korean Results
Results showed that Koreans' VOT distances were influenced by L1 bigram frequencies but not by L1 plosive frequencies (see Table 3 and Figure 4).Lower bigram frequencies facilitated smaller VOT distances to the American English model.

Low Bigram Frequencies in Korean Facilitated Phonetic Learning in Wordinitial Bigrams in Fortis and Lenis Plosives
In addition, EFL bigram frequencies and English plosive/bigram frequencies had an effect on VOT distances in the Korean learners (see Table 3), with the latter showing the opposite effect on VOT distances than L1 and EFL bigram frequencies: high-frequencies in the interaction variable of English plosives and bigrams led to smaller VOT distances in the Korean learners.

Austrian Results
VOT distances of the Austrian learners were affected by the frequency of the word-initial plosive in Austrian German (L1 plosive frequency), but not by L1 bigram frequencies (see Table 4 and Figure 5).High-frequency plosives showed more English VOTs than low-frequency ones.Austrian Learners Produced Better Approximations of American VOTs when German Phoneme Frequency of Fortis and Lenis Plosives Was High EFL plosive and bigram frequencies also had an effect on Austrians' VOT distances, with low frequencies being indicative of shorter phonetic distances.High English plosive/bigram frequencies also had a measurable effect and minimized VOT distances.Words of high lexical frequency rate and words residing in sparser and lower frequency neighborhoods also showed improved VOT scores.

Discussion
The experiment conducted for the present study followed two investigative threads.First, we analyzed the role of phonotactic probability of initial phonemes (plosives) and phoneme combinations (bigrams: plosive plus vowel) on phonetic learning of voice-onset time in learners of English as a Foreign Language (EFL).Two competing hypotheses were tested: (1) high frequency rates of L1 segments slow down phonetic learning, and (2) high frequency segments have larger and more variable exemplar clouds, equipping Phonotactic Probabilities and Sub-syllabic Segmentation… TAPSLA.12468p. 19/31 a speaker with more phonetic variability, and thus facilitating phonetic learning.We were specifically interested in analyzing the influence of the phonotactic probabilities that exist in the first language of EFL learners (Korean, German), as well as the influence of those probabilities formed through exposure to EFL of the two learner groups.Second, we tested whether sub-syllabic units play a role in phonetic learning and hypothesized that right-branching German syllable structure would influence phonetic learning of phonemes, while left-branching Korean syllable structure would influence the learning of bigrams.Thus, highfrequency German word-initial phonemes were expected to interfere with the learning of phonetic detail of equivalent structures in EFL in the Austrian group.In Koreans, high frequency rates of word-initial bigrams were proposed to be influential.The results show that frequency rates of word-initial segments were predictive in how far learners had progressed in their acquisition of English VOTs: high L1 frequencies affected phonetic learning in Austrian learners, while Korean learners were influenced by low L1 frequencies.Sub-syllabic segmentation was also shown to have an impact.
In general, the Austrian learners' English was influenced by a wider variety of factors analyzed in the present study.Neighborhood density and lexical frequency rate of target words were shown to have effects on VOT distances in Austrians but not in Koreans.The closer phonetic distance between English and German could play a role in this.
Concerning the first hypothesis, we found evidence that low-frequency items in the first language facilitate phonetic learning in English as a Foreign Language in Korean learners.In contrast, Austrians relied on high-frequencies to improve their English VOTs.These findings do not neatly fit into one of the proposed hypotheses.The Austrian results could be explained in the context of the exemplar-based hypothesis, where speakers have more numerous and diverse phonetic targets associated with high-frequency speech segments.When producing a novel sound in a foreign language, the Austrian learners may have a greater choice of phonetic patterns (or exemplars) for pronunciation.The Korean learners showed better VOT approximation to the American English model when frequencies of the respective segments in their L1 Korean were low.Here, the less automatized phonetic patterns associated with low-frequency bigrams may enable the phonetic learning process.The discrepancy between Austrians and Koreans could be related to the learning potentials that are different for each learner group.Austrians' VOTs were generally closer to the English model on the distance scale, whereas Koreans' VOT generally showed greater distances.When phonetic distances are small, the numerous phonetic competitors associated with the high-frequency segments could help hone in on the exact target.When phonetic distances are large, learners may have to ignore their L1 phonetic repertoire and acquire novel phonetic patterns in order to produce good approximations of a phonetic target.Low frequency rates TAPSLA.12468p. 20/31 Eva Maria Luef, Pia Resnik could facilitate that process, as they provide conditions where only a few and less deeply engrained phonetic targets exist, making it easier to adopt a new variant that is independent of the pre-existing phonetic variants.
The second hypothesis of sub-syllabic structure having an impact on phonetic learning in a foreign language was supported by our results.Due to left-and right-branched syllable structures differentiating the languages, we predicted Koreans to be mainly influenced by bigram frequencies, while Austrians to be mainly influenced by phoneme frequencies of their first languages.These expectations were borne out by the results, and Koreans' VOTs were shown to be affected by bigram frequencies of L1 Korean, whereas Austrians' VOTs were affected by L1 German plosive frequencies.The differences in cognitive linking of segments in language users' minds may be reflected in the differences in locus of frequency effects in EFL.
In sum, VOT distance reduction (i.e., more L1-user-like pronunciation of plosives) was most successful in cases where the first language probabilities of segments and segment combinations were low in Korean and high in Austrian German.Furthermore, in Koreans, distance reduction was largest when L1 Korean bigram frequency was involved, whereas in Austrians the reduction was largest when L1 German phoneme frequency was involved.This points to a role of sub-syllabic units in the cognitive processing of phonological features of a foreign language.
For better interpretation of the findings presented here, some limitations of the study should be considered.Carrier sentences differed in terms of subject phrase complexity and consequently higher rhythmic variability.In addition, a few cases of secondary stress on the initial syllable of a target word (such as in "punctuation" and "pizzerias") might have contributed to differences in VOT values.In general, the phonetics of VOT are heavily influenced by a variety of factors, including language experience (Stoehr, Benders, van Hell, & Fikkert, 2017), gender (Koenig, 2000), biological (hormonal) causes (Whiteside, Hanson, & Cowell, 2004), fluency of speech production (Beckman, Helgason, McMurray, & Ringen, 2011), and dialectal region of origin in Korea (Cho, 2005) and Austria (Moosmüller, 1987).In addition, the large inter-individual variation that is generally recorded in VOT measurements (e.g., Allen, Miller, & DeSteno, 2003) renders experimental designs complicated when trying to control for all of these factors.Future studies could compare L1 and L2 VOTs per person (paired data design) to document the exact VOT changes in a speaker switching from their first to their second language.A more detailed and separate investigation of the fortis and lenis categories may also yield interesting results that can qualify some of the findings presented here.

Conclusion
The results of the present study indicate that phonotactic probabilities in the first language exert influence over phonetic learning in a foreign language.Sub-syllabic structuring contributes to this effect by providing different segmental combinations where the frequency effects unfold.
In sum, our findings suggest an interaction between the statistical probabilities arisen in the first language, their cognitive entrenchment, and phonetic learnability in a foreign language, which is mediated by sub-syllabic segmentation of the first language.

Table 1
CorrelationMatrix of Fixed Effects (Correlations Are Indicated in Bold)

Table 2
Speech-rate-adjusted Lenis VOTs (in Milliseconds)of the American, Korean, and Austrian Speakers of English for EachBigram (Means, Standard  Deviations)