Sound is at the beginning of language learning. Children have to learn to distinguish different sounds and to segment the speech stream they are exposed to into units eventually meaningful units in order to acquire words and sentences. Here is one reason that speech segmentation is challenging: When you read, there are spaces between the words. No such spaces occur between spoken words. So, if an infant hears the sound sequence thisisacup, it has to learn to segment this stream into the distinct units this , is , a , and cup. Once the child is able to extract the sequence cup from the speech stream it has to assign a meaning to this word. Furthermore, the child has to be able to distinguish the sequence cup from cub in order to learn that these are two distinct words with different meanings. Finally, the child has to learn to produce these words. The acquisition of native language phonology begins in the womb and isn t completely adult-like until the teenage years. Perceptual abilities (such as being able to segment thisisacup into four individual word units) usually precede production and thus aid the development of speech production.
Prelinguistic development (birth 1 year)
Children don t utter their first words until they are about 1 year old, but already at birth they show some utterances in their native language from utterances in languages with different prosodic features.
Infants as young as 1 month perceive some speech sounds as speech categories (they display categorical perception of speech). For example, the sounds /b/ and /p/ differ in the amount of breathiness that follows the opening of the lips. Using a computer generated continuum in breathiness between /b/ and /p/, Eimas et al. (1971) showed that English-learning infants paid more attention to differences near the boundary between /b/ and /p/ than to equal-sized differences within the /b/-category or within the /p/-category. Their measure, monitoring infant sucking-rate, became a major experimental method for studying infant speech perception.
Fig. 1. Sucking rate for 20 ms VOT change across category boundary (left), 20 ms VOT change within category (middle), without VOT change (right). After Eimas et al.(1971).
Infants up to 10 12 months can distinguish not only native sounds but also nonnative contrasts. Older children and adults lose the ability to discriminate some nonnative contrasts. Thus, it seems that exposure to one s native language causes the perceptual system to be restructured. The restructuring reflects the system of contrasts in the native language.
At four months infants still prefer infant-directed speech to adult-directed speech. Whereas 1-month-olds only exhibit this preference if the full speech signal is played to them, 4-month-old infants prefer infant-directed speech even when just the pitch contours are played. This shows that between 1 and 4 months of age, infants improve in tracking the suprasegmental info in the speech directed at them. By 4 months, finally, infants have learned which features they have to pay attention to at the suprasegmental level.
Babies prefer to hear their own name to similar sounding words. This indicates that they have associated the meaning me with their name.
With increasing exposure to the ambient language, infants learn not to pay attention to sound distinctions that are not meaningful in their native language, e.g., two acoustically different versions of the vowel /i/ that simply differ because of inter-speaker variability. By 6 months of age infants have learned to treat acoustically different sounds that are representations of the same sound category, such as an /i/ spoken by a male versus a female speaker, as members of the same phonological category /i/.
Infants are able to extract meaningful distinctions in the language they are exposed to from statistical properties of that language. For example, if English-learning infants are exposed to a prevoiced /d/ to voiceless unaspirated /t/ continuum (similar to the /d/ - /t/ distinction in Spanish) with the majority of the tokens occurring near the endpoints of the continuum, i.e., showing extreme prevoicing versus long voice onset times (bimodal distribution) they are better at discriminating these sounds than infants who are exposed primarily to tokens from the center of the continuum (unimodal distribution).
These results show that at the age of 6 months infants are sensitive to how often certain sounds occur in the language they are exposed to and they can learn which cues are important to pay attention to from these differences in frequency of occurrence. In natural language exposure this means typical sounds in a language (such as prevoiced /d/ in Spanish) occur often and infants can learn them from mere exposure to them in the speech they hear. All of this occurs before infants are aware of the meaning of any of the words they are exposed to, and therefore the phenomenon of statistical learning has been used to argue for the fact that infants can learn sound contrasts without meaning being attached to them.
At 6 months, infants are also able to make use of prosodic features of the ambient language to break the speech stream they are exposed to into meaningful units, e.g., they are better able to distinguish sounds that occur in stressed vs. unstressed syllables. This means that at 6 months infants have some knowledge of the stress patterns in the speech they are exposed and they have learned that these patterns are meaningful.
At 7.5 months English-learning infants have been shown to be able to segment words from speech that show a strong-weak (i.e., trochaic) stress pattern, which is the most common stress pattern in the English language, but they were not able to segment out words that follow a weak-strong pattern. In the sequence guitar is these infants thus heard taris as the word-unit because it follows a strong-weak pattern. The process that allows infants to use prosodic cues in speech input to learn about language structure has been termed prosodic bootstrapping .
While children generally don t understand the meaning of most single words yet, they understand the meaning of certain phrases they hear a lot, such as Stop it, or Come here. 
Infants can distinguish native from nonnative language input using phonetic and phonotactic patterns alone, i.e., without the help of prosodic cues. They seem to have learned their native language s phonotactics, i.e., which combinations of sounds are possible in the language.
Infants now can no longer discriminate most nonnative sound contrasts that fall within the same sound category in their native language. Their perceptual system has been tuned to the contrasts relevant in their native language. As for word comprehension, Fenson et al. (1994) tested 10-11-month-old children s comprehension vocabulary size and found a range from 11 words to 154 words. At this age, children normally have not yet begun to speak and thus have no production vocabulary. So clearly, comprehension vocabulary develops before production vocabulary.
Stages of pre-speech vocal development
Even though children do not produce their first words until they are approximately 12 months old, the ability to produce speech sounds starts to develop at a much younger age. Stark (1980) distinguishes five stages of early speech development:
0-6 weeks: Reflexive vocalizations
These earliest vocalizations include crying and vegetative sounds such as breathing, sucking or sneezing. For these vegetative sounds, infants vocal cords vibrate and air passes through their vocal apparatus, thus familiarizing infants with processes involved in later speech production.
A 14 week old infant cooing as she interacts with a caregiver (51 seconds)
6-16 weeks: Cooing and laughter
Infants produce cooing sounds when they are content. Cooing is often triggered by social interaction with caregivers and resembles the production of vowels.
16-30 weeks: Vocal play
Infants produce a variety of vowel- and consonant-like sounds that they combine into increasingly longer sequences. The production of vowel sounds (already in the first 2 months) precedes the production of consonants, with the first back consonants (e.g., [g], [k]) being produced around 2 3 months, and front consonants (e.g., [m], [n], [p]) starting to appear around 6 months of age. As for pitch contours in early infant utterances, infants between 3 and 9 months of age produce primarily flat, falling and rising-falling contours. Rising pitch contours would require the infants to raise subglottal pressure during the vocalization or to increase vocal fold length or tension at the end of the vocalization, or both. At 3 to 9 months infants don t seem to be able to control these movements yet.
6-10 months: Reduplicated babbling (or canonical babbling)
Reduplicated babbling contains consonant-vowel (CV) syllables that are repeated in reduplicated series of the same consonant and vowel (e.g., [bababa]). At this stage, infants productions resemble speech much more closely in timing and vocal behaviors than at earlier stages. Starting around 6 months babies also show an influence of the ambient language in their babbling, i.e., babies babbling sounds different depending on which languages they hear. For example, French learning 9-10 month-olds have been found to produce a bigger proportion of prevoiced stops (which exist in French but not English) in their babbling than English learning infants of the same age. This phenomenon of babbling being influenced by the language being acquired has been called babbling drift.
10-14 months: Nonreduplicated babbling (or variegated babbling)
Infants now combine different vowels and consonants into syllable strings. At this stage, infants also produce various stress and intonation patterns. During this transitional period from babbling to the first word children also produce protowords , i.e., invented words that are used consistently to express specific meanings, but that are not real words in the children s target language. Around 12 14 months of age children produce their first word. Infants close to one year of age are able to produce rising pitch contours in addition to flat, falling, and rising-falling pitch contours.
Development once speech sets in (1 year and older)
At the age of 1, children only just begin to speak, and their utterances are not adult-like yet at all. Children s perceptual abilities are still developing, too. In fact, both production and perception abilities continue to develop well into the school years, with the perception of some prosodic features not being fully developed until about 12 years of age.
Children are able to distinguish newly learned words associated with objects if they are not similar sounding, such as lif and neem . They cannot distinguish similar sounding newly learned words such as bih and dih , however. So, while children at this age are able to distinguish monosyllabic minimal pairs at a purely phonological level, if the discrimination task is paired with word meaning, the additional cognitive load required by learning the word meanings leaves them unable to spend the extra effort on distinguishing the similar phonology.
Children s comprehension vocabulary size ranges from about 92 to 321 words. The production vocabulary size at this age is typically around 50 words. This shows that comprehension vocabulary grows faster than production vocabulary.
At 18 20 months infants can distinguish newly learned words , even if they are phonologically similar, e.g. bih and dih . While infants are able to distinguish syllables like these already soon after birth, only now are they able to distinguish them if they are presented to them as meaningful words rather than just a sequence of sounds. Children are also able to detect mispronunciations such as vaby for baby . Recognition has been found to be poorer for mispronounced than for correctly pronounced words. This suggests that infants representations of familiar words are phonetically very precise. This result has also been taken to suggest that infants move from a word-based to a segment-based phonological system around 18 months of age.
Of course, the reason why children need to learn the sound distinctions of their language is because then they also have to learn the meaning associated with those different sounds. Young children have a remarkable ability to learn meanings for the words they extract from the speech they are exposed to, i.e., to map meaning onto the sounds. Often children already associate a meaning with a new word after only one exposure. This is referred to as fast mapping . At 20 months of age, when presented with three familiar objects (e.g., a ball, a bottle and a cup) and one unfamiliar object (e.g., an egg piercer), children are able to conclude that in the request Can I have the zib, zib must refer to the unfamiliar object, i.e., the egg piercer, even if they have never heard that pseudoword before. Children as young as 15 months can complete this task successfully if the experiment is conducted with fewer objects. This task shows that children aged 15 to 20 months can assign meaning to a new word after only a single exposure. Fast mapping is a necessary ability for children to acquire the number of words they have to learn during the first few years of life: Children acquire an average of nine words per day between 18 months and 6 years of age.
2 6 years
At 2 years, infants show first signs of phonological awareness, i.e., they are interested in word play, rhyming, and alliterations. Phonological awareness does continue to develop until the first years of school. For example, only about half of the 4- and 5-year olds tested by Liberman et al. (1974) were able to tap out the number of syllables in multisyllabic words, but 90% of the 6-year-olds were able to do so. Most 3-4-year olds are able to break simple consonant-vowel-consonant (CVC) syllables up into their constituents (onset and rime). The onset of a syllable consists of all the consonants preceding the syllable s vowel, and the rime is made up of the vowel and all following consonants. For example, the onset in the word dog is /d/ and the rime is /og/. Children at 3 4 years of age were able to tell that the nonwords /fol/ and /fir/ would be liked by a puppet whose favorite sound is /f/. 4-year olds are less successful at this task if the onset of the syllable contains a consonant cluster, such as /fr/ or /fl/. Liberman et al. found that no 4-year-olds and only 17% of 5-year-olds were able to tap out the number of phonemes (individual sounds) in a word. 70% of 6-year-olds were able to do so. This might mean that children are aware of syllables as units of speech early on, while they don t show awareness of individual phonemes until school age. Another explanation is that individual sounds do not easily translate into beats, which makes clapping individual phonemes a much more difficult task than clapping syllables. One reason why phoneme awareness gets much better once children start school is because learning to read provides a visual aid as how to break up words into their smaller constituents.
Although children perceive rhythmic patterns in their native language at 7 8 months, they are not able to reliably distinguish compound words and phrases that differ only in stress placement, such as HOT dog vs. hot DOG until around 12 years of age. Children in a study by Vogel and Raimy (2002) were asked to show which of two pictures (i.e., a dog or a sausage) was being named. Children younger than 12 years generally preferred the compound reading (i.e., the sausage) to the phrasal reading (the dog). The authors concluded from this that children start out with a lexical bias, i.e., they prefer to interpret phrases like these as single words, and the ability to override this bias develops until late in childhood.
Infants usually produce their first word around 12 14 months of age. First words are simple in structure and contain the same sounds that were used in late babbling. The lexical items they produce are probably stored as whole words rather than as individual segments that get put together online when uttering them. This is suggested by the fact that infants at this age may produce the same sounds differently in different words.
Children s production vocabulary size at this age is typically around 50 words, although there is great variation in vocabulary size among children in the same age group, with a range between 0 and 160 words for the majority of children.
Children s productions become more consistent around the age of 18 months. When their words differ from adult forms, these differences are more systematic than before. These systematic transformations are referred to as phonological processes , and often resemble processes that are typically common in the adult phonologies of the world s languages (cf. reduplication in adult Jamaican Creole: yellow yellow = very yellow ). Some common phonological processes are listed below.
Whole word processes (until age 3 or 4)- Weak syllable deletion: omission of an unstressed syllable in the target word, e.g., for banana - Final consonant deletion: omission of the final consonant in the target word, e.g., for because - Reduplication: production of two identical syllables based on one of the target word syllables, e.g., for bottle - Consonant harmony: a target word consonant takes on features of another target word consonant, e.g., for duck - Consonant cluster reduction: omission of a consonant in a target word cluster, e.g., for cracker
Segment substitution processes (into the early school years)- Velar fronting: a velar is replaced by a coronal sound, e.g., for key - Stopping: a fricative is replaced by a stop, e.g., for sea - Gliding: a liquid is replaced by a glide, e.g., for rabbit
The size of the production vocabulary ranges from about 50 to 550 words at the age of 2 years. Influences on the rate of word learning, and thus on the wide range of vocabulary sizes of children of the same age, include the amount of speech children are exposed to by their caregivers as well as differences in how rich the vocabulary in the speech a child hears is. Children also seem to build up their vocabulary faster if the speech they hear is related to their focus of attention more often. This would be the case if a caregiver talks about a ball the child is currently looking at.
A study by Gathercole and Baddeley (1989) showed the importance of sound for early word meaning. They tested the phonological memory of 4- and 5-year-old children, i.e., how well these children were able to remember a sequence of unfamiliar sounds. They found that children with better phonological memory also had larger vocabularies at both ages. Moreover, phonological memory at age 4 predicted the children s vocabulary at age 5, even with earlier vocabulary and nonverbal intelligence factored out.
Children produce mostly adult-like segments. Their ability to produce complex sound sequences and multisyllabic words continues to improve throughout middle childhood.
Biological foundations of infants speech development
The developmental changes in infants vocalizations over the first year of life are influenced by physical developments during that time. Physical growth of the vocal tract, brain development, and development of neurological structures responsible for vocalization are factors for the development of infants vocal productions.
Infants vocal tract
Infants vocal tracts are smaller, and initially also shaped differently from adults vocal tracts. The infant s tongue fills the entire mouth, thus reducing the range of movement. As the facial skeleton grows, the range for movement increases, which probably contributes to the increased variety of sounds infants start to produce. Development of muscles and sensory receptors also gives infants more control over sound production. The limited movement possible by the infant jaw and mouth might be responsible for the typical consonant-vowel (CV) alternation in babbling and it has even been suggested that the predominance of CV syllables in the languages of the world might evolutionarily have been caused by this limited range of movements of the human vocal organs.
The differences between the vocal tract of infants and adults can be seen in figure 3 (infants) and figure 4 (adults) below.
Fig. 3. Infant vocal tract: H = hard palate, S = soft palate, T = tongue, J = jaw, E = epiglottis, G = glottis; After Vihman (1996)
Fig. 4. Adult vocal tract: H = hard palate, S = soft palate, T = tongue, J = jaw, E = epiglottis, G = glottis; After Vihman (1996)
The nervous system
Crying and vegetative sounds are controlled by the brain stem, which matures earlier than the cortex. Neurological development of higher brain structures coincides with certain developments in infants vocalizations. For example, the onset of cooing at 6 to 8 weeks happens as some areas of the limbic system begin to function. The limbic system is known to be involved in the expression of emotion, and cooing in infants is associated with a feeling of contentedness. Further development of the limbic system might be responsible for the onset of laughter around 16 weeks of age. The motor cortex, finally, which develops later than the abovementioned structures may be necessary for canonical babbling, which start around 6 to 9 months of age.