The human vocal system consists of two main parts: the vocal tract proper, an acoustic tube of variable cross section extending from the vocal cords to the lips, and the nasal tract, a secondary cavity coupled to the vocal tract proper by means of the trapdoor action of the velum (soft palate). The view at is a median sagittal section of the entire system; views at bottom area transverse section (A-A) and a vertical section (B-B) of the vocal-cord region.
The vocal tract, also called the upper respiratory tract, the part of the vocal system that lies between the glottis and the lips. It is an acoustic tube consisting of the laryngeal cavity, the pharynx, the oral cavity, and the nasal cavity. Its average length is about 16.9 centimeters in adult males and 14.1 centimeters adult females. The cross-sectional area of the vocal tract can be varied from zero (complete closure) to about 20 square centimeters by the placement of the lips, the jaw, the tongue, and the soft palate or velum. In the vocal tract, the raw buzzing sound produced by the vocal folds is resonated (see vocal resonance) and shaped into recognizable sounds. The trapdoor action of the soft palate couples the vocal tract proper to a secondary cavity involved in voice production: the nasal cavity. The nasal cavity is about 12 centimeters long and has a volume of about 60 cubic centimeters.
The vocal system can produce three basic kinds of sounds: voiced sounds, fricative sounds, and plosive sounds. Voiced sounds, exemplified by the vowels, are produced by raising the air pressure in the lungs and forcing air to flow through the glottis (the orifice between the vocal cords), causing the vocal cords to vibrate. The vibrations interrupt the airflow and generate quasi-periodic, broad-spectrum pulses that excite the vocal tract. The vibrating ligaments of the vocal cords are some 18 millimeters long and the glottal opening typically varies in area from zero to about 20 square mm.
Fricative sounds, exemplified by the consonants s, sh, f, and th, are generated when the vocal tract is partly closed at some point and air is forced through the constriction at high enough velocity to produce turbulence. Plosive sounds, typified by the consonants p, t, and k, are produced when the vocal tract is closed completely (usually with the lips or tongue), allowing air pressure to build up behind the closure, and is then abruptly opened. The sharp sound produced when the air is released is often followed by a fricative sound or aspiration. All these vocal sources, whether for periodic voiced sounds or for aperiodic voiceless (fricative or plosive) sounds, have a fairly broad spectrum of frequencies extending over the voice-frequency range from about 100 cycles per second to more than 3,000. The vocal system acts as a time-varying filter to impose its resonant characteristics on the sound waves generated by the broad-spectrum sources. Operation of the voiced and voiceless sources is not mutually exclusive. For some sounds, such as the voiced fricative consonants v and z, two sound sources act in combination.