The ambitus of a voice is the range, or the distance, between the highest and the lowest singable note. The staff view can show the typical ambitus of speaking and singing voices and of singable overtones.


The amplitude is the maximum value of a signal over a given period of time. This correlates to the intensity and to the perceived loudness of a signal. It has no unit, but is scaled into the range [-1, 1], where -1 and 1 represent the largest values that a particular file format can encode.

Analyzer View

The Analyzer View is the central window in VoceVista Video and contains one or two sub-windows that can show the Spectrogram, the Spectrum, or both.

Auto Marker

An Auto Marker is a type of Marker that is automatically created for each recorded segment. In other words, every time you press record, and then stop, a new auto marker is created to mark the recorded time period.

Bit Depth

Same as sample size.


A cent is one hundredth of the distance between two notes on the piano, or between two semitones of the tempered scale. In other words, two consecutive keys on the piano (regardless if black or white) are 100 Cent apart. The cent is used to measure extremely small intervals. One octave is divided into 1200 Cent.


Clipping is the effect when parts of the recorded audio signal are too loud to be represented by the used sample format, and are therefore cut off. For example, the audio format may be able to represent sample values between -1.0 and 1.0. If the incoming signal contains values larger than 1.0, they will all be set to 1.0, which causes a loss of information, and a distortion of the signal.

Decibel (dB)

The decibel is a logarithmic unit that indicates the ratio of an intensity relative to a reference level. When used to represent the intensity of an audio signal or of individual frequency components, the reference level is 0dB, which represents the loudest sound that can be encoded in a particular file format. A decibel value of 0dB equals an amplitude of 1. All intensities that are smaller than the loudest reference level have a negative decibel value. The available range depends on the bit depth of the file format. With 16 bit, the smallest intensity that can be represented is -90dB, and with 24 bit, it is approximately -140dB.

Dynamic Range

The dynamic range is the ratio between the largest and the smallest value that can be represented by a given format. The dynamic range is typically measured in decibel. In digital audio, common dynamic range values are 90db (for 16-bit audio), and 140db (for 24-bit audio).

Fast Fourier Transform (FFT)

The FFT is a mathematical process that converts a series of samples in the time domain (such as a digital audio recording) into a list of frequencies and their intensity.

FFT Window Function

The window function is a set of coefficients between 0 and 1 that are multiplied with a sequence of samples before taking the FFT of this sequence. The purpose of this is to reduce mathematical artifacts in the Spectrum arising from discontinuities between the beginning and the end of the signal.

File / Marker List

The File / Marker List is a window that lists the Markers of the current file. It can also show a list of recently used files, or a list of search results. Further, it allows to add and edit markers and marker descriptions.

File Description

The File Description is a special type of Marker that is automatically added to every file. Each file has a description, which is the first entry in the marker list, and which has the round information icon as symbol. It can be used to add a description to the file (such as what it contains, when it was recorded, where, with whom, and any other relevant information).


Short for Frequency Filter.


A formant is a resonance frequency in the vocal tract. The vocal tract has multiple resonance tones that will amplify sound with the frequency of that tone. The sound can come from the vocal chords, but it may also come from other sources. The literature on the voice does not always clearly distinguish between formants and overtones. Overtones are frequency components of a sound that may be amplified by the vocal tract if they match the frequency of a formant.


The frequency is the number of cycles per second. The unit of frequency is the hertz (Hz). The frequency of a sound wave determines its pitch.

Frequency Filter

Frequency Filters are a tool to isolate individual parts of a recording in the frequency domain and make them louder or quieter. This allows, for example, to listen only to specific frequencies in a recording, or to take them away entirely.

Frequency Resolution

The frequency resolution of the Spectrum is the difference in Hz between two frequencies that the analyzer can distinguish. The frequency resolution can be set on the Analyzer Settings dialog. Smaller values show more detail in the Spectrum and Spectrogram, but they also require more processing power and can make the program slower.


For a tone that has multiple harmonic components, the fundamental tone is the frequency that forms the base of an overtone scale that contains all these harmonics. In most cases the fundamental is the pitch that a human listener will identify when hearing the tone.


Harmonic is another word for overtone, with one small difference: Harmonics are counted such that the fundamental is the first harmonic, while overtones are counted such that the first overtone is the second harmonic.

Harmonic Series

The harmonic series is the set of frequencies that are all integer multiples of a fundamental frequency.

Hertz (Hz)

Hertz is the unit of frequency to indicate the number of cycles per second of a periodic phenomenon. It is named after the German physicist Heinrich Hertz.


The intensity is a measure of how loud or strong a signal is. The Waveform shows the intensity of the entire recording for each point in time, while the Spectrum shows the intensities of the individual frequency components. The intensity can be measured as amplitude, or in decibel.

The intensity is not identical to the loudness of the whole signal or of the frequency components, because the human ear perceives different frequencies differently. For example, if two tones are played with the same intensity, one with 100Hz, and the other with 1000Hz, a human listener might hear one as louder than the other, even though they have the same amplitude when leaving the speaker. The intensity that VoceVista Video can show is therefore not the loudness experienced by a human listener, but the sound pressure level recorded by the microphone.


Short form of linear. Opposite of logarithmic. On a linear scale, numbers with the same distance have the same difference.


Short form of logarithmic. Opposite of linear. A log scale can be useful to display numbers that range from very small to very large, especially values that represent quantities perceived by humans. On a log scale, numbers with the same distance to each other have the same ratio, whereas on a linear scale, numbers with the same distance have the same difference.

The piano has a log scale. All octaves are the same distance apart, as each octave is a doubling of the frequency. If the piano is projected on a linear scale, the piano keys become progressively wider.

Long-term view

The long-term view is part of the Analyzer View and shows things that span a relatively long range of time, such as a Spectrogram, a melody, or a musical piece. The long-term view has a frequency scale and a time scale.


A marker marks a specific point in time, or a time range, in a recording. It can hold text to name and describe the area of interest. Markers can be used as searchable bookmarks to easily find specific points in a recording, and to add comments and notes.

There are four types of markers: Auto Markers, Range Markers, Point Markers, and the File Description.


Short for Musical Instrument Digital Interface, a standard protocol to encode messages to electronic instruments. In VoceVista Video, MIDI output is used to play the keys of the piano keyboard and of overtone sliders. It can be send to the standard MIDI synthesizer that is part of the operating system, or it can be send to external instruments connected to the computer.

MIDI is also used as a file format to store a musical piece as a sequence of notes.


A mono recording has one channel, for example the input of a single microphone.

Note Slider

Same as Overtone Slider. The terms Note Slider and Overtone Slider are used synonymously, depending on the context.


A display that shows how a signal changes over time on a two dimensional graph, where one axis is time, and the other axis is the intensity of the signal.

In VoceVista Video, an oscilloscope display can be shown by zooming in the Waveform View very far.


An overtone is a tone that relates to a specific fundamental tone. Each overtone has a frequency that is a whole multiple of the fundamental frequency. For example, if the fundamental has a frequency of 100Hz, its overtones have 200Hz, 300Hz, etc.

Also called harmonic, or partial tone.

Overtone Slider

Same as Note Slider. Overtone Sliders are a visual tool that is laid over the Spectrogram. Each slider represents a frequency. This can be interpreted as a music note, and it can be played as a sound. Sliders can be drawn out to show the overtones and undertones of the fundamental frequency. Sliders can be used to highlight a specific frequency or note, to illustrate principles of music theory and acoustics, or to transcribe a piece of music and show its notes.

Partial tone

Other word for overtone.


Pitch is a perceptual property of a sound that corresponds to the frequency of a tone. Pitch allows to classify tones as higher or lower. Pitch is not a purely objective physical property because a human listener may perceive the pitch of a tone differently from its measurable fundamental frequency. However, in VoceVista Video, pitch and frequency of a sound are used mostly synonymously.

Playback Cursor

Other word for Time Cursor, especially during Playback.

Point Marker

A Point Marker is a type of Marker which marks a specific point in time and has no range.


Profiles are a set of user settings that can be stored and retrieved. Profiles can contain most settings that can be changed by the user, such as the range of the frequency scale, the arrangement of toolbar buttons, or the display configuration.

When a profile is saved, the current state of those settings is written into the profile. When the profile is later activated, all affected settings will be set to the value in the profile.

Range Marker

A Range Marker is a type of Marker that marks a period of time with a beginning and an end.


A ruler is a visual aid that marks a specific frequency or amplitude. Over the Spectrogram, rulers are similar to Overtone Sliders in that they represent a frequency. However, contrary to sliders, rulers have no label, no overtones, and cannot be played. They are simply a visual tool.


A single measurement of sound pressure, or amplitude. In a digital recording, sound is stored as a sequence of numbers. A sound wave travels through the air and moves the membrane of a microphone. The microphone converts this mechanical movement into an electrical current, and the sound card reads out this current many times per second and stores each sample as a number that can be further processed by the computer.

Sample Size

The number of bits of each sample in a digital recording. Common values are 16, 24 and 32 bit. Larger values can represent a larger dynamic range of intensities.

Sampling Rate

The number of discrete measurements (or samples) per second stored in a digital audio recording. The sampling rate determines the frequency range that can be represented by an audio file. The highest representable frequency is half the sampling rate. For example, in a file with a sampling rate of 44100 Hz, the highest frequency that can be displayed in the Spectrum is 22050 Hz.

Common values are 44100 samples per second for CD-Quality sound, or 48000, 96000 and 192000 samples per second for studio-quality sound.

Short-term view

The short-term view is part of the Analyzer View and shows things that span a relatively short range of time, such as a single Spectrum. The short-term view has a frequency scale and an intensity scale. However, the intensity scale only applies to the Spectrum, and not to the pitch value.


The Spectrogram is a series of spectra. Whereas the Spectrum shows a single frequency-intensity diagram, the Spectrogram shows many such diagrams side-by side. Therefore, the Spectrogram is a two-dimensional diagram where one axis shows time, and the other shows the frequency. The intensity of each frequency at a specific point in time is now represented by the color of this point.


The Spectrum shows the strength of the individual frequency components in a piece of sound at a specific point in time. The Spectrum is a two-dimensional diagram, where one axis shows the frequency, and the other shows the intensity of each frequency.

Staff View

The Staff View shows a musical staff with treble and bass clefs. The location of the staff lines corresponds loosely to the location of the associated pitch on the frequency scale. When notes are played on the piano or the overtone sliders, they are shown as musical notes on the staff view.


A stereo recording has two channels. To make a stereo recording, you need a recording device with two separate microphones. Stereo recordings are normally used to add depth to a recording by reproducing sound as a human listener would hear it with two ears. However, the two channels can also be used for different purposes, for example to record the sound from within an organ with one microphone, and the sound from the outside with another.

Time Cursor

Green line that indicates the time in the recording that is currently being played (or that will be played next). Also, when the Spectrogram and the Spectrum are both visible, the Time Cursor determines the time position of the Spectrum.

Time Range Slider

The Time Range Slider is a graphical interface element on the Timeline View that shows the current time range of the Spectrogram and the Waveform.

Time Resolution

The time resolution of the analyzer determines the length of a piece of a recording that the analyzer uses to calculate its Spectrum or pitch. A lower time resolution means that the analyzer can look at a longer piece of a recording. This will give more accuracy in the frequency domain at the expense of resolution in the time domain.


The Timeline View shows an overview of the entire recording. It is similar to the Waveform View. The difference to the Waveform View is that the Timeline is zoomed out further than the Spectrogram and may show the whole recording, while the Waveform always shows the same time range as the Spectrogram.


An undertone is a tone that relates to a specific fundamental tone. Each undertone has a frequency that is a whole ratio of the fundamental tone. So undertones follow the sequence 1/2, 1/3, 1/4, 1/5 etc. For example, if the fundamental has 100Hz, the undertones have the frequencies 50Hz, 33.33Hz, 25Hz, 20Hz, etc.

Each undertone is a tone that has the reference tone as one of its overtones.

Vowel Chart

The vowel chart shows the first and second resonance frequencies of the oral cavity (sometimes called Formants) that are used in many languages to form a specific vowel. The chart is a two-dimensional diagram where one axis represents the first, and the other the second formant. The vowels are shown as symbols from the International Phonetic Alphabet (IPA).


The Waveform View shows the samples of a digital recording. When the displayed time range is very small (in other words, when the view is zoomed in very far), the individual samples are shown, as on an oscilloscope.

When the view is zoomed out, each pixel shows an aggregate with the maximum and minimum values of the samples contained in the time range corresponding to this pixel.

The values in the vertical middle of the Waveform show the Root Mean Square (RMS) of the signal.