Sound compression: principle and configuration. Synthesis and speech recognition

During the time when the researchers also proceeded to solve the problem of creating a speech interface for computers, it was often necessary to manufacture equipment independently, allowing you to enter audio information into the computer, as well as display it from the computer. Today, such devices may have unique historical interest, since modern computers can easily equip the input and output devices, such as sound adapters, microphones, headphones and sound columns.

We will not deepen into details internal device These devices, but we will tell about how they work, and give some recommendations for choosing sound computer devices to work with recognition systems and speech synthesis.

As we have already spoken in the previous chapter, the sound is nothing more than air oscillations, the frequency of which lies in the frequency range perceived by the person. In different people, the exact limits of the range of audible frequencies may vary, however, it is believed that sound oscillations lie in the range of 16-20,000 Hz.

The task of the microphone is to convert audio fluctuations into electrical oscillations, which can continue to be reinforced, filtered to remove interference and digitized to enter sound information into the computer.

According to the principle of operation, the most common microphones are divided into coal, electrodynamic, condenser and electret. Some of their these microphones require their work external source The current (for example, coal and condenser), others under the influence of sound oscillations can independently produce an alternating electrical voltage (these are electrodynamic and electret microphones).

You can also split microphones for the purpose. There are studio microphones that can be kept in hand or secure on the stand, there are radio microphones that can be fixed on clothes, and so on.

There are also microphones designed specifically for computers. Such microphones are usually attached on the stand on the surface of the table. Computer microphones can be combined with headphones, as shown in Fig. 2-1.

Fig. 2-1. Headphones with microphone

How to choose from all the variety of microphones the one that is best suited for speech recognition systems?

In principle, you can experiment with any microphone you have, unless it can be connected to a computer audio adapter. However, the developers of speech recognition systems are recommended to acquire such a microphone, which at work will be at a permanent distance of the speaker's mouth.

If the distance between the microphone and mouth does not change, the average electrical signal coming from the microphone will also change too much. This will have a positive effect on the quality of the work of modern speech recognition systems.

What is the problem here?

A person is able to successfully recognize speech, the volume of which is changing in very wide limits. The human brain is able to filter the quiet speech from interference, such as the noise of cars passing down the street, foreign conversations and music.

As for the modern speech recognition systems, their abilities in this area leave much to be desired. If the microphone stands on the table, then when the head is rotated or changing the position of the body, the distance between the mouth and the microphone will change. This will lead to a change in the level of the microphone output signal, which in turn will worsen the reliability of speech recognition.

Therefore, when working with speech recognition systems, the best results will be achieved if you use the microphone attached to the headals, as shown in Fig. 2-1. When using such a microphone, the distance between the mouth and the microphone will be permanent.

We also pay your attention that all experiments with speech recognition systems are best done, retaining in a quiet room. In this case, the effect of interference will be minimal. Of course, if you need to choose a speech recognition system capable of working in conditions of strong interference, the tests need to be carried out differently. However, as far as it is known to the authors of the book, while the observance of speech recognition systems is still very, very low.

Microphone performs for us conversion of sound oscillations in fluctuations electric current. These oscillations can be seen on the screen of the oscilloscope, but do not rush to the store to purchase this expensive device. All oscillographic research we can spend using a regular computer equipped with a sound adapter, such as the Sound Blaster adapter. Later we will tell you how to do it.

In fig. 2-2 We showed an oscillogram sound signal, Obtained when uttered a long sound a. This oscillogram was obtained using the Goldwave program, about which we still tell in this chapter of the book, as well as using the Sound Blaster and microphone audio adapter, similar to that shown in Fig. 2-1.

Fig. 2-2. Oscillogram of sound signal

The Goldwave program allows you to stretch the oscillogram along the time axis, which allows you to see the smallest details. In fig. 2-3 We showed a stretched fragment of the sound oscillogram mentioned above.

Fig. 2-3. Sound Sound Oscillogram Fragment

Please note that the magnitude of the input signal coming from the microphone varies periodically and takes both positive and negative values.

If only one frequency was present in the input signal (that is, if the sound was "pure"), the form of the signal obtained from the microphone would be sinusoidal. However, as we have said, the spectrum of human speech sounds consists of a set of frequencies, as a result of which the form of the speech signal oscillogram is far from sinusoidal.

The signal whose value changes with time continuously, we will call analog signal. This signal comes from the microphone. Unlike analog, digital signal is a set of numerical values \u200b\u200bvarying with time discrete.

To the computer can process the beep, it must be translated from the analog form into digital, that is, to represent in the form of a set of numeric values. This process is called the digitization of an analog signal.

Digitization of the sound (and any analog) signal is performed using a special device called analog-to-digital converter ADC (Analog to Digital Converter, ADC). This device is on the board of the audio adapter and is a common microcircuit.

How does an analog-to-digital converter work?

It periodically measures the level of the input signal, and gives the output numerical value of the measurement result. This process is illustrated in Fig. 2-4. Here, gray rectangles marked the input values \u200b\u200bmeasured at a certain constant time interval. A set of such values \u200b\u200band is a digitized representation of the input analog signal.

Fig. 2-4. Measurement of the dependence of the amplitude of the signal from time

In fig. 2-5 We showed the connection of analog-to-digital converter to the microphone. In this case, the input X 1 serves analog signal, and the digital signal is removed from the U 1 -U n outputs.

Fig. 2-5. Analog-Digital Converter

Analog-to-digital converters are characterized by two important parameters - the transformation frequency and the number of quantization levels of the input signal. The correct selection of these parameters is critical to achieving an adequate representation in the digital form of analog signal.

How often do you often need to measure the value of the amplitude of the input analog signal so that due to the digitization is not lost information about changes in the input analog signal?

It would seem that the answer is simple - the input signal must be measured as often as possible. Indeed, the more often the analog-to-digital converter conducts such measurements, the better the slightest changes in the amplitude of the input analog signal will be tracked.

However, unnecessarily frequent measurements can lead to an unjustified growth of digital data stream and useless spending computer resources when processing a signal.

Fortunately, right choice Frequency conversion (sampling frequency) is simple enough. To do this, it suffices to contact the Kotelnikov Theorem, known to those skilled in the field of digital signal processing. The theorem states that the frequency of the conversion must be two times higher than the maximum frequency of the spectrum of the transformed signal. Therefore, for digitization without losing the quality of the sound signal, the frequency of which lies in the range of 16-20,000 Hz, you need to select the frequency of the conversion, not less than 40,000 Hz.

Note, however, that in professional sound equipment the frequency of the conversion is selected several times of the specified value. This is done to achieve a very high quality of digitized sound. For speech recognition systems, this quality is not relevant, so we will not sharpen your attention on such a choice.

And what frequency of the transformation is needed to digitize the sound of human speech?

Since the sounds of human speech lie in the frequency range of 300-4000 Hz, the minimum necessary frequency of the conversion is 8000 Hz. However, many computer programs Speech recognition use standard for conventional audio adapters. The transformation frequency is 44,000 Hz. On the one hand, this frequency of the transformation does not lead to an excessive increase in the flow of digital data, and the other - provides speech digitization with sufficient quality.

Even at school, we were taught that with any measurements, errors arise, from which it is impossible to get rid of completely. Such errors occur due to the limited resolution of the measuring instruments, as well as due to the fact that the measurement process itself may make some changes to the measured value.

An analog-to-digital converter represents the input analog signal in the form of a stream of numbers of limited bit. Conventional audio adapters contain 16-bit ADC blocks that can represent the amplitude of the input signal in the form of 216 \u003d 65536 different values. ADC devices in high-end sound equipment can be 20-bit, providing greater accuracy of the amplitude of the audio signal.

Modern systems and speech recognition programs were created for ordinary computers equipped with the usual sound adapters. Therefore, for conducting experiments with speech recognition, you will not need to acquire a professional audio adapter. Such an adapter as Sound Blaster is quite suitable for speech digitizing to further recognize it.

Along with the useful signal to the microphone, various noises are usually falling - noise from the street, wind noise, foreign conversations, etc. The noise has a negative impact on the quality of work of speech recognition systems, so it has to deal with it. One way we have already mentioned - today's speech recognition systems best use in a quiet room, staying with a computer one on one.

However, ideal conditions can be created not always, so you have to use special methodsallowing you to get rid of noise. To reduce noise levels, special tricks are used when constructing microphones and special filters that remove from the spectrum of an analog frequency signal that do not carry useful information. In addition, this technique is used as compression. dynamic range Input levels.

Tell about all this in order.

Frequency filter A device that converts the frequency spectrum of an analog signal is called. In this case, during the transformation process (or absorption) of oscillations of certain frequencies occurs.

You can imagine this device in the form of a series of black box with one input and one output. With regard to our situation, a microphone will be connected to the frequency filter input, and analog-to-digital converter will be connected to the output.

Frequency filters are different:

· Lower frequency filters;

· Upper frequency filters;

· Passing strip filters;

· Bashed strip filters.

Lower frequency filters (LOW-PASS FILTER) is removed from the input spectrum all frequencies whose values \u200b\u200bare below some threshold frequency depending on the filter setting.

Since the sound signals lie in the range of 16-20,000 Hz, all frequencies less than 16 Hz can be cut off without deteriorating sound quality. For speech recognition, the frequency range of 300-4000 Hz is important, so you can cut frequencies below 300 Hz. In this case, all interference will be cut out of the input signal, the frequency spectrum of which lies below 300 Hz, and they will not interfere with the process of speech recognition.

Similarly, upper frequency filters (High -Pass Filter) are cut out of the input spectrum all frequencies above some threshold frequency.

A person does not hear sounds with a frequency of 20,000 Hz and above, so they can be cut out of the spectrum without a noticeable sound quality deterioration. As for speech recognition, here you can cut all frequencies above 4000 Hz, which will lead to a significant decrease in the level of high-frequency interference.

Transmitting strip filter (Band -pass Filter) can be imagined as a combination of the bottom and upper frequency filter. Such a filter delays all frequencies below the so-called bottom frequencyas well as above upper frequency bandwidth.

Thus, for the speech recognition system, a bandwidth filter is convenient, which delays all frequencies except the frequencies of the range of 300-4000 Hz.

As for the ignition strip filters (Band -Stop Filter), they allow you to cut out of the input spectrum all frequencies lying in the specified range. Such a filter is convenient, for example, to suppress noise that occupy a solid part of the spectrum of the signal.

In fig. 2-6 We showed the connection of the bandwidth filter.

Fig. 2-6. Sound signal filtering before digitization

It must be said that the usual sound adapters installed in the computer are in their composition a strip filter through which an analog signal passes before digitization. The bandwidth of such a filter usually corresponds to the range of sound signals, namely, 16-20,000 Hz (in different audio adapters, the values \u200b\u200bof the upper and lower frequency may vary in small limits).

And how to achieve a narrower bandwidth of 300-4000 Hz, corresponding to the most informative part of the human spectral spectrum?

Of course, if you have a tendency to designing radio-electronic equipment, you can make your filter from the microcircuit of the operational amplifier, resistors and capacitors. Approximately the first creators of speech recognition systems.

but industrial systems Speech recognition must be workable on standard computer hardware, so the path of manufacturing a special band filter is not suitable here.

Instead, the so-called is used in modern speech processing systems digital frequency filtersimplemented programmatically. It became possible after cPU The computer has become powerful enough.

The digital frequency filter implemented software converts the input digital signal to the output digital signal. In the process of conversion, the program processes a special stream of a signal of the luminescence of the signal amplitude coming from an analog-to-digital converter. The result of the conversion will also be the number of numbers, however, this thread will correspond to an already filtered signal.

Talking about the analog-to-digital converter, we noted such an important characteristicas the number of quantization levels. If a 16-bit analog-to-digital converter is installed in the audio adapter, then after digitizing the sound signal levels can be represented as 216 \u003d 65536 different values.

If there are few quantization levels, then the so-called cheat noise. To reduce this noise, in high-quality sound digitization systems, analog-digital converters should be applied with the maximum available number of quantization levels.

However, there is another reception that allows you to reduce the effect of quantization noise on the quality of the audio signal, which is used in the digital sound recording systems. When using this reception before digitizing, the signal is passed through a nonlinear amplifier, underlining signals with a small amplitude of the signal. Such a device enhances weak signals stronger than strong.

This is illustrated by a graph of the dependence of the amplitude of the output signal from the amplitude of the input signal shown in Fig. 2-7.

Fig. 2-7. Nonlinear amplification before digitization

At the reverse conversion stage of the digitized audio to the analog (we consider this step below in this chapter) before displaying the audio column, the analog signal is again passed through a nonlinear amplifier. This time another amplifier is used, which emphasizes signals with a large amplitude and has a transfer characteristic (dependence of the amplitude of the output signal from the amplitude of the input signal), the inverse one that was used during digitization.

How can all this help the creators of speech recognition systems?

The person, as is known, is quite well recognized by the speech uttered by a quiet whisper or a rather loud voice. It can be said that the dynamic range of volume levels of successfully recognized speech for a person is quite wide.

Today's computer systems Speech recognition, unfortunately, until it boasts it. However, with the aim of a certain expansion of the specified dynamic range before digitizing, you can skip a signal from the microphone through a nonlinear amplifier, the transfer characteristic of which is shown in Fig. 2-7. This will reduce the noise level of quantization during the digitization of weak signals.

The developers of speech recognition systems, again, are forced to focus primarily on serially produced sound adapters. They do not provide the nonlinear signal conversion described above.

However, you can create a software equivalent of a nonlinear amplifier that converts a digitized signal before transmitting it to the speech recognition module. And although such a program amplifier will not be able to reduce the noise of quantization, it is possible to emphasize those signal levels that carry the greatest speech information. For example, you can reduce the amplitude of weak signals, having eliminating the signal from noise.

Or photographic latitude The photo material is the relationship between the maximum and minimum exposure values \u200b\u200bthat can be correctly captured in the picture. With reference to the digital photography, the dynamic range is actually equivalent to the ratio of the maximum and minimum possible values \u200b\u200bof the useful electrical signal generated by the photo seensor during the exposure.

The dynamic range is measured in the exposure steps (). Each step corresponds to doubling the amount of light. For example, if a certain camera has a dynamic range of 8 EV, this means that the maximum possible value of the useful signal of its matrix refers to the minimum as 2 8: 1, which means that the camera is capable of capturing within one frame objects differing in brightness No more than 256 times. More precisely, it can capture it objects with any brightness, but objects whose brightness will exceed the maximum permissible value Let's get out on a picture of dazzling white, and objects whose brightness will be below the minimum value - coal black. Details and textures will be distinguishable only on those objects whose brightness is stacked in the dynamic range of the chamber.

To describe the relationship between the brightness of the brightest and most dark from the removable objects, not quite correct term "dynamic scene range" is often used. It will be more correct to talk about the brightness range or on the level of contrast, since the dynamic range is usually the characteristic of the measuring device (in this case, Digital Camera matrices).

Unfortunately, the brightness range of many beautiful scenes with which we are faced in real lifemay noticeably exceed the dynamic range of digital camera. In such cases, the photographer is forced to decide which objects should be worked out in all parts, and which one can be left outside the dynamic range without prejudice to creative design. In order to make the most effectively use the dynamic range of your camera, sometimes it may take not so much a thorough understanding of the principle of work of photosensor, how much developed artistic one.

Dynamic Range Factors

The lower boundary of the dynamic range is set by the level of its own noise of the photo seensor. Even the unlit matrix generates a background electrical signal, called dark noise. Also, interference occurs when the charge is transferred to an analog-to-digital converter, and the ADC itself introduces a certain error in the digitized signal - so-called. Noise sampling.

If you take a picture in a complete darkness or with a lid on the lens, then the camera will record only this meaningless noise. If you allow the minimum number of light to get to the sensor, photodiodes will begin to accumulate an electric charge. The value of the charge, which means the intensity of the beneficial signal, will be proportional to the number of captured photons. In order for a snapshot, at least some meaningful details, it is necessary that the level of the useful signal exceeds the background noise level.

Thus, the lower boundary of the dynamic range or, in other words, the sensitivity threshold of the sensor formally can be defined as the output signal level at which the signal-to-noise ratio is greater than the unit.

The upper limit of the dynamic range is determined by the container of a separate photodiode. If, during the exposition, any photodiode will accumulate an electrical charge of limiting values \u200b\u200bfor itself, then the pixel of the image corresponding to the overloaded photodide is absolutely white, and further irradiation will not affect its brightness. This phenomenon is called clipping. The higher the frenetic ability of the photodiode, the greater the signal is capable of giveing \u200b\u200bat the output before the saturation reaches.

For greater clarity, we turn to the characteristic curve, which is a graph of the output signal dependency from the exposure. On the horizontal axis, the binary logarithm of irradiation obtained by the sensor is postponed, and on the vertical - binary logarithm of the magnitude of the electrical signal generated by the sensor in response to this irradiation. My drawing is largely conditional and pursues exceptionally illustrative purposes. The characteristic curve of the present photo seensor has a slightly more complex form, and noise level is rarely so high.

The graph is clearly visible two critical stiff points: in the first of these, the level of the useful signal crosses the noise threshold, and in the second - photodiodes reach saturation. Exposure values \u200b\u200blying between these two dots are dynamic range. In this abstract example, it is equal to how easy it is to notice, 5 EV, i.e. The camera is able to digest five doubling exposure, which is equivalent to 32x (2 5 \u003d 32) in brightness difference.

Exposure zones that make up the dynamic range are unequal. The upper zones are characterized by a higher signal-to-noise ratio, and therefore look clearer and more detailed than the lower. As a result, the upper limit of the dynamic range is very real and noticeable - clipping is wrapped lights at the slightest overexposition, while the lower boundary is increasingly sinking in the noise, and the transition to black color is far from so cut.

The linear dependence of the signal from the exposure, as well as a sharp yield to the plateau, are unique features of the digital photographic process. For comparison, take a look at the conditional characteristic curve of traditional photoplinka.

The shape of the curve and especially the angle of inclination greatly depend on the type of film and from the procedure of its manifestation, but the main thing that remains the difference between the film schedule from the digital - the nonlinear nature of the dependence of the optical density of the film from the exposure value remains unchanged.

The lower boundary of the photographic latitude of the negative film is determined by the density of the veil, and the upper one - the maximum achievable optical density of the photocloor; Rotate films - on the contrary. Both in the shadows and in the lights there are smooth bends of the characteristic curve, indicating the drop in contrast when approaching the boundaries of the dynamic range, because the angle of inclination of the curve is proportional to the contrast of the image. Thus, the exposure zones lying on the middle part of the schedule have a maximum contrast, while in the lights and shadows, the contrast is reduced. In practice, the difference between the film and the digital matrix is \u200b\u200bparticularly well noticeable in the lights: where in the digital image of the light is burned with clipping, the parts on the film are still distinguishable, although low-contrast, and the transition to a pure white color looks smooth and natural.

In sensitometry, even two independent terms are used: actually photographic latitudebounded by a relatively linear section of the characteristic curve, and useful photographic latitude, In addition to the linear section, also base and shoulder graphics.

It is noteworthy that when processing digital photos, it, as a rule, applies a more or less pronounced S-shaped curve, which increases the contrast in the halftone at the cost of its decrease in the shadows and lights, which gives a digital image a more natural and pleasant eye look.

Bigness

Unlike the matrix of the digital camera, human vision is peculiar, let's say, a logarithmic view of the world. Sequential doubling of the amount of light is perceived by us as equal changes in brightness. Light numbers can even be compared with musical octaves, because twofold changes of the sound frequency are perceived by rumor as a single musical interval. This principle employs other senses. The nonlinearity of perception is very expanding the human sensitivity range to the stimulus of various intensity.

When converting the RAW file (it does not matter, the camera tools or in the RAW converter) containing linear data, the so-called automatically applies to it. Gamma curve, which is designed to nonlinearly increase the brightness of the digital image, leading it in line with the peculiarities of human vision.

With linear conversion, the image is obtained too dark.

After gamma correction, the brightness comes to normal.

The gamma curve as it would stretch dark tones and squeezes light, making the distribution of gradations more uniform. As a result, the image acquires a natural look, but the noise and artifacts of sampling in the shadows inevitably become more noticeable, which is only exacerbated by a small number of brightness levels in the lower zones.

Linear distribution of brightness gradations.

Uniform distribution after applying a gamma curve.

ISO and dynamic range

Despite the fact that in the digital photography, the same concept of photosensitivity of the photographic material is used as in the photograph of the film, it should be understood that this is solely due to the tradition of tradition, since approaches to changing photosensitivity in digital and film photography differ in principle.

Improving ISO sensitivity in the traditional photography means replacing one film to another with a larger grain, i.e. There is an objective change in the properties of the photo material. In the digital camera, the sensor sensitivity is toughly set by its physical characteristics and cannot be changed literally. With an increase in ISO, the camera changes not real sensitivity of the sensor, but only enhances the electrical signal generated by the sensor in response to irradiation and correctly adjusts the digitization algorithm for this signal.

An important consequence of this is to reduce the effective dynamic range in proportion to an increase in ISO, because with a useful signal, noise is enhanced. If the ISO 100 digitizes the entire range of signal values \u200b\u200b- from zero to the saturation point, then with ISO 200, only half the capacity of photodiodes is accepted for maximum. With each doubling of the ISO sensitivity, the upper stage of the dynamic range is cut off, and the remaining steps are tightened to its place. That is why the use of ultra-high ISO values \u200b\u200bare deprived of practical meaning. With the same success, you can lighten the photo in the RAW converter and get a comparable level of noise. The difference between an increase in ISO and an artificial illumination of the picture is that with increasing ISO, the signal strengthening occurs before it is received in the ADC, and therefore the noise of quantization is not enhanced, unlike its own noise of the sensor, while in the RAW-converter, the amplification is subject to Including the mistakes of the ADC. In addition, a decrease in the digitization range means more accurate sampling of the remaining input values.

By the way, an ISO is available on some devices below the base value (for example, to ISO 50), it does not expand the dynamic range, and simply loosens the signal twice, which is equal to the snapshot in the RAW converter. This function can be even treated as harmful, since the use of submimic value of ISO, provokes a chamber to increase the exposure that, with the remaining unchanged threshold of the sensor, it increases the risk of getting clipping in lights.

True dynamic range

There are a number of programs like (DXO Analyzer, Imatest, Rawdigger, etc.) allow you to measure the dynamic range of a digital camera at home. In principle, this is not a great need, since the data for most cameras can be freely found on the Internet, for example, on the Dxomark.com website.

Should I believe the results of such tests? Quite. With the only reservation that all these tests are defined efficient or, if you can express it, the technical dynamic range, i.e. The relationship between the saturation level and the noise level of the matrix. For the photographer, the useful dynamic range is primarily important, i.e. The number of exposure zones that really allow you to capture some useful information.

As you remember, the threshold of the dynamic range is specified by the noise level of the photo seensor. The problem is that in practice the lower zones formally incoming in the dynamic range, contains everything too much noise so that they can be used to use. Here, much depends on individual squeezing - the acceptable level of noise each determines for itself.

My subjective opinion is that the details in the shadows begin to look more or less decent with the signal / noise ratio at least eight. On this basis, I determine for myself a useful dynamic range, as a technical dynamic range minus about three steps.

For example, if the mirror chamber according to the results of reliable tests has a dynamic range of 13 EV, which is very good for today's standards, then its useful dynamic range will be about 10 EV, which, in general, is also very thorough. Of course, we are talking about shooting in RAW, with minimal ISO and maximum bit. When shooting in JPEG, the dynamic range strongly depends on the contrast settings, but on average two or three steps should be discarded.

For comparison: color-traded photo shots have a useful photographic latitude of 5-6 steps; Black and white negative films give 9-10 steps with standard manifestation and printing procedures, and with certain manipulations - up to 16-18 steps.

Summarizing the foregoing, we will try to formulate a few simple rules, which will help you squeeze out of the sensor of your camera maximum performance:

The dynamic range of the digital camera is fully accessible only when shooting in RAW.
The dynamic range decreases with increasing light sensitivity, and therefore avoid high ISO values \u200b\u200bif there is no sharp necessity.
Using higher discharge for RAW files does not increase the true dynamic range, but improves the tonal separation in the shadows due to more Brightness levels.
Exposure to the right. Upper exposure zones always contain maximum useful information With a minimum of noise and should be used most effectively. At the same time, you should not forget about the danger of clipping - pixels that have reached saturation are absolutely useless.

And the main thing: it is not necessary to worry about the dynamic range of your camera. With a dynamic range, it is all right. Your ability to see light and competently manage exposure is much more important. A good photographer will not complain about the lack of photographic latitude, but will try to wait for more comfortable lighting, or will change the angle, or will use the flash, in a word, will act in accordance with the circumstances. I will tell you more: some scenes only won due to the fact that they do not fit into the dynamic range of the camera. Often an unnecessary abundance of parts is simply necessary to hide into a semi-grated black silhouette that makes a photo at the same time concisely and richer.

High contrast is not always bad - you only need to be able to work with it. Learn to exploit the disadvantages of equipment as well as its advantages, and you will be surprised how much your creative opportunities will expand.

Thanks for attention!

Vasily A.

POST Scriptum

If the article has been useful and informative for you, you can kindly support the project, making a contribution to its development. If you didn't like the article, but you have thoughts on how to make it better, your criticism will be accepted with no less gratitude.

Do not forget that this article is the object of copyright. Reprint and quoting is allowed if there is an existing reference to the original source, and the text used should not be selected or modified.

People who are enthusiastic with homemade sound demonstrate an interesting paradox. They are ready to shove the listening room, to build columns with exotic emitters, but they are embarrassed in front of the musical canned, as if the wolf in front of the red flag. And in fact, why it is impossible for the checkbox to get out, and from canned try to cook something more edible?

Periodically, there are complaints on the forum: "Advise well-recorded albums." It is understandable. Special audiophile editions, though they will delight hearing the first minute, but no one is listening to the end, it hurts the repertoire. As for the rest of the phonothek, the problem seems obvious. You can save, but you can not save and empty a buzz of money into the components. I still do not like to listen to your favorite music on high volume and the possibility of an amplifier here.

Today, even in Hi-Res albums, the peaks of the phonogram and the volume of the driven into clipping are cut. It is believed that the majority listens to music on every junk, and therefore it is necessary to "ask the Gat", to make a kind of dedication.

Of course, this is not done specifically to upset audiophiles. About them generally few people remember. Well, except that they guessed to let the master files with which the main circulation is copied - CDs, MP3, and so on. Of course, the wizard has long been flattened by the compressor, no one will consciously prepare special versions for HD Tracks. Is that a certain procedure for vinyl carrier, which for this reason and sounds more humanely. And for the digital path, everything ends the same - a large thick compressor.

So, currently all 100% of the phonograms published, minus classical music, are subjected to compression when masthering. Someone performs this procedure more or less skillfully, and someone is completely in stupid. As a result, we have pilgrims on the forums with the line of the DR plugin for the sinus, painful comparisons of publications, escape to Vinyl, where you also need a Main Popper.

The most frostbitten at the sight of all these disgraces turned literally in audio shoes. No joke, they read the sound source holy Scripture backwards! Modern sound editing programs have some restoration tool sound waveClipped clipped.

Initially, this functionality was intended for studios. When mixed, there are situations when clipping has come to write, and it is no longer possible to remake the session for a number of reasons, and here comes to the aid arsenal audio editor - decalipper, decompressor, etc.

And already for such software, all the bolder pulls the handles of ordinary listeners who have blood from the ears after the next novelty. Someone prefers Izotope, someone Adobe Audition, someone operations shares between several programs. The meaning of the restoration of the former dynamics is to correctly correct the clip-plated signal peaks, which, resting in 0 dB, resemble a gear.

Yes, about 100% revival of the Source of Speech does not go, since the processes of interpolation on fairly speculative algorithms occur. But still, some of the results of processing seemed to me interesting and worthy of study.

For example, the album of Lana del Rey "Lust for Life", consistently frowning, pah, driving! In the original song "WHEN THE World Was At War We Kept Dancing" was like this.

And after a series of decalippers and decompressors, it became like this. The DR coefficient has changed from 5 to 9. Download and listen to the sample before and after processing.

I can not say that the method is universal and is suitable for all the deployed albums, but in this case I preferred to preserve in the collection exactly this option treated with a rutraker activist, instead of the official publication in 24 bits.

Even if the artificial pulling of peaks from the sound minced is not returning the true dynamics of musical performance, your DAC will still say thank you anyway. It was so hard for him to work without mistakes at the limit levels, where the likelihood of the so-called intersmonic peaks (ISP) is great. And now up to 0 dB will dope only rare spoors of the signal. In addition, the triggered phonogram when compressed in FLAC or other Lossless codec will now be smaller in size. More "air" in the signal saves Hard Drive space.

Try to revive your most hated albums killed on the "Volume War". For the reserve of the speaker, you first need to lower the level of the track on -6 dB, and then start the declipper. Those who do not believe computers can simply stick between the CD player and the amplifier studio expander. This device In essence, it is done in the same way - as it can restores and pulls the peaks compressed over the dynamics of the audio signal. There are similar devices from the 80-90s not to say to be very expensive, and as an experiment, try them very interesting.

The dynamic range controller DBX 3BX processes the signal separately in three stripes - LF, SC and RF

Once the equalizers were for granted component of the audio system, and no one was afraid of them. Today it is not necessary to level the high frequencies of the magnetic tape, but with the ugly dynamics it is necessary to solve something, brothers.

Dynamic compression (Dynamic Range Compression, DRC) is a narrowing (or expansion in the case of an expander) of the dynamic range of the phonogram. Dynamic rangeThis is the difference between the most quiet and loudest sound. Sometimes the most quiet in the phonogram will be the sound of a little loud level of noise, and sometimes a little quieter of the most loud. Hardware devices and programs carrying out dynamic compression are called compressors, highlighting four main groups: compressors, limiter, expanders and gates.

Lamp Analog Compressor DBX 566

Reduced and promoting compression

Lowing compression (Downward Compression) Reduces the sound volume when it starts exceeding a certain threshold value, leaving quieter sounds unchanged. Extreme option of lower compression is limiter. Enhancement compression (Upward Compression), on the contrary, increases the volume of the sound, if it is below the threshold, without affecting more loud sounds. At the same time, both types of compression narrow the dynamic range of the audio signal.

Lowing compression

Enhancement compression

Expander and Gate

If the compressor reduces the dynamic range, the expander increases it. When the signal level becomes above the threshold level, the expander increases it even more, thus increasing the difference between loud and quiet sounds. Such devices are often used when recording drum installation to separate the sounds of some drums from others.

The type of expander, which is not used not to enhance loud, and to dry the quiet sounds that do not exceed the level of the threshold value (for example, background noise) is called Noise Gate.. In such a device, as soon as the sound level becomes less than the threshold, the signal pass is stopped. Typically, the gate is used to suppress noise in pauses. On some models it can be done so that the sound when the threshold level does not stop sharply, but gradually roamed. In this case, the attenuation speed is set by the Decay regulator (recession).

Gate, like other types of compressors, maybe frequency-dependent (i.e., in different ways to process certain frequency bands) and can operate in mode side-Chain. (see below).

The principle of operation of the compressor

The signal falling into the compressor is divided into two copies. One copy is sent to the amplifier, in which the degree of amplification is controlled by an external signal, the second copy - forms this signal. It enters the device called side-chain, where the signal is measured, and the envelope is created based on this data describing the change in its volume.
So the most modern compressors are arranged, this is the so-called FEED-FORWARD type. In older devices (FEEDBACK type), the signal level is measured after the amplifier.

There are various analog control technologies (Variable-Gain Amplification), each with its advantages and disadvantages: lamps, optical using photoresistra and transistum. When working with digital audio (in sound editor or DAW), their own mathematical algorithms can be used or the operation of analog technology can be entered.

The main parameters of compressors

Threshold.

The compressor reduces the audio signal if its amplitude primaries a specific threshold value (THRESHOLD). It is usually indicated in decibels, with a lower threshold (for example, -60 DB) means that the sound will be processed than with a higher threshold (for example, -5 dB).

Ratio.

The degree of level decrease is determined by the Ratio parameter: Ratio 4: 1 means that if the input level is 4 dB exceeds the threshold, the output level will be higher than the threshold by 1 dB.
For example:
Threshold \u003d -10 db
Input signal \u003d -6 DB (on 4 dB above threshold)
Output signal \u003d -9 dB (on 1 dB above threshold)

It is important to keep in mind that suppressing the signal level continues and some time after it falls below the threshold level, and this time is determined by the parameter value release.

Compression with the maximum value of Ratio ∞: 1 is called Limiting. This means that any signal above the threshold level is suppressed before the threshold level (with the exception of a short period after a sharp increase in the input volume). For details, see below "Limiter".

Examples of various Ratio values

Attack and Release

The compressor provides certain control over how quickly it responds to changing the signal dynamics. The Attack parameter defines the time for which the compressor reduces the gain coefficient to the level, which is determined by the Ratio parameter. Release Defines the time for which the compressor, on the contrary, increases the gain coefficient, or returns to normal if the input signal level drops below the threshold value.

ATTACK and Release phases

These parameters indicate the time (usually in milliseconds), which will be required to change the strengthening to a certain amount of decibel, is usually 10 dB. For example, in this case, if ATTACK is set to 1 ms, to reduce the gain by 10 dB, 1 ms will be required, and 20 dB - 2 ms.

In many compressors, the Attack and Release parameters can be configured, but in some they are initially set and not regulated. Sometimes they are designated as "Automatic" or "Program dependent", i.e. vary depending on the input signal.

Knee.

Another compressor parameter: hard / Soft Knee. It determines whether the beginning of the application of compression is sharp (Hard) or gradual (Soft). Soft Knee reduces the slumbering of the transition from the raw signal to the signal subjected to compression, especially at high Ratio values \u200b\u200band sharp volume increases.

Hard Knee and Soft Knee Compression

PEAK and RMS.

The compressor can react to peak (short-term maximum) values \u200b\u200bor on the averaged input level. The use of peak values \u200b\u200bcan lead to sharp fluctuations in the degree of compression, and even to distortion. Therefore, compressors apply averaging function (usually this is RMS) input signal when comparing it with a threshold value. It gives a more comfortable compression, close to the human perception of the volume.

RMS is a parameter reflecting the average volume of the phonogram. From a mathematical point of view RMS (Root Mean Square) is the rms value of the amplitude of a certain number of samples:

Stereo Linking.

Compressor in Stereo Linking mode applies the same gain to both stereo channels. This avoids the displacement of the stereopanorama, which can be the result of the individual processing of the left and right channels. Such a displacement occurs if, for example, any loud element panted not in the center.

Makeup Gain.

Since the compressor reduces the overall signal level, the possibility of fixed gain at the output is usually added, which allows you to get the optimal level.

LOOK-AHEAD.

The Look-AHead function is designed to solve problems peculiar both too large and too small values \u200b\u200battack and release. Too much attacks do not allow you to effectively intercept transients, but too small may not be comfortable for the listener. When using the LOOK-AHEAD function, the main signal is delayed relative to the controller, it allows you to start compression in advance, even before the signal reaches the threshold value.
The only disadvantage of this method is the time delay of the signal, which in some cases undesirable.

Use of dynamic compression

Compression is used everywhere, not only in musical phonograms, but also everywhere, where you need to increase the overall volume, without increasing the peak levels where the inexpensive sound-reproducing equipment is used or a limited transmission channel (alert system, amateur radio, etc.) .

Compression is applied when playing background music (in stores, restaurants, etc.), where any noticeable volume changes are undesirable.

But the most important scope of applying dynamic compression is musical production and broadcasting. Compression is used to give the sound of "density" and "drive" for a better combination of tools with each other, and especially when processing vocals.

Vocal parties in rock and pop music are usually subjected to compression to highlight them on the background of the accompaniment and add clarity. A special type of compressor, configured only on certain frequencies - deesser, is used to suppress hissing background.

In the instrumental parties, the compression is also used for the effects that are not directly related to the volume, for example, the rapidly fading drum sounds can become more prolonged.

In electronic dance music (EDM), Side-chaning is often used (see below) - for example, the bass line can be controlled by a barrel or something similar to prevent the conflict of bass and drums and create a dynamic pulsation.

Compression is widely used in broadcast transmission (radio, television, Internet broadcasting) to increase the perceived volume while reducing the dynamic range of source audio (usually CD). Most countries have legal restrictions on the instant maximum volume, which can be broadcast. Typically, these limitations are implemented by constant hardware compressors in the ethereal chain. In addition, an increase in the perceived volume improves the "quality" of the sound from the point of view of most listeners.

Side-chaning

Another frequently found compressor switch is "Side Chain". In this mode, the compression of the audio does not occur depending on its own level, but depending on the signal level entering the connector, which is so usually called - Side Chain.

This can be found several applications. For example, vocalist Shepelvit and all the letters "C" stand out out of the overall picture. You skip his voice through the compressor, and the Side Chain connector serves the same sound, but missed through the equalizer. On the equalizer you remove all the frequencies, except for those used by vocalist when pronouncing the letter "C". Usually about 5 kHz, but can be from 3 kHz to 8 kHz. If then put a compressor into Side Chain mode, then the compression of the voice will occur in those moments when the letter "C" is pronounced. Thus, it turned out a device known as "Deesser" (DE-ESSER). This method of work is called "frequency dependent" (Frequency Dependent).

Another use of this feature is called "Ducker". For example, on a radio station, music goes through the compressor, and the words of DJ - through a side chain. When DJ starts chatting, the volume of music is automatically reduced. This effect can be successfully used in records, for example, reduce the volume of keyboard batches during singing.

Brick Wall Limiting

The compressor and the limiter are approximately the same, it can be said that the limiter is a high Ratio compressor (from 10: 1) and, usually, low attack time.

There is a BRICK WALL LIMITING concept - a very high ratio limiting (from 20: 1 and above) and a very fast attack. Ideally, it does not allow the signal to exceed the threshold level. The result will be unpleasant for rumor, but this will prevent damage to sound reproducing technology or excess channel bandwidth. Many manufacturers integrate limiter devices for this purpose.

Clipper VS. Limiter, Soft and Hard Clipping

This group of methods is based on the fact that the transmitted signals are subjected to non-linear amplitude transformations, and in transmitting and receiving parts of nonlinearity is converted. For example, if the transmitter uses a nonlinear function ÖU, in the receiver - u 2. The consistent application of the convergent functions will lead to the fact that, in general, the transformation remains linear.

The idea of \u200b\u200bnonlinear data compression methods is reduced to the fact that the transmitter can give a larger range of change in the transmitted parameter with the same amplitude of the output signals (that is, greater dynamic range). Dynamic range - This is expressed in relative units or decibellah attitude of the greatest admissible signal amplitude to the smallest:

;	(2.17)
.	(2.18)

Natural desire to increase the dynamic range by reducing U MIN is limited by the sensitivity of the equipment and an increase in the effect of interference and its own noise.

Most often, the compression of the dynamic range is carried out using a pair of convergent functions of logarithming and potentiation. The first operation of changing amplitude is called compression(compression), second - expandment (stretching). The choice of these functions is related to their greatest compression capability.

At the same time, these methods have disadvantages. The first of them is that the logarithm of a small number is negative and in the limit:

that is, the sensitivity is very nonlinear.

To reduce these drawbacks, both functions are modified by offset and approximation. For example, for telephone channels, the approximated function is related (type A,):

and a \u003d 87.6. The gain from compression is 24dB.

Data compression by nonlinear procedures is implemented by analog facilities with large errors. The use of digital tools can significantly improve the accuracy or speed of the transformation. At the same time, the direct use of funds computer equipment (that is, the direct calculation of logarithms and exponentials) will give no better result due to low speed and accumulating calculation error.

Data compression by compression due to accuracy restrictions is used in non-response cases, for example, to transmit speech on telephone and radio channels.

Effective coding

Effective codes were offered to Sundon, Fano and Hafman. The essence of the codes is that they are uneven, that is, with a different category of discharges, and the length of the code is inversely proportional to the probability of its appearance. Another remarkable feature of effective codes - they do not require separators, that is, special characters separating the neighboring code combinations. This is achieved by observing simple rule: Shorter codes are not the beginning of longer. In this case, the solid stream of binary discharges is uniquely decoded, since the decoder reveals the shortest code combinations first. Effective codes for a long time were purely academic, but recently used in the formation of databases, as well as in compressing information in modern modems and in software archivers.

Due to unevenness, the average code length is introduced. Medium length - mathematical expectation of code length:

moreover, L CP tends to h (x) from above (that is, L Wed\u003e H (x)).

The implementation of the condition (2.23) is enhanced by increasing N.

There are two varieties of effective codes: Shannon Fano and Hafman. Consider their receipt on the example. Suppose the probabilities of the characters in the sequence are the meanings shown in Table 2.1.

Table 2.1.

Probabilities of symbols

N.
P I.	0.1	0.2	0.1	0.3	0.05	0.15	0.03	0.02	0.05

Symbols are ranked, that is, they seek in a row on descending probability. After that, according to the Shennon Fano method, the following procedure is periodically repeated: the entire group of events is divided into two subgroups with the same (or approximately the same) total probabilities. The procedure continues until one element remains in the next subgroup, after which this element is eliminated, and with the remaining these actions continue. This happens until the last two subgroups remain one element. Continue consideration of our example, which is reduced in Table 2.2.

Table 2.2.

Chennon Fano method

N.	P I.
4	0.3		I.
	0.2	I.	II.
6	0.15		I.	I.
	0.1			II.
1	0.1			I.	I.
9	0.05	II.			II.
5	0.05		II.		I.
7	0.03			II.	II.	I.
8	0.02					II.

As can be seen from Table 2.2, the first symbol with a probability P 4 \u003d 0.3 participated in two partitioning procedures and both times hit the group with number i. In accordance with this, it is encoded by two-bit code II. The second element in the first stage of the partition belonged to Group I, on the second - group II. Therefore, its code 10. The codes of the rest of the characters in additional comments do not need.

Usually uneven codes are depicted in the form of code trees. The code tree is a graph indicating the allowed code combinations. Pre-specify the directions of the ribs of this graph, as shown in Fig.2.11 (the choice of directions is arbitrary).

The graph is guided as follows: make up a route for a dedicated symbol; The number of discharges for it is equal to the number of edges in the route, and the value of each discharge is equal to the direction of the corresponding rib. The route is made up source point (on the drawing it is marked with a letter a). For example, the route to the vertex 5 consists of five ribs, of which everything, in addition to the latter, have direction 0; We get the code 00001.

Calculate for this example entropy and middle length of the word.

H (x) \u003d - (0.3 log 0.3 + 0.2 log 0.2 + 2 0.1 Log 0.1+ 2 0.05 log 0.05+

0.03 Log 0.03 + 0.02 log 0.02) \u003d 2.23 bits

l cp \u003d 0.3 2 + 0.2 2 + 0.15 3 + 0.1 3 + 0.1 4 + 0.05 5 +0.05 4+

0.03 6 + 0.02 6 = 2.9 .

As can be seen, the medium length of the word is close to entropy.

Hafman codes are built on a different algorithm. The encoding procedure consists of two stages. At the first stage, one-time compression of the alphabet is consistently. One-time compression is the replacement of the last two characters (with lower probabilities) one, with a total probability. Compression is carried out until two characters remain. At the same time fill the coding table in which the resulting probabilities are affixed, and also depict routes for which new characters are moving at the next stage.

At the second stage, the coding itself occurs, which begins from the last stage: the first of two characters assign code 1, the second - 0. After that, go to the previous stage. To the symbols that did not participate in the compression at this stage, attribute codes from the subsequent stage, and to the two latest characters twice attribute the symbol code obtained after gluing, and add to the upper symbol code 1, Lower - 0. If the symbol is further in gluing Participates, its code remains unchanged. The procedure continues to the end (that is, until the first stage).

Table 2.3 shows coding along the Hafman algorithm. As can be seen from the table, the coding was carried out in 7 stages. On the left are the probabilities of characters, right - intermediate codes. The arrows show moving newly formed characters. At each stage, the last two characters differ only with the younger discharge, which corresponds to the coding technique. We calculate the average length of the word:

l cf \u003d 0.3 2 + 0.2 2 + 0.15 3 ++ 2 0.1 3 + +0.05 4 + 0.05 5 + 0.03 6 + 0.02 6 \u003d 2.7

It is even closer to the entropy: the code is even more effective. In fig. 2.12 shows the Hafman Code tree.

Table 2.3.

Coding on the Hafman algorithm

N.	P I.	the code	I.	II.	III	IV	V.	VI	VII
	0.3		0.3 11	0.3 11	0.3 11	0.3 11	0.3 11	0.4 0	0.6 1
	0.2		0.2 01	0.2 01	0.2 01	0.2 01	0.3 10	0.3 11	0.4 0
	0.15		0.15 101	0.15 101	0.15 101	0.2 00	0.2 01	0.3 10
	0.1		0.1 001	0.1 001	0.15 100	0.15 101	0.2 00
	0.1		0.1 000	0.1 000	0.1 001	0.15 100
	0.05		0.05 1000	0.1 1001	0.1 000
	0.05		0.05 10011	0.05 1000
	0.03		0.05 10010
	0.02

Both codes satisfy the requirement of decoding uniqueness: as can be seen from the tables, shorter combinations are not the beginning of longer codes.

With increasing number of symbols, the effectiveness of codes increase, so in some cases encoded larger blocks (for example, if we are talking about texts, you can encode some of the most common syllables, words, and even phrases).

The effect of the implementation of such codes is determined in comparison with the uniform code:

(2.24)

where n is the number of uniform code discharges, which is replaced with effective.

Modifications of Khafman codes

The classic Hafman algorithm refers to two-passable, i.e. Requires the initial set of statistics on symbols and messages, and then the procedures described above. It is inconvenient in practice, because it increases the processing time of messages and the accumulation of the dictionary. Single-pass methods in which accumulation and coding procedures are combined. Such methods are also called adaptive compression along Hafman [46].

The essence of adaptive compression across Hafman is reduced to the construction of the initial code tree and its consistent modification after the receipt of each next symbol. As before, the trees here are binary, i.e. From each vertex of the graph - wood, a maximum of two arcs occurs. It is customary to call the original peak by the parent, and the two associated next vertices - children. We introduce the concept of weight of the vertex - this is the number of characters (words) corresponding to this vertex obtained when the initial sequence is applied. Obviously, the sum of the scales of children is equal to the weight of the parent.

After entering the next symbol of the input sequence, the code tree is revised: the weights of the vertices are recalculated and, if necessary, the vertices are rearranged. The rule of rearrangement of the vertices as follows: the weights of the lower vertices are the smallest, and the vertices that are left on the column have the smallest weights.

At the same time, the vertices are numbered. The numbering begins with the lower (hanging, i.e. who do not have children) vertices from left to right, then transferred to upper level etc. to the numbering of the last, source vertex. At the same time, the following result is achieved: the less weight of the vertex, the less its number.

The permutation is carried out mainly for hanging vertices. When permutation, the formulated rule is considered: the tops with high weight have a larger number.

After passing the sequence (it is also called control or test), the code combinations are assigned to all hanging vertices. The rule assignment rule is similar to the above: the number of code discharges is equal to the number of vertices through which the route runs from the source to this hanging vertex, and the value of a specific discharge corresponds to the direction from the parent to the "child" (say, the transition to the left from the parent corresponds to the value 1, right - 0 ).

The obtained code combinations are entered into the memory of the compression device along with their analogues and form a dictionary. The use of the algorithm is as follows. The compressible sequence of characters is divided into fragments in accordance with the existing dictionary, after which each of the fragments is replaced by its code from the dictionary. Fragments not detected in the dictionary form new hanging vertices, gain weight and are also entered into the dictionary. This is formed by an adaptive algorithm for a dictionary replenishment.

To increase the efficiency of the method, it is desirable to increase the size of the dictionary; In this case, the compression coefficient is rising. Virtually the size of the dictionary is 4 - 16 KB of memory.

We illustrate the algorithm given by an example. In fig. 2.13 shows the source diagram (it is also called with a hafman tree). Each vertex of wood is shown by a rectangle, in which two digits are inscribed through the fraction: the first means the number of the vertices, the second is its weight. How can you make sure that the versic weights and their numbers are satisfied.

Suppose now that the symbol corresponding to the vertex 1, in the test sequence met the secondary. The weight of the vertices changed, as shown in Fig. 2.14, as a result, the number of numbering the vertex is violated. At the next stage, we change the layout of hanging vertices, for which we change the vertices 1 and 4 and renumbers all the vertices of the tree. The resulting graph is shown in Fig. 2.15. Next, the procedure continues similarly.

It should be remembered that each hanging peak in the Hafman tree corresponds to a specific symbol or their group. The parent is different from children by the fact that a group of characters, it is appropriate to him, for one symbol in short, than his children, and these children differ in the last symbol. For example, the parents correspond to the "car" symbols; Then children may have a "Kara" and "carp" sequences.

The above algorithm is not academic and is actively used in programs - archivers, including when compressing graphic data (they will be discussed below).

Lempel - Ziva algorithms

These are the most commonly used compression algorithms. They are used in most programs - archivers (for example, Pkzip. Arj, LHA). The essence of algorithms is that some set of characters is replaced when archiving it in a specially generated dictionary. For example, often found in the affairs of the phrase "on your letter outgoing number ..." can occupy in the dictionary position 121; Then, instead of transferring or storing the mentioned phrase (30 bytes), you can store the phrase number (1.5 bytes in binary - decimal form or 1 byte - in binary).

Algorithms are named after the authors who first offered them in 1977. Of these, the first - LZ77. For archiving, the so-called sliding window consisting of two parts is created. The first part, greater format, serves to form a dictionary and has a size of the order of several kilobytes. In the second, smaller part (usually up to 100 bytes) are accepted by the current characters of the text being viewed. The algorithm is trying to find in the dictionary Set of characters coinciding with the viewed window. If it is possible, a code consisting of three parts is generated: a displacement in the dictionary regarding its initial substring, the length of this substring next to this substrate character. For example, a dedicated substrate consists of "application" symbols (only 6 characters), the following symbol is "e". Then, if the substring has an address (place in the dictionary) 45, then the record in the dictionary has the form "45, 6. E". After that, the contents of the window shifts to the position, and the search continues. Thus, a dictionary is formed.

The advantage of the algorithm is an easily formalized algorithm for compiling a dictionary. In addition, it is possible to unzip and without the initial dictionary (it is desirable to have a test sequence) - the dictionary is formed in the process of unimber.

The disadvantages of the algorithm appear with an increase in the size of the dictionary - the time to search is increasing. In addition, if a string of characters are missing in the current window, each symbol is written to three-element code, i.e. It turns out not compression, but stretching.

Best characteristics It has the LZSS algorithm proposed in 1978. It has differences in maintaining the sliding window and the output codes of the compressor. In addition to the window, the algorithm forms a binary tree, similar to the Hafman tree to speed up the search for coincidences: each substring leaving the current window is added to the tree as one of the children. Such an algorithm allows you to further increase the size of the current window (it is desirable that its value equal to the degree of two: 128, 256, etc. byte). The sequence codes are also formed differently: 1-bit prefix is \u200b\u200badditionally introduced for distinguishing the non-projected characters from pairs "offset, length".

An even greater compression is obtained using LZW type algorithms. The previously described algorithms have a fixed window size, which leads to the impossibility of entering into the dictionary of phrases is longer than the window size. In the LZW algorithms (and their predecessor lz78) the view window has an unlimited size, and the dictionary accumulates the phrase (and not a totality of characters as before). The dictionary has an unlimited length, and the encoder (decoder) operates in the mode of standby mode. When the phrase that coincides with the dictionary is formed, the coincidence code is issued (i.e. code of this phrase in the dictionary) and the code of the following symbol behind it. If as symbols accumulate a new phrase is formed, it is also entered into the dictionary, as the shortest one. As a result, a recursive procedure is formed, providing quick encoding and decoding.

Additional opportunity Compression provides compressed encoding of repetitive characters. If in the sequence, some characters follow in a row (for example, in the text it may be the "space" characters, in the numerical sequence - flowing zeros, etc.), it makes sense to replace their pair "Symbol; Length" or "Sign, Length ". In the first case, the code indicates the feature that the sequence is encoded (usually 1 bit), then the code of the repeating symbol and the length of the sequence. In the second case (provided for the most common repeated symbols) in the prefix indicates simply a sign of repetitions.