Formula Hartley Private case of the formula of Shannon. Lecture: Approaches to the determination of the number of information

Its further development of the theory of information received in the works of Claud Shannon, American engineer and Mathematics (1916 - 2001). Shannon is one of the creators mathematical theory information. Its main works are devoted to the theory of relay-contact schemes, mathematical communication theory, cybernetics. K. Shannon studied information transfer issues in telegraph, telephony or broadcasting in the form of signals of electromagnetic oscillations. One of the tasks that K. Shannon put in front of him was to determine the coding system that allows you to optimize the speed and accuracy of information transfer. Since during the war years, he served in the encryption department, where he was engaged in the development of cryptographic systems, it later helped him open coding methods with error correction. In its works, 1948-1949, K. Shannon defined the amount of information through the entropy - the amount known in thermodynamics and statistical physics as a measure of the disorder of the system, and per unit of the number of information accepted what was subsequently called the bit (BIT).

For further presentation it is necessary to use some concepts of probability theory: Random event, experience, probability of event, random value.

In the world around us, various events occur, and we can intuitively based on experience, evaluate some of them as more possible than others.

Random Call an event that may occur or not to occur as a result of some test, experience or experiment. We will denote events capital letters A, b, c, etc.

Quantitative measure of the possibility of a certain event A. called it probability and indicated as p (a), P - From English Probability. The more possibly the occurrence of a random event, the greater its probability: if A. more possibly) B.T. p (A)\u003e P (B).

The concept is introduced reliable event - an event that will come. This event is denoted Ω And it is believed that his probability p (Ω) \u003d 1.

Impossible Call an event that will never happen. It is denoted "and it is believed that his probability p (æ) \u003d 0. For probabilities of all other events A. Inequality is performed p (æ)< p(A) < p(Ω) , or 0 < p(A) < 1 .

For events, the concept of the amount and work is introduced.

Sum events A + B. - This is an event that consists in an event occur A. or B. Work of events A * B. consists in a simultaneous event A. and B..

Events a and b incompleteIf they cannot come together as a result of one test. The probability of the amount of incomplete events is equal to the sum of their probability. If a BUT and IN incomplete events p (a + b) \u003d p (a) + p (b).



Events A1, A2, A3, ... An Form full groupIf, as a result of experience, at least one of them will come.

If events A1, A2, A3, ... An in pairs are inconsistent and form a complete group, then the sum of their probabilities p1 + P2 + P3 + .... pn \u003d 1.

If they are also equally even, then the probability of each is equal p \u003d 1 / N where n. - The number of events.

Probabilityevents are defined as the ratio of the number of favorable events of experience in the total number of outcomes.

Frequencyevents are an empirical approximation of its probability. It is calculated as a result of a series of experiments as an attitude of the number of experiments in which the event has come to the total number of experiments. With a large number of experiments (tests), the event frequency is committed to its probability.

K. Shannon, using the R. Hartley's approach, drew attention to the fact that when transmitting verbal messages, the frequency (probability) of using various alphabet letters is not the same: some letters are used very often, others are rare.

Consider alphabet A M. consisting of m. Symbols. Denote by p I. probability (frequency) appearance i.Symbol in any position of the transmitted message consisting of n characters.

One i. The alphabet symbol is the amount of information equal - Log 2 (P i). Before the logarithm is "minus" because the amount of information is non-negative, and Log 2 (x)<0 for 0.

There may be any alphabet symbol in the position of each symbol. A M.; The amount of information on one message symbol is equal to the average value of the information on all alphabet characters. A M.:

The total amount of information contained in the message from n. Symbols are:

If all alphabet characters A M. appear with equal probability, then all p i \u003d P. As Σr i \u003d 1T. p \u003d 1 / m.

The formula in the case when all the symbols of the alphabet are equal, takes the form

I \u003d. n.*Log. 2 (m.).

Output: shannon's formula in the case when all alphabet characters are equally transit, goes into the Hartley formula.

In the general case, the number of entropy H of an arbitrary system X. (random variable) which can be in m. Different States x 1, x 2, ... x m With probabilities p 1, P 2, ... P M calculated by shannon formula is equal

Recall that p 1 + p 2 + ... + p m \u003d 1. If everyone p. i same, then all system status X. equivalent; in this case p i \u003d 1 / m, and the formula goes to the Hartley formula: H (x) \u003d log 2 (m).

Comment.The number of entropy of the system (random variable) H. does not depend on what specifically states x 1, x 2, ... x m can be a system, but depends on the number m. these states and probability p 1, P 2, ... P M With which the system can be in these states. This means that two systems in which the number of states is equally, and the likelihood of these states p 1, P 2, ... P M equal (with an accuracy of the order of transmission), have equal entropy.

Theorem.Maximum entropy H (x) It is achieved in the case when all states of the system are equally equal. It means that

If the X system can be only one state ( m \u003d 1.), then her entropy is equal zero.

Consider a system that can only take two states. x1. and x2 With probabilities p1 and p2.:

The number of entropy of such a system is equal

H (x) \u003d - (1/2 * log 2 (1/2) + 1/2 * log 2 (1/2)) \u003d -Log 2 (1/2) \u003d log 2 (2) \u003d 1

This amount is taken per unit of entropy measurement (information) and is called 1 bit (1 bit).

Consider a function

h (x) \u003d - (x * log 2 (x) + (l-x) * log 2 (L-x))

Area of \u200b\u200bits definition - interval (0 ;1) , Lim H (x) \u003d 0 for h.-\u003e 0Ili h.-> 1.

The schedule of this feature is shown in the picture:

Schedule function h (x) \u003d - (x * log 2 (x) + (L - X) * Log 2 (L - X))

If you designate x through p 1, but (1-x) through p 2.T. p 1 + P 2 \u003d 1; p 1, P 2 î (0; 1), h (x) \u003d h (p 1, p 2) \u003d - (p 1 * log 2 (P 1) + (P 2) * log 2 (p 2)) - entropy system with two states; maximum H. achieved at p 1 \u003d P 2 \u003d 0.5.

The H (X) graph can be used when solving the following tasks:

Task 1. There are three random variables X, Y, Z, each of which takes two values \u200b\u200bwith probabilities:

1. P (x \u003d x1) \u003d 0.5; P (x \u003d x2) \u003d 0.5;

2. p (y \u003d y1) \u003d 0.2; P (Y \u003d Y2) \u003d 0.8;

3. p (z \u003d z1) \u003d 0.3; P (z \u003d z2) \u003d 0.7.

Recording P (x \u003d x1) \u003d 0.5 means that the random value X takes the value x1 with a probability of 0.5. It is required to arrange entropy of these systems in ascending order.

Decision.

Entropy h (x) is equal to 1 and will be the greatest;

Entropy H (y) is equal to the value of the function h (x), () at x \u003d 0.2, i.e. H (y) \u003d h (0.2);

Entropy H (z) \u003d H (0.3). According to the graph H (x), it can be determined that H (0.2)< h(0.3). Следовательно, H(Y) < H(Z) < H(X).

Note 1. The entropy of the system is the greater, the less difference between the probabilities of its states from each other.

Based on this, we can conclude that H (y)< H(Z).

For example, if there are probabilities for X and Y with three states: for x (0.4; 0.3; 0.3), for y (0.1; 0.1; 0.8), it is obvious that the uncertainty of the system X is larger than the uncertainty of the Y system: the latter Most likely, the condition will be implemented, the probability of which is 0.8.

Entropy H (x) characterizes the degree of system uncertainty. The larger the amount of information received about the information system, the more information about the system, and the less uncertain its state will be for the recipient of the information.

If the entropy of the system after receiving information becomes equal to zero, this means that the uncertainty disappeared, all entropy "crossed" into information. In this case, it is said that complete information about the X system was obtained. The amount of information acquired with the full clarification of the state of the physical system is equal to the entropy of this system.

If after receiving a certain message, the uncertainty of the X system became less, but did not disappear at all, the amount of information contained in the message is equal to the increment of entropy:

I \u003d h1 (x) - h2 (x),

where h1 (x) and h2 (x) is the entropy of the system before and after the message, respectively. If H2 (x) \u003d 0, then the measure of the system uncertainty is zero and complete information about the system was obtained.

Example. You want to guess the number of points that falls on a playing cube. You received a message that the number of points fell. What amount of information contains this message?

Decision. Entropy System "Playing Cube" H1.equal LOG 2 6.because The cube may randomly take six equal possiblestates (1, 2, 3, 4, 5, 6). The received message reduces the number of possible states up to three: (2, 4, 6), i.e. The entropy system is now equal H2 \u003d log 2 3. The increment of entropy is equal to the number of information obtained I \u003d H1 - H2 \u003d LOG 2 6 - Log 2 3 \u003d log 2 2 \u003d 1 Bit.

On the example of a disassembled task, one of the common definitions of the unit unit can be explained - 1 bits: 1 bit is a number of information that reduces the uncertainty of the system status by two times.

The uncertainty of the discrete system depends on the number of its states N.

Entropy before receiving information H1 \u003d log 2 n. If, after receiving information, the uncertainty decreased twice, then this means that the number of states became equal to N / 2, and the entropy H2 \u003d log 2 N / 2. The number of information received i \u003d h1 -H2 \u003d log 2 n - log 2 n / 2 \u003d log 2 2 \u003d 1 bit.

Consider several tasks on the use of the Shannon and Hartley formula.

Task 2.Can entropy of the system, which takes randomly one of the 4 states, is: a) 3; b) 2.1 c) 1.9 g) 1; e) 0.3? Answer to explain.

Decision.The maximum possible value of the entropy of the system with 4 states reaches in the case when all states are equal. This value according to the Hartley formula is equal to log 2 4 \u003d 2 bits. In all other cases, the entropy of the system with 4 states will be less than 2. Consequently, the possible values \u200b\u200bof entropy from those listed above may be values \u200b\u200bof 1.9, 1, 0.3.

Task 3.The function h (x) \u003d -x * log 2 (x) is set - (1-x) * log 2 (1-x). Place the following values \u200b\u200bin ascending order: H (0.9), H (0.85), H (0.45), H (0.2), H (0.15).

Decision.Use the graph of the function (3.5). The highest value will be H (0.45), the smallest value - H (0.9), then the values \u200b\u200bof H (0.15) and H (0.85) \u003d H (0.15) are ascending; H (0.2). Answer: H (0.9)< H(0.15)=H(0.85)< H(0.2) < H(0.45). É

Task 4.Messages are transmitted over the link: a) "Start_B_10"; b) Loancha_1_V0. Compare the amount of information in the first and second message.

Decision.The first and second message consist of the same characters: the second is obtained from the first as a result of the permutation of these characters. In accordance with Schannon's formula, these messages contain the same amount of information. At the same time, the first message brings the meaningful information, and the second is a simple set of characters. However, in this case, we can say that the second message is a "encrypted" option of the first, and therefore the amount of information in both messages is the same.

Task 5.Three different messages A, B, C are obtained:

A \u003d "Arrival at ten o'clock"; B \u003d "Arrival at ten hours zero minutes"; C \u003d "Arrival exactly ten o'clock." Using the Schannon entropy approach, compare the amount of information contained in these messages.

Decision.Denote the amount of information in messages A, B, C through I (A), I (B), I (C), respectively. In the sense of "content", these messages are exactly the same, but the same content is expressed using a different number of characters. In this case, all symbols of the message A are contained in the message B and C, the message C \u003d A + "Exactly", B \u003d A + "zero minutes"; In accordance with Shannon's approach, we get: I (a)< I(C) < I(B).

Our world is based on three components: substance, energy and information. How many in the world of substances, energy and information? Is it possible to measure them and how exactly? We know how to measure the amount of substance and energy. But what about information? Is it possible to measure it?

Earlier it was noted that there are several approaches to assessing the number of information. Now we will stay in detail on one of them.

Any message will be informative if it replenishes human knowledge, i.e. Reduces the uncertainty of his knowledge.

Equalious events

Example 1.

For example, when throwing a coin, we are trying to guess which side it will fall. One of the outcome options is possible: the coin will be in the "Eagle" or "rush" position. Each of these two events will be equivalent, i.e., none of them have the advantages over to others. Before throwing a coin, no one can know how it falls, i.e. There is an uncertainty of knowledge. After the occurrence of the event, on the contrary, there is a complete certainty, since the throwing receives a visual message about the position of a coin, which, in turn, reduces the uncertainty of his knowledge twice, since one of two equilibrium events occurred.

Example 2.

Another example is the situation with a hexagon cube, i.e. Before the throw, no one can know which side it will fall. In this case, there is an opportunity to get one result of six equivalent. Thus, before throwing the uncertainty of the knowledge of the throwing will be equal to 6, after the throw, it will decrease exactly 6 times, since it is 6 equivalent events that can occur.

Example 3.

Consider an example where 40 tickets prepared for the exam. The probability of events that will occur when pulling the ticket will be equal to 40. And these events will be equal. At the same time, the uncertainty of the student's knowledge before the choice of ticket will be equal to 40. Accordingly, the uncertainty of knowledge after the student took the ticket will decrease 40 times. Let us ask whether this indicator depends on the number of the elongated ticket. No, since events are equally).

After analyzing all the examples discussed above, it can be concluded that the greater the initial number of possible equivalent events, the more time the uncertainty of knowledge decreases, and the more information will be contained in the report of the experiment.

Non-equilibrium events

Consider as an example spoken language. We turn to the facts of proven research, which show that in all collaborative languages, some letters occur much more often than others. The research results confirm that $ 1,000 letters in different collaborative languages \u200b\u200baccount for a different number of repetitions. As examples in the table presents some letters in Russian and English:

Picture 1.

In addition, the probability of the appearance of individual letters will depend on what letters are used in front of them. So, in Russian after a vowel, a soft sign can never stand, as well as in words, four vowels are not used, etc. Spoken languages \u200b\u200bhave, as a rule, their own features and patterns. That is why the amount of information contained in messages of any colloquial language is unacceptable to evaluate using the Hartley formula, which is used in an alphabetical approach to the information evaluation and is characteristic of examples with equivalent events (examples with a coin and a cube).

How to determine how much information contains, for example, the text of the novel "War and Peace", or the frescoes and canvas of the Great Italian artists, or the human genetic code? Answers to these questions and similar science are not yet known and, in all likelihood, will not be known soon. However, everyone is interested, is it possible to objectively assess the amount of information? The task of this kind include the following example.

How to find out whether the equivalent messages are "the first will come out of the building" and "the first will come out of the building"? There is no unambiguous answer to this question. Everything will depend on what kind of building is we talking about. If this is, for example, the building of the gynecological clinic, then the probability of getting the first woman is very high, if it is a military barracks, then the likelihood to go out first for a man will be higher than for a woman, but if this is a cinema building, then probabilities come out first for a man and Women will be the same.

Assessment of the number of information. Formula Shannon

To solve problems of this kind, a total assessment of the number of information proposed by American scientists is used Claude Shannon in 1948 Created by the formula for determining the number of information is able to take into account the possible unequal probability of messages contained in the set. Shannon when creating the formula used used in mathematics and hydrodynamics probabilistic measure of uncertainty (called entropy) in order to fully estimate the status of the system being studied and obtain the highest possible information about the processes in this system. This assessment of the number of information is essentially probabilistic measure, and, as an assessment of uncertainty, it reflects the ability of any source to show all new and new states and thus give information.

Definition 1.

Shannon defined entropy. As an average logarithmic function of many probabilities of possible states of the system (possible outcomes of experience). To calculate Entropy Shannon proposed the following equation:

$ H \u003d - (p_1log_2p_1 + p_2log_2p_2 +... + P_nlog_2p_n) $

where $ p_i $ is the probability of the appearance of $ I $ -th event in a set of $ n $ events.

Then the amount of information obtained as a result of experience will not be other than the difference between the entropy of the system to ($ H_0 $) and after ($ H_1 $) experience:

moreover, if uncertainty as a result of experience is completely excluded, we have:

$ I \u003d \\ Sigma (p_ilog_2p_i), i \u003d 1, \\ dots, n $.

Consider an example confirming the use of this Channon theory in practice.

Example 4.

Pescari and perch dwell in the lake. Calculated the number of individuals in each population (Pescase - $ 1,500 $, and the perch - $ 500 $). It is necessary to determine how much information is contained in the reports that the fisherman caught the sand, perch, in general fish?

Decision. Events of the catch of Pescar or perch are not equal, since the diples in the lake dwells much less than Pescare.

The total number of Pescase and the perch dwelling in the lake:

$1500 + 500 = 2000$.

We define the likelihood of Pescar catch:

$ p_1 \u003d \\ FRAC (1500) (2000) \u003d 0.75 $,

We define the likelihood of the catch of perch:

$ P_2 - \\ FRAC (500) (2000) \u003d 0.25 $.

$ I_1 \u003d log_2 (\\ FRAC (1) (p_1)), i_1 \u003d log_2 (\\ FRAC (1) (P_2)) $

where $ i_1 $ and $ i_2 is the probability of sand catch and perch, respectively.

The amount of information contained in the graduation message:

$ I_1 \u003d log_2 (\\ FRAC (1) (0.75)) "0.43 $ bit,

The amount of information contained in the perch of perch catch:

$ I_2 \u003d log_2 (\\ FRAC (1) (0.25)) »$ 2 bits.

The amount of information contained in the message about the catch of fish (crucian or perch) is calculated by Shannon's formula:

$ I \u003d - p_1log_2p_1 - p_2log_2p_2 $

$ I \u003d -0.75 \\ CDOT log_20,75-0.25 \\ CDOT Log_20,25 \u003d -0.75 \\ CDOT (\\ FRAC (log0,75) (log2)) - 0.25 \\ CDOT (\\ FRAC ( log0.25) (log2)) \u003d 0.604 bits "0.6 $ bit.

Answer: The message contains $ 0.6 $ bit information.

Its further development of the theory of information received in the works of Claud Shannon, American engineer and Mathematics (1916 - 2001). Shannon is one of the creators of the mathematical theory of information. Its main works are devoted to the theory of relay-contact schemes, mathematical communication theory, cybernetics. K. Shannon studied information transfer issues in telegraph, telephony or broadcasting in the form of signals of electromagnetic oscillations. One of the tasks that K. Shannon put in front of him was to determine the coding system that allows you to optimize the speed and accuracy of information transfer. Since during the war years, he served in the encryption department, where he was engaged in the development of cryptographic systems, it later helped him open coding methods with error correction. In its works, 1948-1949, K. Shannon defined the amount of information through the entropy - the amount known in thermodynamics and statistical physics as a measure of the disorder of the system, and per unit of the number of information accepted what was subsequently called the bit (BIT).

For further presentation, it is necessary to use some concepts of probability theory: a random event, experience, probability of an event, a random value. In the world around us, various events occur, and we can intuitively based on experience, evaluate some of them as more possible than others. Random is called an event that may occur or not to step as a result of a certain test, experience or experiment. We denote the events in capital letters a, b, cts etc. The quantitative measure of the possibility of the occurrence of some event is likely to be probability and is referred to asp (A), P- from English Probability. The more possibly the occurrence of a random event, the greater its probability: if the most possibly perhaps, then p (a)\u003e p (b). The concept of a reliable event is introduced - an event that will come. This event denotes it believes that its probability () \u003d 1. It is impossible to call an event that will never happen. Its denotes it believes that its probability () \u003d 0. For probabilities of all other events, inequality is carried out ()< p(A)

For events, the concept of the amount and work is introduced. The sum of events A + B is an event that consists in the occurrence of event A or B. The work of events A * B consists in the simultaneous occurrence of Events A and B. AIB Events incompleteIf they cannot come together as a result of one test. The probability of the amount of incomplete events is equal to the sum of their probability. If a and in incomplete events, then p (a + b) \u003d p (a) + p (b).

Events A1, A2, A3, ... convert full groupIf, as a result of experience, at least one of them will come. If the events are A1, A2, A3, ... angry are incomprehensible and form a complete group, then the sum of their probabilities P1 + P2 + P3 + ... .pn \u003d 1. If they are also equally even, then the probability of each is equal to \u003d 1 / N, wheren the number of events. Probabilityevents are defined as the ratio of the number of favorable events of experience in the total number of outcomes. Frequencyevents are an empirical approximation of its probability. It is calculated as a result of a series of experiments as an attitude of the number of experiments in which the event has come to the total number of experiments. With a large number of experiments (tests), the event frequency is committed to its probability.

K. Shannon, using the R. Hartley's approach, drew attention to the fact that when transmitting verbal messages, the frequency (probability) of using various alphabet letters is not the same: some letters are used very often, others are rare.

Consider the alphabet A m consisting of isms. Denote by the probability (frequency) of the appearance of the i-th symbol in any position of the transmitted message consisting of N characters. One IMA Symbol of the alphabet carries the amount of information equal to -Log 2 (P i). Before the logarithm costs "minus" because the amount of information is non-negative, and Log 2 (x)<0 при 0

There may be any character of the alphabet A m in the place of each symbol; The amount of information on one message symbol is equal to the average value of the information on all alphabet characters A M:

The total number of information contained in the message from n characters is:

(3.2)

If all the characters of the alphabet A M appear with an equal probability, then all p i \u003d p. So probes i \u003d 1, then p \u003d 1 / m.

Formula (3.2) in the case when all alphabet characters are equal, takes

Conclusion: Shannon formula (3.2) in the case when all alphabet symbols are equally equal to the Hartley formula (2.2).

In the general case, the number of entropy of the HPC system X (random variable), which may be in M \u200b\u200bof various states x 1, x 2, ... x m turns into P 1, P 2, ... P M, calculated by Shannon formula, equal

(3.3)

Recall that p 1 + p 2 + ... + p m \u003d 1. If all P i are the same, then all states of the system X are equally equal; In this case, p i \u003d 1 / m, and formula (3.3) enters the Hartley formula (2.5): h (x) \u003d log 2 (m).

Comment.The number of entropy of the system (random variable) X does not depend on which specifically states x 1, x 2, ... xm can be system, but depends on the number of states and from probabilities P 1, P 2, ... PM, with which the system can To be in these states. This means that two systems in which the number of states is equally, and the probabilities of these states P 1, P 2, ... P M are equal to (with an accuracy of the order of the listing), have equal entropy.

Theorem.Maximum entropy H (x) is achieved when all states of the system are equally equal. It means that

(3.4)

If the X system can be only in one state (m \u003d 1), then its entropy is zero. Consider the system that can only take two states x1 and x2 with probabilities P1 and P1:

The number of entropy of such a system is equal

H (x) \u003d - (1/2 * log 2 (1/2) + 1/2 * log 2 (1/2)) \u003d -Log 2 (1/2) \u003d log 2 (2) \u003d 1

This amount is taken as a unit of measurement of entropy (information) and is called 1 bit (1 bit).

Consider a function

h (x) \u003d - (x * log 2 (x) + (1-x) * Log 2 (1-x)). (3.5)

The area of \u200b\u200bits definition is the interval (0; 1), Limh (x) \u003d 0, at x0 or 1. The schedule of this function is shown in the figure:

Fig. 4. Function schedule (3.5)

If you designate x by p 1, a (1-x) via P 2, TOP 1 + P 2 \u003d 1; P 1, P 2  (0; 1), H (x) \u003d h (p 1, p 2) \u003d - (P 1 * Log 2 (P 1) + (P 2) * Log 2 (P 2)) - entropy of a system with two states; Maximum h is achieved by 1 \u003d p 2 \u003d 0.5.

The H (X) graph can be used when solving the following tasks:

Task 1. Three random variables X, Y, Z are given, each of which takes two values \u200b\u200bwith probabilities:

    P (x \u003d x1) \u003d 0.5; P (x \u003d x2) \u003d 0.5;

    P (y \u003d y1) \u003d 0.2; p (y \u003d y2) \u003d 0.8;

    P (z \u003d z1) \u003d 0.3; P (z \u003d z2) \u003d 0.7.

Recording P (x \u003d x1) \u003d 0.5 means that the random value X takes the value x1 with a probability of 0.5. It is required to arrange entropy of these systems in ascending order.

Decision. Entropy h (x) is equal to 1 and will be the greatest; Entropy h (y) is equal to the value of the function h (x), see (3.5), atx \u003d 0.2, i.e.h (y) \u003d h (0.2); entropyh (z) \u003d H (0.3). According to the graph H (x), it can be determined that H (0.2)< h(0.3). Следовательно, H(Y) < H(Z) < H(X).

Note 1.The entropy of the system is the greater, the less difference between the probabilities of its states from each other. Based on this, we can conclude that H (y)< H(Z). Например, если для систем X и Y с тремя состояниями заданы вероятности: дляX{0.4; 0.3; 0.3}, дляY{0.1; 0.1; 0.8}, то очевидно, что неопределённость системыXбольше, чем неопределённость системыY: у последней, скорее всего, будет реализовано состояние, вероятность которого равна 0.8 .

Entropy H (x) characterizes the degree of system uncertainty. The larger the amount of information received about the information system, the more information about the system, and the less uncertain its state will be for the recipient of the information.

If the entropy of the system after receiving information becomes equal to zero, this means that the uncertainty disappeared, all entropy "crossed" into information. In this case, it is said that full information was obtained about the system X. Number of information acquired with full clarification of the state of the physical system, equal to the entropy of this system.

If, after receiving a certain message, the uncertainty of the system is less, but did not disappear at all, the amount of information contained in the message is equal to the increment of entropy:

I \u003d h1 (x) - h2 (x), (3.6)

where h1 (x) and h2 (x) is the entropy of the system before and after the message, respectively. If H2 (x) \u003d 0, then the measure of the system uncertainty is zero and complete information about the system was obtained.

Example. You want to guess the number of points that falls on a playing cube. You received a message that the number of points fell. What amount of information contains this message?

Decision. The entropy of the "Playing Cube" system H1 is equal to Log 2 6, because The cube may randomly take six equal possiblestates (1, 2, 3, 4, 5, 6). The received message reduces the number of possible states up to three: (2, 4, 6), i.e. The entropy of the system is now equal to H2 \u003d log 2 3. The increment of entropy is equal to the number of information obtained i \u003d H1 - H2 \u003d LOG 2 6 - Log 2 3 \u003d Log 2 2 \u003d 1bit.

On the example of a disassembled task, one of the common definitions of the unit unit can be explained - 1 bits: 1 bit is the amount of information that reduces the uncertainty of the system status twice.The uncertainty of the discrete system depends on the number of its states. Entropy before receiving informationH1 \u003d log 2 N. If, after receiving information, the uncertainty decreased twice, then this means that the number of states has become equal toN / 2, and entropyH2 \u003d log 2 N / 2. The number of information receivedi \u003d H1 -H2 \u003d log 2 n-log 2 n / 2 \u003d log 2 2 \u003d 1 bits.

Consider several tasks on the use of the Shannon and Hartley formula.

Task 2.Can entropy of the system, which takes randomly one of the 4 states, is: a) 3; b) 2.1 c) 1.9 g) 1; e) 0.3? Answer to explain.

Decision.The maximum possible value of the entropy of the system with 4 states reaches in the case when all states are equal. This value according to the Hartley formula is equal toLog 2 4 \u003d 2 bits. In all other cases, the entropy of the system with 4 states will be less than 2. Consequently, the possible values \u200b\u200bof entropy from those listed above may be values \u200b\u200bof 1.9, 1, 0.3.

Task 3.Function is set (x) \u003d -x * log 2 (x) - (1-x) * log 2 (1-x). Place the following values \u200b\u200bin ascending order: H (0.9), H (0.85), H (0.45), H (0.2), H (0.15).

Decision.Use the graph of the function (3.5). The highest value will be h (0.45), the smallest value - H (0.9), then ascending values \u200b\u200bare valid (0.15) ich (0.85) \u003d H (0.15); H (0.2). Answer: H (0.9)

Task 4.Messages are transmitted over the link: a) "Start_B_10"; b) "Loancha_1_V0". Compare the amount of information in the first and second message.

Decision.The first and second message consist of the same characters: the second is obtained from the first as a result of the permutation of these characters. In accordance with Schannon's formula, these messages contain the same amount of information. At the same time, the first message brings the meaningful information, and the second is a simple set of characters. However, in this case, we can say that the second message is a "encrypted" option of the first, and therefore the amount of information in both messages is the same.

Task 5.Three different posts are obtained, B, C:

A \u003d "Arrival at ten o'clock"; B \u003d "Arrival at ten hours zero minutes"; c \u003d "Arrival is exactly ten o'clock." Using the Schannon entropy approach, compare the amount of information contained in these messages.

Decision.Denote the amount of information in messages A, B, C throughi (A), I (B), I (C), respectively. In the sense of "content", these messages are exactly the same, but the same content is expressed using a different number of characters. In this case, all symbols of the message A are contained in the message B and C, the message C \u003d A + "Exactly", B \u003d A + "zero minutes"; In accordance with Shannon's approach, we get: I (a)< I(C) < I(B).

American Engineer R. Hartley In 1928 the process of obtaining information considered as a choice of one message from the final alternation of a given set from N. equivalent messages, and the amount of information I.contained in the selected message, defined as binary logarithm N. .

Formula Hartley:

I \u003d log2. N.

Suppose you need to guess one number from a set of numbers from one to one hundred. According to the Hartley formula, you can calculate how much information is required for this: i \u003d log2100\u003e 6,644. Thus, the message about the correctly guessed number contains the amount of information, approximately equal to 6.644 units of information.

We give others examples of equivalent messages:

1. When throwing coins: "Rusk dropped", "Eagle fell";

2. On the book page: "The number of letters is clear", "The number of letters of letters".

We define now are equivalent messages "The first will come out of the door of the building. and "The first will come out of the door of the building a man". Unambiguously answer this question can not. It all depends on what kind of building is we talking about. If this is, for example, the metro station, then the probability of getting out of the doors is the first one for a man and a woman, and if it is a military barracks, then for a man this probability is significantly higher than for a woman.

For the tasks of this kind of american scientist Claude Shannon Suggested in 1948 another formula for determining the number of information that takes into account the possible unequal probability of messages in the set.

Shannon formula:

I \u003d - ( p.1Log2. p.1 + p.2 log2. p.2 +... + p.N log2. pN.),


Where pI - the likelihood that i.-E-message is highlighted in the set of N. messages.

It is easy to see that if probabilities p.1, ...,pN. equal, then each of them is equal to 1 / N.And Shannon's formula turns into the Hartley formula.

Claude Shannon determined information , as removed uncertainty . More precisely, the receipt of information is a necessary condition for removing uncertainty. Uncertainty arises in a selection situation. The task that is solved during the removal of uncertainty is to reduce the number of options under consideration (a decrease in diversity), and eventually the choice of one corresponding situation of the option from the number of possible. Decisions of uncertainty makes it possible to make informed solutions and act. This is the management role of information.

Imagine that you went to the store and asked to sell you a chewing gum. The saleswoman who, who, let's say, 16 grades of chewing gum is in a state of uncertainty. She cannot fulfill your request without more information. If you specified, say, "Orbit", and from 16 initial options for the saleswoman now considers only 8, you have reduced its uncertainty twice (running forward, let's say that reducing uncertainty twice complies with obtaining 1 bit of information ). If you, without causing a custody, simply indicated a finger on the shop window, "this is this!", The uncertainty was completely removed. Again, running forward, let's say that this gesture in this example you have informed the saleswoman 4 bits of information.

Situation maximum uncertainty Presses the presence of several equalious Alternatives (options), i.e. None of the options is more preferable. And the more equivalent options observed, the greater the uncertainty, the more difficult it is to make an unambiguous choice and the more information is required to do this. For N. Options This situation is described by the following probability distribution: (1 / N.,1/ N., …,1/ N.} .

Minimum uncertainty equal to 0. this situation full certainty , meaning that the choice is made, and all the necessary information is obtained. The distribution of probabilities for a situation of complete certainty looks like this: (1, 0, ... 0).

The quantity characterizing the amount of uncertainty in the theory of information is indicated by the symbol. H. and has a name entropy , more precisely information entropy. .

Entropy ( H.) – measure of uncertainty , expressed in bits. Also entropy can be viewed as measure uniformity of distribution random variable.

Fig. 3.4 The behavior of entropy for the case of two alternatives

In fig. 3.4 shows the behavior of entropy for the case of two alternatives, when changing the ratio of their probabilities ( P., (1-P.)).

The maximum value of entropy reaches in this case when both probabilities are equal to each other and are equal to 1/2, the zero value of entropy corresponds to cases ( P.0=0, P.1 \u003d 1) and ( P.0=1, P.1=0).

Number of information I. and entropy H. Characterize the same situation, but from highly opposite sides. I is the amount of information that is required to remove the uncertainty H. By definition of Leon Brilllyuen information is negative entropy(negentropium) .

When the uncertainty is completely removed, the number of information received I. equally originally existing uncertainty H..

In case of partial removal of uncertainty, the resulting amount of information and the remaining unnecessary uncertainty is in the amount of initial uncertainty. HT + IT \u003d H(Fig. 3.5).

Fig. 3.5 Communication between entropy and the number of information

For this reason, the formulas that will be presented below to calculate entropy H. are both formulas for calculating the number of information I.. when it comes to full removal of uncertainty, H.they can be replaced by I..

In general, Entropy H. and the amount obtained as a result of the removal of the uncertainty of information I. depend on the initial number of options under consideration N. and a priori probability of the implementation of each of them P:{p.0,p.1, …,pN-1), i.e. H \u003d F.(N.,P.). The calculation of entropy in this case is produced according to Shannon formula Proposed by him in 1948 in the article "Mathematical Communication Theory".

In particularwhen all options easually sound, Dependence remains only on the number of options under consideration, i.e. H \u003d F.(N.). In this case, the Channon formula is greatly simplified and coincides with formula Hartley , which was first suggested by the American engineer Ralph Hartley in 1928, i.e. 20 years earlier.

Shannon's formula has the following form:

The minus sign in formula (2.1) does not mean that entropy is a negative value. It is explained by the fact that pI£ 1 by definition, and the logarithm of a smaller unit is a negative value. By the property of the logarithm, so this formula can be recorded in the second version, without a minus before the sum of the amount.

Expression is interpreted as a private amount of information. IT.obtained in case of implementation i.Option. Entropy in Shannon's formula is an average characteristic - mathematical expectation of the distribution of random variable ( I.0,I.1, …,I N-1} .

We give an example of calculating entropy according to Shannon formula. Let in some institution, the composition of workers are distributed as follows: 3/4 - Women, 1/4 - Men. Then uncertainty, for example, as to whom you will meet the first, going to the institution, will be calculated next to the actions shown in Table. 3.1.

Table 3.1.

pI 1/pI II \u003dlog2 (1 / pI),bit pI *log2 (1 / pI),bit
J. 3/4 4/3 log2 (4/3) \u003d 0.42 3/4 * 0,42=0,31
M. 1/4 4/1 log2 (4) \u003d 2 1/4 * 2=0,5
å H \u003d.0,81bit

We have already mentioned that the Hartley formula is a special case of Shannon's formula for equivalent alternatives.

Substituting in formula (2.1) instead pI it (in an equivalent case, independent of i.) Value, we get:

Thus, the formula Hartley looks very simple:

It clearly follows that the more alternatives ( N.), the greater the uncertainty ( H.). Logarithmation based on 2 provides the number of options to units of measurement of information - bits. Figure 3.6 presents the dependence of entropy on the number of equivalent selection options.

Fig. 3.6 Entropy dependence on the number of equilibrium selection options (equivalent alternatives)

To solve the inverse problems when uncertainty is known ( H.) or the amount of information obtained as a result of its removal ( I.) And you need to determine how much equally alternatively corresponds to the emergence of this uncertainty, use the reverse formula of Hartley, which looks even easier:

For example, if it is known that as a result of the determination of the fact that the Kolya Ivanov interested in the second floor, 3 bits of information were obtained, the number of floors in the house can be determined by formula (2.3) as N \u003d23= 8Jetages.

If the question is as follows: "In the house of 8 floors, how much information did we receive, having learned that the Kolya Ivanov interested in the second floor?" It is necessary to use the formula (2.2): I \u003d.log2 (8) \u003d 3 bits.

So far, we have given formulas for calculating entropy (uncertainty) H.pointing that H. They can be replaced by I.because the amount of information received with full displacement of uncertainty Some situation, quantitatively equal to the initial entropy of this situation.

But uncertainty can only be removed partially, therefore the amount of information Iderived from some message is calculated as reducing entropy that occurred as a result of receiptthis messages.

For an equivalent caseUsing to calculate the entropy formula Hartley, we get:

The second equality is displayed on the basis of the properties of the logarithm. Thus, in an equilibious case I. depends on how many times The amount of selection options under consideration has changed (diversity under consideration).

Based on (3.5), you can withdraw the following:

If, then - the full removal of uncertainty, the number of information received in the message is equal to the uncertainty that existed before receiving the message.

If, then uncertainty has not changed, therefore, there was no information.

If, then \u003d\u003e,

if, then \u003d\u003e.

Those. The number of information received will be a positive value if, as a result of receiving a message, the number of alternatives under consideration decreased, and negative, if more.

If the number of alternatives under consideration as a result of receiving the message was halved, i.e. I.\u003d log2 (2) \u003d 1 bit.In other words, obtaining 1 bits of information excludes from consideration of half of equivalent options.

Consider as an example experience with a deck of 36 cards (Fig.3.7).

Fig. 3.7 Illustration for experience with a deck of 36 cards

Let someone takes out one card from the deck. We are interested in which one of the 36 cards he took out. The initial uncertainty, calculated by formula (3.2), is H \u003d.log2 (36) @ 5,17 bit. The expected map tells us some of the information. Using formula (3.5), we determine how much information we receive from these messages:

Option A. "This is a map of red suit."

I.\u003d log2 (36/18) \u003d log2 (2) \u003d 1bits (red cards in the deck of half, uncertainty decreased by 2 times).

Variant B. "This is a platform of a peak".

I.\u003d log2 (36/9) \u003d Log2 (4) \u003d 2 bits (peak cards make up a quarter of the decks, the uncertainty decreased by 4 times).

Option C. "This is one of the senior cards: rings, lady, king or ace."

I.\u003d Log2 (36) -Log2 (16) \u003d 5.17-4 \u003d 1.17 bits (uncertainty decreased more than twice, therefore the amount of information obtained is greater than one bit).

Variant D. "This is one card from the deck."

I.\u003d log2 (36/36) \u003d Log2 (1) \u003d 0 bits (uncertainty did not decrease - the message is not informative).

Embodiment E. "This is a lady peak."

I.\u003d log2 (36/1) \u003d Log2 (36) \u003d 5.17 bits (uncertainty completely removed).

Task 1.What amount of information will contain a visual message about the color of the broken ball, if 50 white, 25 red, 25 blue balls are located in an opaque bag?

Decision.

1) Total balls 50 + 25 + 25 \u003d 100

2) Ball probabilities 50/100 \u003d 1/2, 25/100 \u003d 1/4, 25/100 \u003d 1/4

3)I. \u003d - (1/2 Log21 / 2 + 1/4 log21 / 4 + 1/4 log21 / 4) \u003d - (1/2 (0-1) +1/4 (0-2) +1/4 (0 -2)) \u003d \u003d 1.5 bits

Task 2. The basket lies 16 balls of different colors. How much information is the message that you got a white ball?

Decision. Because N \u003d 16 balls, then i \u003d log2 n \u003d log2 16 \u003d 4 bits.

Task 3.In the basket lie black and white balls. Among them18 black balls. The message that the white ball was taken, carries 2 bits of information. How many balls in the basket?

1) 18 2) 24 3) 36 4)48

Decision. We find the probability of obtaining a white ball according to Shannon: log2n \u003d 2, n \u003d 4, therefore, the probability of obtaining a white bowl is 1/4 (25%), and the likelihood of obtaining a black ball, respectively, 3/4 (75%). If 75% of all black balls, their number 18, then 25% of all white balls, their number (18 * 25) / 75 \u003d 6.

It remains to find the number of all balls in the basket 18 + 6 \u003d 24.

Answer: 24 balls.

Task 4.In some country, a car number of 5 characters is made up of capital letters (30 letters are used) and decimal digits in any order. Each symbol is encoded in the same and minimally possible amount of bits, and each number is the same and minimally possible by bytes. Determine the amount of memory required for storing 50 car numbers.

1) 100 byte 2) 150 bytes 3) 200 bytes 4) 250 bytes

Decision. The number of characters used to encode the number is: 30 letters + 10 digits \u003d 40 characters. The amount of information carrying one character is 6 bits (2i \u003d 40, but the amount of information cannot be fractional number, therefore we take the nearest degree of twos of a large number of characters 26 \u003d 64).

We found the amount of information embedded in each symbol, the number of characters in the room is 5, therefore, 5 * 6 \u003d 30 bits. Each number is 30 bits of information, but by the condition of the task, each number is encoded the same and the minimum possible amount of bytes, therefore, we need to know how much byte in 30 bits. If it is divided 30 to 8, a fractional number will be obtained, and we need to find a whole amount byte for each number, so we find the nearest multiplier of 8-ki, which will exceed the number of bits, it is 4 (8 * 4 \u003d 32). Each number is encoded with 4 bytes.

For storage 50 car numbers, you will need: 4 * 50 \u003d 200 bytes.

The choice of the optimal strategy in the game "Guess the number". At the receipt of the maximum number of information, the choice of an optimal strategy in the game "Guess the number" is built, in which the first participant makes an integer (for example, 3) from a specified interval (for example, from 1 to 16), and the second one should "guess" the intended number. If you consider this game from an information point of view, the initial uncertainty of knowledge for the second participant is 16 possible events (options for mysterious numbers).

With an optimal strategy, the number interval should always share in half, then the number of possible events (numbers) in each of the intervals obtained will be the same and adjusting the intervals is equally. In this case, at each step, the response of the first player ("Yes" or "No") will bear the maximum amount of information (1 bit).

As can be seen from the table. 1.1, guessing the number 3 occurred in four steps, at each of which the uncertainty of knowledge of the second participant decreased twice by receiving a message from the first participant containing 1 bit of information. Thus, the amount of information needed to gueads one of the 16 numbers, amounted to 4 bits.

Check questions and tasks

1. A priori is known that the ball is in one of the three urns: a, in or C. Determine how many bits of information contains a message that it is in the urn V.

Options:1bit,1,58bit,2bit,2,25bit.

2. The probability of the first event is 0.5, and the second and third 0.25. What the distribution is equal to information entropy. Options:0,5bit,1 bit,1,5bit,2bit,2,5bit,3bit.

3. Here is a list of employees of some organization:

Determine the amount of information that is missing in order to fulfill the following requests:

Please call Ivanov to phone.

I am interested in one of your employee, it is born in 1970.

4. Which of the messages bears more information:

· As a result of taking a coin (eagle, rush), the rush dropped.

· On the traffic light (red, yellow, green) is now green light.

· As a result of the recovery of the playing bone (1, 2, 3, 4, 5, 6), 3 points fell.

The most widespread when determining the average number of information, which is contained in messages from the sources of the most different nature, got an approach. To Shannon. Consider the following situation.
Source transmits elementary signals k. Different types. Let's follow a fairly long segment of the message. Let it have N.1 of the first type signals, N.2 second-type signals, ..., N.k. Signals k.- type, and N.1 + N.2 + ... + N.k. = N. - the total number of signals in the observed segment, f.1, f.2, ..., f.k. - frequencies of the corresponding signals. As an increase in the length of the segment of the message, each of the frequencies tends to a fixed limit, i.e.
Lim. f.i. = p.i., (i. = 1, 2, ..., k.),
Where ri. You can consider the probability of the signal. Suppose a signal received i.-Ho type with probability ri.containing - log p.i. Units of information. In the segment under consideration i.- a signal will meet approximately NP.i. times (we assume that N. large enough), and general information delivered by signals of this type will be equal to the work NP.i. Log. ri.. The same refers to signals of any other type, so the full amount of information delivered by the segment from N. signals will be approximately equal

To determine the average amount of information on one signal, i.e. Specific information source, you need to divide this number on N.. With an unlimited growth, approximate equality will go into exact. As a result, an asymptotic ratio will be obtained - the formula of Shannon

Recently, it has become no less common than the famous Einstein formula E. = mC. 2. It turned out that the formula proposed by Hartley is a special case of a more general formula of Shannon. If in the schannam formula to accept that
r1 = p.2 = ... = ri. = ... =p.N. = 1/N.T.

The minus sign in the Shannon formula does not mean that the amount of information in the message is a negative value. It is explained by the fact that the probability rAccording to definition, less than one, but more zero. Since the logarithm of a smaller unit, i.e. Log. p.i. - The value is negative, then the product of the probability on the logarithm of the number will be positive.
In addition to this formula, Shannon proposed an abstract communication scheme consisting of five elements (source of information, transmitter, communication lines, receiver and addressee), and formulated bandwidth, noise immunity, coding, etc.
As a result of the development of information theory and its applications, Shannon's ideas quickly distributed their influence on the most different areas of knowledge. It was seen that the formula of Shannon is very similar to the formula of entropy used in physics, derived by the Boltzmann. Entropy denotes the degree of disorder of statistical forms of molecules. Entropy is maximal with an equivalent distribution of molecules motion parameters (direction, speed and spatial position). The entropy value decreases if the movement of molecules is arranged. As the ordering arrangement increases, the entropy tends to zero (for example, when only one value and the direction of speed is possible). When drawing up a message (text) with the help of entropy, it is possible to characterize the degree of fragileness of motion (alternation) of characters. The text with maximum entropy is text with an equilibious distribution of all alphabet letters, i.e. With meaningless alternation of letters, for example: YkhzzzzchchkchchkchchchkchSBSM. If the actual probability of letters is taken into account when drawing up text, then in the thus obtained "phrases" there will be a certain orderliness of the motion of the letters, regulated by the frequency of their appearance: the ote of the OKRS of the AKSH TSHI.
When taking into account the probabilities of four-letter combinations, the text becomes so ordered that, according to some formal features, it is approaching meaningful: it's not dry and nepo and Corco. The reason for such an ordering in this case is information on statistical patterns of texts. In meaningful texts, orderly, naturally, even higher. So, in the phrase came ... Spring we have even more information about the movement (alternation) of letters. Thus, the text to the text increases the ordering and the information we have about the text, and the entropy (measure of disorder) decreases.
Using the difference in the formulas of the number of information of Shannon and the Ertropy of the Boltzmann (different signs), L. Brillurian characterized information as negative entropy, or negentropy.. Since entropy is a measure of disordex, then information can be defined as measurement of material systems .
Due to the fact that the appearance of the formula coincides, it can be assumed that the concept of information does not add anything to the concept of entropy. However, it is not. If the concept of entropy was previously used only for systems, seeking thermodynamic equilibrium, i.e. To the maximum disorder in the movement of its components, to an increase in entropy, the concept of information has also paid attention to those systems that do not increase entropy, but on the contrary, being in a state with small values \u200b\u200bof entropy, tend to further reduce it.

It is difficult to overestimate the importance of the ideas of the theory of information in the development of a wide variety of scientific areas.
However, according to K. Shannon, all unresolved problems cannot be solved with such magical words such as "information", "entropy", "redundancy".
The theory of information is based on probabilistic, statistical patterns of phenomena. It gives a useful, but not a versatile apparatus. Therefore, many situations do not fit into the Shannon's information model. It is not always possible to predetermine the list of all state states and calculate their probabilities. In addition, only the formal side of the message is considered in the theory of information, while its meaning remains aside. For example, the system of radar stations leads to observation of the airspace in order to detect the opponent aircraft system S.followed by observation, can be in one of two states x.1 - the enemy is, x.2 - no enemy. The importance of the first message cannot be assessed using a probabilistic approach. This approach and a measure based on it of the amount of information express, first of all, the "structural-syntactic" side of its transfer, i.e. Express the relationship of signals. However, the concepts of "probability", "uncertainty", with which the concept of information is associated, assume the process of choice. This process can only be implemented if there are many possibilities. Without this, the conditions can be assumed, the transmission of information is impossible.