Using the Kalman filter to filter the values obtained from the sensors. Kalman filtering Optimal Kalman filter

On the Internet, including on Habré, you can find a lot of information about the Kalman filter. But it's hard to find an easily digestible derivation of the formulas themselves. Without a conclusion, all this science is perceived as a kind of shamanism, formulas look like a faceless set of symbols, and most importantly, many simple statements that lie on the surface of a theory are beyond comprehension. The purpose of this article will be to talk about this filter in as accessible language as possible.
Kalman filter is a powerful data filtering tool. Its main principle is that when filtering, information about the physics of the phenomenon itself is used. For example, if you are filtering data from the speedometer of a car, then the inertia of the car gives you the right to perceive too fast jumps in speed as a measurement error. The Kalman filter is interesting because, in a sense, it is the best filter. We will discuss in more detail below what exactly the words "the best" mean. At the end of the article I will show that in many cases the formulas can be simplified to such an extent that almost nothing will remain of them.

Educational program

Before getting acquainted with the Kalman filter, I propose to recall some simple definitions and facts from the theory of probability.

Random value

When they say that a random variable is given, they mean that this quantity can take random values. It takes different values with different probabilities. When you roll, say, a die, a discrete set of values will drop out:. When it comes to, for example, the speed of a wandering particle, then, obviously, one has to deal with a continuous set of values. We will denote the "dropped out" values of a random variable by, but sometimes, we will use the same letter, which we use to denote a random variable:.
In the case with a continuous set of values, the random variable is characterized by the probability density, which dictates to us that the probability that the random variable will "fall out" in a small neighborhood of a point with length is equal to. As we can see from the picture, this probability is equal to the area of the shaded rectangle under the graph:

Quite often in life, random variables are Gaussian when the probability density is equal.

We see that the function has the shape of a bell centered at a point and with a characteristic width of the order.
Since we are talking about the Gaussian distribution, it would be a sin not to mention where it came from. Just as numbers are firmly established in mathematics and occur in the most unexpected places, so the Gaussian distribution took deep roots in the theory of probability. One remarkable statement that partially explains the Gaussian omnipresence is the following:
Let there be a random variable with an arbitrary distribution (in fact, there are some restrictions on this arbitrariness, but they are not at all strict). Let's carry out experiments and calculate the sum of the "dropped out" values of the random variable. Let's do a lot of these experiments. It is clear that each time we will receive a different value of the amount. In other words, this sum is itself a random variable with its own certain distribution law. It turns out that for large enough the distribution law of this sum tends to the Gaussian distribution (by the way, the characteristic width of the "bell" grows like). Read more in Wikipedia: Central Limit Theorem. In life, very often there are quantities that are made up of a large number of equally distributed independent random variables, and therefore are distributed according to Gaussian.

Mean

The average value of a random variable is what we get in the limit if we carry out a lot of experiments and calculate the arithmetic mean of the dropped values. The mean is denoted in different ways: mathematicians like to denote by (expectation), and foreign mathematicians by (expectation). Physicists are through or. We will designate in a foreign way:.
For example, for a Gaussian distribution, the mean is.

Dispersion

In the case of the Gaussian distribution, we clearly see that the random variable prefers to fall out in some vicinity of its mean value. As can be seen from the graph, the characteristic scatter of the order values. How can we estimate this spread of values for an arbitrary random variable, if we know its distribution. You can draw a graph of its probability density and estimate the characteristic width by eye. But we prefer to go along the algebraic path. You can find the average length of the deviation (modulus) from the mean:. This value will be a good estimate of the characteristic scatter of values. But you and I know very well that using modules in formulas is one headache, so this formula is rarely used to estimate the characteristic spread.
An easier way (simple in terms of calculations) is to find. This value is called variance, and is often referred to as. The root of the variance is called the standard deviation. The standard deviation is a good estimate of the spread of a random variable.
For example, for a Gaussian distribution, we can calculate that the variance defined above is exactly equal, which means that the standard deviation is equal, which agrees very well with our geometric intuition.
In fact, there is a small fraud hidden here. The fact is that in the definition of the Gaussian distribution, under the exponent is the expression. This two in the denominator is precisely so that the standard deviation would be equal to the coefficient. That is, the Gaussian distribution formula itself is written in a form specially sharpened so that we will consider its standard deviation.

Independent random variables

Random variables are dependent and not. Imagine that you are throwing a needle onto a plane and writing down the coordinates of both ends. These two coordinates are dependent, they are related by the condition that the distance between them is always equal to the length of the needle, although they are random values.
The random variables are independent if the result of the first of them is completely independent of the result of the second of them. If the random variables are independent, then the average value of their product is equal to the product of their average values:

Proof

For example, having blue eyes and graduating from high school with a gold medal are independent random variables. If blue-eyed, say, gold medalists, then blue-eyed medalists.This example tells us that if random variables are given by their probability densities and, then the independence of these values is expressed in the fact that the probability density (the first value dropped out, and the second) is found by the formula:

It immediately follows from this that:

As you can see, the proof is carried out for random variables that have a continuous spectrum of values and are given by their probability density. In other cases, the idea of proof is similar.

Kalman filter

Formulation of the problem

Let's denote by the value that we will measure, and then filter. This can be coordinate, speed, acceleration, humidity, stench, temperature, pressure, etc.
Let's start with a simple example that will lead us to formulate a general problem. Imagine that we have a radio-controlled car that can only go back and forth. Knowing the weight of the car, shape, road surface, etc., we calculated how the controlling joystick affects the speed of movement.

Then the coordinate of the car will change according to the law:

In real life, we cannot take into account in our calculations small perturbations acting on the car (wind, bumps, pebbles on the road), so the real speed of the car will differ from the calculated one. A random variable is added to the right side of the written equation:

We have a GPS sensor installed on a typewriter that tries to measure the true coordinate of the car, and, of course, cannot measure it exactly, but measures it with an error, which is also a random variable. As a result, we receive erroneous data from the sensor:

The task is that, knowing the wrong sensor readings, find a good approximation for the true coordinate of the car.
In the formulation of the general task, anything can be responsible for the coordinate (temperature, humidity ...), and the term responsible for controlling the system from the outside will be denoted by (in the example with a machine). The equations for the coordinate and sensor readings will look like this:

Let's discuss in detail what we know:

It is worth noting that the filtering task is not an anti-aliasing task. We are not trying to smooth the data from the sensor, we are trying to get the closest value to the real coordinate.

Kalman's algorithm

We will argue by induction. Imagine that at the th step we have already found the filtered value from the sensor, which approximates the true coordinate of the system well. Do not forget that we know the equation that controls the change in the unknown coordinate:

therefore, not yet receiving the value from the sensor, we can assume that at a step the system is evolving according to this law and the sensor will show something close to. Unfortunately, we cannot say anything more precise so far. On the other hand, at a step, we will have an inaccurate sensor reading on our hands.
Kalman's idea is as follows. To get the best approximation to the true coordinate, we must choose the middle ground between the reading of an inaccurate sensor and our prediction of what we expected from it. We will give weight to the sensor reading and the weight will remain on the predicted value:

The coefficient is called the Kalman coefficient. It depends on the iteration step, so it would be more correct to write it, but for now, in order not to clutter up the calculation formulas, we will omit its index.
We must choose the Kalman coefficient so that the resulting optimal coordinate value would be closest to the true one. For example, if we know that our sensor is very accurate, then we will trust its reading more and give the value more weight (close to one). If the sensor, on the contrary, is completely inaccurate, then we will focus more on the theoretically predicted value.
In general, to find the exact value of the Kalman coefficient, you just need to minimize the error:

We use equations (1) (those in the box with a blue background) to rewrite the expression for the error:

Proof

Now is the time to discuss what does the expression minimize error mean? After all, the error, as we can see, is itself a random variable and each time takes on different values. In fact, there is no one-size-fits-all approach to defining what it means that the error is minimal. Just like in the case with the variance of a random variable, when we tried to estimate the characteristic width of its spread, so here we will choose the simplest criterion for calculations. We'll minimize the mean of the squared error:

Let's write out the last expression:

Proof

From the fact that all random variables included in the expression for are independent, it follows that all "cross" terms are equal to zero:

We used the fact that then the variance formula looks much simpler:.

This expression takes on a minimum value when (we equate the derivative to zero):

Here we already write an expression for the Kalman coefficient with the step index, thereby we emphasize that it depends on the iteration step.
We substitute the obtained optimal value into the expression for, which we have minimized. We receive;

Our task has been accomplished. We got an iterative formula to calculate the Kalman coefficient.
Let's summarize our acquired knowledge in one frame:

Example

Matlab code

Clear all; N = 100% number of samples a = 0.1% acceleration sigmaPsi = 1 sigmaEta = 50; k = 1: N x = k x (1) = 0 z (1) = x (1) + normrnd (0, sigmaEta); for t = 1: (N-1) x (t + 1) = x (t) + a * t + normrnd (0, sigmaPsi); z (t + 1) = x (t + 1) + normrnd (0, sigmaEta); end; % kalman filter xOpt (1) = z (1); eOpt (1) = sigmaEta; for t = 1: (N-1) eOpt (t + 1) = sqrt ((sigmaEta ^ 2) * (eOpt (t) ^ 2 + sigmaPsi ^ 2) / (sigmaEta ^ 2 + eOpt (t) ^ 2 + sigmaPsi ^ 2)) K (t + 1) = (eOpt (t + 1)) ^ 2 / sigmaEta ^ 2 xOpt (t + 1) = (xOpt (t) + a * t) * (1-K (t +1)) + K (t + 1) * z (t + 1) end; plot (k, xOpt, k, z, k, x)

Analysis

If we trace how the Kalman coefficient changes with the iteration step, it can be shown that it always stabilizes to a certain value. For example, when the rms errors of the sensor and the model relate to each other as ten to one, then the plot of the Kalman coefficient depending on the iteration step looks like this:

In the next example, we will discuss how this can make our life much easier.

Second example

In practice, it often happens that we know nothing at all about the physical model of what we are filtering. For example, you want to filter the readings from your favorite accelerometer. You do not know in advance by what law you intend to turn the accelerometer. The most information you can grab is the variance of the sensor error. In such a difficult situation, all ignorance of the motion model can be driven into a random variable:

But, frankly speaking, such a system does not at all satisfy the conditions that we imposed on the random variable, because now all the physics of motion unknown to us is hidden there, and therefore we cannot say that at different moments of time the model errors are independent of each other and that their mean values are zero. In this case, by and large, the Kalman filter theory is not applicable. But, we will not pay attention to this fact, but, stupidly apply all the colossal formulas, choosing the coefficients by eye, so that the filtered data looks cute.
But you can take a different, much simpler path. As we saw above, the Kalman coefficient always stabilizes towards a value with increase. Therefore, instead of choosing the coefficients and and finding the Kalman coefficient using complex formulas, we can consider this coefficient to be always constant, and select only this constant. This assumption spoils almost nothing. First, we are already illegally using Kalman's theory, and secondly, the Kalman coefficient quickly stabilizes to a constant. As a result, everything will be very simplified. In general, we do not need any formulas from Kalman's theory, we just need to find an acceptable value and insert it into the iterative formula:

The following graph shows data from a fictional sensor filtered in two different ways. Provided that we do not know anything about the physics of the phenomenon. The first way is honest, with all the formulas from Kalman's theory. And the second one is simplified, without formulas.

As we can see, the methods are almost the same. A small difference is observed only at the beginning, when the Kalman coefficient has not yet stabilized.

Discussion

As we have seen, the main idea of the Kalman filter is to find a coefficient such that the filtered value

on average, it would be the least different from the real value of the coordinate. We can see that the filtered value is a linear function of the sensor reading and the previous filtered value. And the previous filtered value is, in turn, a linear function of the sensor reading and the previous filtered value. And so on, until the chain fully unfolds. That is, the filtered value depends on of all previous sensor readings linearly:

Therefore, the Kalman filter is called a linear filter.
It can be proven that the Kalman filter is the best of all linear filters. Best in the sense that the mean square of the filter error is minimal.

Multidimensional case

The whole theory of the Kalman filter can be generalized to the multidimensional case. The formulas there look a little more scary, but the very idea of their derivation is the same as in the one-dimensional case. You can see them in this excellent article: http://habrahabr.ru/post/140274/.
And in this wonderful video an example of how to use them is analyzed.

Wiener filters are best suited for processing processes or sections of processes in general (block processing). For sequential processing, a current estimate of the signal at each clock cycle is required, taking into account the information entering the filter input during the observation process.

With Wiener filtering, each new signal sample would require recalculation of all filter weights. Currently, adaptive filters have become widespread, in which the incoming new information is used to continuously correct the previously made signal assessment (target tracking in radar, automatic control systems in control, etc.). Of particular interest are the adaptive filters of the recursive type known as the Kalman filter.

These filters are widely used in control loops in automatic regulation and control systems. It was from there that they appeared, as evidenced by such a specific terminology used to describe their work as the state space.

One of the main tasks to be solved in the practice of neural computing is to obtain fast and reliable algorithms for learning neural networks. In this regard, it may be useful to use linear filters in the feedback loop. Since the training algorithms are iterative in nature, such a filter must be a sequential recursive estimator.

Parameter estimation problem

One of the problems of the theory of statistical decisions, which are of great practical importance, is the problem of estimating the state vectors and parameters of systems, which is formulated as follows. Suppose it is necessary to estimate the value of the vector parameter $ X $, which is inaccessible to direct measurement. Instead, another parameter $ Z $ is measured, depending on $ X $. The estimation problem is to answer the question: what can be said about $ X $, knowing $ Z $. In the general case, the procedure for the optimal evaluation of the vector $ X $ depends on the accepted criterion of the quality of the evaluation.

For example, the Bayesian approach to the problem of parameter estimation requires complete a priori information about the probabilistic properties of the parameter being estimated, which is often impossible. In these cases, they resort to the method of least squares (OLS), which requires much less a priori information.

Let us consider the application of OLS for the case when the observation vector $ Z $ is related to the parameter estimation vector $ X $ by a linear model, and the observation contains noise $ V $, uncorrelated with the estimated parameter:

$ Z = HX + V $, (1)

where $ H $ is a transformation matrix describing the relationship between the observed quantities and the estimated parameters.

The estimate $ X $ minimizing the squared error is written as follows:

$ X_ (оц) = (H ^ TR_V ^ (- 1) H) ^ (- 1) H ^ TR_V ^ (- 1) Z $, (2)

Let the noise $ V $ not be correlated, in this case the matrix $ R_V $ is just the identity matrix, and the equation for the estimation becomes simpler:

$ X_ (ots) = (H ^ TH) ^ (- 1) H ^ TZ $, (3)

Writing in matrix form greatly saves paper, but it may be unusual for some. The following example, taken from the monograph by Yu. M. Korshunov, "Mathematical Foundations of Cybernetics", illustrates all this.
The following electrical circuit is available:

The observed values in this case are the readings of the devices $ A_1 = 1 A, A_2 = 2 A, V = 20 B $.

In addition, the resistance $ R = 5 $ Ohm is known. It is required to estimate in the best way, from the point of view of the criterion of the minimum mean square of the error, the values of the currents $ I_1 $ and $ I_2 $. The most important thing here is that there is some relationship between the observed values (instrument readings) and the estimated parameters. And this information is brought in from the outside.

In this case, these are Kirchhoff's laws, in the case of filtering (which will be discussed further) - an autoregressive model of a time series, which assumes the dependence of the current value on the previous ones.

So, knowledge of Kirchhoff's laws, which has nothing to do with the theory of statistical decisions, makes it possible to establish a connection between the observed values and the estimated parameters (who studied electrical engineering - they can check, the rest will have to take their word for it):

$$ z_1 = A_1 = I_1 + \ xi_1 = 1 $$

$$ z_2 = A_2 = I_1 + I_2 + \ xi_2 = 2 $$

$$ z_2 = V / R = I_1 + 2 * I_2 + \ xi_3 = 4 $$

It's in vector form:

$$ \ begin (vmatrix) z_1 \\ z_2 \\ z_3 \ end (vmatrix) = \ begin (vmatrix) 1 & 0 \\ 1 & 1 \\ 1 & 2 \ end (vmatrix) \ begin (vmatrix) I_1 \ \ I_2 \ end (vmatrix) + \ begin (vmatrix) \ xi_1 \\ \ xi_2 \\ \ xi_3 \ end (vmatrix) $$

Or $ Z = HX + V $, where

$$ Z = \ begin (vmatrix) z_1 \\ z_2 \\ z_3 \ end (vmatrix) = \ begin (vmatrix) 1 \\ 2 \\ 4 \ end (vmatrix); H = \ begin (vmatrix) 1 & 0 \\ 1 & 1 \\ 1 & 2 \ end (vmatrix); X = \ begin (vmatrix) I_1 \\ I_2 \ end (vmatrix); V = \ begin (vmatrix) \ xi_1 \\ \ xi_2 \\ \ xi_3 \ end (vmatrix) $$

Considering the values of the interference uncorrelated with each other, we find the estimate of I 1 and I 2 by the method of least squares in accordance with formula 3:

$ H ^ TH = \ begin (vmatrix) 1 & 1 & 1 \\ 0 & 1 & 2 \ end (vmatrix) \ begin (vmatrix) 1 & 0 \\ 1 & 1 \\ 1 & 2 \ end (vmatrix) = \ begin (vmatrix) 3 & 3 \\ 3 & 5 \ end (vmatrix); (H ^ TH) ^ (- 1) = \ frac (1) (6) \ begin (vmatrix) 5 & -3 \\ -3 & 3 \ end (vmatrix) $;

$ H ^ TZ = \ begin (vmatrix) 1 & 1 & 1 \\ 0 & 1 & 2 \ end (vmatrix) \ begin (vmatrix) 1 \\ 2 \\ 4 \ end (vmatrix) = \ begin (vmatrix) 7 \ \ 10 \ end (vmatrix); X (ots) = \ frac (1) (6) \ begin (vmatrix) 5 & -3 \\ -3 & 3 \ end (vmatrix) \ begin (vmatrix) 7 \\ 10 \ end (vmatrix) = \ frac (1) (6) \ begin (vmatrix) 5 \\ 9 \ end (vmatrix) $;

So $ I_1 = 5/6 = 0.833 A $; $ I_2 = 9/6 = 1.5 A $.

Filtration task

In contrast to the problem of estimating parameters that have fixed values, in the problem of filtering, it is required to estimate the processes, that is, to find the current estimates of a time-varying signal, distorted by interference, and, therefore, inaccessible to direct measurement. In general, the type of filtering algorithms depends on the statistical properties of the signal and noise.

We will assume that the useful signal is a slowly varying function of time, and the interference is uncorrelated noise. We will use the least squares method, again due to the lack of a priori information about the probabilistic characteristics of the signal and interference.

First, we obtain an estimate of the current value of $ x_n $ based on the available $ k $ of the latest values of the time series $ z_n, z_ (n-1), z_ (n-2) \ dots z_ (n- (k-1)) $. The observation model is the same as in the parameter estimation problem:

It is clear that $ Z $ is a column vector consisting of the observed values of the time series $ z_n, z_ (n-1), z_ (n-2) \ dots z_ (n- (k-1)) $, $ V $ - vector-column of interference $ \ xi _n, \ xi _ (n-1), \ xi_ (n-2) \ dots \ xi_ (n- (k-1)) $, distorting the true signal. What do the symbols $ H $ and $ X $ mean? What, for example, the column vector $ X $ can we talk about if all that is needed is to give an estimate of the current value of the time series? And what is meant by the transformation matrix $ H $ is not clear at all.

All these questions can be answered only if the concept of a signal generation model is introduced into consideration. That is, some model of the original signal is needed. This is understandable, in the absence of a priori information about the probabilistic characteristics of the signal and interference, all that remains is to make assumptions. You can call it a fortune-telling on the coffee grounds, but experts prefer a different terminology. On their "hair dryer" it is called a parametric model.

In this case, the parameters of this particular model are estimated. When choosing a suitable signal generation model, remember that any analytical function can be expanded in a Taylor series. An amazing property of the Taylor series is that the form of a function at any finite distance $ t $ from some point $ x = a $ is uniquely determined by the behavior of the function in an infinitely small neighborhood of the point $ x = a $ (we are talking about its derivatives of the first and higher orders ).

Thus, the existence of Taylor series means that the analytic function has an internal structure with a very strong coupling. If, for example, we restrict ourselves to three members of the Taylor series, then the signal generation model will look like this:

$ x_ (n-i) = F _ (- i) x_n $, (4)

$$ X_n = \ begin (vmatrix) x_n \\ x "_n \\ x" "_ n \ end (vmatrix); F _ (- i) = \ begin (vmatrix) 1 & -i & i ^ 2/2 \\ 0 & 1 & -i \\ 0 & 0 & 1 \ end (vmatrix) $$

That is, formula 4, for a given order of the polynomial (in the example, it is 2) establishes a connection between the $ n $ -th value of the signal in the time sequence and the $ (n-i) $ -th. Thus, the estimated state vector in this case includes, in addition to the actual estimated value, the first and second derivatives of the signal.

In automatic control theory, such a filter would be called a 2nd order astatism filter. The transformation matrix $ H $ for this case (the estimation is carried out on the basis of the current and $ k-1 $ previous samples) looks like this:

$$ H = \ begin (vmatrix) 1 & -k & k ^ 2/2 \\ - & - & - \\ 1 & -2 & 2 \\ 1 & -1 & 0.5 \\ 1 & 0 & 0 \ end (vmatrix) $$

All these numbers are obtained from the Taylor series under the assumption that the time interval between adjacent observed values is constant and equal to 1.

So, the filtering task under our assumptions has been reduced to the task of estimating parameters; in this case, the parameters of the adopted signal generation model are estimated. And the estimation of the values of the state vector $ X $ is carried out according to the same formula 3:

$$ X_ (ots) = (H ^ TH) ^ (- 1) H ^ TZ $$

Essentially, we have implemented a parametric estimation process based on an autoregressive model of the signal generation process.

Formula 3 can be easily implemented programmatically, for this you need to fill in the matrix $ H $ and the vector column of observations $ Z $. These filters are called finite memory filters, since they use the last $ k $ observations to get the current estimate of $ X_ (nоц) $. At each new cycle of observation, a new one is added to the current set of observations and the old one is discarded. This process of obtaining grades is called sliding window.

Growing memory filters

Filters with finite memory have the main disadvantage that after each new observation, it is necessary to re-perform a complete recalculation of all data stored in memory. In addition, the calculation of estimates can be started only after the results of the first $ k $ observations have been accumulated. That is, these filters have a long transient time.

To combat this drawback, it is necessary to switch from a persistent memory filter to a filter with growing memory... In such a filter, the number of observed values for which the estimation is made must coincide with the number n of the current observation. This makes it possible to obtain estimates starting from the number of observations equal to the number of components of the estimated vector $ X $. And this is determined by the order of the adopted model, that is, how many terms from the Taylor series are used in the model.

At the same time, with increasing n, the smoothing properties of the filter improve, that is, the accuracy of the estimates increases. However, the direct implementation of this approach is associated with an increase in computational costs. Therefore, growing memory filters are implemented as recurrent.

The fact is that by the time n we already have an estimate $ X _ ((n-1) оц) $, which contains information about all previous observations $ z_n, z_ (n-1), z_ (n-2) \ dots z_ (n- (k-1)) $. The estimate $ X_ (nоц) $ is obtained from the next observation $ z_n $ using the information stored in the estimate $ X _ ((n-1)) (\ mbox (оц)) $. This procedure is called recurrent filtering and consists of the following:

according to the estimate $ X _ ((n-1)) (\ mbox (оц)) $ predict the estimation of $ X_n $ according to formula 4 with $ i = 1 $: $ X _ (\ mbox (nоtsapriori)) = F_1X _ ((n-1 ) sc) $. This is an a priori estimate;
according to the results of the current observation $ z_n $, this a priori estimate is converted into a true one, that is, a posteriori;
this procedure is repeated at each step, starting from $ r + 1 $, where $ r $ is the filter order.

The final formula for recurrent filtering looks like this:

$ X _ ((n-1) оц) = X _ (\ mbox (nоtsapriori)) + (H ^ T_nH_n) ^ (- 1) h ^ T_0 (z_n - h_0 X _ (\ mbox (nоtsapriori))) $, (6 )

where for our second order filter:

A growing memory filter based on Formula 6 is a special case of a filtering algorithm known as the Kalman filter.

In the practical implementation of this formula, it must be remembered that the a priori estimate included in it is determined by formula 4, and the value $ h_0 X _ (\ mbox (notspriori)) $ is the first component of the vector $ X _ (\ mbox (notspriori)) $.

The growing memory filter has one important feature. If you look at formula 6, then the final score is the sum of the predicted score vector and the correcting term. This correction is large for small $ n $ and decreases with increasing $ n $, tending to zero as $ n \ rightarrow \ infty $. That is, with an increase in n, the smoothing properties of the filter grow and the model inherent in it begins to dominate. But the real signal can correspond to the model only in certain areas, therefore, the forecast accuracy deteriorates.

To combat this, starting from some $ n $, a prohibition is imposed on further reduction of the correction term. This is equivalent to changing the filter bandwidth, that is, for small n the filter is wider (less inertial), for large n it becomes more inertial.

Compare Figure 1 and Figure 2. In the first figure, the filter has a large memory, while it smooths well, but due to the narrow band, the estimated trajectory lags behind the real one. In the second figure, the filter memory is smaller, it smooths worse, but follows the real trajectory better.

Literature

YM Korshunov "Mathematical Foundations of Cybernetics"
A.V. Balakrishnan "Kalman filtration theory"
VNFomin "Recurrent estimation and adaptive filtering"
C.F.N. Cowen, P.M. Grant "Adaptive Filters"

Random Forest is one of my favorite data mining algorithms. Firstly, it is incredibly versatile; it can be used to solve both regression and classification problems. Search for anomalies and select predictors. Secondly, this is an algorithm that is really difficult to apply incorrectly. Simply because, unlike other algorithms, it has few configurable parameters. It is also surprisingly simple at its core. And at the same time, it is remarkable for its precision.

What is the idea behind such a wonderful algorithm? The idea is simple: let's say we have some very weak algorithm, say. If we make a lot of different models using this weak algorithm and average the result of their predictions, then the final result will be much better. This is the so-called ensemble training in action. The Random Forest algorithm is therefore called the "Random Forest", for the data obtained it creates many decision trees and then averages the result of their predictions. An important point here is the element of randomness in the creation of each tree. After all, it is clear that if we create many identical trees, then the result of their averaging will have the accuracy of one tree.

How does he work? Suppose we have some input data. Each column corresponds to some parameter, each row corresponds to some data element.

We can randomly select a certain number of columns and rows from the entire dataset and build a decision tree based on them.

Thursday, May 10, 2012

Thursday, January 12, 2012

That's all. The 17-hour flight is over, Russia is left overseas. And through the window of a cozy 2-bedroom apartment San Francisco, the famous Silicon Valley, California, USA is looking at us. Yes, this is the very reason why I practically did not write lately. We moved.

It all started back in April 2011 when I was doing a phone interview at Zynga. Then it all seemed like some kind of game that had nothing to do with reality, and I could not even imagine what it would result in. In June 2011, Zynga came to Moscow and conducted a series of interviews, about 60 candidates who passed telephone interviews were considered, and about 15 of them were selected (I don’t know the exact number, someone later changed his mind, someone immediately refused). The interview turned out to be surprisingly simple. No programming tasks, no tricky questions about the shape of the hatches, mostly the ability to chat was tested. And knowledge, in my opinion, was assessed only superficially.

And then the gimmick began. First, we waited for the results, then the offer, then the LCA approval, then the approval of the petition for a visa, then the documents from the USA, then the queue at the embassy, then an additional check, then the visa. At times it seemed to me that I was ready to drop everything and score. At times I doubted whether we need this America, after all, it’s not bad in Russia either. The whole process took about six months, as a result, in mid-December we received visas and began to prepare for departure.

Monday was my first day at work. The office has all the conditions for not only working, but also living. Breakfasts, lunches and dinners from our own chefs, a bunch of varied food crammed all over the place, a gym, massage and even a hairdresser. All this is completely free of charge for employees. Many people get to work by bike and there are several rooms for storing vehicles. In general, I have never come across anything like this in Russia. Everything, however, has its own price, we were immediately warned that we would have to work a lot. What is "a lot", by their standards, is not very clear to me.

Hopefully, however, despite the amount of work, I will be able to resume blogging for the foreseeable future and maybe tell you something about American life and work as a programmer in America. Wait and see. In the meantime, I congratulate everyone on the coming New Year and Christmas and see you soon!

For an example of use, we will print out the dividend yield of Russian companies. As a base price, we take the closing price of a share on the day of the register closing. For some reason, this information is not on the site of the troika, but it is much more interesting than the absolute values of dividends.
Attention! The code takes a long time to execute, because for each promotion, you need to make a request to the finam servers and get its value.

Result<- NULL for(i in (1:length(divs[,1]))){ d <- divs if (d$Divs>0) (try ((quotes<- getSymbols(d$Symbol, src="Finam", from="2010-01-01", auto.assign=FALSE) if (!is.nan(quotes)){ price <- Cl(quotes) if (length(price)>0) (dd<- d$Divs result <- rbind(result, data.frame(d$Symbol, d$Name, d$RegistryDate, as.numeric(dd)/as.numeric(price), stringsAsFactors=FALSE)) } } }, silent=TRUE) } } colnames(result) <- c("Symbol", "Name", "RegistryDate", "Divs") result

Similarly, you can build statistics for past years.

The Kalman filter is probably the most popular filtering algorithm used in many fields of science and technology. Due to its simplicity and efficiency, it can be found in GPS receivers, processors of sensor readings, in the implementation of control systems, etc.

There are a lot of articles and books about the Kalman filter on the Internet (mostly in English), but these articles have a fairly large entry threshold, there are many vague places, although in fact it is a very clear and transparent algorithm. I will try to tell you about it in simple language, with a gradual increase in complexity.

What is it for?

Any measuring device has some error, it can be influenced by a large number of external and internal influences, which leads to the fact that the information from it is noisy. The more noisy the data, the more difficult it is to process such information.

A filter is a data processing algorithm that removes noise and unnecessary information. In the Kalman filter, it is possible to set a priori information about the nature of the system, the relationship of variables and, on the basis of this, build a more accurate estimate, but even in the simplest case (without entering a priori information) it gives excellent results.

Let's consider the simplest example - suppose we need to control the fuel level in the tank. To do this, a capacitive sensor is installed in the tank, it is very easy to maintain, but it has some drawbacks - for example, dependence on the fuel being refueled (the dielectric constant of the fuel depends on many factors, for example, on temperature), a large influence of "bumpiness" in the tank. As a result, the information from it represents a typical "saw" with a decent amplitude. These types of sensors are often installed on heavy mining equipment (do not be confused by the volume of the tank):

Kalman filter

Let's digress a little and get acquainted with the algorithm itself. The Kalman filter uses a dynamic model of the system (for example, the physical law of motion), known control actions, and many successive measurements to form an optimal state estimate. The algorithm consists of two repeating phases: prediction and correction. At the first, the prediction of the state at the next moment in time is calculated (taking into account the inaccuracy of their measurement). On the second, the new information from the sensor corrects the predicted value (also taking into account the inaccuracy and noisiness of this information):

Equations are presented in matrix form, if you do not know linear algebra - that's okay, then there will be a simplified version without matrices for the case with one variable. In the case of one variable, the matrices degenerate into scalar values.

Let us first understand the notation: the subscript denotes the moment in time: k - current, (k-1) - previous, the minus sign in the superscript means that it is predicted intermediate value.

Descriptions of variables are presented in the following images:

It is possible to describe for a long time and tediously what all these mysterious transition matrices mean, but it is better, in my opinion, to try to apply the algorithm using a real example - so that abstract values would acquire real meaning.

Let's try it in action

Let's return to the example with the fuel level sensor, since the state of the system is represented by one variable (the volume of fuel in the tank), the matrices degenerate into the usual equations:

Defining a process model

In order to apply the filter, it is necessary to determine the matrices / values of the variables that determine the dynamics of the system and the dimensions F, B and H:

F- a variable describing the dynamics of the system, in the case of fuel - this can be a coefficient that determines the fuel consumption at idle during the sampling time (the time between algorithm steps). However, in addition to fuel consumption, there are also refueling ... therefore, for simplicity, we will take this variable equal to 1 (that is, we indicate that the predicted value will be equal to the previous state).

B- the variable determining the application of the control action. If we had additional information about the engine speed or the degree of pressing the accelerator pedal, then this parameter would determine how the fuel consumption will change during the sampling time. Since there are no control actions in our model (there is no information about them), we take B = 0.

H- a matrix that determines the relationship between measurements and the state of the system, for now, without explanation, we will take this variable also equal to 1.

Defining smoothing properties

R- measurement error can be determined by testing measuring instruments and determining the error of their measurement.

Q- Determination of process noise is a more difficult task, since it is required to determine the variance of the process, which is not always possible. In any case, you can choose this parameter to provide the required level of filtration.

Implementing in code

To dispel the remaining incomprehensibility, let's implement a simplified algorithm in C # (without matrices and control actions):

class KalmanFilterSimple1D
{
public double X0 (get; private set;) // predicted state
public double P0 (get; private set;) // predicted covariance

Public double F (get; private set;) // factor of real value to previous real value
public double Q (get; private set;) // measurement noise
public double H (get; private set;) // factor of measured value to real value
public double R (get; private set;) // environment noise

Public double State (get; private set;)
public double Covariance (get; private set;)

Public KalmanFilterSimple1D (double q, double r, double f = 1, double h = 1)
{
Q = q;
R = r;
F = f;
H = h;
}

Public void SetState (double state, double covariance)
{
State = state;
Covariance = covariance;
}

Public void Correct (double data)
{
// time update - prediction
X0 = F * State;
P0 = F * Covariance * F + Q;

// measurement update - correction
var K = H * P0 / (H * P0 * H + R);
State = X0 + K * (data - H * X0);
Covariance = (1 - K * H) * F;
}
}

// Application ...

Var fuelData = GetData ();
var filtered = new List ();

Var kalman = new KalmanFilterSimple1D (f: 1, h: 1, q: 2, r: 15); // set F, H, Q and R
kalman.SetState (fuelData, 0.1); // Set the initial values for State and Covariance
foreach (var d in fuelData)
{
kalman.Correct (d); // Apply the algorithm

Filtered.Add (kalman.State); // Save the current state
}

The result of filtering with these parameters is shown in the figure (to adjust the degree of smoothing - you can change the Q and R parameters):

The most interesting thing is left outside the scope of the article - applying the Kalman filter for several variables, setting the relationship between them and automatically displaying values for unobservable variables. I will try to continue the topic as soon as there is time.

I hope the description turned out to be not very tedious and complicated, if you have any questions or clarifications - welcome to the comments)

In the process of automation of technological processes to control mechanisms and units, one has to deal with measurements of various physical quantities. This can be the pressure and flow rate of a liquid or gas, speed, temperature, and much more. Measurement of physical quantities is carried out using analog sensors. An analog signal is a data signal in which each of the representing parameters is described by a function of time and a continuous set of possible values. From the continuity of the value space, it follows that any interference introduced into the signal is indistinguishable from the wanted signal. Therefore, the wrong value of the required physical quantity will be sent to the analog input of the control device. Therefore, it is necessary to filter the signal coming from the sensor.

One of the efficient filtering algorithms is the Kalman filter. Kalman filter is a recursive filter that estimates the state vector of a dynamical system using a series of incomplete and noisy measurements. The Kalman filter uses a dynamic model of the system (for example, the physical law of motion), control actions, and many sequential measurements to form an optimal state estimate. The algorithm consists of two repeating phases: prediction and correction. At the first stage, the prediction of the state at the next moment in time is calculated (taking into account the inaccuracy of their measurement). On the second, the new information from the sensor corrects the predicted value (also taking into account the inaccuracy and noise of this information).

At the prediction stage, the following happens:

System state prediction:

where is the prediction of the state of the system at the current time; - matrix of transition between states (dynamic model of the system); - prediction of the state of the system at the previous moment in time; - matrix of application of the control action; - control action at the previous moment in time.

Predicting error covariance:

where is the prediction of the error; - error at the previous moment in time; - process noise covariance.

At the stage of adjustment, the following occurs:

Calculation of Kalman gain:

where is the Kalman gain; - a matrix of measurements showing the relationship of measurements and states; - measurement noise covariance.

where is the measurement at the current time.

Covariance error update:

where is the identity matrix.

If the state of the system is described by one variable, then = 1, and the matrices degenerate into ordinary equations.

To clearly demonstrate the effectiveness of the Kalman filter, an experiment was carried out with the AVR PIC KY-037 volume sensor, which is connected to the Arduino Uno microcontroller. Figure 1 shows a graph of the sensor readings without a filter (line 1). Chaotic fluctuations in the sensor output indicate the presence of noise.

Figure 1. Graph of sensor readings without a filter

To apply the filter, it is necessary to define the values of the variables, and, which determine the dynamics of the system and dimensions. Let us take and equal to 1, and equal to 0, since there are no control actions in the system. To determine the smoothing properties of the filter, it is necessary to calculate the value of the variable, as well as to select the value of the parameter.

The variable will be calculated in Microsoft Excel 2010. To do this, it is necessary to calculate the standard deviation for the sample of the sensor readings. = 0.62. is selected depending on the required filtration level, we take = 0.001. In Figure 2, the second line shows the graph of the sensor readings with the filter applied.

Figure 2. Graph of sensor readings using Kalman filter

From the graph, we can conclude that the filter coped with the task of filtering interference, since in the steady state the fluctuations in the sensor readings that have passed the filtering are insignificant.

However, the Kalman filter has a significant drawback. If the measured value from the sensor can change rapidly, the filtered sensor reading will not change as quickly as the measured value. Figure 3 shows the response of the Kalman filter to a jump in the measured value.

Figure 3. Reaction of the Kalman filter to a jump in the measured value

The response of the filter to a jump in the measured value was found to be negligible. If the measured value changes significantly, and then does not return to the previous value, then the filtered sensor readings will correspond to the real value of the measured value only after a significant period of time, which is unacceptable for automatic control systems that require high performance.

From the experiment carried out, it can be concluded that the Kalman filter is advisable to use for filtering sensor readings in low-speed systems.

Bibliography:

GOST 17657-79. Data transfer. Terms and Definitions. - Moscow: Publishing house of standards, 2005. - 2 p.
Kalman filter // Wikipedia. ... Updated date: 26.04.2017. URL: http://ru.wikipedia.org/?oldid=85061599 (date accessed: 05/21/2017).