Basic algorithms for image processing. Image preprocessing algorithms Image preprocessing

Digital noise is an image defect that is randomly located areas that are close to the pixel size and differ from the original image in brightness or color. Noise reduction plays an important role in the transmission, processing and compression of video sequences and images.

Video noise can occur for several reasons:

1. Imperfect video capture equipment.

2. Poor shooting conditions - for example, night photo / video shooting, shooting in inclement weather.

3. Interference during transmission over analog channels - interference from sources of electromagnetic fields, intrinsic noise of active components (amplifiers) of the transmission line. An example would be a television signal.

4. Inaccuracies in filtering when extracting luminance and color-difference signals from an analog composite signal, etc.

The amount of noise in an image can range from an almost imperceptible speck in a digital photograph taken in good light to astronomical images, in which noise obscures much of the useful information that can only be obtained through laborious image processing.

There are different types of noise, depending on the nature of the random distribution of the noise in the image. In practice, the following types are most common:

White Gaussian Noise

One of the most common noise is the additive Gaussian noise, which is characterized by adding values ​​with a normal distribution and zero mean to each pixel in an image. The term "additive" means that this type of noise is added to the useful signal. Occurs when signal reception is poor.

Digital noise

The cause of digital noise is most often associated with the peculiarities of the equipment used for shooting - usually with insufficient light sensitivity of the matrix. This type of noise is characterized by replacing some of the pixels in the image with values ​​of a fixed or random value. If the brightness of the dots is approximately equal, digital noise is also called "impulse". If the intensity of the dots can vary from black to white, this noise is called salt and pepper noise.

Typically, this type of noise affects only a small number of pixels in an image.

Combined noise

Cases when the image in equal volume is noisy with Gaussian noise and random pulses are much less common. This set is called combined noise.

Image scan defects

Also, extraneous effects may appear on the image, such as cracks, scratches, bruises. These artifacts do not have a homogeneous structure, and the determination of their shape and location is largely beyond mathematical analysis. Defects of this kind can only be dealt with by manual image processing; therefore, they are not considered in this work.

Noise Removal Algorithms

There are a large number of algorithms for removing noise from images, and they can be used not only by special processing programs, but also by some photo and video cameras. Despite this, there is still no universal filtering algorithm, since during image processing it is invariably necessary to choose between the degree of elimination of unwanted effects and the preservation of small details that have characteristics similar to noise. In addition, an algorithm that easily copes with one type of noise can only spoil the image with another type of noise.

Let's consider several of the most well-known image noise suppression algorithms.

Linear pixel averaging

The simplest idea for noise removal is to average the pixel values ​​in a spatial neighborhood. Since the noise changes independently from pixel to pixel, the noise of adjacent pixels will cancel each other out when added. A rectangular window is specified, which is superimposed on each pixel of the image in turn. The value of the central pixel is calculated based on the analysis of all neighboring pixels that fall within the window area. Accordingly, the larger the window taken, the more averaged value will be obtained in the end, which leads to a strong blurring effect.

In the simplest version, the analysis of neighboring pixels is to find their arithmetic mean. To reduce the influence of pixels that do not belong to the same area as the one under consideration (for example, a dark outline on a light background), you can enter a certain numerical threshold and take into account when calculating only those neighbors whose difference from the central pixel does not exceed this threshold. The higher the threshold value, the stronger the averaging will take place. The considered option can be complicated by introducing weight coefficients for each neighboring pixel depending on their distance to the center of the considered area.

This method can also be applied in the time domain, averaging each pixel over adjacent frames of the video stream (each pixel will be averaged over pixels located at the same position in adjacent frames).

This algorithm is very simple, but it does not give a good result, at the same time it leads to a strong blurring of the image details.

Gaussian filter

It has a principle of operation similar to the previous method and also belongs to the number of smoothing filters. However, noise reduction using a linear averaging filter has a significant drawback: all neighbors of the processed pixel have the same effect on the result, regardless of their distance to it. The Gaussian filter also averages the central pixel and its neighbors in a certain area, only this happens according to a certain law, which is set by the Gaussian function.

Where parameter y sets the amount of blur and parameter A provides normalization. As a result, the central pixel of the considered area will have the highest value corresponding to the peak of the Gaussian distribution. The values ​​of the rest of the elements will have less and less influence as the distance from the center increases.

The matrix filter calculated according to the indicated formula is called the Gaussian; the larger its size, the stronger the blur (with a fixed y). Since this filter is separable, it can be represented as:

Hence it follows that convolution can be performed sequentially by rows and columns, which leads to a significant acceleration of the method for large filter sizes.

2Dcleaner algorithm

Replaces each pixel in the image with the average of the neighboring pixels taken in an area bounded by a certain radius. In this case, not all points within the radius are considered, but only those whose value differs from the central pixel by no more than some predetermined value (threshold). This causes uniformly colored areas to be blurred more than sharp edges of objects. This reduces low-level noise in the image while keeping fine details intact.

Median filtering

Linear algorithms turn out to be very effective in suppressing Gaussian noise, when neighboring pixels, although they have a certain random spread of values, still remain within a certain average value characteristic of the region to which they belong. However, sometimes you have to deal with images distorted by other types of interference. An example of such interference is impulse noise, which manifests itself in the presence of randomly scattered points of random brightness in the image. Averaging in this case “smears” each such point into neighboring pixels, leading to a deterioration in the image quality.

The standard way to suppress impulse noise is median filtering. This non-linear image processing technique eliminates outliers, but, unlike linear averaging algorithms, leaves monotonic pixel sequences unchanged. Due to this, median filters are able to maintain without distortion the contours of objects and the differences between areas of different brightness, while effectively suppressing uncorrelated noise and small-sized details.

Filtering principle: A certain window of odd size is set, which is sequentially superimposed on each pixel of the image. Among all pixels within the considered area, including the central one, the median value is searched for, which is eventually assigned to the central pixel of the area. In this case, the median means the median element of the array of sorted pixel values ​​belonging to the region. The odd window size is chosen precisely to ensure the existence of the median pixel.

It is possible to use a median filter to suppress white Gaussian noise in the image. However, the study of noise suppression using median filtering shows that its effectiveness in solving this problem is lower than that of linear filtering.

Median filtering is not without a disadvantage inherent in most noise-canceling filters - increasing the size of the mask to improve the degree of noise suppression leads to a decrease in the clarity of the image and blurring of its contours. However, it is possible to minimize negative effects by applying median filtering with a dynamic mask size (additive median filtering). Its principle remains the same, only the size of the sliding filtering window can change depending on the brightness of neighboring pixels.

Sharpening an image

Almost all algorithms for suppressing noise in the image lead to its blurring, as a result, small details are lost, and the perception of the image is difficult. The image sharpening filter can partially compensate for this negative effect and restore the lost contour contrast and color transitions. Sharpness can also depend on many other factors - on the quality of the lens, on the aperture used, on the thickness of the anti-moire filter found on the matrix of most digital cameras, blurring the image to varying degrees. Also, the sharpness of images often needs to be increased after reducing their size, because this inevitably loses some of the information and with it the clarity of the contours.

Unsharp masking is a technique that allows by enhancing the contrast of transitions between tones of an image to improve its visual perception due to the illusion of sharpening. In fact, the sharpness remains at the same level, because, in principle, it is impossible to restore the lost image details, but improving the contrast between areas of different brightness leads to the fact that the image is perceived as clearer.

Figure 5.1 - Illustration of the concept of "contour sharpness"

The sharpness of the image depends on the magnitude of the difference in brightness between the areas (W) that form its contours, and on the sharpness of the change in this difference (H).

The unsharp masking technique was first used for processing film photographs. The method adapted to digital image processing differs little from the original one: the so-called “unsharp mask” is subtracted from the image - its blurred and inverted copy. The result is a new image containing only the light outlines of the original. Dark outlines can be obtained by simply inverting the result.

If in the future you subtract dark outlines from the original image and add light ones, you get a significant increase in contrast at each difference in brightness.

You can use any of the noise canceling filters, such as the Gaussian filter, to blur the original in order to obtain a “unsharp mask”.

Figure 5.2 - Result of applying unsharp masking

The convolution operation is quite often used in image processing. In addition to sharpening, it is used for blurring, increasing brightness, lightening, etc.

Image convolution is the operation of calculating a new value of a given pixel, which takes into account the values ​​of the surrounding neighboring pixels. In a general sense, this term means some action that is performed on each part of the image.

The main element of the convolution is the convolution mask - it is a matrix (of arbitrary size and aspect ratio). This mask is often referred to as a filter, kernel, pattern, or window. The values ​​of the matrix elements are usually called coefficients.

Most often, a square matrix is ​​used as a convolution kernel.

The processing of the image by the convolution operation is as follows: The central element of the matrix, called the “anchor”, is sequentially superimposed on each pixel of the image. The new value of the considered pixel is calculated as the sum of the values ​​of neighboring pixels, multiplied by the corresponding coefficients of the convolution mask.

The resulting effect depends on the selected convolution kernel.

The core of the contrast-enhancing filter has a value greater than 1 at the point (0, 0), with the total sum of all values ​​equal to 1. For example, the contrast-enhancing filter is filters with kernels specified by matrices:

The effect of enhancing the contrast is achieved by the fact that the filter emphasizes the difference between the intensities of adjacent pixels, removing these intensities from each other. This effect will be the stronger, the larger the value of the central term of the nucleus.

Convolution-based linear contrast-up filtering can cause visible color halos around the edges of the image.

Illumination Difference Compensation

Image illumination problems most often occur when windows, the sun or other unregulated light sources enter the frame.

This situation is called "excess light" and leads to the fact that, due to too bright buttress lighting, the details and color of objects located against the background of overly bright objects are lost, becoming difficult to distinguish.

The situation of lack of light is also common. It can be caused by shooting in dark rooms with poor lighting, as well as the limited range of sensitivity of video equipment.

Single Scale Retinex Algorithm

When you try to lighten an image by increasing the brightness of each pixel by a certain fixed value, the initially bright areas may be completely blown out.

In such cases, it is required to apply “smart” color correction, which would be able to equalize the lighting in the image, processing light areas to a lesser extent than dark ones.

These requirements are met by the Single Scale Retinex algorithm, based on the principles of retinal receptor design. The main goal of the algorithm is to split the image into components that are responsible for lighting and detail separately. Since the problems in the image are related to the lighting of the scene, then, having received a component responsible for lighting, it becomes possible to transform it separately from the image, thereby significantly increasing its quality.

Any image can be represented as a product of a high-frequency signal (reflection - R) and a low-frequency signal (illumination - I).

S (x, y) = I (x, y) * R (x, y) (5.6)


Figure 5.3 - Representation of the image in the Retinex algorithm.

An approximate image of the lighting can be obtained using low-pass filtering - in other words, simply blurring the original image, for example, with a Gaussian filter.

where G - Gaussian filter

Since the logarithm of the signal does not change the frequency, and due to the properties of the logarithmic function (the logarithm of the product is equal to the sum of the logarithms of the factors), the problem of dividing the product of signals can be simplified to the problem of dividing the sum of signals.

After that, it remains only to take an exponent from the received signal in order to return it to the original amplitude scale. The resulting high-frequency component can be added to the blurred and lightened original image, which acts as a new light model.

The effect obtained from equalizing the illumination may be too strong (the dark areas will become the same in brightness as the light ones). To reduce the effect, you can simply blend the processed image with the original in a certain proportion.

Gamma correction

The original purpose of gamma correction is to compensate for differences in displayed colors on different output devices so that the image looks the same when viewed on different monitors. Due to the non-linear appearance of the applied power function, gamma correction also allows you to increase the contrast of dark areas of the image, without overexposing bright details and without losing the distinguishability of the edges of objects in the image.

Information about brightness in analog form in television, as well as in digital form in most common graphic formats, is stored in a non-linear scale. The brightness of a pixel on the monitor screen can be considered proportional

where I is the brightness of the pixel on the display screen (or the brightness of the color components, red, green and blue separately),

V is the numerical value of the color from 0 to 1, and

d - indicator of gamma correction.

If r is less than 1, then the transmission characteristics of the levels will be convex and the resulting image will be lighter than the original. If r is greater than 1, then the characteristics of the transfer of levels will be concave and the resulting image will be darker than the original.

By default, the r parameter is equal to 1, which corresponds to the linear characteristic of the transmission of levels and the absence of gamma correction.

Selecting the contours of an image

Contour analysis can be used to describe, recognize, compare and search for graphical objects represented as outlines. Since the use of contours excludes the internal points of the object from consideration, this can significantly reduce the computational and algorithmic complexity of these operations.

Figure 5.4 - Change in the form of the power function depending on the parameter r

An object outline is a list of points that represent a certain curve in the image that separates the object from the background. Most often, a jump in brightness or color is observed along the contour.

To simplify the search for contours in the image, you can pre-binarize it.

The Sobel filter selects the boundaries of objects based on their brightness. Since the color component is not taken into account, the images must first be converted to grayscale.

The Sobel filter is applied sequentially to each pixel, calculating the approximate value of its brightness gradient. The gradient for each point of the image (brightness function) is a two-dimensional vector whose components are the horizontal and vertical derivatives of the image brightness.

At each point of the image, the gradient vector is oriented in the direction of the greatest increase in brightness, and its length corresponds to the magnitude of the change in brightness. These data allow us to make an assumption about the probability of finding the point under consideration on the boundary of a certain object, as well as about the orientation of this boundary.

That. the result of the operation of the Sobel operator at a point in the region of constant brightness will be a zero vector, and at a point lying on the border of regions of different brightness - a vector crossing the border in the direction of increasing brightness.

To calculate the approximate values ​​of the derivatives at each point in the image, the Sobel filter uses convolution with a 3 × 3 matrix.

Sobel matrix coefficients:

The final value of the gradient is calculated by approximation by the formula:

| G | = | Gx | + | Gy |

Kenny Border Detector

Although Kenny's work was carried out in the early days of computer vision (1986), the Kenny boundary detector is still one of the best detectors. Kenny's method is a multi-stage algorithm, and includes the following steps:

1. Cleaning the image from noise and unnecessary details.

2. Cleaning the image from noise and unnecessary details.

3. Search for image gradients, for example, by the Sobel operator.

4. Suppression of non-maxima. Only local highs are marked as boundaries.

5. Double threshold filtering. Potential boundaries are defined by thresholds.

6. Trace paths (Link edges to paths)

Since the slightest noise in the image can violate the integrity of its contours, it is recommended to filter the image by any noise reduction method before starting the search. Due to the high speed of operation and ease of implementation, the Gaussian filter is most often used. The edges of an image can be in different directions, so Kenny's algorithm uses four filters to detect horizontal, vertical, and diagonal edges. Using the boundary detection operator (for example, the Sobel operator), the value for the first derivative in the horizontal direction (Gy) and vertical direction (Gx) is obtained. From this gradient you can get the angle of the border direction:

The border direction angle is rounded to one of four corners representing vertical, horizontal, and two diagonals (for example, 0, 45, 90, and 135 degrees). Only those pixels in which the local gradient maximum in the direction of the gradient vector is reached are declared as boundaries. The direction value must be a multiple of 45 °. After suppressing non-maxima, the edges become more precise and thinner.

At the next step, by threshold filtering for each considered pixel, it is determined whether it belongs to the image boundaries. The higher the threshold, the more uniform the found contours will be, however, weak edges can be ignored. On the other hand, decreasing the threshold increases the algorithm's susceptibility to noise. Kenny's boundary selection uses two filtering thresholds: if the pixel value is higher than the upper boundary, it takes the maximum value (the boundary is considered valid), if lower, the pixel is suppressed, points with a value falling within the range between the thresholds take a fixed average value (they will be refined by next step).

The last step in image processing is to tie the individual edges into uniform contours. The pixels that received the average in the previous step are either suppressed (if they do not touch any of the already detected edges) or are attached to the corresponding contour.

Segmentation

Most of the images obtained from photographic and video equipment are raster, that is, they consist of colored dots arranged in a rectangular grid. However, people perceive the world around them as a collection of whole objects, and not a matrix of points. The human brain is able to combine disparate details of an image into homogeneous areas, subconsciously dividing it into objects. This process is called segmentation, and can be implemented programmatically when solving the problem of computer image analysis and pattern recognition. Segmentation is performed in the early stages of the analysis, and the quality of its execution can have a strong impact on its speed and accuracy.

Segmentation methods can be divided into two classes: automatic - requiring no user interaction and interactive - using user input directly in the process.

In the first case, no a priori information about the properties of the regions is used, but some conditions are imposed on the partitioning of the image itself (for example, all regions must be uniform in color and texture). Since this formulation of the segmentation problem does not use a priori information about the depicted objects, the methods of this group are universal and applicable to any images.

For a rough assessment of the quality of a method in a specific problem, several properties are usually recorded that a good segmentation should have:

§ uniformity of regions (uniformity of color or texture);

§ dissimilarity of neighboring regions;

§ smoothness of the region border;

§ a small number of small "holes" within the regions;

Threshold segmentation

Thresholding is the simplest method focused on image processing, individual homogeneous areas of which differ in average brightness. However, if the image is unevenly illuminated, some of the objects may coincide in intensity with the background, which will make threshold segmentation ineffective.

The simplest and at the same time often used type of threshold segmentation is binary segmentation, when only two types of homogeneous areas are distinguished in the image.

In this case, the transformation of each point of the source image into the output is performed according to the rule:

where x0 is the only processing parameter called threshold. The output brightness levels y0 and y1 can be arbitrary, they only serve as labels, with the help of which the resulting map is marked - assigning its points to the classes K1 or K2, respectively. If the resulting product is being prepared for visual perception, then often their values ​​correspond to the levels of black and white. If there are more than two classes, then during thresholding, a family of thresholds must be set that separates the brightness of different classes from each other.

Threshold segmentation is well suited for selecting a small number of non-intersecting objects in an image that have a homogeneous structure and stand out sharply from the background. With an increase in the degree of heterogeneity of the image, and hence the number of segments and their complexity, this type of segmentation becomes ineffective.

Segmentation based on graph splitting

Graph theory methods are one of the most actively developing areas in image segmentation.

The general idea of ​​the methods of this group is as follows. The image is presented in the form of a weighted graph, with vertices at the points of the image. The weight of an edge of a graph reflects the similarity of points in a certain sense (the distance between points in a certain metric). Image splitting is modeled by graph cuts.

Usually, in the methods of graph theory, the functional of the "cost" of the cut is introduced, which reflects the quality of the obtained segmentation. So the problem of dividing the image into homogeneous areas is reduced to the optimization problem of finding a cut of the minimum cost on the graph. This approach allows, in addition to the uniformity of the color and texture of the segments, to control the shape of the segments, their size, the complexity of the boundaries, etc.

To find the cut of the minimum cost, various methods are used: greedy algorithms (at each step such an edge is selected so that the total cost of the cut is minimal), dynamic programming methods (it is guaranteed that by choosing the optimal edge at each step, we will end up with the optimal path), the algorithm Dijkstra, etc.

Interpolation

In computer graphics, the interpolation method is often used in the process of changing the scale of images. By changing the number of pixels in the image, interpolation helps to avoid excessive pixelation of the picture when it is enlarged or the loss of important details when it is reduced.

During the interpolation process, additional points are inserted between the pixels of the image, the estimated tone and color of which are calculated using a special algorithm based on the analysis of the available data on the neighboring areas. Unfortunately, since any interpolation is just an approximation, the image will invariably lose quality whenever it is interpolated.

Nearest Neighbor Interpolation

This algorithm is the simplest kind of interpolation, simply increasing each pixel of the image to the required scale. Requires the least processing time but produces the worst results.

Bilinear interpolation

This type of interpolation is performed for each coordinate of the two-dimensional grid. In this case, the image is considered as a surface, the color is the third dimension. If the image is in color, then interpolation is carried out separately for three colors. For each unknown point in the new image, bilinear interpolation considers the square of four surrounding known pixels. The weighted average of these four pixels is used as the interpolated value. As a result, the images look significantly smoother than the result of the nearest neighbor method.

Bilinear interpolation works well at large integer scaling factors, but it blurs the sharp edges of the image quite a lot.

Bicubic interpolation goes one step further than bilinear interpolation, considering an array of 4x4 surrounding pixels - only 16. Since they are at different distances from the unknown pixel, the nearest pixels are weighted more in the calculation. Bicubic interpolation produces significantly sharper images than the previous two methods, and is arguably optimal in terms of processing time and output quality. For this reason, it has become standard for many image editing programs (including Adobe Photoshop), printer drivers, and built-in camera interpolation.

The scaled image may become much less sharp. Interpolation algorithms that retain sharpness better are also more susceptible to moiré, while those that eliminate moiré tend to produce a softer result. Unfortunately, this scaling tradeoff cannot be avoided.

One of the best ways to combat this is to apply an unsharp mask immediately after scaling, even if the original has already been sharpened.

5.2 Rationale for the choice of algorithms used in the subsystem

The main requirement for the developed software package was to minimize the playback delay of a video stream during its preprocessing on a computing cluster. In addition, shooting can take place in any conditions, which means that in a short time it was necessary to implement a large number of simple filters to neutralize various negative effects. In addition, it was necessary in a short time to study a large number of negative factors that appear on the video and implement simple filters to neutralize them. Algorithms that satisfy the presented requirements should be readily available, well optimized, highly reliable and, at the same time, easy to implement. The functions of the OpenCV library have such properties, therefore, when choosing specific methods for implementing filters for processing a video stream, priority was given to the algorithms contained in this library in one form or another.

All the algorithms considered in the theoretical part of the final qualifying work were implemented in a test form in order to compare their characteristics in practice. In particular, preference was given to a compromise between the processing speed of a video stream frame and the quality of the result obtained.

As a result, the following algorithms were chosen to implement filters for processing a video stream on a computing cluster:

1. To remove “additive white” noise, the Gaussian algorithm was chosen. As the most common noise reduction method, it is very well optimized and therefore has a high operating speed.

2. To remove “additive white” noise, the Gaussian algorithm was chosen. As the most common noise reduction method, it is very well optimized and therefore has a high operating speed.

3. To remove “impulse” noise, median filtering was chosen. This method is also well optimized and has been specifically designed to eliminate impulse and salt-and-pepper noise.

4. Convolution was chosen to increase the sharpness of the image, since it works much faster than unsharp masking, at the same time giving acceptable results.

5. The OpenCV library does not contain color correction algorithms - therefore it was decided to implement the most widespread and well-documented Single Scale Retinex algorithm. This method is very efficient, but requires optimization to speed up the work speed.

6. The Kenny's algorithm was chosen as a method for extracting the contours, since it gives better results than the Sobel filter.

7. The pyramidal segmentation algorithm presented in the OpenCV library works extremely slowly, so it was decided to use the previously discussed segmentation algorithm on graphs.

8. interpolation - the bicubic interpolation method was chosen as the most reasonable compromise between the speed of work and the quality of the result.

Installation and configuration of the software used.

The compute cluster used was running GNU Linux (Ubuntu)

After installing the operating system, you need to install several libraries that support reading and writing image files, drawing on the screen, working with video, etc.

Installing CMake

The project is built using CMake (version 2.6 or higher is required). You can install it with the command:

apt-get install cmake

You may also need the following libraries:

build-essential libjpeg62-dev libtiff4-dev libjasper-dev libopenexr-dev libtbb-dev libeigen2-dev libfaac-dev libopencore-amrnb-dev libopencore-amrwb-dev libtheora-dev libvorbis-dev libxvidcore-dev

Installing ffmpeg

For opencv to process video files correctly, the ffmpeg library must be installed. This is done with the following commands:

1) Downloading the source codes of the library

wget http://ffmpeg.org/releases/ffmpeg-0.7-rc1.tar.gz

2) Unpacking the archive with source codes

tar -xvzf ffmpeg-0.7-rc1.tar.gz

3) Library configuration

configure --enable-gpl --enable-version3 --enable-nonfree --enable-postproc

Enable-libfaac --enable-libopencore-amrnb --enable-libopencore-amrwb

Enable-libtheora --enable-libvorbis --enable-libxvid --enable-x11grab

Enable-swscale --enable-shared

4) Building and installing the library

installing GTK

Displaying OpenCV windows requires GTK + 2.x or higher library installed, including header files (libgtk2.0-dev)

apt-get install libgtk2.0-dev

Installing Opencv

After installing all related libraries, the opencv2.2 installation is performed with the following commands:

1) Downloading the source codes of the OpenCV library

http://downloads.sourceforge.net/project/opencvlibrary/opencv-unix/2.2/OpenCV-2.2.0.tar.bz2

2) Unpacking the archive with source codes

tar -xvf OpenCV-2.2.0.tar.bz2

3) generating Makefile using CMake.

4) building and installing the OpenCV library

5) You may also need to register the path to the libraries

export LD_LIBRARY_PATH = / usr / local / lib: $ LD_LIBRARY_PATH

Installation and compilation of the developed software package

It is necessary to copy the source codes of the programs from the disc attached to this explanatory note. Copy the build_all.sh batch file to the same folder and then run it. If the gcc compiler is installed on the system, the build will happen automatically.

DIGITAL TREATMENT SIGNALS

Topic 17. IMAGE PROCESSING

There is nothing, no matter what the imagination of a person dares.

Titus Lucretius. Roman philosopher and poet. 1st century BC NS.

Imagination is a good thing. But getting the bum out of the basement, laundering, turning into Apollo, packing in a matchbox and e-mailing a friend to a friend, a good graphics program will do better.

Anatoly Pyshmintsev, Novosibirsk geophysicist of the Ural School. XX century

Introduction.

1. Basic concepts. Graphic representation of images. Representation of color in computer graphics. RGB color model. CIE XYZ color system.

2. Geometric transformations of raster images. Areas and stages of transformation. Sampling. Interpolation series of reconstruction of a two-dimensional signal. Frequency distortion of images and their elimination. Resampling images.

3. Filtering images. Linear filters. Smoothing filters. Contrast Rising Filters. Difference filters. Two-dimensional cyclic convolution. Non-linear filters. Threshold filtering. Median filtering. Extremum filters.

4. Compression of images. Repetition Length (RLE) coding algorithms. Dictionary algorithms. Statistical coding algorithms. Lossy image compression. Image loss estimation. Fourier transform. Wavelet transform.

INTRODUCTION

The scope of research in the field of digital imaging is growing rapidly. This is because image processing is multidimensional signal processing, and most signals in the real world are multidimensional.


An image in mathematical representation is a two-dimensional signal that carries a huge amount of information. A color image of 500 × 500 elements is an array of several hundred thousand bytes. Such information can be processed only by a rational organization of calculations. For specific image processing tasks, efficient processing techniques can be applied, taking into account the characteristics and limitations of that particular task. But if we talk about image processing for solving a wide class of problems, then it is necessary to single out a set of standard operations, from which it is possible to build algorithms for solving arbitrary problems. These include linear transformations, 2D convolution, and 2D discrete Fourier transforms.

However, nonlinear transformations are also widely used in image processing. The peculiarity of the images is that the individual elements of the image are in a certain connection with the neighboring elements. Therefore, most image conversion algorithms are local in nature, that is, they process images by groups of elements located in a neighborhood around a given one. Linear transformations satisfy the locality property and allow the construction of algorithms, the computational complexity of which depends little on the size of the surrounding neighborhood. The same properties are required from nonlinear transformations of images. The class of such transformations includes algorithms that are called rank filtering algorithms based on the calculation of local rank statistics of images. When calculating rank statistics and their derivatives, simplifications associated with information redundancy of images are possible. The most famous algorithm of this class is the median filtering algorithm. Other examples of rank algorithms are extreme filtering algorithms that replace the analyzed image element with a maximum or minimum in a neighborhood. Another property of rank algorithms is local adaptation to the characteristics of the processed image and the potential for their use not only for smoothing and cleaning from noise, but also for feature extraction during automatic image recognition.

In image processing, methods for processing one-dimensional signals are widely used, if it is possible to generalize them to multidimensional signals. At the same time, one has to take into account that mathematical methods for describing multidimensional systems are not complete. Multidimensional systems have a large number of degrees of freedom, and their design acquires flexibility that is not inherent in one-dimensional systems. At the same time, multidimensional polynomials do not decompose into prime factors, which complicates the analysis and synthesis of multidimensional systems.

17.1. Basic concepts

Graphic representation of images. To represent graphic information on a two-dimensional plane (monitor screen), two approaches are used: raster and vector.

In the vector approach, graphic information is described as a collection of abstract geometric objects - lines, segments, curves, rectangles, etc. Vector description assumes a priori knowledge of the structure of the image.

Raster graphics operate with arbitrary images in the form of rasters. A raster is a description of an image on a plane by dividing (discretizing) it into identical elements on a regular grid and assigning its own color and any other attributes to each element. The simplest raster is rectangular, the most economical in terms of the number of samples for image transmission is hexagonal. From a mathematical point of view, a raster is a piecewise constant approximation on the plane of a continuous image function.

A raster element is called a pixel. Pixel standard identification:


f (i, j) = (A (i, j), C (i, j)), (17.1.1)

where A (i, j) Ì R2 - pixel area, C (i, j) Î C - pixel attribute (usually color). Two types of attributes are most commonly used:

C (i, j) = I (i, j) - pixel intensity (brightness);

C (i, j) = (R (i, j), G (i, j), B (i, j)) - color attributes in the RGB color model.

In matrix form:

Mij ​​= (Aij, Cij).

When discretizing continuous images, the values ​​of Aij can be defined in two ways, either as the values ​​of the points Aij = (i, j) for which the attributes Cij are defined, or as the values ​​of the squares Aij = (i, i + 1) × (j, j + 1) or any other form, with the determination of Cij by the mean values ​​within this form (Fig. 17.1.1).

In practice, as a rule, X and Y are limited sets of non-negative integers of a square or rectangular raster with an aspect ratio (aspect ratio) of width to height of the raster, which is written in the form, for example, "4: 3".

Representation of color in computer graphics. The concept of color is based on the perception by human eyes of electromagnetic waves in a certain frequency range. The daylight we perceive has wavelengths λ from 400 nm (violet) to 700 nm (red). The description of the luminous flux can be its spectral function I (λ). Light is called monochromatic if its spectrum has only one specific wavelength.

There are two types of receptors on the retina: rods and cones. The spectral sensitivity of rods (Fig. 17.1.2) is directly proportional to the brightness of the incident light. Cones are divided into three types, each of which has a certain sensitivity in limited ranges with maxima for red, green and blue colors, and sharply lose their sensitivity in the dark. The eye's susceptibility to blue is significantly lower than to the other two. An important property of human perception of light is linearity when colors with different wavelengths are combined.

RGB color model (Red, Green, Blue - red, green, blue) in computer graphics is currently the most common. In this model, the spectral function is represented as the sum of the sensitivity curves for each type of cones with non-negative weight coefficients (normalized from 0 to 1), which are denoted as R, G, and B. The model is characterized by the additivity property for obtaining new colors. For example, the encoding of spectral functions:

Black: fblack = 0, (R, G, B) = (0,0,0);

Purple fviolet = fred + fblue, (R, G, B) = (1,0,1);

White fwhite = fred + fgreen + fblue, (R, G, B) = (1,1,1).

The three-dimensional color space of the RGB model is shown in Fig. 17.1.3. Due to the peculiarities of light perception by receptors, not all colors visible to humans are representable in this model. However, the share of reproducible colors is much higher than the share that is not represented in this model.

CIE XYZ color system. The international standard for color representation CIE (CIE - Commission Internationale de l "Eclairage) was adopted in 1931 by the International Commission on Illumination, It defines three basis functions ρX (λ), ρY (λ), ρZ (λ), depending on the wavelength linear combinations of which with non-negative coefficients (X, Y and Z) produce all human-visible colors. These functions take into account the relative perception of light intensity by the receptors of the eye. In three-dimensional space, the CIE color system forms a cone in the first quadrant and is used for high-quality display of color images.

17.2. Geometric transformations of bitmaps

Areas and stages of transformation. Images can be divided into texture and detail. In texture images, all samples (elements) carry information (the image on the TV screen). A detailed image is an image on which you can highlight interfering objects, the background, and useful objects.

There are three main groups of algorithms for image processing on computers:

1. Primary (preliminary) image processing for the purpose of restoration, cleaning from random noise, improving quality, correcting geometric distortions of optical systems (defocusing, aberrations, etc.).

2. Description of images, pattern recognition. It is performed to determine the parameters of image details and includes: finding areas of an image that are uniform in terms of illumination level and color, highlighting image shape features, determining the coordinates of special points of objects, etc.

3. Efficient coding to reduce volume in transmission and storage.

Most of the preprocessing methods are based on the use of linear space invariant (LPI) filters. Linear algorithms are performed using two-dimensional analogs of one-dimensional FIR and IIR filters. They can be used, for example, when implementing filters to reduce noise in images.

FIR filters are implemented using the convolution method. The advantage of 2D FIR filters is clarity, simplicity, and absolute stability. IIR filters are implemented using difference equations and z-transforms. They are faster than FIR filters, but can be unstable. The synthesis of two-dimensional IIR filters differs from the synthesis of one-dimensional filters, since for a two-dimensional function it is not possible to select the poles in an explicit form.

Non-linear methods may also be required to restore images and improve their quality. So, for example, in order to suppress noise and at the same time preserve the contour part of the images, it is necessary to use nonlinear or linear spatially noninvariant (LPNI) filters, which are implemented by rank algorithms. All rank nonlinear filters are based on fast algorithms for calculating local histograms.

One of these methods is median filtering. The use of median filters is effective for suppressing some types of noise and periodic interference without simultaneously distorting the signal, for example, to suppress bursts of noise spikes, including line dropouts. The method can also be used to solve problems related to recognition, for example, to highlight thin lines and small isolated objects.

Algorithms for describing images and pattern recognition are usually non-linear and heuristic in nature. The features of objects are usually the area of ​​the object image, the perimeter of the image contour, the ratio of the area to the square of the image perimeter. The shape of the object can be characterized by the radius of a circle inscribed in the image or described around the image of the object, the length of the minimum and maximum radius-vector from the “center of mass” of the image.

Sampling. Transformations of images in a computer and storage of processed data are performed in a discrete form. Sampling is used to obtain a discrete representation from continuous analog images of the real world. In practice, it is carried out by input devices (digital camera, scanner, or others). For visual perception of processed images on output devices (display, plotter, etc.), an analog image is reconstructed according to its discretized representation.

In the simplest case of black and white images, we have a two-dimensional array sa (x, y). For color images in the RGB model, taking into account the additivity property when adding colors, each layer of R, G and B can also be considered and processed as a two-dimensional array, with the subsequent summation of the results.

Of the methods of generalizing one-dimensional periodic sampling to a two-dimensional case, the simplest is periodic sampling in rectangular coordinates:

s (n, m) = sa (nDx, mDy),

where Dx and Dy are horizontal and vertical sampling intervals of a two-dimensional continuous signal sa (x, y) with continuous x and y coordinates. Below the values ​​of Dx and Dy, as in the one-dimensional case, are taken equal to 1.

Sampling of a two-dimensional signal also leads to periodization of its spectrum and vice versa. The condition of information equivalence of the coordinate and frequency representations of a discrete signal is also preserved with an equal number of sampling points in the main signal ranges. For rectangular sampling, the direct and inverse Fourier transforms are determined by the expressions:

S (k, l) = s (n, m) exp (-jn2pk / N-jm2pl / M), (17.2.1)

S (k, l) = exp (-jn2pk / N) s (n, m) exp (-jm2pl / M), (17.2.1 ")

s (n, m) = S (k, l) exp (-jn2pk / N-jm2pl / M). (17.2.2)

s (n, m) = exp (-jn2pk / N) S (k, l) exp (-jm2pl / M). (17.2.2 ")

Rice. 17.2.1. Spectrum periodization.

These expressions show that a two-dimensional DFT over a rectangular data sampling raster can be computed using one-dimensional sequential DFTs. The second sums of expressions (17.2.1 ") and (17.2.2") are one-dimensional DFTs of the sections of the functions s (n, m) and S (k, l) along the lines n and k, respectively, and the first are one-dimensional DFTs of the calculated functions in the sections by m and l. In other words, the initial matrices of values ​​s (n, m) and S (k, l) are recalculated first into intermediate matrices with DFT by rows (or by columns), and intermediate matrices - into final matrices with DFT by columns (or, respectively, by rows).

In order for the periodic repetition of the spectrum (Fig. 17.2.1), caused by the sampling of the analog signal with the frequency Fx = 1 / Dx and Fy = 1 / Dy, does not change the spectrum in the main frequency range (with respect to the spectrum of the original analog signal), it is necessary and it is sufficient that the maximum frequency components fmax in the spectrum of the analog signal, both in rows and columns, do not exceed the Nyquist frequency (fmax. x £ fN = Fx / 2, fmax. y £ fM = Fy / 2). This means that the sampling frequency of the signal must be at least twice the maximum frequency component in the signal spectrum:

Fx ³ 2fmax. x, Fy ³ 2fmax. y, (17.2.3)

which ensures that the spectral functions reach zero values ​​at the ends of the main spectrum range.

Interpolation series of reconstruction of a two-dimensional signal. If the continuous signal sa (x, y) is a spectrum-limited signal, and the sampling periods are chosen small enough and the spectra of adjacent periods do not overlap:

Sa (Wx, Wy) = 0 for | Wx | p / Dx, | Wy | p / Dx,

then, as in the one-dimensional case, the signal sa (x, y) can be reconstructed from a discrete signal using a two-dimensional analogue of the Kotelnikov-Shannon series:

sa (x, y) = Sn Sm s (n, m) . (17.2.4)

Frequency distortion of images and their elimination. An unlimited spectrum signal can also be sampled, but in this case there is aliasing in adjacent periods, while high frequencies, higher than the Nyquist frequencies, will be "masked", as in the one-dimensional case, under the low frequencies of the main period. The effect of "reflection" from the boundaries of the period gives an even more complex picture due to the interference of frequencies reflected in different coordinates. A similar effect, known as aliasing, will also occur when images are undersampled. This effect can be especially clearly observed on sharp contrasting changes in brightness.

To combat such phenomena, prefiltration (antialiasing) is used - a preliminary convolution of an analog image with a filter weighting function that cuts off high-frequency components that can lead to aliasing. In the two-dimensional case, filtering is described as follows:

z (x, y) = h (x ", y") ③③ s (x-x ", y-y"). (17.2.5)

It should be noted that analog images exist only in the optical range, for example, in the form of light display on a screen, photographic paper or photographic film, but cannot exist in computer memory. Therefore, the physical performance of pre-filtering is possible only when registering an image by defocusing it, which, as a rule, is not applied. Primary information should always be recorded with maximum completeness and accuracy, and cleaning the primary information from unnecessary details and redundancy is a matter of subsequent data processing. Therefore, in relation to Equation 17.2.5, two-dimensional prefiltering, in its practical implementation, can only be filtering of images sampled with a large margin over the main frequency range (with excessive resolution), and is used, as a rule, when oversampling to a larger step. for example, when compressing images. Pre-filtering can also be built into imaging algorithms.

In fig. 3 and below, Table 17.2.1 shows examples of the most common one-dimensional anti-aliasing filters. They can be performed in the form of analog filters, and can be used, for example, when transmitting television lines of images in analog form over radio channels (horizontal antialiasing). In principle, a similar operation can be performed on columns (double - image), and after the image is summed up, a full anti-aliasing operation will be performed, but this method belongs more to the field of special scientific research.

Table 17.2.1.

Basic weight functions

Time window

Weight function

Fourier transform

Natural (P)

П (t) = 1, | t | £ t; П (t) = 0, | t |> t

P (w) = 2t sinc

Bartlett (D)

B (w) = t sinc2 (wt / 2).

Henninga, Ganna

p (t) = 0.5

0.5P (w) + 0.25P (w + p / t) + 0.25P (w-p / t)

Hamming

p (t) = 0.54 + 0.46 cos (pt / t)

0.54P (w) + 0.23P (w + p / t) + 0.23P (w-p / t)

Carre (2nd window)

p (t) = b (t) sinc (pt / t)

t · B (w) * П (w), П (w) = 1 for | w |

Laplace-Gauss

p (t) = exp [-b2 (t / t) 2/2]

[(t / b) exp (-t2w2 / (2b2))] ③ П (w)

Two-dimensional analogs of one-dimensional filters f1 (x) are constructed in two variants of symmetry: or as a function of the radius:

f2 (x, y) = f1 (),

or as a work:

f2 (x, y) = f1 (x) × f1 (y).

The first option is more correct, but the second has the separability property, i.e., two-dimensional convolution can be performed by two one-dimensional convolutions sequentially along the rows with f1 (x) and along the columns with f1 (y).

Image resampling or resampling is a change in the sampling rate of a digital signal. For digital images, this means resizing the image.

There are various algorithms for resampling images. For example, to enlarge the image by 2 times using the bilinear interpolation method, intermediate columns and rows are obtained by linear interpolation of the values ​​of adjacent columns and rows. Each point of the new image can be obtained as a weighted sum of a larger number of points in the original image (bicubic and other types of interpolation). The highest quality resampling is obtained when using algorithms that take into account not only the time, but also the frequency domain of the signal.

Consider a resampling algorithm with maximum preservation of the frequency information of the image. We will consider the operation of the algorithm on one-dimensional signals, since a two-dimensional image can first be stretched or compressed horizontally (by rows) and then vertically (by columns), and the resampling of a two-dimensional image can be reduced to resampling of one-dimensional signals.

Suppose we have a one-dimensional signal (Fig. 17.2.4), specified on the 0-T interval and sampled with a step Dt = 1 (N intervals). It is necessary to "stretch" the signal m times. The signal spectrum shown in the figure is calculated by the fast Fourier transform (FFT, the number of spectrum samples is equal to the number of signal samples) and is given in the main FFT range (0-2p, Nyquist frequency wN = p / Dt = p, or 0.5N according to the numbering of spectrum samples with a step along the spectrum Df = 1 / T or Dw = 2p / T). There are 2 steps to performing stretching.

The first step is interpolation with zeros, which increases the signal length by m times. (fig. 17.2.5). It is necessary to multiply all samples of the original signal by m, and then after each sample of the signal insert m-1 zero value. On the 0-T interval, the value of which remains unchanged, is now located m times more sampling intervals (mN), and the new sampling step will be equal to Dx = Dt / m. Accordingly, the new Nyquist frequency for this signal is mp / Dt = mp. But the physical value of the spectrum step in frequency units is inverse to the physical value of the signal setting interval (Df = 1 / T), and, therefore, the FFT from mN points of the signal will calculate mN points of the spectrum in the main FFT range of 0-2pm with a step of the spectrum of the original signal, in which will be present m-periods of the spectrum of the original signal (one main and m-1 side).

The second step is to filter out the sidebands of the spectrum using a low-pass filter in either the time domain or the spectral domain. In fig. 17.2.6, the spectrum was cleared and the inverse Fourier transform was performed, as a result of which a signal was obtained m times longer than the original one with full preservation of all frequency information.

By a similar principle, a signal compression (decimation) algorithm can be constructed by a factor of n, while the order of steps is reversed. When the signal is compressed, the signal sampling step is increased and, accordingly, the Nyquist frequency is decreased, while the cut off high frequencies (noise and insignificant high-frequency parts of the signal spectrum) will be reflected from the boundary of the main range and added to the main information, creating distortions. To eliminate this phenomenon, first, the signal is low-pass filtered with a cutoff frequency equal to the new Nyquist frequency (antialiasing), and only then the signal is decimated by decimation.

When resampling only in the time domain, the stretching and compression algorithms are combined, as a rule, into a single sequential process with specifying the change in the sampling step in the form of the ratio m / n, which makes it possible to set integer values ​​of m and n with fractional values ​​of the change in the sampling step. This greatly simplifies the algorithms and increases the efficiency and quality of their work. For example, when the signal is stretched 1.5 times at m / n = 3/2, the signal is first stretched 3 times (simple and uniform addition of zeros to all samples, then low-pass filtering is performed, after which the signal is decimated by two times. An anti-aliasing filter is not required , since its cutoff frequency is covered by the frequency of the first low-pass filter.When the reverse compression operation (for example, m / n = 2/3), only the anti-aliasing filter is used in the same way.

17.3. filtering images

Image filtering is understood as an operation that results in an image of the same size, obtained from the original one according to certain rules. Usually, the intensity (color) of each pixel of the resulting image is determined by the intensities (colors) of pixels located in some of its vicinity in the original image.

Filtering rules can be very different. Image filtering is one of the most fundamental operations of computer vision, pattern recognition, and image processing. The work of the overwhelming majority of image processing methods begins with one or another filtering of the source images.

Line filters have a very simple mathematical description. We will assume that the original grayscale image A is given, and denote the intensities of its pixels A (x, y). A linear filter is defined by a real-valued function h (filter kernel) defined on the raster. The filtering itself is performed using the discrete convolution (weighted summation) operation:

B (x, y) = h (i, j) ③③A (x, y) = h (i, j) A (x-i, y-j). (17.3.1)

The result is image B. Usually, the filter kernel is nonzero only in some neighborhood N of the point (0, 0). Outside this neighborhood, h (i, j) is zero, or very close to it and can be neglected. The summation is performed over (i, j) Î N, and the value of each pixel B (x, y) is determined by the pixels of the image A, which lie in the window N, centered at the point (x, y) (designation - the set N (x, y) ). A filter kernel defined on a rectangular neighborhood N can be viewed as an m-by-n matrix, where the side lengths are odd numbers. When specifying a kernel as a matrix, it should be centered. If the pixel (x, y) is in the vicinity of the edges of the image, then the coordinates A (x-i, y-j) for certain (i, j) may correspond to non-existent pixels A outside the image. This problem can be solved in several ways.

Do not filter these pixels by cropping image B at the edges, or applying the original values ​​of image A to their values.

Do not include the missing pixel in the summation by distributing its weight h (i, j) evenly among other pixels in the neighborhood N (x, y).

Redefine pixel values ​​outside the image boundaries using extrapolation.

Redefine pixel values ​​outside the image boundaries using a mirror image continuation.

The choice of the method is made taking into account the specific filter and image features.

Smoothing filters. The simplest rectangular smoothing filter of radius r is specified using a matrix of size (2r + 1) × (2r + 1), all values ​​of which are equal to 1 / (2r + 1) 2, and the sum of the values ​​is equal to one. It is a two-dimensional analogue of a low-frequency one-dimensional moving average U-shaped filter. When filtering with such a kernel, the pixel value is replaced by the average pixel value in a square with a side of 2r + 1 around it. An example of a 3 × 3 filter mask:

.

One of the applications of filters is noise reduction. The noise changes independently from pixel to pixel and, provided that the mathematical expectation of the noise value is equal to zero, the noise of neighboring pixels during the summation will compensate for each other. The larger the filtering window, the less will be the average intensity of the noise, but this will also result in a corresponding blurring of significant image details. The image of a white point on a black background during filtration (reaction to a single impulse) will be a uniformly gray square.

Noise reduction using a rectangular filter has a significant drawback: all pixels in the filter mask at any distance from the processed one have the same effect on the result. A slightly better result is obtained when modifying the filter with an increase in the weight of the center point:

.

More effective noise reduction can be carried out if the influence of pixels on the result will decrease with increasing distance from the processed one. This property is possessed by a Gaussian filter with the kernel: h (i, j) = (1 / 2ps2) exp (- (i2 + j2) / 2s2). The Gaussian filter has a nonzero kernel of infinite size. However, the value of the filter kernel decreases very quickly to n), and therefore, in practice, one can confine ourselves to convolution with a small window around (0, 0), for example, by taking the window radius equal to 3σ.

Gaussian filtering is also anti-aliasing. However, unlike a rectangular filter, the image of a point with Gaussian filtering will be a symmetrical blurred spot, with a decrease in brightness from the middle to the edges. The degree of blurring of images is determined by the parameter σ.

Contrast Rising Filters ... While anti-aliasing filters reduce the local contrast of an image by blurring it, contrast-enhancing filters produce the opposite effect and are essentially high spatial frequency filters. The core of the contrast-enhancing filter at point (0, 0) has a value greater than 1, with a total sum of values ​​equal to 1. For example, contrast-enhancing filters are filters with a kernel specified by matrices:

. .

An example of applying the filter is shown in Fig. 17.3.1. The effect of enhancing the contrast is achieved by the fact that the filter emphasizes the difference between the intensities of adjacent pixels, removing these intensities from each other. This effect will be the stronger, the larger the value of the central term of the nucleus. A characteristic artifact of linear contrast enhancing filtering is noticeable light and less noticeable dark halos around the borders.

Difference filters Are linear filters defined by discrete approximations of differential operators (by the method of finite differences). These filters play an essential role in many applications, for example, for tasks of finding boundaries in an image.

The simplest differential operator is taking the derivative with respect to the x-coordinate d / dx, which is defined for continuous functions. Common variants of similar operators for discrete images are the Prewitt and Sobel filters:

. .

Filters that approximate the operator of the derivative with respect to the y-coordinate d / dy are obtained by transposing matrices.

The simplest algorithm for calculating the gradient norm from three adjacent points:

G (x, y) = .

A simplified calculation formula is also applied:

Calculation of the gradient norm from four adjacent points (Roberts operator):

Sobel's algorithm uses eight luminance readings in the vicinity of the center point:

G (x, y) = , G (x, y) @ ,

Gxx, y = -,

Gyx, y = -.

Along with a more accurate determination of the gradient norm, Sobel's algorithm also allows determining the direction of the gradient vector in the image analysis plane in the form of the angle j between the gradient vector and the direction of the matrix rows:

j (x, y) = argtg (Gyx, y / Gxx, y).

In contrast to anti-aliasing and contrast-enhancing filters, which do not change the average intensity of the image, as a result of using the difference operators, as a rule, an image is obtained with an average pixel value close to zero. The vertical edges (boundaries) of the original image correspond to the pixels with large absolute values ​​in the resulting image. Therefore, delta filters are also called boundary selection filters.

Similar to the above filters, using the finite difference method, you can compose filters for other differential operators. In particular, the Laplace differential operator (Laplacian) D = 𝝏2 / 𝝏x2 + 𝝏2 / 𝝏y2, which is important for many applications, can be approximated for discrete images by a filter with a matrix (one of the options):

.

As seen in Fig. 17.3.2, as a result of applying the discrete Laplacian, large absolute values ​​correspond to both vertical and horizontal differences in brightness. The filter is thus a filter that finds the boundaries of any orientation. Finding the boundaries in the image can be done by applying this filter and taking all pixels, the magnitude of which exceeds a certain threshold.

However, this algorithm has significant drawbacks. The main one is the uncertainty in the choice of the threshold value. For different parts of the image, an acceptable result is usually obtained at significantly different threshold values. In addition, difference filters are very sensitive to image noise.

Two-dimensional cyclic convolution. As for one-dimensional signals, two-dimensional convolution can be performed in the domain of spatial frequencies using fast Fourier transform algorithms and multiplying the two-dimensional spectra of the image and the filter kernel. It is also cyclical, and is usually performed in a sliding version. Taking into account the cyclicity, to calculate the constant pattern of the kernel spectrum, the dimensions of the kernel filter mask are doubled along the axes and padded with zeros, and the same mask sizes are used to select a window sliding over the image, within which the FFT is performed. Implementing an FIR filter with an FFT is especially effective if the filter has a large reference area.

Non-linear filters ... In digital image processing, non-linear algorithms based on rank statistics are widely used to recover images damaged by various noise models. They allow you to avoid additional image distortion when removing noise, as well as significantly improve the results of filtering on images with a high degree of noise.

Let us introduce the concept of an M-neighborhood of an image element A (x, y), which is central for this neighborhood. In the simplest case, the M-neighborhood contains N-pixels - points falling into the filter mask, including (or not including) the central one. The values ​​of these N-elements can be arranged in a variation series V (r), ranked in ascending (or descending) order, and certain moments of this series can be calculated, for example, the average value of the brightness mN and the variance dN. The calculation of the output value of the filter, which replaces the center sample, is performed using the formula:

B (x, y) = aА (x, y) + (1-a) mN. (17.3.2)

The value of the coefficient a = is associated with a certain relationship with the statistics of counts in the filter window, for example:

a = dN / (dN + k dS), (17.3.3)

where dS is the variance of noise in the image as a whole or in the S-neighborhood for S> M and MÎS, k is the confidence constant of the variance of the S-neighborhood. As follows from this formula, for k = 1 and dN "dS, a" 0.5 takes place, and the value B (x, y) = (A (x, y) + mN) / 2, that is, add up equally from the values ​​of the central reference and the average value of the pixels of its M-neighborhood. With an increase in the dN values, the contribution to the result of the central reference value increases, with a decrease, the mN value. The weight of the contribution of the average values ​​over the M-neighborhood can be changed by the value of the coefficient k.

The choice of the statistical function and the nature of the dependence of the coefficient a on it can be quite diverse (for example, according to the variances of the differences in the samples in the M-neighborhood with the central sample), and depends both on the size of the filter aperture and on the nature of the images and noise. In essence, the value of the coefficient a should specify the degree of damage to the central reference and, accordingly, the function of borrowing the samples from the M-neighborhood to correct it.

The most simple and common types of nonlinear filters for image processing are threshold and median filters.

Threshold filtering is set, for example, as follows:

B (x, y) =

The magnitude p is the filtering threshold. If the value of the center point of the filter exceeds the average value of the samples mN in its M-neighborhood by the value of the threshold, then it is replaced by the average value. The threshold value can be either constant or functionally dependent on the value of the center point.

Median filtering is defined as follows:

B (x, y) = med (M (x, y)),

that is, the filtering result is the median value of the pixels of the neighborhood, the shape of which is determined by the filter mask. Median filtering can effectively remove noise from an image that independently affects individual pixels. For example, such noises are “broken” pixels in digital photography, “snow” noise, when some of the pixels are replaced by pixels with maximum intensity, etc. The advantage of median filtering is that a “hot” pixel on a dark background will be replaced dark, and not "smeared" in the surroundings.

Median filtering has a pronounced selectivity with respect to array elements, which are a non-monotonic component of a sequence of numbers within the filter aperture. At the same time, the median filter leaves the monotonic component of the sequence unchanged. Due to this feature, median filters with an optimally selected aperture preserve sharp edges of objects without distortion, suppressing uncorrelated or weakly correlated noise and small-sized details.

Extremum filters are determined by the rules:

Bmin (x, y) = min (M (x, y)),

Bmax (x, y) = max (M (x, y)),

that is, the filter result is the minimum and maximum pixel values ​​in the filter mask. Such filters are used, as a rule, for binary images.

17.4. IMAGE COMPRESSION

A typical image with a resolution of the order of 3000 × 2000 at 24 bits per pixel for color is 17 megabytes. For professional devices, the size of the resulting image raster can be much larger, the color depth is up to 48 bits per pixel, and the size of one image can be more than 200 megabytes. Therefore, image compression algorithms are highly relevant to reduce the amount of data representing an image.

There are two main classes of algorithms:

1. Lossless compression, if there is an inverse algorithm A-1 such that for any h-image A [h] = h1 we have A-1 = h. Lossless compression is used in such graphic image formats as: GIF, PCX, PNG, TGA, TIFF, and is used in the processing of especially valuable primary information (medical images, aerial and space images, etc.), when even the slightest distortion undesirable

2. Compression with loss (lossy compression), if it does not provide the ability to accurately restore the original image. An algorithm for approximate image restoration paired to A will be denoted as A *. The pair (A, A *) is selected to provide high compression ratios while maintaining visual quality. Lossy compression is used in image formats: JPEG, JPEG2000, etc.

All algorithms and statements refer to both images and arbitrary sequences, the elements of which can take a finite number of values. It should be borne in mind that there are no ideal algorithms that compress any data set without loss.

Repetition Length (RLE) Coding Algorithms are based on a simple principle: replacing repeating groups of elements of the original sequence by a pair (quantity, element), or only by quantity.

Bit level. We will consider the original data at the level of a sequence of bits, for example, representing a black and white image. There are usually several 0s or 1s in a row, and you can encode the number of consecutive identical digits. But the number of repetitions must also be encoded in bits. We can assume that each number of repetitions varies from 0 to 7 (3-bit code), alternating a sequence of codes of ones and zeros. For example, the sequence can be compared to the numbers 7 0 4, that is, 7 ones, 0 zeros, 4 ones, while we have a new year - The longer the length of the sequences of identical bits, the greater the effect. So, a sequence of 21 ones, 21 zeros, 3 ones and 7 zeros is encoded as follows: that is, from the original sequence with a length of 51 bits, we have a sequence with a length of 36 bits.

Byte level. Suppose that a grayscale image is fed to the input, where 1 byte is allocated to the value of the pixel intensity, while the expectation of a long string of identical bits is significantly reduced.

We will split the input stream into bytes (code from 0 to 255) and encode repeated bytes in pairs (number, letter). A single byte can be left unchanged. So, the bytes AABBBCDAA are encoded (2A) (3B) (C) (D) (2A).

However, modifications of this algorithm are rarely used by themselves (for example, in the PCX format), since the subclass of sequences on which the algorithm is effective is relatively narrow. Most often they are used as one of the stages of the compression pipeline.

Dictionary algorithms instead of encoding only one element of the input sequence, the chain of elements is encoded. In this case, a dictionary of strings (created from the input sequence) is used to encode new ones.

The LZ77 algorithm was one of the first to use a dictionary. The last N already encoded elements of the sequence are used as a dictionary. During the compression process, the subsequence dictionary "slides" over the incoming sequence. The chain of elements at the output is encoded as follows: the position of the matching part of the processed chain of elements in the dictionary - offset (relative to the current position), length, the first element following the matched part of the chain. The length of the match chain is bounded from above by the number n. Accordingly, the task is to find the largest string from the dictionary that matches the sequence being processed. If there are no matches, then the zero offset, one length and the first element of the uncoded sequence are recorded.

The coding scheme described above leads to the concept of a sliding window, which consists of two parts:

A subsequence of already encoded elements of length N-dictionary - search buffer;

The subsequence of length n from the chain of elements for which an attempt will be made to find a match is the look-ahead buffer.

Decoding a compressed sequence is a decryption of the recorded codes: each entry is matched with a string from a dictionary and an explicitly written element, after which the dictionary is shifted. The dictionary is recreated as the decoding algorithm runs.

This algorithm is the ancestor of a whole family of algorithms. Its advantages include a decent compression ratio on fairly large sequences and fast decompression. The disadvantages include a slow compression speed and a lower compression ratio than alternative algorithms.

LZW algorithm. The dictionary in this algorithm is a table that is filled with strings of elements as the algorithm runs. The compression process searches for the longest chain already written to the dictionary. Each time a new string of elements is not found in the dictionary, it is added to the dictionary, and the string code is written. In theory, there is no limitation on the size of the table, but the limitation on the size allows you to improve the compression ratio, since unnecessary (not found) chains accumulate. The more entries a table has, the more information needs to be allocated to store the codes.

Decoding consists in decrypting the codes directly, that is, in building a dictionary and outputting the corresponding strings. The dictionary is initialized in the same way as in the encoder. The advantages of the algorithm include a high compression ratio and a fairly high speed of both compression and decoding.

Entropy coding algorithms put in correspondence to each element of the sequence a code so that its length corresponds to the probability of occurrence of the element. Compression occurs by replacing elements of the original sequence that have the same length (each element takes the same number of bits) with elements of different lengths proportional to the negative logarithm of the probability, i.e., elements that are more common than the rest have a shorter code length.

The Huffman algorithm uses a variable length prefix code with a special property: shorter codes do not match the prefix (initial part) of longer ones. This code allows for one-to-one coding. The compression process consists in replacing each element of the input sequence with its code. The construction of a set of codes is usually carried out using the so-called code trees.

Huffman's algorithm is two-pass. The first pass over the image creates a table of element weights, and during the second, coding occurs. There are implementations of the fixed table algorithm. It often happens that the prior probability distribution of the elements of the alphabet is unknown, since the entire sequence is not available at once, and adaptive modifications of the Huffman algorithm are used.

Lossy image compression. The amount of information required to store images is usually large. Classical algorithms, being general-purpose algorithms, do not take into account that the information being compressed is an image - a two-dimensional object, and do not provide a sufficient compression ratio.

Lossy compression is based on the characteristics of human perception of an image: the highest sensitivity in a certain color wavelength range, the ability to perceive the image as a whole, without noticing small distortions. The main class of images to which lossy compression algorithms are focused are photographs, images with smooth color transitions.

Image loss estimation. There are many measures for assessing the loss in images after their recovery (decoding) from compressed ones, but for all of them two images can be selected such that their measure of difference will be large enough, but the differences will be almost imperceptible to the eye. And vice versa - you can pick up images that are very different by eye, but have a small measure of difference.

The standard numerical measure of loss is usually the standard deviation (RMS) of the pixel values ​​of the reconstructed image from the original one. However, the most important “measure” of loss estimation is the opinion of the observer. The fewer differences (or better, their absence) the observer detects, the higher the quality of the compression algorithm. Lossy compression algorithms often give the user the ability to choose the amount of "lost" data, that is, the right to choose between the quality and size of the compressed image. Naturally, the better the visual quality with a higher compression ratio, the better the algorithm.

Fourier transform. In general, the image can be considered as a function of two variables, defined at the points of the final raster. A set of such functions at the points of a fixed finite raster form a finite-dimensional Euclidean space, and a discrete Fourier transform, i.e., the spectral representation of the image, can be applied to them. It provides:

The uncorrelatedness and independence of the spectrum coefficients, that is, the accuracy of the representation of one coefficient does not depend on any other.

- Energy compaction. The transformation keeps the basic information in a small number of coefficients. This property is most pronounced in photorealistic images.

Spectral representation coefficients are the amplitudes of the spatial frequencies of the image. In the case of images with smooth transitions, most of the information is contained in the low-frequency spectrum.

The compression algorithm used in the JPEG format is based on the use of the discrete cosine Fourier transform. The compression scheme in the algorithm is a pipeline, where this transformation is only one of the stages, but one of the main ones. The algorithm contains the following basic operations:

1. Translation into YCbCr color space. Here Y is the luminance component, Cb and Cr are the chromaticity components. The human eye is more sensitive to brightness than color. Therefore, it is more important to maintain greater accuracy in Y transmission than in Cb and Cr transmission.

2. Discrete cosine transform (DCT). The image is divided into 8 × 8 blocks. A discrete cosine transform is applied to each block (separately for the Y, Cb and Cr components).

3. Reduction of high-frequency components in DCT matrices. The human eye practically does not notice changes in high-frequency components, therefore, the coefficients responsible for high frequencies can be stored with less accuracy.

4. Zigzag ordering of matrices. This is a special matrix pass for obtaining a one-dimensional sequence. First comes element T00, then T01, T10, T1 Moreover, for typical photorealistic images, first there will be nonzero coefficients corresponding to low-frequency components, and then - many zeros (high-frequency components).

5. Compression first by the RLE method, and then by the Huffman method.

The image restoration algorithm works in the reverse order. Compression ratio from 5 to 100 or more times. At the same time, the visual quality for most photorealistic images remains at a good level when compressed up to 15 times. The algorithm and format are the most common for transferring and storing full color images.

Wavelet transform signals is a generalization of the classical Fourier transform. The term "wavelet" in translation from English means "small (short) wave". Wavelets are a generalized name for families of mathematical functions of a certain form, which are local in time and frequency, and in which all functions are obtained from one basic one by means of its shifts and stretches along the time axis.

In lossy compression algorithms, as a rule, all operations of the compression pipeline are preserved, replacing the discrete Fourier transform with the discrete wavelet transform. Wavelet transforms have very good frequency-spatial localization and are superior to traditional Fourier transforms in this indicator. This makes it possible to apply stronger quantization, improving the properties of the sequence for subsequent compression. Image compression algorithms based on this transformation, at the same compression ratio, show better results in maintaining image quality.

literature

46. ​​et al. Fast algorithms in digital image processing. - M .: Radio and communication, 1984 .-- 224 p.

47. Soifer image processing. Part 2. Methods and algorithms. - Soros Educational Journal No. 3, 1996.

48., Cartilage noise from images based on nonlinear algorithms using rank statistics. - Yaroslavl State University, 2007.

49. Andreev television surveillance systems. Part II. Arithmetic - logical foundations and algorithms. Tutorial. - SPb: SPb, GUITMO, 2005 .-- 88p.

51. Introduction to digital signal processing (Mathematical foundations) .- M .: Moscow State University, Laboratory of computer graphics and multimedia, 2002. - http: // pv. ***** / dsp / dspcourse. pdf, http: // dsp-book. ***** / dspcourse. djvu, http: // geogin. ***** / arhiv / dsp / dsp4.pdf.

1i. and other Algorithmic foundations of raster graphics. - Internet University of Information Technologies. - http: // www. ***** / goto / course / rastrgraph /

2i. Lukin - electronic systems: Lecture notes. ITMO, 2004. - St. Petersburg, ITMO IFF, 2004. - http: // iff. ***** / kons / oes / KL. htm

About noticed errors and suggestions for additions: ***** @ *** ru.

Copyright© 2008DavydovA.V.

Laboratory work No. 1

Image processing algorithms

Convolution operation

Convolution is a very widespread algorithm that can be used for both image preprocessing and object recognition and identification. Let the image be specified by a two-dimensional brightness matrix F" , and the impulse response by the matrix H... Mathematical convolution of a matrix F with core H can be determined by the following formula:

where M2xN2 - the size of the convolution kernel matrix. Matrix size F is equal to (M1 + M2-1) x (N1 + N2-1), where M1xN1 - the size of the original matrix F" ... Matrix F is obtained from the original one by adding elements at the edges of the matrix according to some rule in order to bring it to the required size. Usually the original matrix is ​​padded at the edges with zeros for half the width of the matrix H to the left and to the right and, respectively, half the height up and the same down. Then the size of the resulting matrix R will be the same as the matrix F" .

Convolution can be calculated directly by "running" one matrix over another, as already shown above. In fig. 1 shows the scheme for calculating the convolution (the size of the mask matrix is ​​taken equal to 3x3). The convolution operator can be viewed as a matrix of coefficients (masks) that are element-wise multiplied with the selected image fragment and then summed to obtain the new value of the element of the filtered image. This matrix can be of arbitrary size, not necessarily square.

Rice. 1. Implementation of the convolution operation.

Exercise

    Implement an algorithm that performs the operation of convolution of the original image with a matrix-mask.

    The size and type of the matrix-mask are set by the user.

    Use the following mask matrices to implement various image processing algorithms:

    • to smooth and suppress noise in the image, use a 3x3 matrix mask of the following form:

    to emphasize the outlines, matrix masks of the following type are used:

1/9*

    to select the outlines, a mask of the following type is used:

4. Implement a median filter that is used to suppress point and impulse noise. The pixel of the image and its neighbors in the considered area are lined up in a variation series (in ascending or decreasing pixel values) and the central value of this variation series is selected as a new pixel value. The result of averaged filtering is that any random noise contained in the image will be effectively eliminated. This is because any random abrupt change in pixel intensity within the area of ​​interest will be sorted, i.e. it will be placed either at the top or at the bottom of the sorted values ​​of this area and will not be counted, since the center value is always taken for the new value of the elements.

5. Implement an embossing algorithm. Embossing is done in the same way as for averaging or underline contour algorithms. Each pixel in the image is processed by a 3x3 embossing matrix. For example, the following matrix mask can be taken as the embossing kernel:

After the pixel value is processed by the embossing kernel, 128 is added to it. This will change the background pixel value to a medium gray (red = 128, green = 128, blue = 128). Amounts over 255 can be rounded to 255.

In the embossed version of the image, the outlines appear to be extruded above the surface. The direction of the image illumination can be changed by changing the positions 1 and -1 in the kernel. If, for example, the values ​​1 and -1 are swapped, then the direction of illumination is reversed.

6. Water colorization of the image. The watercolor filter transforms the image, and after processing it looks like it was painted in watercolor:

    The first step in applying a watercolor filter is to smooth out the colors in the image. One of the anti-aliasing methods is to apply a median color averaging at each point. The color value of each pixel and its 24 neighbors (the size of the matrix-mask is 5x5) are arranged in a series of variations in descending or ascending order. The median (thirteenth) color value in the variation series is assigned to the center pixel.

    after smoothing the colors, you need to apply an underline filter to highlight the boundaries of the color transitions.

Representation of images

There are two main types of image representations - vector and raster.

In the vector representation, the image is described by a set of lines (vectors) that contain the coordinates of the start and end points, the curvature of lines and other geometric characteristics, the rules for constructing various areas and color characteristics are also described. In other words, for a raster representation, it is necessary to form a certain mathematical model. Therefore, the vector representation is used mainly for solving problems of image synthesis. Although some image recognition algorithms for their work require exactly the vector representation, which must be obtained from the original image.

A raster image is one or more matrices that describe the spatial distribution of image characteristics on a certain Cartesian coordinate grid. In this case, the image is built from many points and has a raster structure. The main element of a raster image is a pixel (short for the phrase "picture elements" - image elements), which has coordinates in the raster coordinate system and some attributes (color, brightness, transparency, etc.). The number of pixels in the X and Y coordinates (horizontal and vertical) sets the resolution (dimension) of the image representation. Pixel color is specified by depth - the number of bits required to specify any color.

Raster images, depending on the methods for setting the color of the pixel and the properties of the original image, are divided into:

Binary

Halftone

Palette

Full color

In binary representation, the color of a pixel can be either white or black and is encoded in one bit. The image is a matrix. Each element I (i, j) of this matrix has the value either 0 or 1, where i is the row number, and is the column number j of the element corresponding to the given pixel (Fig. 1).

In grayscale images, pixels represent grayscale brightness values. Matrix indices describing a grayscale image specify the position of the pixel on the raster, and the value of the matrix element

- sets its brightness I (i, j) (Fig. 2).

Palette images are described by two matrices (Fig. 3). One stores the values ​​of the indices, which specify the reference to the row of the palette matrix. The palette matrix is ​​a color map. It contains 3 groups of columns - corresponding to the red "R", green "G" and blue "B" colors. They also set the color of the corresponding pixel.

The palette is an Nc 3 matrix, where Nc is the number of colors.

Image preprocessing algorithms

Full-color images are built in RGB format and represent three matrices R (i, j), G (i, j), B (i, j). The corresponding elements of each matrix contain the values ​​of the intensities of red, green and blue colors for the pixel specified by the matrix indices. Thus, a full-color image does not have a color map and the color of each pixel is represented by three numbers taken from the corresponding matrices (Fig. 4).

The format of numbers in matrices can be either integer or floating point. The first case refers to the so-called digitized images obtained using various devices - scanners, digital cameras, television cameras, etc. It is in this format that information about images is stored in standard graphic files.

The second option is used for the internal representation of images during their processing. In this case, it is convenient to normalize the intensity data to one range, for example, to a range, and perform various calculations with floating numbers, and then convert the result to the original integer form. This method allows you to reduce calculation errors and improve the accuracy of the processing result.

For full color images, one parameter is the maximum number of colors that can be represented in that format. The most commonly used images have 16, 256, 65536 (High Color) and 10.7 million (True Color) colors.

Image preprocessing algorithms

0 0 0 0 1 1 1 0 0

120 122 125 128 115 117 118

1 0 0 0 1 1 1 1 0

119 121 124 125 128 130 133

1 1 0 0 1 1 0 0 1

122 122 124 123 127 126 128

120 121 123 125 127 125 126

1 1 1 0 1 1 0 0 0

118 110 109 108 108 109 110

0 0 1 0 0 1 0 0 1

Image preprocessing algorithms

Index Matrix

31 15 03 09

Palette matrix

Image preprocessing algorithms

A full-color image can be represented not only in RGB format, but also using other color systems.

In the HSB system, color is represented by the following color characteristics: Hue - hue;

Saturation - saturation; Brightness - brightness.

It is believed that this color system corresponds to the peculiarities of human perception of color.

In the LAB system, color is considered as a combination of lightness and two independent chromaticity values, which determine the true color of a pixel. Chroma A - selects a color from magenta to green. Chromaticity B - the second color component is selected from the range from yellow to cyan.

There are other systems for representing colors. Naturally, they are all related, and from one representation another can be obtained. The variety of color systems is due to the tasks solved with their help. For example, it is more convenient to perform color correction in the LAB system, reproduce the image on the monitor screen in the RGB system, print better,

Image preprocessing algorithms

using CMYK representation. However, in any case, when processing images and recognizing them, they work with a raster representation of images containing one or more matrices.

Classification of preprocessing algorithms

Image preprocessing algorithms will be subdivided into different groups depending on the classifying feature. All preprocessing algorithms must either improve in some sense the quality of images, or transform it to a form most convenient for subsequent processing.

Algorithms aimed at improving the color reproduction of an image are called color correction algorithms. This group also includes algorithms that work with halftone images that change their brightness and contrast characteristics.

Algorithms aimed at processing the spatial characteristics of images are called algorithms spatial filtering. This group includes algorithms for suppression of interference, algorithms for spatial smoothing and spatial gain algorithms, algorithms for suppression and amplification of spatial frequencies.

Algorithms that perform geometric operations on an image are called geometric processing algorithms... These include:

Image preprocessing algorithms

Cropping an image - selection from the original image of a certain part of a rectangular shape;

Resize the image. These algorithms use various interpolation methods that allow either to correctly fill in the missing pixels in the enlarged image, or to recalculate the pixel values ​​when the image is reduced.

Rotate the image. These algorithms rotate the original image by a given angle, correctly recalculating the pixel values ​​using various interpolation methods.

Algorithms performing transformations from one color system to another are called color conversion algorithms... They also include algorithms for converting color images to grayscale and binarization algorithms that convert the original image into a binary one.

Algorithms that highlight some areas in the original image according to various, often informal conditions are called segmentation algorithms. An example of such an algorithm can be, for example, an algorithm that should select areas of text and graphic information on a document image, or an algorithm that selects areas in a text image that refer to individual words.

Image preprocessing algorithms

Spatial filtering algorithms

Spatial filtering of an image in mathematical form is a discrete convolution of a discrete image with some impulse response of the spatial filter

If (i, j)

Im (i m, j n) h (m, n), where:

m N11 n N21

Im, If matrices of the original and filtered images, h matrix of the impulse response of the filter,

N 11, N 21 are the lower and upper boundaries of the impulse response columns, N 12, N 22 are the left and right boundaries of the rows of the impulse response.

The impulse response matrix can be obtained by calculating the spatial filter based on the specified parameters. A large amount of literature devoted to digital filtering, for example, is devoted to methods for calculating spatial filters. For practical calculations, you can use standard mathematical packages, for example, the “MATLAB” system includes the “Image Filter Design” filter calculation system.

Note that filtering can also be performed in the frequency domain. In that

Image preprocessing algorithms

In case, the filtering order is as follows:

Convert an image from spatial domain to frequency domain using 2D discrete Fourier transform

Perform element-wise multiplication of the frequency matrix of the image by the frequency matrix of the filter

Transform the obtained result into a spatial domain using the inverse two-dimensional discrete Fourier transform.

Im (x, y)

Im (f x, f y)

If (f x, f y) Im (f x, f y) H (f x, f y)

If (fx, f y)

If (x, y).

Filtering images in the frequency domain is rarely used due to the large amount of computation. However, this method of filtering is widely used in theoretical calculations when analyzing options for image processing. It allows you to clearly visualize what kind of filtering is needed. For example, if you need to highlight sharp changes in brightness in the image, then it is obvious that you need to use high-pass filters. On the contrary, if you need to get rid of low-frequency noise - jittering loops, individual surges, etc., then you need to use low-pass filters. Specific filter parameters are selected based on frequency analysis of interference and properties of the original image.