Analytical data processing methods for decision support. Online analytical data processing (OLAP)

3.4 Methods of analytical data processing

In order for the existing data warehouses to facilitate the adoption of management decisions, the information must be presented to the analyst in the required form, that is, he must have developed tools for accessing and processing data in the warehouse.

Very often, information and analytical systems created with the expectation of direct use by decision-makers are extremely easy to use, but severely limited in functionality. Such static systems are called Executive Information Systems (EIS). They contain many queries and, being sufficient for everyday review, are unable to answer all the questions that may arise when making decisions. The result of the work of such a system, as a rule, is multi-page reports, after careful study, which the analyst has a new series of questions. However, each new request not foreseen in the design of such a system must first be formally described, coded by the programmer, and only then executed. The waiting time in this case can be hours and days, which is not always acceptable.

Online analytical processing... Or On-Line Analytical Processing, OLAP is a key component of data warehousing organization. The OLAP concept was described in 1993 by Edgar Codd and has the following requirements for multivariate analysis applications:

- multidimensional conceptual representation of data, including full support for hierarchies and multiple hierarchies (a key requirement of OLAP);

- providing the user with the analysis results in a reasonable time (usually no more than 5 s), even at the cost of a less detailed analysis;

- the ability to carry out any logical and statistical analysis, typical for this application, and save it in a form accessible to the end user;

- multi-user access to data with support of appropriate locking mechanisms and authorized access means;

- the ability to access any necessary information, regardless of its volume and storage location.

An OLAP system consists of many components. At the highest level of presentation, the system includes a data source, a multidimensional database (MDB) that provides the ability to implement a reporting engine based on OLAP technology, an OLAP server, and a client. The system is built on the client-server principle and provides remote and multi-user access to the MDB server.

Consider the components of an OLAP system.

Sources. The source in OLAP systems is the server that supplies the data for analysis. Depending on the scope of use of the OLAP product, the source can be a data warehouse, an inherited database containing general data, a set

tables combining financial data, or any combination of the above.

Data store. Raw data is collected and stored in a repository designed in accordance with the principles of building data warehouses. HD is a relational database (RDB). The main CD table (fact table) contains the numerical values ​​of the indicators for which statistical information is collected.

Multidimensional database The data store serves as a provider of information for a multidimensional database, which is a collection of objects. The main classes of these objects are dimensions and measures. Dimensions include sets of values ​​(parameters) by which data is indexed, for example, time, regions, type of institution, etc. Each dimension is filled with values ​​from the corresponding dimension tables of the data warehouse. The set of measurements defines the space of the investigated process. Measures are multidimensional data cubes (hypercubes). The hypercube contains the data itself, as well as the aggregate amounts for the dimensions that are part of the indicator. Indicators constitute the main content of the MDB and are filled in according to the fact table. Along each axis of the hypercube, data can be organized in a hierarchy representing different levels of detail. This allows you to create hierarchical dimensions, which will be used to aggregate or drill down on the data presentation during subsequent data analysis. A typical example of a hierarchical dimension is a list of territorial objects grouped by districts, regions, districts.

Server. The OLAP server is the applied part of the OLAP system. This component does all the work (depending on the system model), and stores in itself all the information to which active access is provided. Server architecture is governed by various concepts. In particular, the main functional characteristic of OLAP products is the use of MDB or RDB for data storage.

Client application.Data structured accordingly and stored in MDB is available for analysis using the client application. The user gets the ability to remotely access data, formulate complex queries, generate reports, and receive arbitrary subsets of data. Obtaining a report is reduced to the selection of specific measurement values ​​and the construction of a section of the hypercube. The cross section is determined by the selected measurement values. The data for the rest of the dimensions are summarized.

OLAPon the client and on the server. Multidimensional data analysis can be carried out using various tools, which can be conditionally divided into client and server OLAP tools.

OLAP client tools (for example, Pivot Tables in Excel 2000 from Microsoft or ProClarity from Knosys) are applications that calculate and display aggregate data. In this case, the aggregate data themselves are contained in the cache inside the address space of such an OLAP tool.

If the source data is contained in the desktop DBMS, the calculation of the aggregate data is performed by the OLAP tool itself. If the source of the original data is a server DBMS, many of the client OLAP tools send SQL queries to the server and as a result receive the aggregate data calculated on the server.

Typically, OLAP functionality is implemented in statistical data processing tools and in some spreadsheets.

Many development tools contain libraries of classes or components that allow you to create applications that implement the simplest OLAP functionality (such as the Decision Cube components in Borland Delphi and Borland C ++ Builder). In addition, many companies offer ActiveX controls and other libraries that provide similar functionality.

Client OLAP tools are used, as a rule, with a small number of dimensions (usually no more than six) and a small variety of values ​​for these parameters - since the obtained aggregate data must fit into the address space of such a tool, and their number grows exponentially with an increase in the number of dimensions.

Many client OLAP tools allow you to save the contents of the cache with aggregate data in the form of a file in order not to recalculate them. However, this opportunity is often used to alienate aggregate data for the purpose of transferring them to other organizations or for publication.

The idea of ​​storing a cache with aggregate data in a file was further developed in server OLAP tools (for example, Oracle Express Server or Microsoft OLAP Services), in which saving and changing aggregate data, as well as maintaining the storage containing them, are carried out by a separate application or process called OLAP server. Client applications can request such a multidimensional storage and receive certain data in response. Some client applications can also create such repositories or update them according to the changed source data.

The advantages of using server-based OLAP tools compared to client OLAP tools are similar to the advantages of using server-side DBMSs compared to desktop tools: in the case of using server-based tools, the calculation and storage of aggregate data occurs on the server, and the client application receives only the results of queries to them, which allows in general, reduce network traffic, query execution time, and resource requirements for the client application.

3.5 Technical aspects of multidimensional data storage

Multidimensionality in OLAP applications can be divided into three levels:

1... Multidimensional data representation- end-user tools that provide multidimensional visualization and data manipulation; the MDI layer abstracts from the physical data structure and treats the data as multidimensional.

    Multidimensional processing- a tool (language) for formulating multidimensional queries (the traditional relational SQL language turns out to be unsuitable here) and a processor that can process and execute such a query.

    Multidimensional storage- means of physical organization of data, ensuring efficient execution of multidimensional queries.

The first two levels are mandatory in all OLAP tools. The third level, although widespread, is not required, since the data for the multidimensional view can also be retrieved from ordinary relational structures. The multidimensional query processor, in this case, translates the multidimensional queries into SQL queries that are executed by the relational DBMS.

In any data warehouse - both conventional and multidimensional - along with detailed data retrieved from operational systems, aggregated indicators (total indicators) are also stored, such as the sums of sales volumes by month, by product category, etc. Aggregates are stored explicitly for the sole purpose of speeding up the execution of queries. Indeed, on the one hand, as a rule, a very large amount of data is accumulated in the warehouse, and on the other, analysts in most cases are interested not in detailed, but generalized indicators. And if millions of individual sales had to be summed up each time to calculate annual sales, the speed would most likely be unacceptable. Therefore, when loading data into a multidimensional database, all summary indicators or part of them are calculated and saved.

However, the use of aggregated data is fraught with disadvantages. The main disadvantages are the increase in the amount of stored information (when new dimensions are added, the amount of data that makes up the cube grows exponentially) and the time it takes to load them. Moreover, the amount of information can increase tens and even hundreds of times. For example, in one of the published standard tests, a full aggregate count for 10 MB of raw data required 2.4 GB, i.e. the data grew 240 times!

The degree to which the data volume increases when calculating aggregates depends on the number of dimensions in the cube and the structure of these dimensions, that is, the ratio of the number of “parents” and “descendants” at different levels of the dimension. To solve the problem of storing aggregates, complex schemes are used that allow, when calculating far from all possible aggregates, to achieve a significant increase in query performance.

Both initial and aggregate data can be stored either in

relational, or in multidimensional structures. In this regard, three ways of storing multidimensional data are currently used:

MOLAP (Multidimensional OLAP) - Source and aggregate data is stored in a multidimensional database. Storing data in multidimensional structures allows you to manipulate data as a multidimensional array, so that the speed of calculating aggregate values ​​is the same for any of the dimensions. However, in this case, the multidimensional database turns out to be redundant, since the multidimensional data completely contains the original relational data.

These systems provide a full cycle of OLAP processing. They either include, in addition to the server component, their own integrated client interface, or use external spreadsheet programs to communicate with the user.

ROLAP (Relational OLAP) - the original data remains in the same relational database where it was originally located. Aggregate data is placed in service tables specially created for their storage in the same database.

HOLAP (Hybrid OLAP) - the original data remains in the same relational database where it was originally located, and the aggregate data is stored in the multidimensional database.

Some OLAP tools support storing data only in relational structures, some only in multidimensional structures. However, most modern OLAP server-based tools support all three methods of storing data. The choice of storage method depends on the size and structure of the source data, the requirements for the speed of execution of queries and the frequency of updating the OLAP cubes.

3.6 Data mining (DataMining)

The term Data Mining denotes the process of finding correlations, trends and relationships through various mathematical and statistical algorithms: clustering, regression and correlation analysis, etc. for decision support systems. In this case, the accumulated information is automatically generalized to information that can be characterized as knowledge.

The modern Data Mining technology is based on the concept of templates reflecting the patterns inherent in data subsamples and constituting the so-called hidden knowledge.

The search for patterns is performed by methods that do not use any a priori assumptions about these subsamples. An important feature of Data Mining is the non-standard and non-obviousness of the sought patterns. In other words, Data Mining tools differ from statistical data processing tools and OLAP tools in that instead of checking the relationships assumed by users in advance

between data, based on the available data, they are able to independently find such relationships, as well as build hypotheses about their nature.

In general, the data mining process consists of three stages

    identifying patterns (free search);

    using the revealed patterns to predict unknown values ​​(predictive modeling);

    analysis of exceptions, designed to identify and interpret anomalies in the found patterns.

Sometimes an intermediate stage of checking the reliability of the found patterns between their finding and using (the stage of validation) is explicitly distinguished.

There are five standard types of patterns identified by Data Mining methods:

1.Association allows you to select stable groups of objects between which there are implicitly specified links. The frequency of occurrence of an individual item or group of items, expressed as a percentage, is called prevalence. A low prevalence rate (less than one thousandth of a percent) suggests that such an association is not significant. Associations are written in the form of rules: A=> B, where BUT - package, IN - consequence. To determine the importance of each obtained association rule, it is necessary to calculate a value called confidence BUT To IN(or relationship A and B). Confidence shows how often when BUT appears IN. For example, if d (A / B)= 20%, this means that when buying a product BUT in every fifth case, the goods are also purchased IN.

A typical example of the use of the association is the analysis of the structure of purchases. For example, when conducting a study in a supermarket, it can be established that 65% of those who bought potato chips also take Coca-Cola, and if there is a discount for such a set, Cola is purchased in 85% of cases. Results like these are valuable in shaping marketing strategies.

2.Sequence - it is a method of identifying associations in time. In this case, rules are defined that describe the sequential occurrence of certain groups of events. Such rules are essential for building scripts. In addition, they can be used, for example, to form a typical set of prior sales that may entail subsequent sales of a particular product.

3.Classification - generalization tool. It allows you to move from considering single objects to generalized concepts that characterize some sets of objects and are sufficient for recognizing objects belonging to these sets (classes). The essence of the concept formation process is to find patterns inherent in classes. Many different features (attributes) are used to describe objects. The problem of concept formation based on feature descriptions was formulated by M.M. Bongart. Its solution is based on the application of two basic procedures: training and testing. In the training procedures, a classifying rule is constructed based on the processing of a training set of objects. The verification (examination) procedure consists in using the obtained classifying rule for recognizing objects from a new (examination) sample. If the test results are found to be satisfactory, then the learning process ends, otherwise the classifying rule is refined during the re-learning process.

4 clustering Is the distribution of information (records) from the database into groups (clusters) or segments with the simultaneous determination of these groups. In contrast to classification, here, for the analysis, no preliminary assignment of classes is required.

5 time series forecasting is a tool for determining the trends of changes in the attributes of the objects under consideration over time. Analysis of the behavior of time series makes it possible to predict the values ​​of the studied characteristics.

To solve such problems, various Data Mining methods and algorithms are used. Due to the fact that Data Mining has developed and develops at the intersection of disciplines such as statistics, information theory, machine learning, database theory, it is quite natural that most Data Mining algorithms and methods were developed based on various methods from these disciplines.

From the variety of existing data mining methods, the following can be distinguished:

    regression, variance and correlation analysis(implemented in most modern statistical packages, in particular, in the products of the companies SAS Institute, StatSoft, etc.);

    methods of analysis in a specific subject area, based on empirical models (often used, for example, in inexpensive financial analysis tools);

    neural network algorithms- a method of simulating processes and phenomena that allows reproducing complex dependencies. The method is based on the use of a simplified model of a biological brain and consists in the fact that the initial parameters are considered as signals that are transformed in accordance with the existing connections between "neurons", and the response of the entire network to the initial data is considered as a response resulting from the analysis. In this case, connections are created using the so-called network training by means of a large sample containing both the initial data and the correct answers. Neural networks are widely used to solve classification problems;

    fuzzy logic is used to process data with fuzzy truth values ​​that can be represented by a variety of linguistic variables. Fuzzy knowledge representation is widely used to solve classification and forecasting problems, for example, in the XpertRule Miner system (Attar Software Ltd., Great Britain), as well as in AIS, NeuFuz, etc.

    inductive leads allow you to get generalizations of the facts stored in the database. In the process of inductive learning, a specialist supplying hypotheses may be involved. This is called supervised learning. The search for generalization rules can be carried out without a teacher by automatically generating hypotheses. In modern software, as a rule, both methods are combined, and statistical methods are used to test hypotheses. An example of a system using inductive leads is the XpertRule Miner developed by Attar Software Ltd. (Great Britain);

    reasoning based on similar cases(“Nearest neighbor” method) (Case-based reasoning - CBR) are based on searching in a database for situations whose descriptions are similar in a number of features to a given situation. The principle of analogy allows us to assume that the results of similar situations will also be close to each other. The disadvantage of this approach is that it does not create any models or rules that generalize previous experience. In addition, the reliability of the outputted results depends on the completeness of the description of situations, as in the processes of inductive inference. Examples of systems using CBR are: KATE Tools (Acknosoft, France), Pattern Recognition Workbench (Unica, USA);

    decision trees- a method for structuring a task in the form of a tree graph, the vertices of which correspond to production rules that allow classifying data or analyzing the consequences of decisions. This method gives a visual representation of the system of classifying rules, if there are not very many of them. Simple problems are solved using this method much faster than using neural networks. For complex problems and for some data types, decision trees may not be appropriate. In addition, this method has a problem of significance. One of the consequences of hierarchical data clustering is the absence of a large number of training examples for many special cases, and therefore the classification cannot be considered reliable. Decision tree methods are implemented in many software tools, namely: С5.0 (RuleQuest, Australia), Clementine (Integral Solutions, UK), SIPINA (University of Lyon, France), IDIS (Information Discovery, USA);

    evolutionary programming- search and generation of an algorithm expressing the interdependence of data, based on the initially specified algorithm, modified in the search process; sometimes the search for interdependencies is carried out among any specific types of functions (for example, polynomials);

limited search algorithms that compute combinations of simple logical events in subgroups of data.

3.7 IntegrationOLAPandDataMining

Online analytical processing (OLAP) and data mining (Data Mining) are two parts of the decision support process. However, today most OLAP systems focus only on providing access to multidimensional data, and most data mining tools working in the field of patterns deal with one-dimensional data perspectives. To increase the efficiency of data processing for decision support systems, these two types of analysis should be combined.

Currently, the composite term "OLAP Data Mining" (multidimensional mining) appears to denote such a combination.

There are three main ways to form "OLAP Data Mining":

    "Cubing then mining". The ability to perform mining analysis should be provided on any result of a query to a multidimensional conceptual representation, that is, over any fragment of any projection of a hypercube of indicators.

    Mining then cubing. Like data extracted from a repository, mining results must be presented in hypercubic form for subsequent multivariate analysis.

    "Cubing while mining". This flexible method of integration allows you to automatically activate the same type of intelligent processing mechanisms over the result of each step of multivariate analysis (transition) between the levels of generalization, extraction of a new fragment of the hypercube, etc.).

    Astronomy for 11 class [Text ... them how part the whole systems ... assistant professor ... Cheboksary, 2009. No. 10. S. 44 -49 .... Authors- compilers: N. ... synopseslectures, ...

  • Study guide

    ... lectures... Training lectures mathematics. Writing synopsislectures lectures... Usage informationtechnologies ...

  • I k kondaurova s ​​v lebedeva

    Study guide

    ... lectures... Training lectures mathematics. Writing synopsislectures... Preparation of visual aids. Reading technique lectures... Usage informationtechnologies ...

  • M MEDIA MONITORING Modernization of vocational education March - August 2011

    Summary

    ... 11 .08.2011 "Dead Souls-2" in RNIMU them ... 3,11 -3,44 ... ... public lectures leaders ... Cheboksary... and scribbling synopses audience - ... informationsystems and technology. ... system education, says assistant professor ... compilers ... parts of enhancing real content ...

Topic 6

CORPORATE INFORMATION SYSTEMS FOR PROCESSING ECONOMIC INFORMATION

Corporate information technology concept

The essence and significance of corporate information technology

Among the variety of programs for business, the term "information technology in corporate governance" is traditionally understood as "integrated management automation systems". Their other names are also known - enterprise-scale systems, corporate information systems (CIS), corporate (or integrated) management systems (KSU), automated control systems (ACS).

As a rule, complex control automation systems are "basic" universal solutions suitable for various types of enterprises, primarily financial management, inventory management, purchase and sales management. But these same systems often have industry solutions that reflect one or more other specifics and containing an appropriate regulatory and reference base.

For example, the solution of the SAP R / 3 system for the aviation industry supports accounting and control of serial numbers of all aircraft parts, their service life, scheduled replacement or repair, which ensures not only production reliability, but also the safety of passengers.

Since integrated management systems are primarily focused on large enterprises containing multidisciplinary structures, they not only offer a developed set of functions, but also provide reliable storage and processing of large amounts of information, using powerful platforms and system tools for multi-user work. ...

Modern information technologies, communications and the Internet allow solving problems of remote access to a single database, which is also important for corporate governance.

Build concept

Although most developers call their software products management (enterprise, warehouse, finance, etc.), in essence, almost all software used in corporate governance are recording facts and documents of financial and economic activities, accounting systems with the ability construction of reports and references in the sections allowed by analytical features. That is, structured information is entered into the database. This structure is laid down to one degree or another by interconnected reference books, classifiers, parameters and forms of standard documents. According to the information available in the database, the so-called "cut" is "built", "pulled out", "collected" by instrumental means. Having received reports and references, often called analytical reports, based on such data, management can make decisions. This is the typical concept and typical technology for working with systems of the class in question.



It is no coincidence that such different in functional content, system solutions, purpose and use of "management" software, such as "Galaxy", "BEST" and "1C: Enterprise", is similar in the principles of information organization, technology of its formation and processing, as well as methods of interaction with systems.

Nevertheless, enterprises, for example, OJSC Uralelectromed, put forward such tough and varied requirements for corporate governance tools that it becomes necessary to build them on a multi-level basis. Usually the core is the core of the system, which contains only program codes. The next conceptually important element is the built-in toolkit of the system, which allows, without changing the program codes, at least to configure it at workplaces, perform specific operations, enter new and change existing forms of primary and reporting documents, and use other means of parametric adjustment. More advanced systems have built-in tools for creating various models of an enterprise: information, organizational, functional, etc. And, finally, the database itself.

Analytical information processing

Planning the activities of an enterprise, obtaining operational information and making the correct decision based on its analysis is associated with the processing of large amounts of data. Reports generated in corporate accounting systems are usually not flexible. They cannot be “rotated”, “expanded” or “collapsed” to obtain the desired data representation, including graphical ones. The more “cuts” and “cuts” you can make, the more realistic you can imagine the picture of the enterprise and make the best decision on the management of business processes. For this kind of tasks, mathematical and economic modeling, as well as high performance, are required. The analytical module is available in the "RepKo" system, the better known is the "Triumph-Analytica" system ("PARUS" Corporation - "Torah Center"). It would seem that accounting systems build references in various “sections” based on the information stored in the database, they simply represent what is. And analytical systems build new information according to specified parameters or criteria, optimizing it for specific purposes. Therefore, more often you need a special tool for viewing and visualizing information, which is online analytical processing (OLAP). It provides a set of convenient and high-speed means of access, viewing and multidimensional analysis of information accumulated in the storage.

OLAP technologies are used to model a situation according to the “what if…” scheme, and to compile a variety of analytical reports. There are specialized Western software products.

Typically, information from corporate management systems is transferred to specialized programs for analytical data processing. Many domestic developers try to solve these problems on their own, for example, Nikos-Soft (NS-2000 system), Cepheus (Etalon corporate management system), KOMSOFT (KOMSOFT-STANDARD "2.0), etc.

6.4. Prospects for the development and use of corporate information technologies

In addition to the development and use of modern tools and platforms, as well as system tools, the development of domestic corporate systems presupposes their functional saturation, especially in terms of production.

Despite the widespread passion for the implementation of management standards, the leading players in the domestic software market are developing industry solutions for various types of industries.

Firms' fears of revealing the "confidentiality" of their developments are diminishing, which helps to consolidate their efforts to integrate their products, rather than developing everything from "a" to "z" on their own. Today, no one has enough resources. It takes years to comprehend a new concept, develop a project and a system, namely a system that changes its quality depending on what is in it. In addition, the requirement for the integration of software products is also put forward by enterprises wishing to keep "working", as a rule, specialized, systems and informationally combine them with newly acquired ones.

Integration is also required for products from different manufacturers - in the name of combining complex solutions with specialized:

- budgeting, financial and economic analysis, customer service, analytical data processing, etc.

It should be noted that it is not the control systems themselves that are more promising, but a simple and universal tool for their creation, intended for qualified intermediaries between the developer and the end user. Now these functions are trying to be performed by system administrators and analysts.

If such a tool is available, "ready-made" standard solutions for all enterprises in all industries will be in demand.

The Internet as an additional tool for business development can be effectively used only in the presence of an integrated management system.

Although modern information and communication technologies, including the Internet, make it possible to organize the rental of software, it is premature to talk about the near-term prospect of using such opportunities, especially in our country. And not so much for reasons of confidentiality as for lack of order and reliable means of communication.

Attempts to implement and experience in using, even not in full, information technologies at domestic enterprises have proven in practice that "chaos cannot be automated." A preliminary reorganization of the business and the enterprise itself is necessary, as well as the construction of regulations (instructions) of management. It is difficult for employees of the enterprise to cope with such work on their own. Especially considering the time factor in market conditions. Therefore, the practice of interaction with consulting companies is developing everywhere, which help enterprises and teach their employees to "expand bottlenecks", establish the main business process, develop technology, build information flows, etc. Automating a streamlined process is easier, easier, cheaper, faster.

Everyone must do their job. An accountant, storekeeper, sales manager and other "subject" specialists should not improve the form of forms of documents, expand columns or change their places due to changes in legislation or business schemes. Therefore, the software market is gradually transforming from a "product" into a "service" one. Outsourcing begins to develop - the transfer of some functions of the enterprise to the specialists of the involved companies. They are engaged in the maintenance of equipment, system software, modification of the applied (functional) part of systems, etc.

Information technology and methodological services for their users and consumers are becoming the most important and topical in the use of corporate management systems.

8.3.1. On-Line Analytical Processing (OLAP) Tools

On-Line Analytical Processing is a means of operational (real-time) analytical processing of information aimed at supporting decision-making and helping analysts answer the question "Why are objects, environments and the results of their interaction such and not others?" In this case, the analyst himself forms versions of the relationship between a set of information and checks them on the basis of the available data in the corresponding databases of structured information.

ERP systems are characterized by the presence of analytical components as part of functional subsystems. They provide the formation of analytical information in real time. This information is the basis for most management decisions.

OLAP technologies use hypercubes - specially structured data (otherwise called OLAP cubes). In the data structure of the hypercube, the following are distinguished:

Measures - quantitative indicators (requisites-bases) used to generate summary statistical results;

Dimensions - descriptive categories (attributes-attributes), in the context of which measures are analyzed.

The dimension of a hypercube is determined by the number of dimensions for one measure. For example, the SALES hypercube contains data:

Dimensions: consumers, dates of operations, groups of goods, nomenclature, modifications, packaging, warehouses, types of payment, types of shipment, rates, currency, organizations, departments, responsible, distribution channels, regions, cities;

Measures: planned quantity, actual quantity, planned amount, actual amount, planned payments, actual payments, planned balance, actual balance, sales price, order execution time, refund amount.

Such a hypercube is intended for analytical reports:

Classification of consumers according to the volume of purchases;

Classification of goods sold by the ABC method;

Analysis of the terms of execution of orders of various consumers;

Analysis of sales volumes by periods, goods and groups of goods, regions and consumers, internal departments, managers and sales channels;

Forecast of mutual settlements with consumers;

Analysis of the return of goods from consumers; etc.

Analytical reports can have an arbitrary combination of dimensions and measures, they are used to analyze management decisions. Analytical processing is provided by instrumental and language tools. In the publicly available MS Excel spreadsheet, the information technology "Pivot Tables" is presented, the initial data for their creation are:

List (database) MS Excel - relational table;

Another MS Excel pivot table;

Consolidated range of MS Excel cells located in the same or different workbooks;

External relational database or OLAP cube, data source (files in .dsn, .ode format).

To build pivot tables based on external databases, use ODBC drivers, as well as the MS Query program. The summary table for the original MS Excel database has the following structure (Fig. 8.3).

The layout of the pivot table has the following data structure (Fig. 8.4): dimensions - department code, position; measures - work experience, salary and bonus. Below is a summary table. 8.2, which allows you to analyze the relationship between average work experience and salary, average work experience and bonuses, salary and bonuses.

Table 8.2

Pivot Table for Link Analysis

The end of the table. 8.2

To continue the analysis using the pivot table, you can:

Add new totals (for example, average salary, average bonus, etc.);

Use filtering of records and totals of the pivot table (for example, by the attribute "Gender", which is placed in the layout in the * Page "area);

Calculate structural indicators (for example, the distribution of wage funds and bonus funds by divisions - using additional processing of pivot tables, shares of the amount by column); etc.

The MS Office suite allows you to publish spreadsheet data, including pivot tables and charts in XTML format.

Microsoft Office Web Components supports working with published data in Internet Explorer, allowing further analysis (changes in the data structure of the pivot table, calculation of new summary totals).

8.3.2. Data Mining Tools (DM)

DM tools imply the extraction ("excavation", "extraction") of data and are aimed at identifying the relationship between information stored in digital databases of the enterprise, which the analyst can use to build models that quantify the degree of influence of factors of interest. In addition, such tools can be useful for building hypotheses about the possible nature of information relations in digital databases of an enterprise.

Text Mining (TM) technology is a set of tools that allows you to analyze large sets of information in search of trends, patterns and relationships that can help you make strategic decisions.

Image Mining (IM) technology contains tools for the recognition and classification of various visual images stored in the company's databases or obtained as a result of an online search from external information sources.

To solve the problems of processing and storing all data, the following approaches are used:

1) the creation of several backup systems or one distributed document management system that allow you to save data, but have slow access to the stored information at the request of the user;

2) construction of Internet systems that are highly flexible, but not adapted for the implementation of the search and storage of text documents;

3) the introduction of Internet portals that are well-targeted to user requests, but do not have descriptive information regarding the text data loaded into them.

Text processing systems free from the problems listed above can be divided into two categories: linguistic analysis systems and text data analysis systems.

The main elements of Text Mining technology are:

Summarization;

Feature extraction

Clustering

Classification

Answering questions (question answering);

Thematic indexing;

Search by keywords (keyword searching);

Creation and maintenance oftaxonomies and thesauri.

Software products that implement Text Mining technology include:

IBM Intelligent Miner for Text - a set of individual command line utilities, or skips; independent from each other (the main emphasis is on data mining mechanisms - information retrieval);

Oracle InterMedia Text - a set integrated into a DBMS that allows you to work most effectively with user requests (allows you to work with modern relational DBMS in the context of complex multipurpose search and analysis of text data);

Megaputer Text Analyst is a set of COM objects built into the program for solving Text Mining tasks.

8.3.3. Intelligent information technology

Today, in the field of control automation, information analysis dominates at the preliminary stage of preparing solutions - processing primary information, decomposing a problem situation, which allows one to learn only fragments and details of processes, and not the situation as a whole. To overcome this drawback, one must learn to build knowledge bases using the experience of the best specialists, as well as generate the missing knowledge.

The use of information technologies in various spheres of human activity, the exponential growth of information volumes and the need to respond quickly in any situations required the search for adequate ways to solve emerging problems. The most effective of them is the way of intellectualization of information technologies.

Under intelligent information technology(ITT) is usually understood as such information technology, which provides the following capabilities:

The presence of knowledge bases reflecting the experience of specific people, groups, societies, humanity as a whole, in solving creative problems in certain areas of activity, traditionally considered the prerogative of human intelligence (for example, such poorly formalized tasks as decision-making, design, meaning extraction, explanation, training, etc.);

The presence of thinking models based on knowledge bases: rules and logical conclusions, argumentation and reasoning, recognition and classification of situations, generalization and understanding, etc.;

Ability to form quite clear decisions based on fuzzy, loose, incomplete, underdetermined data;

The ability to explain conclusions and decisions, i.e. the presence of an explanation mechanism;

Ability to learn, retrain and therefore develop.

Technologies of informal search for hidden patterns in data and information Knowledge Discovery (KD) are based on the latest technologies for the formation and structuring of information images of objects, which is closest to the principles of information processing by intelligent systems.

Decision Support (DS) decision support information technology is an expert shell.

systems or specialized expert systems that enable analysts to determine the relationships and relationships between information structures in the bases of structured information of the enterprise, as well as to predict the possible results of decision-making.

IIT development trends. Communication and communication systems. Global information networks and IIT can radically change our understanding of companies and mental work itself. The presence of employees in the workplace will become almost unnecessary. People can work from home and interact with each other as needed through networks. Known, for example, is the successful experience of creating a new modification of the Boeing-747 aircraft by a distributed team of specialists interacting via the Internet. The location of the participants in any development will play an ever smaller role, but the importance of the level of qualifications of the participants will increase. Another reason that determined the rapid development of IIT is associated with the complication of communication systems and the tasks solved on their basis. It took a qualitatively new level of "intellectualization" of such software products as systems for analyzing heterogeneous and non-strict data, ensuring information security, making decisions in distributed systems, etc.

Education... Already today, distance learning is beginning to play an important role in education, and the introduction of IIT will significantly individualize this process in accordance with the needs and abilities of each student.

Everyday life... The informatization of everyday life has already begun, but with the development of IIT, fundamentally new opportunities will appear. Gradually, all new functions will be transferred to the computer: control over the user's health, control of household appliances such as humidifiers, air fresheners, heaters, ionizers, music centers, medical diagnostics, etc. In other words, systems will also become diagnosticians of the state of a person and his home. A comfortable information space will be provided in the premises, where the information environment will become a part of the human environment.

Prospects for the development of IIT... It seems that at present IIT have approached a fundamentally new stage in their development. So, over the past 10 years, the capabilities of IIT have significantly expanded due to the development of new types of logical models, the emergence of new

out theories and concepts. Key points in the development of IIT are:

Transition from logical inference to models of argumentation and reasoning;

Search for relevant knowledge and generate explanations;

Understanding and synthesis of texts;

Cognitive graphics, i.e. graphic and figurative presentation of knowledge;

Multi-agent systems;

Intelligent network models;

Calculations based on fuzzy logic, neural networks, genetic algorithms, probabilistic calculations (implemented in various combinations with each other and with expert systems);

The meta-knowledge problem.

Multi-agent systems have become a new paradigm for creating promising IITs. It is assumed here that an agent is an independent intellectual system that has its own system of goal-setting and motivation, its own area of ​​action and responsibility. Interaction between agents is provided by a higher level system - metaintelligence. In multi-agent systems, a virtual community of intelligent agents is modeled - objects that are autonomous, active, enter into various social relations - cooperation and cooperation (friendship), competition, competition, enmity, etc. The social aspect of solving modern problems is the fundamental feature of the conceptual novelty of advanced intellectual technologies - virtual organizations, virtual society.

(?) Control questions and tasks

1. Give a description of the enterprise as an object of informatization. What are the main indicators characterizing the development of the enterprise management system?

2. List the leading information technology management of industrial enterprises.

3. What are the main information technologies of organizational and strategic development of enterprises (corporations).

4. What are the foundations of the standards for strategic management aimed at improving business processes? What is the ratio of information technology BPM and BPI?

5. Define the philosophy of total quality management (TQM). How are the phases of development of quality and information technology related?

6. Name the main provisions of the organizational development of the enterprise, describe the stages of strategic management. What are the group strategies?

7. How is the business model of the enterprise created? What are the main approaches to assessing the effectiveness of a business model?

8. What is a balanced scorecard? What are the main components of the BSC? What are the interrelationships of the groups of BSC indicators?

9. List the methodological foundations for creating information systems. What is a systems approach?

10. What is an informational approach to the formation of information systems and technologies?

11. What is a strategic approach to the formation of information systems and technologies?

12. What is the content of the object-oriented approach to describing the behavior of agents in the market? Give the definition of the object, indicate the analogs of agent systems.

13. What are the methodological principles of improving enterprise management based on information and communication technologies? What is the purpose of ICT?

14. Give the definitions of a document, document flow, document flow, document management system.

15. How is the layout of the document form designed? Name the zones of the document, the composition of their details.

16. What are the basic information technologies of the document management system.

17. What is a unified documentation system? What are the general principles of unification?

18. Describe organizational and administrative documentation, provide examples of documents.

19. What are the requirements for an electronic document management system?

20. What is a corporate information system? What are the main control loops, the composition of functional modules.

21. Name the software products known to you for CIS. Give their comparative characteristics.

W Literature

1. Return J., Moriarty S. Marketing communications. An integrated approach. SPb .; Kharkov: Peter, 2001.

2. Brooking E. Intellectual capital. The key to success in the new millennium. SPb .: Peter, 2001.

3. Godin V.V., Korpev I.K. Information resource management. M .: INFRA-M, 1999.

4. Information systems and technologies in economics: Textbook. 2nd ed., Add. and revised / M.I. Semenov, I.T. Trubilin, V.I. Loiko, T.P. Baranovskaya; Ed. IN AND. Loiko. Moscow: Finance and Statistics, 2003.

5. Information technology in business / Ed. M. Zheleny. SPb .: Peter, 2002.

6. Kaplan Robert S., Norton David P. Balanced Scorecard. From strategy to action / Per. from English M .: CJSC "Olymp-Business", 2003.

7. Karagodin V.I., Karagodina BJI. Information as the basis of life. Dubna: Phoenix, 2000.

8. Karminsky AM., Nesterov PZ. Business informatization. Moscow: Finance and Statistics, 1997.

9. Likhacheva T.N. Information technologies at the service of the information society // New information technologies in economic systems. M., 1999.

10. Ostreykovsky V.A. Systems theory. M .: Higher school, 1997.

11. Piterkin S.V., Oladov N.A., Isaev D.V. Just in time for Russia. The practice of using ERP systems. 2nd ed. M .: Alpina Publisher, 2003.

12. Sokolov D.V. Introduction to the theory of social communication: Textbook. allowance. SPb .: Publishing house SP6GUP, 1996.

13. Trofimov V.Z., Tomilov V.Z. Information and communication technologies in management: Textbook. allowance. SPb .: Publishing house SPbGUEF, 2002.

For some time now, the modern level of development of hardware and software has made possible the widespread maintenance of databases of operational information at different levels of management. In the course of their activities, industrial enterprises, corporations, departmental structures, government bodies and administrations have accumulated large amounts of data. They store in themselves great potential for extracting useful analytical information, on the basis of which it is possible to identify hidden trends, build a development strategy, and find new solutions.

In recent years, a number of new concepts for storing and analyzing corporate data have taken shape in the world:

1) Data Warehouses

2) On-Line Analytical Processing (OLAP)

3) Data mining - IAD (Data Mining)

OLAP analytical data processing systems are decision support systems focused on fulfilling more complex queries that require statistical processing of historical data accumulated over a certain period of time. They serve to prepare business reports on sales, marketing for management purposes, the so-called Data Mining - data mining, i.e. a way of analyzing information in a database to find anomalies and trends without finding out the meaning of the records.

Analytical systems built on the basis of OLAP include information processing tools based on artificial intelligence methods and graphical data presentation tools. These systems are determined by a large volume of historical data, allowing to extract meaningful information from them, i.e. gain knowledge from data.

Efficiency of processing is achieved through the use of powerful multiprocessor technology, sophisticated analysis methods, and specialized data storages.

Relational databases store entities in separate tables, which are usually well normalized. This structure is convenient for operational databases (OLTP systems), but complex multi-table queries are relatively slow in it. A better model for queries rather than modification is a spatial database.

The OLAP system takes a snapshot of a relational database and structures it into a spatial model for queries. The claimed processing time for queries in OLAP is about 0.1% of similar queries in a relational database.

An OLAP structure created from operational data is called an OLAP cube. A cube is created from joining tables using a star schema. In the center of the "star" is a fact table that contains the key facts to be queried. Multiple dimension tables are joined to the fact table. These tables show how aggregated relational data can be analyzed. The number of possible aggregations is determined by the number of ways in which the original data can be hierarchically displayed.

The given classes of systems (OLAP and OLTP) are based on the use of a DBMS, but the types of queries are very different. The OLAP engine is one of the most popular data analysis methods today. There are two main approaches to solving this problem. The first of them is called Multidimensional OLAP (MOLAP) - the implementation of the mechanism using a multidimensional database on the server side, and the second Relational OLAP (ROLAP) - building cubes "on the fly" based on SQL queries to a relational DBMS. Each of these approaches has advantages and disadvantages. The general scheme of the desktop OLAP system can be represented in Fig.

The work algorithm is as follows:

1) obtaining data in the form of a flat table or the result of executing an SQL query;

2) caching data and converting it to a multidimensional cube;

3) displaying the constructed cube using a cross-table or diagram, etc.

In general, an arbitrary number of displays can be connected to one cube. The displays used in OLAP systems are most often of two types: crosstabs and charts.

Star diagram. Its idea is that there are tables for each dimension, and all facts are placed in one table, indexed by a multiple key made up of the keys of individual dimensions. Each ray of the star schema defines, in Codd's terminology, the direction of data consolidation along the corresponding dimension.

For complex problems with multilevel dimensions, it makes sense to turn to the star schema extensions - the fact constellation schema and the snowflake schema. In these cases, separate fact tables are created for possible combinations of summary levels of different dimensions. This allows for better performance, but often leads to data redundancy and significant complications in the structure of the database, which contains a huge number of fact tables.

constellation diagram

Analytical data processing - This is data analysis that requires appropriate methodological support and a certain level of training of specialists.

Modern information technologies make it possible to automate the processes of analyzing the accumulated primary information, build analytical models, obtain ready-made solutions and use them in practice. The main requirements , which are presented to the methods of analysis, are efficiency, simplicity, automatism. This concept underlies two modern technologies: Data Mining and Knowledge Discovery in Databases (KDD).

Data Mining - it is the process of discovering in raw data previously unknown, non-trivial, practically useful and accessible interpretation of knowledge necessary for making decisions in various spheres of human activity (definition by G. Pyatetsky-Shapiro, one of the founders of this direction).

Data Mining technology is aimed at finding non-obvious patterns. The stages of data analysis are:

  • 1) classification ( classification) - detection of features that characterize groups of objects of the studied dataset - classes. Solution methods used for the classification problem: nearest neighbor methods ( nearest neighbor) and ^ ’- the nearest neighbor ( k-nearest neighbor) -, Bayesian networks (Bayesian networks) -, induction of decision trees; neural networks (neural networks) -,
  • 2) clustering (clustering)- splitting objects into groups, since object classes are not initially defined. An example of a method for solving the clustering problem: self-organizing Kohonen maps - a neural network with unsupervised learning. An important feature of these maps is their ability to display multidimensional feature spaces on a plane, presenting data in the form of a two-dimensional map;
  • 3) association (associations)- identifying patterns between related events in the dataset. These patterns are revealed not on the basis of the properties of the analyzed object, but between several events that occur simultaneously, for example, the Apriori algorithm;
  • 4) sequence (sequence), or sequential association (sequential association),- search for temporal patterns between transactions, i.e. patterns are established not between simultaneously occurring events, but between events connected in time. Association is sequences with a time lag of zero. Sequencing rule: after the event X after a certain time, event Y will occur;
  • 5) forecasting (forecasting) - is built on the basis of features of historical data, i.e. there is an assessment of the omitted or future values ​​of the target numerical indicators. Methods of mathematical statistics, neural networks, etc are used to solve forecasting problems;
  • 6) determination of deviations or outliers (deviation detection), analysis of deviations or outliers - detecting and analyzing data that is most different from the general set of data;
  • 7) grading (estimation)- predicting continuous values ​​of a feature;
  • 8) link analysis (link analysis)- the task of finding dependencies in a dataset;
  • 9) visualization (visualization, graph mining)- creation of a graphic image of the analyzed data. Graphical methods are used to show the presence of patterns in the data, for example, the presentation of data in 2D and 3D dimensions;
  • 10) summing up ( summarization) - a description of specific groups of objects from the analyzed dataset.

KDD is the process of extracting useful knowledge from a collection of data. This technology includes the following issues: data preparation, selection of informative features, data cleansing, application of Data Mining (DM) methods, post-processing of data and interpretation of the results.

The Knowledge Discovery in Databases process consists of the following steps:

  • 1) problem statement - analysis of user tasks and features of the application area, selection of a set of input and output parameters;
  • 2) preparation of the initial dataset - creating a data warehouse and organizing a scheme for collecting and updating data;
  • 3) data preprocessing - based on the use of Data Mining methods, from the point of view of this method, the data must be of high quality and correct;
  • 4) transformation, normalization of data - bringing information to a form suitable for subsequent analysis;
  • 5) Data Mining - automatic data analysis based on the use of various algorithms for finding knowledge (neural networks, decision trees, clustering algorithms, establishing associations, etc.);
  • 6) post-processing of data - interpretation of the results and application of the knowledge gained in business applications.