Large volumes of big data. Big Data: analytics and solutions

Do you know this famous joke? Big Data is like sex before 18:

  • everyone thinks about it;
  • everyone talks about it;
  • everyone thinks their friends are doing it;
  • almost nobody does it;
  • whoever does it does it badly;
  • everyone thinks it will be better next time;
  • no one takes security measures;
  • anyone is ashamed to admit that he does not know something;
  • if someone succeeds, there is always a lot of noise from it.

But let's be honest, with any hype, the usual curiosity will always go along: what kind of fuss is there and is there something really important there? In short, yes, there is. Details are below. We have selected for you the most amazing and interesting applications of Big Data technologies. This small study of the market on clear examples confronts a simple fact: the future does not come, you do not need to "wait another n years and the magic will become a reality." No, it has already arrived, but it is still invisible to the eye, and therefore the burning of the singularity does not yet burn a certain point in the labor market so much. Go.

1 How Big Data technologies are applied where they originated

Large IT companies are where data science was born, so their inner workings in this area are the most interesting. A Google campaign, the birthplace of the Map Reduce paradigm, whose sole purpose is to educate its programmers in machine learning techniques. And therein lies their competitive advantage: after gaining new knowledge, employees will implement new methods in those Google projects where they constantly work. Imagine how huge the list of areas in which the campaign can make a revolution. One example: neural networks are used.

The corporation is implementing machine learning into all its products. Its advantage is the presence of a large ecosystem, which includes all digital devices used in everyday life. This allows Apple to reach an impossible level: the campaign has more user data than any other. At the same time, the privacy policy is very strict: the corporation has always boasted that it does not use customer data for advertising purposes. Accordingly, user information is encrypted so that Apple lawyers or even the FBI with a warrant cannot read it. By you will find great review Apple developments in the field of AI.

2 Big Data on 4 wheels

A modern car is an information store: it accumulates all the data about the driver, the environment, connected devices and about itself. Soon, one vehicle that is connected to a network like this one will generate up to 25 GB of data per hour.

Vehicle telematics has been used by automakers for many years, but now a more sophisticated data collection method is being lobbied that makes full use of Big Data. This means that technology can now alert the driver to bad road conditions by automatically activating the anti-lock braking and traction control system.

Other concerns, including BMW, are using Big Data technology, combined with insights from test prototypes, built-in "error memory" systems and customer complaints, to identify weaknesses in a model early in production. Now, instead of manually evaluating the data, which takes months, a state-of-the-art algorithm is applied. Errors and troubleshooting costs are reduced, allowing for faster data analysis workflows at BMW.

According to expert estimates, by 2019 the turnover of the market connected to a single network of cars will reach $130 billion. This is not surprising, given the pace of integration by automakers of technologies that are an integral part of the vehicle.

The use of Big Data helps to make the machine safer and more functional. So, Toyota by embedding Information Communication Modules (DCM) . This tool, used for Big Data, processes and analyzes the data collected by DCM in order to further benefit from it.

3 Application of big data in medicine


The implementation of Big Data technologies in the medical field allows doctors to more thoroughly study the disease and choose an effective course of treatment for a particular case. Thanks to the analysis of information, it becomes easier for health workers to predict relapses and take preventive measures. The result is a more accurate diagnosis and improved treatments.

The new technique made it possible to look at the problems of patients from a different angle, which led to the discovery of previously unknown sources of the problem. For example, some races are genetically more predisposed to heart disease than members of other ethnic groups. Now, when a patient complains of a certain disease, doctors take into account data about members of his race who complained about the same problem. The collection and analysis of data allows you to learn much more about patients: from food preferences and lifestyle to the genetic structure of DNA and metabolites of cells, tissues, organs. For example, the Center for Pediatric Genomic Medicine in Kansas City uses patients and analyzes mutations in the genetic code that cause cancer. An individual approach to each patient, taking into account his DNA, will raise the effectiveness of treatment to a qualitatively new level.

With the understanding of how Big Data is used, the first and very important change in the medical field follows. When a patient is undergoing treatment, a hospital or other healthcare facility can obtain a lot of valuable information about the person. The collected information is used to predict the recurrence of diseases with a certain degree of accuracy. For example, if a patient has had a stroke, doctors study information about the time of cerebrovascular accident, analyze the interim period between previous precedents (if any), paying special attention to stressful situations and heavy physical exertion in the patient's life. Based on this data, hospitals give the patient a clear plan of action to prevent the possibility of a stroke in the future.

Wearable devices also play a role, helping to identify health problems, even if a person does not have obvious symptoms of a particular disease. Instead of assessing the patient's condition through a long course of examinations, the doctor can draw conclusions based on the information collected by a fitness tracker or smart watch.

One of the latest examples is . While the patient was being examined for a new seizure caused by a missed medication, doctors discovered that the man had a much more serious health problem. The problem turned out to be atrial fibrillation. The diagnosis was made due to the fact that the staff of the department got access to the patient's phone, namely to the application associated with his fitness tracker. The data from the application turned out to be a key factor in determining the diagnosis, because at the time of the examination, no cardiac abnormalities were found in the man.

This is just one of the few cases that shows why use big data in the medical field today plays such a significant role.

4 Data analytics is already at the core of retail

Understanding user queries and targeting is one of the largest and most widely publicized areas of application of Big Data tools. Big Data helps analyze customer habits in order to better understand consumer needs in the future. Companies are looking to expand the traditional data set with social media information and browser search history in order to form the most complete customer picture possible. Sometimes large organizations choose to create their own predictive model as a global goal.

For example, the Target chain of stores, using deep data analysis and its own forecasting system, can determine with high accuracy -. Each client is assigned an ID, which in turn is tied to a credit card, name or email. The identifier serves as a kind of shopping cart, where information is stored about everything that a person has ever purchased. Network specialists found that women in the position actively purchase unflavored products before the second trimester of pregnancy, and during the first 20 weeks lean on calcium, zinc and magnesium supplements. Based on the received data, Target sends coupons for children's products to customers. The discounts on goods for children themselves are “diluted” with coupons for other products so that offers to buy a crib or diapers do not look too intrusive.

Even government departments have found a way to use Big Data technologies to optimize election campaigns. Some believe that B. Obama's victory in the US presidential election in 2012 is due to the excellent work of his team of analysts, who processed huge amounts of data in the right way.

5 Big Data on guard of law and order


Over the past few years, law enforcement agencies have figured out how and when to use Big Data. It is a well-known fact that the National Security Agency uses Big Data technologies to prevent terrorist attacks. Other departments are using progressive methodology to prevent smaller crimes.

The Los Angeles Police Department uses . It does what is commonly referred to as proactive law enforcement. Using crime reports for certain period time, the algorithm determines the areas where the probability of committing offenses is greatest. The system marks such areas on the city map with small red squares and this data is immediately transmitted to patrol cars.

Chicago Cops use big data technologies in a slightly different way. Law enforcement officers from the Windy City also have it, but it is aimed at outlining a "circle of risk" consisting of people who may be a victim or participant in an armed attack. According to The New York Times, this algorithm assigns a vulnerability score to a person based on their criminal history (arrests and participation in shootings, belonging to criminal gangs). The developer of the system says that while the system studies the criminal past of the individual, it does not take into account secondary factors such as race, gender, ethnicity and location of the person.

6 How Big Data technologies help cities develop


Veniam CEO João Barros demonstrates a tracking map of Wi-Fi routers on buses in the city of Porto

Data analysis is also used to improve a number of aspects of the life of cities and countries. For example, knowing exactly how and when to use Big Data technologies can optimize transport flows. For this, the online movement of cars is taken into account, social media and meteorological data are analyzed. Today, a number of cities have taken the lead in using data analytics to connect transport infrastructure with other modes of transport. utilities into a whole. This is the concept of a smart city, where buses wait for a late train, and traffic lights are able to predict traffic congestion to minimize traffic jams.

Based on Big Data technologies, the city of Long Beach operates "smart" water meters that are used to curb illegal watering. Previously, they were used to reduce water consumption by private households (the maximum result is a reduction of 80%). Saving fresh water is always a topical issue. Especially when the state is experiencing the worst drought ever recorded.

Representatives of the Department of Transportation of the City of Los Angeles joined the list of those who use Big Data. Based on the data received from traffic camera sensors, the authorities control the operation of traffic lights, which in turn allows traffic to be regulated. The computerized system controls about 4,500,000 traffic lights throughout the city. According to official data, the new algorithm helped reduce congestion by 16%.

7 Engine of progress in marketing and sales


In marketing, Big Data tools allow you to identify which ideas are the most effective to promote at a particular stage of the sales cycle. Data analysis determines how investments can improve customer relationship management, which strategy should be chosen to increase conversion rates, and how to optimize the customer lifecycle. In the cloud business, Big Data algorithms are used to figure out how to minimize the cost of customer acquisition and increase customer lifecycle.

Differentiation of pricing strategies depending on the intra-system level of the client is, perhaps, the main thing for which Big Data is used in the field of marketing. McKinsey found that about 75% of the average firm's revenue comes from basic products, 30% of which are incorrectly priced. A 1% price increase results in an 8.7% increase in operating profit.

The Forrester research team was able to determine that data analytics allows marketers to focus on how to make customer relationships more successful. By exploring the direction of customer development, specialists can assess the level of their loyalty, as well as extend the life cycle in the context of a particular company.

The optimization of sales strategies and the stages of entering new markets using geoanalytics are reflected in the biopharmaceutical industry. According to McKinsey, drug companies spend an average of 20 to 30% of their profits on administration and sales. If enterprises become more active use big data to identify the most cost-effective and fastest growing markets, costs will be cut immediately.

Data analytics is a means for companies to get a complete picture of key aspects of their business. Increasing revenues, reducing costs and reducing working capital are the three tasks that modern business tries to solve with the help of analytical tools.

Finally, 58% of CMOs say that the implementation of Big Data technologies can be traced in search engine optimization(SEO), e-mail and mobile marketing, where data analysis plays the most significant role in the formation of marketing programs. And only 4% fewer respondents are confident that Big Data will play a significant role in all marketing strategies for many years to come.

8 Global data analysis

No less curious is that. It is possible that machine learning will ultimately be the only force capable of maintaining a delicate balance. The topic of human influence on global warming still causes a lot of controversy, so only reliable predictive models based on the analysis of large amounts of data can give an accurate answer. Ultimately, reducing emissions will help us all: we will spend less on energy.

Now Big Data is not an abstract concept, which, perhaps, will find its application in a couple of years. This is a fully working set of technologies that can be useful in almost all areas of human activity: from medicine and public order to marketing and sales. The stage of active integration of Big Data into our daily lives has just begun, and who knows what the role of Big Data will be in a few years?

Big data is a broad term for the innovative strategies and technologies required to collect, organize and process information from large datasets. Although the problem of dealing with data that exceeds the computing power or storage capacity of a single computer is not new, the scale and value of this type of computing has expanded significantly in recent years.

In this article, you will find the main concepts that you may come across when exploring big data. It also discusses some of the processes and technologies currently in use in this area.

What is big data?

A precise definition of "big data" is difficult to define because projects, vendors, practitioners, and business professionals use it in very different ways. With this in mind, big data can be defined as:

  • Large datasets.
  • A category of computational strategies and technologies that are used to process large datasets.

In this context, "large data set" means a data set that is too large to be processed or stored using traditional tools or on a single computer. This means that the overall scale of large datasets is constantly changing and can vary significantly from case to case.

Big data systems

The basic requirements for working with big data are the same as for any other dataset. However, the massive scale, processing speed, and data characteristics that are encountered at each step of the process present serious new challenges in tool development. The goal of most big data systems is to understand and communicate with large amounts of heterogeneous data in a way that would not be possible using conventional methods.

In 2001, Gartner's Doug Laney introduced the "three Vs of big data" to describe some of the characteristics that make big data processing different from other types of data processing:

  1. Volume (data volume).
  2. Velocity (speed of data accumulation and processing).
  3. Variety (variety of types of processed data).

Data volume

The exceptional scale of the information being processed helps define big data systems. These datasets can be orders of magnitude larger than traditional datasets, requiring more attention at every stage of processing and storage.

Because the requirements exceed the capacity of a single computer, the problem often arises of pooling, distributing, and coordinating resources from groups of computers. Cluster management and algorithms capable of breaking down tasks into smaller parts are becoming increasingly important in this area.

Accumulation and processing speed

The second characteristic that significantly distinguishes big data from other data systems is the speed at which information moves through the system. Data often enters the system from multiple sources and must be processed in real time to update the current state of the system.

This emphasis on instantaneous feedback forced many practitioners to abandon the batch-oriented approach and prefer a real-time streaming system. Data is constantly being added, processed and analyzed to keep up with the influx of new information and get valuable data at an early stage when it is most relevant. This requires robust systems with highly available components to protect against failures along the data pipeline.

Variety of types of processed data

Big data has many unique challenges related to the wide range of sources processed and their relative quality.

Data can come from internal systems such as application and server logs, from social media channels and other external APIs, from sensors physical devices and from other sources. The goal of big data systems is to process potentially useful data, regardless of origin, by combining all information into a single system.

Media formats and types can also vary considerably. Media files (images, video, and audio) are combined with text files, structured logs, etc. More traditional data processing systems expect data to enter the pipeline already labeled, formatted, and organized, but big data systems typically receive and store data, trying to keep their original state. Ideally, any transformations or modifications to the raw data will happen in memory during processing.

Other characteristics

Over time, individuals and organizations have proposed expanding the original "three Vs", although these innovations tend to describe problems rather than characteristics of big data.

  • Veracity: The variety of sources and the complexity of processing can lead to problems in assessing the quality of the data (and therefore the quality of the resulting analysis).
  • Variability (data variability): changing the data leads to wide changes in quality. Identifying, processing, or filtering low quality data may require additional resources to improve the quality of the data.
  • Value: The end goal of big data is value. Sometimes systems and processes are very complex, making it difficult to use data and extract actual values.

Big data life cycle

So how is big data actually handled? There are several different implementation approaches, but there are commonalities between strategies and software.

  • Entering data into the system
  • Saving data to storage
  • Data calculation and analysis
  • Visualization of results

Before looking at these four categories of workflows in detail, let's talk about cluster computing, an important strategy used by many big data processing tools. Setting up a compute cluster is the backbone of the technology used at every stage of the life cycle.

Cluster Computing

Due to the quality of big data individual computers not suitable for data processing. Clusters are more suitable for this, as they can handle the storage and computing needs of big data.

Big data clustering software pools the resources of many small machines, aiming to provide a number of benefits:

  • Pooling Resources: Processing large data sets requires a large amount of CPU and memory resources, as well as a lot of available storage space.
  • High availability: Clusters can provide varying levels of resiliency and availability so that data access and processing is not impacted by hardware or software failures. This is especially important for real-time analytics.
  • Scalability: Clusters support fast horizontal scaling (adding new machines to the cluster).

To work in a cluster, you need tools to manage cluster membership, coordinate resource allocation, and plan work with individual nodes. Cluster membership and resource allocation can be handled with programs like Hadoop YARN (Yet Another Resource Negotiator) or Apache Mesos.

A prefabricated computing cluster often acts as a base with which another interacts to process data. software. The machines participating in a compute cluster are also typically associated with distributed storage management.

Getting data

Data ingestion is the process of adding raw data to the system. The complexity of this operation largely depends on the format and quality of the data sources and on how the data meets the requirements for processing.

You can add big data to the system using special tools. Technologies such as Apache Sqoop can take existing data from relational databases and add it to a big data system. You can also use Apache Flume and Apache Chukwa - projects designed to aggregate and import application and server logs. Message brokers such as Apache Kafka can be used as an interface between various data generators and a big data system. Frameworks like Gobblin can combine and optimize the output of all tools at the end of the pipeline.

During data ingestion, analysis, sorting and labeling are usually carried out. This process is sometimes referred to as ETL (extract, transform, load), which means extract, transform, and load. While the term usually refers to legacy storage processes, it is sometimes applied to big data systems as well. typical operations include modifying incoming data for formatting, categorizing and labeling, filtering or validating data.

Ideally, incoming data undergoes minimal formatting.

Data storage

Once received, the data passes to the components that manage the storage.

Typically, distributed file systems are used to store raw data. Solutions such as Apache Hadoop's HDFS allow you to write large amounts of data to multiple nodes in a cluster. This system provides compute resources with access to data, can load data into cluster RAM for memory operations, and handle component failures. Other distributed file systems can be used instead of HDFS, including Ceph and GlusterFS.

Data can also be imported into other distributed systems for more structured access. Distributed databases, especially NoSQL databases, are well suited for this role because they can handle heterogeneous data. There are many different types of distributed databases, depending on how you want to organize and present data.

Data calculation and analysis

Once the data is available, the system can begin processing. The computational level is perhaps the freest part of the system, since the requirements and approaches here can differ significantly depending on the type of information. Data is often reprocessed, either with a single tool or with a range of tools to process different types of data.

Batch processing is one of the calculation methods in large datasets. This process includes breaking down the data into smaller pieces, scheduling each piece to be processed on a separate machine, rearranging the data based on intermediate results, and then calculating and collecting the final result. This strategy uses MapReduce from Apache Hadoop. Batch processing is most useful when working with very large datasets that require quite a bit of computation.

Other workloads require real-time processing. At the same time, information must be processed and prepared immediately, and the system must respond in a timely manner as new information becomes available. One way to implement real-time processing is to process a continuous stream of data consisting of individual elements. Another common characteristic of real-time processors is that they compute data in the cluster's memory, which avoids the need to write to disk.

Apache Storm, Apache Flink and Apache Spark offer various ways real-time processing implementations. These flexible technologies allow you to choose the best approach for each separate issue. In general, real-time processing is best suited for analyzing small pieces of data that are changing or being added to the system quickly.

All of these programs are frameworks. However, there are many other ways to compute or analyze data in a big data system. These tools often plug into the above frameworks and provide additional interfaces for interacting with the underlying layers. For example, Apache Hive provides a data warehouse interface for Hadoop, Apache Pig provides a query interface, and interactions with SQL data provided with Apache Drill, Apache Impala, Apache Spark SQL and Presto. Machine learning uses Apache SystemML, Apache Mahout, and MLlib from Apache Spark. For direct analytical programming, which is widely supported by the data ecosystem, R and Python are used.

Visualization of results

Often, recognizing trends or changes in data over time is more important than the values ​​obtained. Data visualization is one of the most useful ways to identify trends and organize large numbers of data points.

Real-time processing is used to visualize application and server metrics. Data changes frequently, and large variances in metrics usually indicate a significant impact on the health of systems or organizations. Projects like Prometheus can be used to process data streams and time series and visualize this information.

One popular way to visualize data is the Elastic stack, formerly known as the ELK stack. Logstash is used for data collection, Elasticsearch for data indexing, and Kibana for visualization. The Elastic stack can work with big data, visualize the results of calculations, or interact with raw metrics. A similar stack can be obtained by merging Apache Solr to index a fork of Kibana called Banana for visualization. Such a stack is called Silk.

Documents are another visualization technology for interactive data work. These projects enable interactive exploration and visualization of data in a format that is easy to share and present. Popular examples of this type of interface are Jupyter Notebook and Apache Zeppelin.

Glossary of big data

  • Big data is a broad term for datasets that cannot be correctly processed. conventional computers or tools because of their volume, speed of delivery and variety. The term is also commonly applied to technologies and strategies for dealing with such data.
  • Batch processing is a computational strategy that involves processing data in large sets. This method is usually ideal for dealing with non-urgent data.
  • Clustered computing is the practice of pooling the resources of multiple machines and managing them common opportunities to complete tasks. This requires a cluster management layer that handles communication between individual nodes.
  • A data lake is a large repository of collected data in a relatively raw state. The term is often used to refer to unstructured and frequently changing big data.
  • Data mining is a broad term for the various practices of finding patterns in large datasets. It is an attempt to organize a mass of data into a more understandable and coherent set of information.
  • A data warehouse is a large, organized repository for analysis and reporting. Unlike a data lake, a warehouse consists of formatted and well-organized data that is integrated with other sources. Data warehouses are often referred to in relation to big data, but they are often components of conventional data processing systems.
  • ETL (extract, transform, and load) - extracting, transforming, and loading data. This is how the process of obtaining and preparing raw data for use looks like. It is associated with data warehouses, but the characteristics of this process are also found in the pipelines of big data systems.
  • Hadoop is an open source Apache project for big data. It consists of a distributed file system called HDFS and a cluster and resource scheduler called YARN. Batch processing capabilities are provided by the MapReduce calculation engine. With MapReduce, modern Hadoop deployments can run other compute and analytics systems.
  • In-memory compute is a strategy that involves moving the working datasets entirely into the cluster's memory. Intermediate calculations are not written to disk, instead they are stored in memory. This gives systems a huge speed advantage over I/O-bound systems.
  • Machine learning is the study and practice of designing systems that can learn, tune, and improve based on the data they are fed. Usually, this means the implementation of predictive and statistical algorithms.
  • Map reduce (not to be confused with Hadoop's MapReduce) is a computing cluster scheduling algorithm. The process includes dividing the task between nodes and getting intermediate results, shuffling and then outputting a single value for each set.
  • NoSQL is a broad term for databases designed outside of the traditional relational model. NoSQL databases are well suited for big data due to their flexibility and distributed architecture.
  • Streaming is the practice of calculating individual items of data as they move through the system. This allows real-time data analysis and is suitable for processing time-critical transactions using high-speed metrics.
Tags: ,

It was predicted that the total global volume of data created and replicated in 2011 could be about 1.8 zettabytes (1.8 trillion gigabytes) - about 9 times more than what was created in 2006.

More complex definition

Nevertheless` big data` involve more than just analyzing vast amounts of information. The problem is not that organizations create huge amounts of data, but that most of it is presented in a format that does not fit well with the traditional structured database format - these are web logs, videos, text documents, machine code or, for example, geospatial data. All this is stored in many different repositories, sometimes even outside the organization. As a result, corporations can have access to a huge amount of their data and not have necessary tools to establish relationships between these data and draw meaningful conclusions from them. Add to this the fact that the data is now being updated more and more often, and you get a situation in which traditional methods analysis of information cannot keep up with huge volumes of constantly updated data, which ultimately paves the way for technology big data.

Best Definition

In essence, the concept big data involves working with information of a huge volume and diverse composition, very often updated and located in different sources in order to increase work efficiency, create new products and increase competitiveness. The consulting company Forrester puts it succinctly: ` big data bring together techniques and technologies that extract meaning from data at the extreme limit of practicality`.

How big is the difference between business intelligence and big data?

Craig Bathy, Chief Marketing Officer and Chief Technology Officer of Fujitsu Australia, pointed out that business analysis is a descriptive process of analyzing the results achieved by a business in a given period of time, while the processing speed big data allows you to make the analysis predictive, able to offer business recommendations for the future. Big data technologies also allow you to analyze more types of data than business intelligence tools, which makes it possible to focus not only on structured storage.

Matt Slocum from O "Reilly Radar believes that although big data and business intelligence have the same goal (finding answers to a question), they differ from each other in three aspects.

  • Big data is designed to process larger amounts of information than business intelligence, and this, of course, fits the traditional definition of big data.
  • Big data is designed to process faster and more rapidly changing information, which means deep exploration and interactivity. In some cases, the results are generated faster than the web page loads.
  • Big data is designed to handle unstructured data that we are only just beginning to explore how to use it after we have been able to collect and store it, and we need algorithms and dialogue to make it easier to find the trends contained within these arrays.

According to the Oracle Information Architecture: An Architect's Guide to Big Data white paper published by Oracle, we approach information differently when working with big data than when doing business analysis.

Working with big data is not like a typical business intelligence process, where simply adding together known values ​​yields results: for example, adding bills paid together becomes sales for a year. When working with big data, the result is obtained in the process of cleaning them through sequential modeling: first, a hypothesis is put forward, a statistical, visual or semantic model is built, on its basis the correctness of the hypothesis put forward is checked, and then the next one is put forward. This process requires the researcher to either interpret visual meanings or make interactive knowledge-based queries, or develop adaptive `machine learning` algorithms capable of producing the desired result. Moreover, the lifetime of such an algorithm can be quite short.

Big Data Analysis Techniques

There are many different methods for analyzing data arrays, which are based on tools borrowed from statistics and computer science (for example, machine learning). The list does not claim to be complete, but it reflects the most popular approaches in various industries. At the same time, it should be understood that researchers continue to work on the creation of new methods and the improvement of existing ones. In addition, some of the techniques listed are not necessarily applicable exclusively to large data and can be successfully used for smaller arrays (for example, A / B testing, regression analysis). Of course, the more voluminous and diversifiable the array is analyzed, the more accurate and relevant data can be obtained at the output.

A/B testing. A technique in which a control sample is compared with others in turn. Thus, it is possible to identify the optimal combination of indicators to achieve, for example, the best consumer response to a marketing offer. big data allow to carry out a huge number of iterations and thus obtain a statistically significant result.

association rule learning. A set of techniques for identifying relationships, i.e. association rules between variables in large data arrays. Used in data mining.

classification. A set of techniques that allows you to predict consumer behavior in a particular market segment (purchase decisions, churn, consumption volume, etc.). Used in data mining.

cluster analysis. A statistical method for classifying objects into groups by identifying common features that are not known in advance. Used in data mining.

Crowdsourcing. A technique for collecting data from a large number of sources.

Data fusion and data integration. A set of techniques that allows you to analyze the comments of social network users and compare them with real-time sales results.

data mining. A set of techniques that allows you to determine the most susceptible categories of consumers for the promoted product or service, identify the characteristics of the most successful employees, and predict the behavioral model of consumers.

Ensemble learning. This method uses a lot of predictive models, which improves the quality of the predictions made.

Genetic algorithms. In this technique, possible solutions are represented as `chromosomes` that can combine and mutate. As in the process of natural evolution, the fittest individual survives.

machine learning. A direction in computer science (historically, the name `artificial intelligence` has been assigned to it), which aims to create self-learning algorithms based on the analysis of empirical data.

natural language processing (NLP). A set of natural language recognition techniques borrowed from computer science and linguistics.

network analysis. A set of techniques for analyzing links between nodes in networks. With regard to social networks, it allows you to analyze the relationship between individual users, companies, communities, etc.

Optimization. A set of numerical methods for redesigning complex systems and processes to improve one or more indicators. Helps in making strategic decisions, for example, the composition of the product line introduced to the market, conducting investment analysis, etc.

pattern recognition. A set of techniques with elements of self-learning for predicting the behavioral model of consumers.

predictive modeling. A set of techniques that allow you to create mathematical model a predetermined probable scenario for the development of events. For example, the analysis of the CRM-system database for possible conditions that will push subscribers to change providers.

regression. A set of statistical methods for identifying patterns between changes in a dependent variable and one or more independent variables. Often used for forecasting and predictions. Used in data mining.

sentiment analysis. The techniques for assessing consumer sentiment are based on human natural language recognition technologies. They allow you to isolate messages related to the subject of interest (for example, a consumer product) from the general information flow. Next, evaluate the polarity of the judgment (positive or negative), the degree of emotionality, and so on.

signal processing. A set of techniques borrowed from radio engineering, which aims to recognize a signal against a background of noise and its further analysis.

Spatial analysis. A set of techniques for analyzing spatial data, partly borrowed from statistics - topology of the area, geographical coordinates, object geometry. source big data in this case geographic information systems (GIS) often act.

  • Revolution Analytics (based on the R language for mathematical statistics).

Of particular interest on this list is Apache Hadoop, an open source software that has been tested as a data analyzer by most stock trackers over the past five years. As soon as Yahoo opened up the Hadoop code to the open source community, a whole new trend in the IT industry quickly emerged to create products based on Hadoop. Almost all modern analysis tools big data provide integration with Hadoop. Their developers are both startups and well-known global companies.

Markets for Big Data Management Solutions

Big Data Platforms (BDP, Big Data Platform) as a means of combating digital hording

Ability to analyze big data, colloquially called Big Data, is perceived as a boon, and unambiguously. But is it really so? What can the unbridled accumulation of data lead to? Most likely to the fact that domestic psychologists in relation to a person call pathological hoarding, syllogomania, or figuratively "Plyushkin's syndrome." In English, the vicious passion to collect everything is called hording (from the English hoard - “reserve”). According to the classification of mental illness, hording is classified as a mental disorder. In the digital age, digital (Digital Hoarding) is added to the traditional material chording, both individuals and entire enterprises and organizations () can suffer from it.

World and Russian market

Big data landscape - Main providers

Interest in collection, processing, management and analysis tools big data showed almost all the leading IT companies, which is quite natural. Firstly, they directly experience this phenomenon in their own business, and secondly, big data open up excellent opportunities for developing new market niches and attracting new customers.

A lot of startups have appeared on the market that do business on processing huge amounts of data. Some of them use ready-made cloud infrastructure provided by large players like Amazon.

Theory and practice of Big Data in industries

History of development

2017

TmaxSoft forecast: the next "wave" of Big Data will require DBMS modernization

Businesses know that the vast amounts of data they accumulate contains important information about their business and clients. If the company can successfully apply this information, then it will have a significant advantage over its competitors, and it will be able to offer better products and services than theirs. However, many organizations still cannot effectively use big data due to the fact that their legacy IT infrastructure is unable to provide the necessary storage capacity, the data exchange processes, utilities and applications necessary to process and analyze large arrays of unstructured data to extract valuable information from them, TmaxSoft indicated.

In addition, increasing the processing power needed to analyze ever-increasing volumes of data can require significant investment in an organization's legacy IT infrastructure, as well as additional maintenance resources that could be used to develop new applications and services.

On February 5, 2015, the White House released a report discussing how companies are using " big data to set different prices for different buyers - a practice known as "price discrimination" or "differential pricing" (personalized pricing). The report describes the benefits of "big data" for both sellers and buyers, and concludes that many of the issues raised by the advent of big data and differential pricing can be addressed within existing anti-discrimination laws and regulations. protecting the rights of consumers.

The report notes that at this time, there is only anecdotal evidence of how companies are using big data in the context of individualized marketing and differentiated pricing. This information shows that sellers use pricing methods that can be divided into three categories:

  • studying the demand curve;
  • Steering and differentiated pricing based on demographics; And
  • target behavioral marketing (behavioral targeting - behavioral targeting) and individualized pricing.

Studying the demand curve: In order to understand demand and study consumer behavior, marketers often conduct experiments in this area, during which customers are randomly assigned one of two possible price categories. “Technically, these experiments are a form of differential pricing because they result in different prices for customers, even if they are “non-discriminatory” in the sense that all customers have the same chance of “hitting” the higher price.”

Steering: This is the practice of presenting products to consumers based on their belonging to a certain demographic group. So, the website of a computer company may offer the same laptop different types buyers at different prices based on the information they have provided about themselves (for example, depending on whether given user representative of government agencies, scientific or commercial institutions, or an individual) or their geographical location (for example, determined by the IP address of a computer).

Targeted Behavioral Marketing and Customized Pricing: In these cases, personal data of buyers is used for targeted advertising and individualized pricing of certain products. For example, online advertisers use collected advertising networks and through third-party cookies, data on user activity on the Internet in order to target their advertising materials. This approach, on the one hand, allows consumers to receive advertisements of goods and services of interest to them, but it may cause concern for those consumers who do not want certain types of their personal data (such as information about visiting websites linked to with medical and financial matters) met without their consent.

Although targeted behavioral marketing is widespread, there is relatively little evidence of individualized pricing in the online environment. The report speculates that this may be because methods are still being developed, or because companies are reluctant to adopt (or prefer to keep quiet about) individual pricing, possibly fearing a backlash from consumers.

The authors of the report believe that "for the individual consumer, the use of big data is undoubtedly associated with both potential returns and risks." While acknowledging that there are issues of transparency and discrimination when using big data, the report argues that existing anti-discrimination and consumer protection laws are sufficient to address them. However, the report also highlights the need for “continuous monitoring” where companies use confidential information in a non-transparent way, or in ways that are not covered by the existing regulatory framework.

This report is a continuation of the White House's efforts to study the use of "big data" and discriminatory pricing on the Internet, and the resulting consequences for American consumers. Previously, it was reported that working group The White House on Big Data released its report on the subject in May 2014. The Federal Trade Commission (FTC) also considered these issues during its September 2014 workshop on discrimination in relation to the use of big data.

2014

Gartner demystifies Big Data

A fall 2014 policy brief from Gartner lists and debunks a number of common Big Data myths among CIOs.

  • Everyone implements Big Data processing systems faster than us

Interest in Big Data technologies is at an all-time high, with 73% of organizations polled by Gartner analysts this year already investing in or planning to do so. But most of these initiatives are still in their very early stages, and only 13% of those surveyed have already implemented such solutions. The hardest part is figuring out how to monetize Big Data, deciding where to start. Many organizations get stuck in the pilot phase because they can't tie new technology to specific business processes.

  • We have so much data that there is no need to worry about small errors in it.

Some CIOs believe that small flaws in the data do not affect the overall results of analyzing huge volumes. When there is a lot of data, each error separately really affects the result less, analysts say, but the errors themselves become larger. In addition, most of the analyzed data is external, of unknown structure or origin, so the probability of errors increases. Thus, in the world of Big Data, quality is actually much more important.

  • Big Data technologies will eliminate the need for data integration

Big Data promises the ability to process data in its original format with automatic schema generation as it is read. It is believed that this will allow the analysis of information from the same sources using multiple data models. Many believe that this will also enable end users to interpret any set of data in their own way. In reality, most users often want the traditional out-of-the-box schema where the data is formatted appropriately and there is agreement on the level of information integrity and how it should relate to the use case.

  • Data warehouses do not make sense to use for complex analytics

Many information management system administrators feel that it makes no sense to spend time creating a data warehouse, given that complex analytical systems use new types of data. In fact, many sophisticated analytics systems use information from a data warehouse. In other cases, new data types need to be additionally prepared for analysis in Big Data processing systems; decisions have to be made about the suitability of the data, the principles of aggregation, and the required level of quality - such preparation can take place outside the warehouse.

  • Data warehouses will be replaced by data lakes

In reality, vendors mislead customers by positioning data lakes as a replacement for storage or as critical elements of an analytical infrastructure. The underlying technologies of data lakes lack the maturity and breadth of functionality found in data warehouses. Therefore, leaders responsible for managing data should wait until the lakes reach the same level of development, according to Gartner.

Accenture: 92% of those who implemented big data systems are satisfied with the result

Among the main advantages of big data, respondents named:

  • "search for new sources of income" (56%),
  • "improving customer experience" (51%),
  • "new products and services" (50%) and
  • "an influx of new customers and maintaining the loyalty of old ones" (47%).

When introducing new technologies, many companies have faced traditional problems. For 51%, the stumbling block was security, for 47% - the budget, for 41% - the lack of necessary personnel, and for 35% - difficulties in integrating with existing system. Almost all surveyed companies (about 91%) plan to soon solve the problem with a shortage of staff and hire big data specialists.

Companies are optimistic about the future of big data technologies. 89% believe they will change business as much as the internet. 79% of respondents noted that companies that do not deal with big data will lose their competitive advantage.

However, the respondents disagreed on what exactly should be considered big data. 65% of respondents believe that these are “large data files”, 60% are sure that this is “advanced analytics and analysis”, and 50% that this is “data visualization tools”.

Madrid spends 14.7 million euros on big data management

In July 2014, it became known that Madrid would use big data technologies to manage urban infrastructure. The cost of the project is 14.7 million euros, and the solutions to be implemented will be based on technologies for analyzing and managing big data. With their help City Administration will manage the work with each service provider and pay accordingly depending on the level of services.

We are talking about contractors of the administration who monitor the condition of the streets, lighting, irrigation, green spaces, clean up the territory and remove, as well as process garbage. In the course of the project, 300 key performance indicators of city services have been developed for specially assigned inspectors, on the basis of which 1.5 thousand various checks and measurements will be carried out daily. In addition, the city will start using an innovative technological platform called Madrid iNTeligente (MiNT) - Smarter Madrid.

2013

Experts: The peak of fashion for Big Data

Without exception, all vendors in the data management market are currently developing technologies for Big Data management. This new technological trend is also actively discussed by the professional community, both developers and industry analysts and potential consumers of such solutions.

As Datashift found out, as of January 2013, the wave of discussion around " big data"exceeded all conceivable dimensions. After analyzing the number of mentions of Big Data in social networks, Datashift calculated that in 2012 this term was used about 2 billion times in posts created by about 1 million different authors around the world. This is equivalent to 260 posts per hour, with a peak of 3070 mentions per hour.

Gartner: Every second CIO is ready to spend money on Big data

After several years of experiments with Big data technologies and the first implementations in 2013, the adaptation of such solutions will increase significantly, Gartner predicts. The researchers surveyed IT leaders around the world and found that 42% of those surveyed have already invested in Big data technologies or are planning to make such investments over the next year (data as of March 2013).

Companies are forced to spend money on processing technologies big data As the information landscape is rapidly changing, I require new approaches to information processing. Many companies have already realized that big data is critical, and working with it allows you to achieve benefits that are not available using traditional sources of information and methods of processing it. In addition, the constant exaggeration of the topic of "big data" in the media fuels interest in relevant technologies.

Frank Buytendijk, vice president of Gartner, even urged companies to moderate their enthusiasm as some are worried that they are lagging behind competitors in the development of big data.

“There is no need to worry, the possibilities for realizing ideas based on big data technologies are virtually limitless,” he said.

Gartner predicts that by 2015, 20% of the Global 1000 companies will have a strategic focus on "information infrastructure."

In anticipation of the new opportunities that big data processing technologies will bring, many organizations are already organizing the process of collecting and storing various kinds of information.

For educational and government organizations, as well as companies in the industry, the greatest potential for business transformation lies in the combination of accumulated data with the so-called dark data (literally - “dark data”), the latter includes messages Email, multimedia and other similar content. According to Gartner, those who learn how to deal with a wide variety of information sources will win the data race.

Poll Cisco: Big Data will help increase IT budgets

The Cisco Connected World Technology Report (Spring 2013) conducted in 18 countries by independent analyst firm InsightExpress surveyed 1,800 college students and an equal number of young professionals aged 18 to 30. The survey was conducted to find out the level of readiness of IT departments for the implementation of projects big data and gain an understanding of the associated challenges, technological flaws, and strategic value of such projects.

Most companies collect, record and analyze data. However, according to the report, many companies face a range of complex business and information technology challenges in connection with Big Data. For example, 60 percent of those surveyed acknowledge that Big Data solutions can improve decision-making processes and increase competitiveness, but only 28 percent said that they are already getting real strategic benefits from the accumulated information.

More than half of the CIOs surveyed believe that Big Data projects will help increase IT budgets in their organizations, as there will be increased demands on technology, staff and professional skills. At the same time more than a half of respondents expect that such projects will increase IT budgets in their companies already in 2012. 57 percent are confident that Big Data will increase their budgets over the next three years.

81 percent of respondents said that all (or at least some) Big Data projects will require the use of cloud computing. So the spread cloud technologies may affect the speed of distribution of Big Data solutions and the value of these solutions for business.

Companies collect and use data from the most different types both structured and unstructured. Here are the sources from which survey participants receive data (Cisco Connected World Technology Report):

Nearly half (48 percent) of CIOs predict that the load on their networks will double over the next two years. (This is especially true in China, where 68 percent of those surveyed hold this point of view, and in Germany, 60 percent.) 23 percent of respondents expect network traffic to triple over the next two years. At the same time, only 40 percent of respondents declared their readiness for an explosive growth in network traffic.

27 percent of those surveyed admitted that they need better IT policies and information security measures.

21 percent need more bandwidth.

Big Data opens up new opportunities for IT departments to create value and build close relationships with business units to increase revenue and strengthen a company's bottom line. Big Data projects make IT departments a strategic partner of business departments.

According to 73 percent of respondents, it is the IT department that will become the main engine for implementing the Big Data strategy. At the same time, respondents believe that other departments will also be involved in the implementation of this strategy. First of all, this concerns the departments of finance (named by 24 percent of respondents), research and development (20 percent), operations (20 percent), engineering (19 percent), as well as marketing (15 percent) and sales (14 percent).

Gartner: Millions of new jobs needed to manage big data

Global IT spending will reach $3.7 billion by 2013, up 3.8% from IT spending in 2012 (year-end forecast is $3.6 billion). Segment big data(big data) will evolve at a much faster pace, according to a Gartner report.

By 2015, 4.4 million jobs in the field information technologies will be created to serve big data, of which 1.9 million jobs are in . What's more, each such job will generate three additional non-IT jobs, so that in the US alone, 6 million people will be working to support the information economy over the next four years.

According to Gartner experts, the main problem is that there is not enough talent in the industry for this: both private and public educational systems, for example, in the United States, are not able to supply the industry with a sufficient number of qualified personnel. So of the mentioned new jobs in IT, only one out of three will be provided with personnel.

Analysts believe that the role of cultivating qualified IT personnel should be taken directly by companies that are in dire need of them, as such employees will become a pass for them into the new information economy of the future.

2012

First skepticism about Big Data

Analysts from Ovum and Gartner suggest that for a trendy topic in 2012 big data it may be time to let go of illusions.

The term "Big Data" at this time usually refers to the ever-growing volume of information coming online from social media, sensor networks and other sources, as well as the growing range of tools used to process data and identify important business from it. -trends.

“Because of (or in spite of) the hype surrounding the idea of ​​big data, manufacturers in 2012 looked at this trend with great hope,” said Tony Bayer, an analyst at Ovum.

Bayer said that DataSift conducted a retrospective analysis of big data references in

Big Data (or Big Data) is a set of methods for working with huge amounts of structured or unstructured information. Big data specialists are engaged in its processing and analysis to obtain visual, human-perceptible results. Look At Me talked to professionals and found out what is the situation with big data processing in Russia, where and what is better to study for those who want to work in this area.

Alexey Ryvkin about the main directions in the field of big data, communication with customers and the world of numbers

I studied at the Moscow Institute electronic engineering. The main thing that I managed to get out of there was fundamental knowledge of physics and mathematics. Simultaneously with my studies, I worked at the R&D center, where I was engaged in the development and implementation of error-correcting coding algorithms for secure data transmission. After graduating from the bachelor's degree, I entered the master's program in business informatics at the Higher School of Economics. After that, I wanted to work at IBS. I was lucky that at that time due to big amount projects, there was an additional recruitment of interns, and after several interviews, I started working at IBS, one of the largest Russian companies this area. In three years, I went from an intern to an enterprise solutions architect. Now I am developing the expertise of Big Data technologies for customer companies from the financial and telecommunications sectors.

There are two main specializations for people who want to work with big data: analysts and IT consultants who create technologies for working with big data. In addition, one can also talk about the profession of Big Data Analyst, i.e. people who directly work with data, with the customer's IT platform. Previously, these were ordinary mathematical analysts who knew statistics and mathematics and, with the help of statistical software, solved data analysis problems. Today, in addition to knowledge of statistics and mathematics, an understanding of technology and the life cycle of data is also required. This, in my opinion, is the difference between modern Data Analyst and those analysts who were before.

My specialization is IT consulting, that is, I come up with and offer customers ways to solve business problems using IT technologies. People with different experience come to consulting, but the most important qualities for this profession are the ability to understand the needs of the client, the desire to help people and organizations, good communication and team skills (since it is always working with the client and in a team), good analytical skills. Internal motivation is very important: we work in a competitive environment, and the customer is waiting for unusual solutions and interest in work.

Most of my time is spent communicating with customers, formalizing their business needs, and helping develop the most appropriate technology architecture. The selection criteria here have their own peculiarity: in addition to functionality and TCO (Total cost of ownership - total cost of ownership), non-functional requirements for the system are very important, most often it is response time, information processing time. To convince the customer, we often use the proof of concept approach - we offer to “test” the technology for free on some task, on a narrow data set, to make sure that the technology works. The solution should create a competitive advantage for the customer by obtaining additional benefits (for example, x-sell , cross-selling) or solve some business problem, say, reduce high level loan fraud.

It would be much easier if clients came with a ready task, but until they realize that there is a revolutionary technology that can change the market in a couple of years

What problems do you have to face? The market is not yet ready to use big data technologies. It would be much easier if clients came with a ready-made task, but so far they do not understand that a revolutionary technology has appeared that can change the market in a couple of years. That is why we, in fact, work in startup mode - we do not just sell technologies, but every time we convince customers that they need to invest in these solutions. This is such a position of visionaries - we show customers how they can change their business with the involvement of data and IT. We are creating this new market - the market of commercial IT consulting in the field of Big Data.

If a person wants to engage in data analysis or IT consulting in the field of Big Data, then the first thing that is important is a mathematical or technical education with a good mathematical background. It is also useful to master specific technologies, for example SAS, Hadoop, R language or IBM solutions. In addition, you need to be actively interested in applied tasks for Big Data - for example, how they can be used for improved credit scoring in a bank or management life cycle client. This and other knowledge can be obtained from available sources: for example, Coursera and Big Data University. There is also a Customer Analytics Initiative at Wharton University of Pennsylvania, where a lot of interesting material has been published.

A serious problem for those who want to work in our field is a clear lack of information about Big Data. You cannot go to a bookstore or some website and get, for example, an exhaustive collection of cases on all applications of Big Data technologies in banks. There are no such guides. Some of the information is found in books, another part is collected at conferences, and some you have to figure out on your own.

Another problem is that analysts are comfortable in the world of numbers, but they are not always comfortable in business. These people are often introverted, have difficulty communicating, and therefore find it difficult to convincingly communicate research results to clients. To develop these skills, I would recommend books such as The Pyramid Principle, Speak the Language of Diagrams. They help develop presentation skills, concisely and clearly express your thoughts.

It helped me a lot to participate in various case championships while studying at the Higher School of Economics. Case championships are intellectual competitions for students, where you need to study business problems and offer solutions to them. They come in two forms: consulting firm case championships such as McKinsey, BCG, Accenture, and independent case championships such as Changellenge. While participating in them, I learned to see and decide challenging tasks- from identifying the problem and structuring it to defending recommendations for solving it.

Oleg Mikhalsky about the Russian market and the specifics of creating a new product in the field of big data

Before joining Acronis, I was already involved in launching new products to the market in other companies. It is always interesting and difficult at the same time, so I was immediately interested in the opportunity to work on cloud services and storage solutions. In this area, all my previous experience in the IT industry came in handy, including my own startup project I-accelerator . It also helped to have a business education (MBA) in addition to basic engineering.

In Russia, large companies - banks, mobile operators etc. - there is a need for big data analysis, so there are prospects in our country for those who want to work in this area. True, many projects are now integration, that is, made on the basis of foreign developments or open source technologies. In such projects, fundamentally new approaches and technologies are not created, but rather existing developments are adapted. At Acronis, we took a different path and, after analyzing the available alternatives, decided to invest in our own development, creating a system as a result secure storage for big data, which is not inferior in cost to, for example, Amazon S3, but works reliably and efficiently and on a much smaller scale. Large Internet companies also have their own developments on big data, but they are more focused on internal needs than meeting the needs of external customers.

It is important to understand the trends and economic forces that are affecting the field of big data processing. To do this, you need to read a lot, listen to speeches by reputable experts in the IT industry, attend thematic conferences. Now almost every conference has a section about Big Data, but they all talk about it from a different angle: from the point of view of technology, business or marketing. You can go for a project job or an internship in a company that already has projects on this topic. If you are confident in your abilities, then it is not too late to organize a startup in the field of Big Data.

Without constant contact with the market new development risk being unclaimed

True, when you are responsible for a new product, a lot of time is spent on market analytics and communication with potential customers, partners, professional analysts who know a lot about customers and their needs. Without constant contact with the market, a new development runs the risk of being unclaimed. There are always many uncertainties: you have to understand who will be the first users (early adopters), what you have for them of value and how to then attract a mass audience. The second most important task is to form and convey to developers a clear and holistic vision of the final product in order to motivate them to work in such conditions, when some requirements may still change, and priorities depend on the feedback from the first customers. Therefore, an important task is to manage the expectations of customers on the one hand and developers on the other. So that neither of them lose interest and bring the project to completion. After the first successful project, it becomes easier and the main task will be to find the right growth model for the new business.

big data– English. "big data". The term appeared as an alternative to DBMS and became one of the main IT infrastructure trends when most of the industry giants - IBM, Microsoft, HP, Oracle and others began to use this concept in their strategies. Big Data is understood as a huge (hundreds of terabytes) array of data that cannot be processed in traditional ways; sometimes - tools and methods for processing this data.

Examples of Big Data sources: RFID events, messages in social networks, meteorological statistics, information about the location of subscribers of mobile networks cellular communication and data from audio/video recording devices. Therefore, "big data" is widely used in manufacturing, healthcare, government, Internet business - in particular, in the analysis of the target audience.

Characteristic

Signs of big data are defined as "three Vs": Volume - volume (really large); variety - heterogeneity, set; velocity - speed (requires very fast processing).

Big data is most often unstructured, and special algorithms are needed to process it. Big data analysis methods include:

  • ("data mining") - a set of approaches for discovering hidden useful knowledge that cannot be obtained by standard methods;
  • Crowdsourcing (crowd - “crowd”, sourcing - use as a source) - the solution of significant tasks by the joint efforts of volunteers who are not in a binding labor contract and relationships, coordinating activities using IT tools;
  • Data Fusion & Integration ("mixing and incorporation of data") - a set of methods for connecting multiple sources as part of deep analysis;
  • Machine Learning (“machine learning”) is a subsection of artificial intelligence research that studies methods for using statistical analysis and obtaining forecasts based on basic models;
  • pattern recognition (for example, face recognition in the viewfinder of a camera or video camera);
  • spatial analysis - the use of topology, geometry and geography to build data;
  • data visualization - output analytical information in the form of illustrations and diagrams with interactive tools and animations to track results and build a foundation for further monitoring.

Storage and analysis of information is carried out on in large numbers high performance servers. The key technology is Hadoop, which is open source.

Since the amount of information will only increase over time, the difficulty lies not in getting the data, but in how to process it with maximum benefit. In general, the process of working with Big Data includes: collecting information, structuring it, creating insights and contexts, and developing recommendations for action. Even before the first stage, it is important to clearly define the purpose of the work: what exactly the data is needed for, for example, determining the target audience of the product. Otherwise, there is a risk of getting a lot of information without understanding how exactly they can be used.