Elements of statistical data processing. Statistical data processing and its features Data for statistical data processing

Laboratory work No. 3. Statistical data processing in the MatLab system

General problem statement

The main purpose of the implementation laboratory work is an introduction to the basics of working with statistical data processing in the MatLAB environment.

Theoretical part

Primary statistical data processing

Statistical data processing is based on primary and secondary quantitative methods. The purpose of the primary processing of statistical data is to structure the information obtained, implying the grouping of data into pivot tables by various parameters. Primary data should be presented in such a format that a person can make an approximate assessment of the obtained data set and reveal information about the data distribution of the obtained data sample, for example, homogeneity or compactness of data. After the primary data analysis, methods of secondary statistical data processing are applied, on the basis of which statistical patterns are determined in the existing data set.

Conducting primary statistical analysis on a data array allows you to gain knowledge about the following:

What is the most representative value for the sample? To answer to this question measures of the central tendency are determined.

Is the scatter in the data relative to this characteristic value large, ie, what is the "blur" of the data? V in this case measures of variability are determined.

It is worth noting the fact that the statistical indicators of the measure of central tendency and variability are determined only on quantitative data.

Central trend measures- a group of values ​​around which the rest of the data are grouped. Thus, the measures of the central tendency summarize the data set, which makes it possible to form inferences both about the sample as a whole and to carry out a comparative analysis of different samples with each other.

Suppose there is a sample of data, then the measures of the central trend are estimated by the following indicators:

1. Sample mean Is the result of dividing the sum of all sample values ​​by their number. It is determined by the formula (3.1).

(3.1)

where - i th element of the sample;

n- the number of elements in the sample.

The sample mean provides the greatest precision in assessing the central trend.

Let's say you have a sample of 20 people. Samples are information about the average monthly income of each person. Let's assume that 19 people have an average monthly income of 20 tr. and 1 person with an income of 300 tr. The total monthly income of the entire sample is RUB 680. The sample mean in this case is S = 34.


2. Median- forms a value, above and below which the number of differing values ​​is the same, that is, it is the central value in a sequential data series. It is determined depending on the parity / oddness of the number of elements in the sample by formulas (3.2) or (3.3). The algorithm for estimating the median for a sample of data:

First of all, the data is ranked (sorted) in descending / ascending order.

If the ordered sample has an odd number of elements, then the median coincides with the center value.

(3.2)

where n

In the case of an even number of elements, the median is defined as the arithmetic mean of the two central values.

(3.3)

where is the average element of the ordered sample;

- the element of the ordered selection following;

The number of elements in the sample.

If all the elements of the sample are different, then exactly half of the elements of the sample are greater than the median, and the other half is less. For example, for the sample (1, 5, 9, 15, 16), the median is the same as item 9.

In statistical analysis of data, the median allows you to determine the elements of the sample that strongly influence the value of the sample mean.

Let's say you have a sample of 20 people. Samples are information about the average monthly income of each person. Let's assume that 19 people have an average monthly income of 20 tr. and 1 person with an income of 300 tr. The total monthly income of the entire sample is RUB 680. The median, after ordering the sample, is determined as the arithmetic mean of the tenth and eleventh elements of the sample) and is equal to Me = 20 tr. This result is interpreted as follows: the median divides the sample into two groups, so that it can be concluded that in the first group, each person has an average monthly income of no more than 20 thousand rubles, and in the second group, at least 20 tons. R. In this example, we can say that the median is characterized by how much the “average” person earns. At the same time, the value of the sample mean is significantly exceeded S = 34, which indicates the unacceptability of this characteristic when assessing average earnings.

Thus, the greater the difference between the median and the sample mean, the greater the scatter of the sample data (in the example considered, a person with an earnings of 300 thousand rubles clearly differs from the average people in a particular sample and has a significant impact on the estimate of the average income). What to do with such elements is decided in each individual case. But in the general case, to ensure the reliability of the sample, they are removed, since they have a strong influence on the assessment of statistical indicators.

3. Fashion (Moe)- forms the value most frequently encountered in the sample, i.e. the value with the highest frequency. Mode estimation algorithm:

In the case when a sample contains elements that occur equally often, it is said that there is no fashion in such a sample.

If two adjacent elements Since the samples have the same frequency, which is higher than the frequency of the rest of the sample, then the mode is determined as the average of these two values.

If two samples have the same frequency, which is higher than the frequency of the other samples, and these elements are not adjacent, then it is said that there are two modes in this sample.

Mode in statistical analysis is used in situations where a quick assessment of the measure of the central trend is required and high accuracy is not required. For example, fashion (in terms of size or brand) is convenient to use to determine the clothes and shoes that are most in demand among customers.

Scatter (variability) measures- a group of statistical indicators characterizing the differences between the individual values ​​of the sample. Based on the indicators of dispersion measures, it is possible to estimate the degree of homogeneity and compactness of the sample elements. Dispersion measures are characterized by the following set of indicators:

1. Swipe - it is the interval between the maximum and minimum values ​​of the observation results (sample units). A swing measure indicates the spread of values ​​in a population of data. If the range is large, then the values ​​in the aggregate are very scattered, otherwise (the range is small) it is said that the values ​​in the aggregate lie close to each other. The range is determined by formula (3.4).

(3.4)

Where - the maximum sample element;

is the minimum sample element.

2.Average deviation- the arithmetic mean difference (in absolute value) between each value in the sample and its sample mean. The average deviation is determined by the formula (3.5).

(3.5)

where - i th element of the sample;

The value of the sample mean, calculated by the formula (3.1);

The number of elements in the sample.

Module is necessary due to the fact that deviations from the average for each specific element can be both positive and negative. Therefore, if you do not take the module, then the sum of all deviations will be close to zero and it will be impossible to judge the degree of data variability (data crowding around the sample mean). When performing statistical analysis, the mode and median can be taken instead of the sample mean.

3. Dispersion- a measure of scattering describing the comparative deviation between data values ​​and the mean. It is calculated as the sum of the squares of the deviations of each sample element from the mean. Depending on the sample size, the variance is estimated different ways:

For large samples (n> 30) by formula (3.6)

(3.6)

For small samples (n<30) по формуле (3.7)

(3.7)

where X i is the i-th element of the sample;

S is the average value of the sample;

The number of elements in the sample;

(X i - S) is the deviation from the mean for each value in the dataset.

4. Standard deviation-a measure of how widely scattered the data points are in relation to their mean.

The process of squaring individual deviations in calculating the variance increases the degree of deviation of the resulting deviation from the original deviations, which in turn introduces additional errors. Thus, in order to approximate the estimate of the spread of data points relative to their mean to the value of the mean deviation, the square root is extracted from the variance. The extracted root of the variance characterizes a measure of variability called the root-mean-square or standard deviation (3.8).

(3.8)

Let's say you are a software development project manager. You have five programmers subordinate to you. By managing the project execution process, you distribute tasks among the programmers. For simplicity of the example, we will proceed from the fact that tasks are equivalent in complexity and execution time. You decided to analyze the work of each programmer (the number of tasks completed during the week) for the last 10 weeks, as a result of which you received the following samples:

Week name

After estimating the average number of completed tasks, you got the following result:

Week name S
22,3
22,4
22,2
22,1
22,5

Based on the S indicator, all programmers work on average with the same efficiency (about 22 tasks per week). However, the indicator of variability (range) is very high (from 5 tasks of the fourth programmer to 24 tasks for the fifth).

Week name S P
22,3
22,4
22,2
22,1
22,5

Let us estimate the standard deviation, which shows how the values ​​in the samples are distributed relative to the mean, namely, in our case, estimate how large the spread of the tasks from week to week is.

Week name S P SO
22,3 1,56
22,4 1,8
22,2 2,84
22,1 1,3
22,5 5,3

The resulting estimate of the standard deviation says the following (let's estimate two extreme cases 4 and 5 programmers):

Each value in a sample of 4 programmers deviates on average by 1.3 tasks from the average value.

Each value in the programmer's sample 5 deviates on average by 5.3 tasks from the average value.

The closer the standard deviation is to 0, the more reliable the mean, as this indicates that each sample value is nearly equal to the mean (in our example, this is 22.5 items). Consequently, the 4th programmer is the most consistent in contrast to the 5th. The variability in the performance of tasks from week to week of the 5th programmer is 5.3 tasks, which indicates a significant scatter. In the case of the 5th programmer, you cannot trust the average, and therefore, it is difficult to predict the number of completed tasks for the next week, which in turn makes it difficult to plan and adhere to work schedules. What management decision you make in this course is irrelevant. It is important that you receive an assessment on the basis of which appropriate management decisions can be made.

Thus, a general conclusion can be drawn that the mean does not always correctly estimate the data. The correctness of the estimate of the mean can be judged by the value of the standard deviation.


1. Tools for statistical data processing in Excel

2. Using special functions

3. Using the ANALYSIS PACKAGE tool

Literature:

the main:

1. Burke. Data analysis using Microsoft Excel. : Per. from English / Burke, Kenneth, Carey, Patrick. - M.: Publishing house "Williams", 2005. - S. 216 - 256.

2. Mishin A.V. Information technologies in legal activity: workshop / A.V. Mishin. - M .: RAP, 2013 .-- S. 2-11.

additional:

3. Informatics for lawyers and economists: textbook for universities / Ed. S.V. Simonovich. - SPb .: Peter, 2004 .-- S. 498-516.

Practical lesson number 30

Topic No. 11.1. Database maintenance in Access DBMS

The lesson is conducted using the project method.

Project goal: to develop a database on the work of the court.

Technical task:

1. Create a database "Court" from two tables "Judges" and "Claims" with the following structure, respectively:

Table "Judges"

Field name Judge code FULL NAME Days of reception Business hours Work experience
Data type Numerical Text Text Text Numerical
Field size Long integer Long integer
Field format Basic Basic
Decimal places
Default value "Wed" "15: 00-17: 00"
Condition on value > 36200 And<36299 Mon Or Tue Or Wed Or Thu Or Fri > 0 And<40
Error message Valid values ​​are Mon, Tue, Wed, Thu, or Fri. Please re-enter! ! Valid values ​​are from 1 to 39. Re-enter!
Obligatory field Yes Yes Not Not Not
Indexed field Not Not Not Not

Note. Declare the key field "Judge code".

Claims table

Field name Case number Plaintiff Answer-chick Judge code Meeting date
Data type Numerical Text Text Numerical Date Time
Field Properties: General Tab
Field size Long integer Long integer Full date format
Field format Basic
Decimal places
Default value
Condition on value > 0 And<99999 > 36200 And<36299
Error message Wrong entry - repeat! Valid values ​​are from 36201 to 36298. Please re-enter!
Obligatory field Yes Not Not Not Not
Indexed field Yes (no matches allowed) Not Not Yes (Matches allowed) Not

2. Enter the following data records into the Judges table:

Enter the following data records into the Claims table:

3. In the field "Judge code" establish a one-to-many relationship between the tables Judges and Lawsuits... When doing so, set "Ensure data integrity" and "cascade update of linked fields".

Literature:

the main:

1. Mishin A.V. Information technologies in professional activity: textbook / A.V. Mishin, L.E. Mistrov, D.V. Kartavtsev. - M .: RAP, 2011 .-- S. 259-264.

additional:

Practical lesson number 31

Topic No. 11.2. Principles of creating forms and queries in Access DBMS

1. Development of input forms for data entry.

2. Methodology for calculating and analyzing the entered data.

Literature:

the main:

1. Mishin A.V. Information technologies in professional activity: textbook / A.V. Mishin, L.E. Mistrov, D.V. Kartavtsev. - M .: RAP, 2011 .-- S. 265-271.

additional:

2. Informatics and information technology: a textbook for university students / I.G. Lesnichaya, I.V. Missing, Yu.D. Romanov, V.I. Shestakov. - 2nd ed. - M .: Eksmo, 2006 .-- 544 p.

3. Mikheeva E.V. Information technologies in professional activity: a textbook for students of secondary vocational schools / E.V. Mikheeva. - 2nd ed., Erased. - M .: Academy, 2005 .-- 384 p.

Send your good work in the knowledge base is simple. Use the form below

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

Processing of statistical data

Introduction

statistical variance sample correlation

Methods of statistical processing of the results of an experiment are mathematical techniques, formulas, methods of quantitative calculations, with the help of which the indicators obtained during the experiment can be generalized, brought into a system, revealing the laws hidden in them. We are talking about such regularities of a statistical nature that exist between the variables studied in the experiment.

Some of the methods of mathematical and statistical analysis allow calculating the so-called elementary mathematical statistics characterizing the sample distribution of data, for example, sample mean, sample variance, mode, median, and a number of others. Other methods of mathematical statistics, for example, analysis of variance, regression analysis, make it possible to judge the dynamics of changes in individual statistics of the sample. With the help of the third group of methods, say, correlation analysis, factor analysis, methods for comparing sample data, one can reliably judge the statistical relationships that exist between the variables that are investigated in this experiment.

1. Methods of primary statistical processing of experimental results

All methods of mathematical and statistical analysis are conventionally divided into primary and secondary. Methods that can be used to obtain indicators that directly reflect the results of measurements made in an experiment are called primary methods. Accordingly, the primary statistical indicators mean those that are used in the psychodiagnostic methods themselves and are the result of the initial statistical processing of the results of psychodiagnostics. Secondary methods of statistical processing are called, with the help of which, on the basis of primary data, statistical patterns hidden in them are revealed.

The primary methods of statistical processing include, for example, the determination of the sample mean, sample variance, sample mode and sample median. Secondary methods usually include correlation analysis, regression analysis, methods for comparing primary statistics in two or more samples.

Consider methods for calculating elementary mathematical statistics.

1.1 Fashion

The numerical characteristic of a sample, which, as a rule, does not require calculations, is the so-called mode. Fashion is the quantitative value of the trait under study, which is most often found in the sample. For symmetric feature distributions, including the normal distribution, the mode value coincides with the mean and median values. For other types of distribution, asymmetric, this is not typical. For example, in the sequence of feature values ​​1, 2, 5, 2, 4, 2, 6, 7, 2, the mode is the value 2, since it occurs more often than other values ​​- four times.

Fashion is found according to the following rules:

1) In the case when all values ​​in the sample occur equally often, it is generally accepted that this sample series has no mode. For example: 5, 5, 6, 6, 7, 7 - there is no fashion in this sample.

2) When two adjacent (adjacent) values ​​have the same frequency and their frequency is greater than the frequencies of any other values, the mode is calculated as the arithmetic average of these two values. For example, in sample 1, 2, 2, 2, 5, 5, 5, 6, the frequencies of adjacent values ​​2 and 5 coincide and equal 3. This frequency is greater than the frequency of other values ​​1 and 6 (for which it is equal to 1). Therefore, the mode of this series will be the value = 3.5

3) If two non-adjacent (not adjacent) values ​​in the sample have equal frequencies that are higher than the frequencies of any other value, then two modes are distinguished. For example, in the row 10, 11, 11, 11, 12, 13, 14, 14, 14, 17, the modes are 11 and 14. In this case, the sample is said to be bimodal.

There can also exist so-called multimodal distributions with more than two vertices (modes).

4) If the mode is estimated by the set of grouped data, then to find the mode it is necessary to determine the group with the highest frequency of the feature. This group is called a modal group.

1.2 Median

The median is the value of the trait under study, which divides the sample, ordered by the value of this trait, in half. To the right and to the left of the median, the same number of features remains in the ordered series. For example, for the sample 2, 3, 4, 4, 5, 6, 8, 7, 9, the median will be 5, since there are four indicators to the left and right of it. If the series includes an even number of features, then the median will be the average taken as a half-sum of the values ​​of the two central values ​​of the series. For the next row 0, 1, 1, 2, 3, 4, 5, 5, 6, 7, the median will be 3.5.

Knowing the median is useful in determining whether the distribution of the particular values ​​of the studied trait is symmetrical and close to the so-called normal distribution. The mean and median for the normal distribution usually coincide or differ very little from each other. If the sample distribution of characteristics is normal, then methods of secondary statistical calculations based on the normal distribution of data can be applied to it. Otherwise, this cannot be done, since serious errors can creep into the calculations.

1.3 Sample mean

The sample mean (arithmetic mean) value as a statistical indicator is the average estimate of the psychological quality studied in the experiment. This assessment characterizes the degree of its development as a whole in that group of subjects that was subjected to psychodiagnostic examination. By comparing directly the mean values ​​of two or more samples, we can judge the relative degree of development in the people making up these samples, assessed quality.

1.4 Sample spread

The spread (sometimes this value is called the range) of the sample is denoted by the letter R. This is the simplest indicator that can be obtained for a sample - the difference between the maximum and minimum values ​​of a given variation series, i.e.

R = xmax - xmin

It is clear that the more the measured characteristic varies, the greater the value of R, and vice versa. However, it may happen that for two sample series both the means and the range are the same, but the nature of the variation of these series will be different. For example, two samples are given:

X = 10 15 20 25 30 35 40 45 50 X = 30 R = 40

Y = 10 28 28 30 30 30 32 32 50 Y = 30 R = 40

With the equality of means and spreads for these two sample series, the nature of their variation is different. In order to more clearly understand the nature of the variation of the samples, one should refer to their distributions.

1.5 Dispersion

Variance is the arithmetic mean of the squares of the deviations of the values ​​of a variable from its mean.

Variance as a statistic characterizes how much the particular values ​​deviate from the mean in a given sample. The greater the variance, the greater the deviation or scatter in the data.

The square root is extracted from the sum of squares divided by the number of terms in the series.

Sometimes there are quite a lot of initial private primary data that are subject to statistical processing, and they require a huge number of elementary arithmetic operations. In order to reduce their number and at the same time maintain the required accuracy of calculations, sometimes they resort to replacing the original sample of particular empirical data with intervals. An interval is a group of characteristic values ​​sorted by value, which is replaced by the average value in the calculation process.

2. Methods of secondary statistical processing of experimental results

With the help of secondary methods of statistical processing of experimental data, hypotheses associated with the experiment are directly verified, proven or refuted. These methods, as a rule, are more complex than methods of primary statistical processing, and require a good preparation from the researcher in the field of elementary mathematics and statistics. (7).

The discussed group of methods can be divided into several subgroups:

1. Regression calculus.

2. Methods for comparing two or more elementary statistics (means, variances, etc.) related to different samples.

3. Methods for establishing statistical relationships between variables, for example, their correlation with each other.

4. Methods for identifying the internal statistical structure of empirical data (for example, factor analysis). Let's consider each of the selected subgroups of methods of secondary statistical processing by examples.

2.1 Regression calculus

Regression calculus is a method of mathematical statistics that allows you to reduce private, disparate data to a certain line graph, approximately reflecting their internal interconnection, and to get the ability to approximately estimate the probable value of another variable by the value of one of the variables (7).

The graphical expression of a regression equation is called a regression line. The regression line expresses the best predictions of the dependent variable (Y) for the independent variables (X).

Regression is expressed using two regression equations, which in the most direct case look like straight line equations.

Y = a 0 + a 1 * X

X = b 0 + b 1 * Y

In equation (1), Y is the dependent variable, X is the independent variable, a 0 is an intercept, a 1 is the regression coefficient, or slope, which determines the slope of the regression line with respect to the coordinate axes.

In equation (2) X is the dependent variable, Y is the independent variable, b 0 is the intercept, b 1 is the regression coefficient, or the slope, which determines the slope of the regression line with respect to the coordinate axes.

Quantifying the relationship (relationship) between X and Y (between Y and X) is called regression analysis. The main task of regression analysis is to find the coefficients a 0, b 0, a1 and b 1 and determine the level of significance of the obtained analytical expressions connecting the variables X and Y.

To apply the linear regression analysis method, the following conditions must be met:

1. Compared variables X and Y should be measured on a scale of intervals or ratios.

2. It is assumed that the variables X and Y have a normal distribution.

3. The number of varying features in the compared variables should be the same. (5).

2.2 Correlation

The next method of secondary statistical processing, through which the relationship or direct relationship between two series of experimental data is clarified, is called the method of correlations. It shows how one phenomenon influences another or is related to it in its dynamics. This kind of relationship exists, for example, between quantities that are in causal relationships with each other. If it turns out that two phenomena statistically reliably correlate with each other and if at the same time there is confidence that one of them can act as the cause of the other phenomenon, then this definitely implies the conclusion that there is a causal relationship between them. (7)

When an increase in the level of one variable is accompanied by an increase in the level of another, then we are talking about a positive correlation. If the growth of one variable occurs with a decrease in the level of the other, then one speaks of a negative correlation. In the absence of a connection between the variables, we are dealing with zero correlation. (one)

There are several variations of this method: linear, ranked, paired, and multiple. Linear correlation analysis allows you to establish direct relationships between variables by their absolute values. These connections are graphically expressed as a straight line, hence the name "linear". Rank correlation determines the dependence not between the absolute values ​​of the variables, but between the ordinal places, or the ranks occupied by them in the order of magnitude. Pairwise correlation analysis includes the study of correlation dependences only between pairs of variables, and multiple, or multivariate, - between many variables at the same time. Factor analysis is a widespread form of multivariate correlation analysis in applied statistics. (5)

The coefficient of rank correlation in psychological and pedagogical research is addressed in the case when the signs between which the dependence is established are qualitatively different and cannot be accurately assessed using the so-called interval measuring scale. An interval scale is called a scale that allows one to assess the distance between its values ​​and judge which of them is greater and how much greater than the other. For example, the ruler used to evaluate and compare the lengths of objects is an interval scale, since, using it, we can argue that the distance between two and six centimeters is twice as large as the distance between six and eight centimeters. If, using some measuring instrument, we can only assert that some indicators are more than others, but are not able to say by how many, then such a measuring instrument is called not interval, but ordinal.

Most of the indicators that are obtained in psychological and pedagogical research refer to ordinal rather than interval scales (for example, assessments like "yes", "no", "rather no than yes" and others that can be converted into points), therefore, the linear correlation coefficient is not applicable to them.

The method of multiple correlations, in contrast to the method of pairwise correlations, makes it possible to reveal the general structure of correlation dependences existing within a multidimensional experimental material, including more than two variables, and to present these correlation dependences in the form of a certain system.

To apply a particular correlation coefficient, the following conditions must be met:

1. Compared variables should be measured on a scale of intervals or ratios.

2. It is assumed that all variables have a normal distribution.

3. The number of varying features in the compared variables should be the same.

4. To assess the level of reliability of the Pearson correlation ratio, use the formula (11.9) and the table of critical values ​​for the Student's t-test at k = n - 2. (5)

2.3 Factor analysis

Factor analysis is a statistical method that is used when processing large arrays of experimental data. The tasks of factor analysis are: reducing the number of variables (data reduction) and determining the structure of relationships between variables, i.e. classification of variables, therefore factor analysis is used as a method of data reduction or as a method of structural classification.

An important difference between factor analysis and all the methods described above is that it cannot be used to process primary, or, as they say, "raw" experimental data, i.e. obtained directly from the examination of the subjects. The material for factor analysis is correlation links, or rather, Pearson's correlation coefficients, which are calculated between the variables (i.e., psychological characteristics) included in the survey. In other words, correlation matrices, or, as they are otherwise called, intercorrelation matrices, are subjected to factor analysis. The names of the columns and rows in these matrices are the same, since they represent a list of variables included in the analysis. For this reason, intercorrelation matrices are always square, i.e. the number of rows in them is equal to the number of columns, and symmetric, i.e. at symmetrical places with respect to the main diagonal, there are the same correlation coefficients.

The main concept of factor analysis is a factor. This is an artificial statistical indicator that arises as a result of special transformations of the table of correlation coefficients between the studied psychological characteristics, or the matrix of intercorrelation. The procedure for extracting factors from the intercorrelation matrix is ​​called matrix factorization. As a result of factorization, a different number of factors can be extracted from the correlation matrix, up to a number equal to the number of initial variables. However, the factors identified as a result of factorization are, as a rule, unequal in their importance. (5)

The identified factors explain the interdependence of psychological phenomena. (7)

Most often, as a result of factor analysis, not one, but several factors are determined, which in different ways explain the intercorrelation matrix of variables. In this case, factors are divided into general, general and individual factors. General factors are called factors, all factorial loads of which are significantly different from zero (zero load indicates that this variable is in no way connected with the rest and does not have any effect on them in life). Common factors are factors for which some of the factor loadings are nonzero. Single factors are factors in which only one of the loads significantly differs from zero. (7)

Factor analysis may be appropriate if the following criteria are met.

1. It is impossible to factorize the qualitative data obtained by the scale of names, for example, such as hair color (black / brown / red), etc.

2. All variables should be independent, and their distribution should be close to normal.

3. Relationships between variables should be approximately linear, or at least not have a clearly curvilinear character.

4. The original correlation matrix must have several correlations in absolute value above 0.3. Otherwise, it is rather difficult to extract any factors from the matrix.

5. The sample of subjects should be large enough. Expert advice varies. The most rigid point of view recommends not using factor analysis if the number of subjects is less than 100, since the standard errors of the correlation in this case will be too large.

However, if the factors are well defined (for example, with loads of 0.7 rather than 0.3), the experimenter needs a smaller sample to isolate them. In addition, if it is known that the data obtained are highly reliable (for example, valid tests are used), then the data can be analyzed for a smaller number of subjects. (5).

2.4 ANDusing factor analysis

Factor analysis is widely used in psychology in various directions related to the solution of both theoretical and practical problems.

In theoretical terms, the use of factor analysis is associated with the development of the so-called factor-analytical approach to the study of personality structure, temperament and abilities. The use of factor analysis in these areas is based on the widely accepted assumption that observable and directly measurable indicators are only indirect and / or partial external manifestations of more general characteristics. These characteristics, in contrast to the first ones, are hidden, so-called latent variables, since they are concepts or constructs that are not available for direct measurement. However, they can be established by factoring the correlations between the observed features and identifying factors that (provided that the structure is good) can be interpreted as a statistical expression of the desired latent variable.

Although the factors are purely mathematical in nature, they are assumed to represent hidden variables (theoretically postulated constructs or concepts), therefore, the names of the factors often reflect the essence of the hypothetical construct under study.

Currently, factor analysis is widely used in differential psychology and psychodiagnostics. With its help, you can design tests, establish the structure of connections between individual psychological characteristics measured by a set of tests or test items.

Factor analysis is also used to standardize test methods, which is carried out on a representative sample of subjects.

Conclusion

If the data obtained in the experiment are of a qualitative nature, then the correctness of the conclusions drawn on the basis of their conclusions depends entirely on the intuition, erudition and professionalism of the researcher, as well as on the logic of his reasoning. If these data are of a quantitative type, then first they carry out their primary, and then secondary statistical processing. Primary statistical processing consists in determining the required number of elementary mathematical statistics. Such processing almost always involves at least the determination of a sample mean. In cases where the spread of the relative mean data is an informative indicator for the experimental verification of the proposed hypotheses, the variance or standard deviation is calculated. It is recommended to calculate the value of the median when it is supposed to use methods of secondary statistical processing calculated on a normal distribution. For this kind of distribution of sample data, the median, as well as the mode, coincide or are close enough to the average value. This criterion can be used to roughly judge the nature of the resulting distribution of primary data.

Secondary statistical processing (comparison of means, variances, data distributions, regression analysis, correlation analysis, factor analysis, etc.) is carried out if, in order to solve problems or prove the proposed hypotheses, it is necessary to determine the statistical patterns hidden in the primary experimental data. When starting secondary statistical processing, the researcher must first decide which of the various secondary statistics he should use to process the primary experimental data. The decision is made on the basis of taking into account the nature of the hypothesis being tested and the nature of the primary material obtained as a result of the experiment. Here are some recommendations in this regard.

Recommendation 1. If the experimental hypothesis contains the assumption that as a result of the psychological and pedagogical research, the indicators of any quality will increase (or decrease), then it is recommended to use the Student's criterion or the ch2 criterion to compare the pre- and post-experimental data. The latter is referred to if the primary experimental data are relative and expressed, for example, as a percentage.

Recommendation 2. If an experimentally tested hypothesis includes a statement about a causal relationship between some variables, then it is advisable to test it by referring to the coefficients of linear or rank correlation. Linear correlation is used when the independent and dependent variables are measured using an interval scale, and the changes in these variables before and after the experiment are small. Rank correlation is referred to when it is sufficient to evaluate changes in the order of succession in terms of the magnitude of independent and dependent variables, or when their changes are large enough, or when the measuring instrument was ordinal rather than interval.

Recommendation 3. Sometimes a hypothesis includes the assumption that as a result of the experiment, the individual differences between the subjects will increase or decrease. This assumption is well verified using the Fisher test, which allows one to compare variances before and after the experiment. Note that, using Fisher's criterion, one can work only with absolute values ​​of indicators, but not with their ranks.

Posted on Allbest.ru

...

Similar documents

    Basic techniques and methods of processing and analysis of statistical data. Calculus of arithmetic, harmonic and geometric mean values. Distribution series, their main characteristics. Alignment techniques near dynamics. System of National Accounts.

    term paper added on 10/24/2014

    The concept of economic analysis as a science, its essence, subject, general characteristics of methods and socio-economic efficiency. The main groups of econometric methods of analysis and data processing. Factor analysis of the economic data of the enterprise.

    abstract, added 03/04/2010

    Arithmetic mean of the sample, variance, standard deviation. Rejection according to the Chauvinet criterion. The Three Sigma Rule. Evaluation of the significance of the difference between the mean values ​​of the two samples. Paired, multiple regression analyzes. Complete factor analysis.

    term paper added on 12/05/2012

    Application of various methods of presentation and processing of statistical data. Spatial statistical samples. Pairwise regression and correlation. Time series. Building a trend. Practical examples and methods of solving them, formulas and their meaning.

    lecture course, added 02/26/2009

    Statistical processing of measurement results; arithmetic mean, quadratic, variance. Determination of sampling parameters: three sigma law, histogram, control charts, Ishikawa diagram. The use of quality tools in the manufacture of sofas.

    term paper added 10/17/2014

    Average value in statistics, its essence and conditions of use. Types and forms of averages: by the presence of the attribute-weight, by the form of calculation, by the coverage of the population. Fashion, median. Statistical study of the dynamics of profit and profitability on the example of OJSC "Bashmebel".

    test, added 06/14/2008

    The principles of statistical data processing, methods and techniques used in this process. Methodology and main stages of constructing control charts, their classification and types, functional features, identification of advantages and disadvantages of use.

    term paper, added 08/23/2014

    Calculation of numerical characteristics and processing of the results of sample observations. Calculation and analysis of statistical indicators in the economy. National wealth: elements, assessment; balance of assets and liabilities; fixed assets, indicators of working capital.

    term paper, added 12/25/2012

    Descriptive statistics and statistical inference. Selection methods to ensure that the sample is representative. The influence of the type of sample on the magnitude of the error. Tasks when applying the sampling method. Dissemination of observation data to the general population.

    test, added 02/27/2011

    Disclosure of the concept: interval scale, arithmetic mean, level of statistical significance. How to interpret fashion, median and mean. Problem solving using the Friedman, Rosenbaum criterion. Calculation of Sprimen's correlation coefficient.

Methods of statistical processing of the results of an experiment are mathematical techniques, formulas, methods of quantitative calculations, with the help of which the indicators obtained during the experiment can be generalized, brought into a system, revealing the laws hidden in them.

We are talking about such regularities of a statistical nature that exist between the variables studied in the experiment.

Data Are the main elements to be classified or categorized for processing 26.

Some of the methods of mathematical and statistical analysis allow calculating the so-called elementary mathematical statistics characterizing the sample distribution of data, for example:

Sample mean,

Sample variance,

Median and a number of others.

Other methods of mathematical statistics make it possible to judge the dynamics of changes in individual statistics of the sample, for example:

Analysis of variance,

Regression analysis.

With the help of the third group of methods of sample data, one can reliably judge the statistical relationships that exist between the variables that are investigated in this experiment:

Correlation analysis;

Factor analysis;

Comparison methods.

All methods of mathematical and statistical analysis are conventionally divided into primary and secondary 27.

Methods that can be used to obtain indicators that directly reflect the results of measurements made in an experiment are called primary methods.

Secondary methods of statistical processing are called, with the help of which, on the basis of primary data, statistical patterns hidden in them are revealed.

The primary methods of statistical processing include, for example:

Determination of the sample mean;

Selective variance;

Selective fashion;

Sample median.

Secondary methods typically include:

Correlation analysis;

Regression analysis;

Methods for comparing primary statistics for two or more samples.

Let's consider methods for calculating elementary mathematical statistics, starting with a sample mean.

Arithmetic mean - it is the ratio of the sum of all data values ​​to the number of terms 28.

The average value as a statistical indicator is the average assessment of the psychological quality studied in the experiment.

This assessment characterizes the degree of its development as a whole in that group of subjects that was subjected to psychodiagnostic examination. By comparing directly the mean values ​​of two or more samples, we can judge the relative degree of development in the people making up these samples, assessed quality.

The sample mean is determined using the following formula 29:

where x cf is the sample mean or the arithmetic mean of the sample;

n - the number of subjects in the sample or private psychodiagnostic indicators, on the basis of which the average value is calculated;

x k - particular values ​​of indicators for individual subjects. There are n such indicators in total, therefore the index k of this variable takes values ​​from 1 to n;

∑ - accepted in mathematics sign of summation of the values ​​of those variables that are to the right of this sign.

Dispersion Is a measure of the dispersion of the data around the mean of 30.

The greater the variance, the greater the deviation or scatter in the data. It is determined in order to be able to distinguish from each other values ​​that have the same average, but different scatter.

The variance is determined by the following formula:

where is the sample variance, or simply variance;

An expression that means that for all x k from the first to the last in a given sample, it is necessary to calculate the differences between the particular and average values, square these differences and sum;

n is the number of subjects in the sample or primary values ​​for which the variance is calculated.

Median the value of the trait being studied is called, which divides the sample, ordered by the value of the given trait, in half.

Knowing the median is useful in determining whether the distribution of the particular values ​​of the studied trait is symmetrical and close to the so-called normal distribution. The mean and median for the normal distribution usually coincide or differ very little from each other.

If the sample distribution of characteristics is normal, then methods of secondary statistical calculations based on the normal distribution of data can be applied to it. Otherwise, this cannot be done, since serious errors can creep into the calculations.

Fashion one more elementary mathematical statistics and characteristics of the distribution of experimental data. Fashion is the quantitative value of the trait under study, which is most often found in the sample.

For symmetric feature distributions, including the normal distribution, the mode values ​​coincide with the mean and median values. For other types of distributions, asymmetric, this is not typical.

The method of secondary statistical processing, through which the connection or direct relationship between two series of experimental data is clarified, is called method of correlation analysis. It shows how one phenomenon influences another or is related to it in its dynamics. This kind of relationship exists, for example, between quantities that are in causal relationships with each other. If it turns out that two phenomena statistically reliably correlate with each other and if at the same time there is confidence that one of them can act as the cause of the other phenomenon, then this definitely implies the conclusion that there is a causal relationship between them.

There are several variations of this method:

Linear correlation analysis allows you to establish direct relationships between variables by their absolute values. These connections are graphically expressed as a straight line, hence the name "linear".

The linear correlation coefficient is determined using the following formula 31:

where r xy - linear correlation coefficient;

x, y - average sample values ​​of the compared values;

X i , at i - partial sample values ​​of the compared values;

P - the total number of values ​​in the compared series of indicators;

Dispersion, deviations of the compared values ​​from the mean values.

Rank correlation determines the dependence not between the absolute values ​​of the variables, but between the ordinal places, or the ranks occupied by them in the order of magnitude. The formula for the rank correlation coefficient is 32:

where R s is the Spearman rank correlation coefficient;

d i - the difference between the ranks of the indicators of the same subjects in ordered rows;

P - the number of subjects or digital data (ranks) in the correlated series.

Atyusheva Anna

In the work, using the example of processing data on the progress of 7th grade students, the main statistical characteristics are considered, the collection and grouping of statistical data is carried out, statistical information is clearly presented, and the analysis of the data obtained is carried out.

The work contains an accompanying presentation.

Download:

Preview:

Municipal autonomous educational institution "Gymnasium No. 24"

XXII scientific conference MAGNI

Statistical data processing

MAOU "Gymnasium No. 24" Atyusheva Anna

Consultant: math teacher

Shchetinina Natalia Sergeevna

Magadan, 2016

Introduction ……………………………………………………………………………………………… 3

  1. Basic concepts used in statistical data processing ……………………… .5
  2. Research part ………………………………………………… ............................ ...... 7

2.1. Statistical processing of data on the progress of students in grade 7 "B" ………………… ..7

2.2 Visual presentation of data using histograms ……………………………………………………………………………………………… 18

2.3. Comparative characteristics of the educational activity of students according to the results of the 1st and 2nd quarters .................................................................. 21

2.4. Analysis of the questionnaire survey of students in grade 7 "B" for parental control over the progress of children ............................................................. 23

Conclusion ………………………………………………………………………………………… ... 27

Literature ……………………………………………………………………………………………… 28

Introduction

Any of us, opening a book or newspaper, turning on the TV or getting to the train station, is constantly faced with a tabular form of information presentation. These are the lesson timetable, train timetable, multiplication table and much more. All information is presented in the form of diagrams or graphs.

You need to be able to process and analyze such information. Without data processing, comparison of events, it is impossible to trace the development of a particular problem.

In the course of algebra, we studied statistical characteristics that are widely used in various studies. I was interested in the practical application of the studied characteristics, and the opportunity to process the data so that the information presented would clearly determine the course of development of a particular problem and, as a consequence, the result of its solution. As such a problem, I decided to consider the performance of my class in the quarters of the first half of the year.

Object research area- algebra

Object of study- statistical characteristics

Subject of study- academic performance of 7 "B" grade students in quarters of the first half of the year

Hypothesis: We believe that using the example of processing data on the performance of students in grade 7B, we will not only get acquainted with the main statistical characteristics, but also learn on our own:

  • collect and group statistical data;
  • visually present statistical information;
  • analyze the data obtained.

Target: learn to process, analyze, and visualize the available information.

Tasks:

  • study statistical characteristics;
  • collect information on student performance in grade 7 in quarters

the first half of the year;

  • process information;
  • carry out a visual presentation of information using histograms;
  • analyze the data obtained, draw appropriate conclusions.

Basic concepts used in statistical data processing

Statistics is a science that deals with obtaining, processing and analyzing quantitative data on various mass phenomena occurring in nature and society. The word "statistics" comes from the Latin word "status", which means "state, state of affairs."

The simplest statistical characteristics are arithmetic mean, median, range, mode.

  • Arithmetic meana series of numbers is called the quotient of dividing the sum of these numbers by the number of terms. Usually, the arithmetic mean is found when they want to determine the average value for a certain series of data: the average yield of wheat from 1 hectare in the region, the average output of one working team per shift, the average score of the certificate, the average air temperature at noon in this decade, etc.
  • Median an ordered series of numbers with an odd number of members is called the number written in the middle, and the median of an ordered series of numbers with an even number of members is called the arithmetic mean of two numbers written in the middle. Note that it is more convenient and faster to work with a number series if it is ordered, i.e. a row in which each subsequent number is not less (or not more) than the previous one.
  • Fashion a series of numbers is called the number most often found in a given series. A number of numbers may have more than one mod or no mod at all. The mode of a data series is usually found when one wants to identify some typical indicator. Note that the arithmetic mean of a series of numbers may not coincide with any of these numbers, and the mode, if it exists, must necessarily coincide with two or more numbers in the series. In addition, unlike the arithmetic mean, the concept of "mode" refers not only to numerical data.
  • In a sweep a series of numbers is the difference between the largest and the smallest of these numbers. The range of a series is found when they want to determine how large the spread of data in a series is.

Let's show the definition of each of the characteristics using the example of a series of numbers: 47,46,52,47,52,47,52,49,45,43,53,53,47,52.

Arithmetic mean 48,7.

It is found like this: we determine the sum of the numbers and divide it by their number.

(47+46+52+47+52+47+52+49+45+43+53+53+47+52):14=48,7.

Median of this series of numbers will be the number 48.

It is found like this: we order a series of numbers, choosing the one that is in the middle. If the number of numbers is even, then we find the arithmetic mean of the two in the middle of the row of numbers.

43,45,46,47,47,47, 47,49 ,52,52,52,52,53,53

(47+49):2=48

Fashion of this series of numbers will be the numbers 47 and 52 ... These numbers are repeated most often.

47 ,46, 52 , 47 , 52 , 47 , 52 ,49,45,43,53,53, 47 , 52 .

In a sweep of this series of numbers will be 10.

It is found like this: choose the largest and smallest number in the series and find the difference between these numbers.

47,46,52,47,52,47,52,49,45, 43, 53 ,53,47,52

53-43=10

Research part

Statistical processing of data on the performance of students in grade 7 "B"

Let's move on to processing information. Let's compose tables for each of the subjects, consisting of three lines, the first will contain a series of data. Each variant from this series was actually observed in the sample for a certain number of times. This number is called the multiplicity of the options. So let's put in the second line the multiplicity of the corresponding option. Let's get the distribution table of the sample.

If we add all the multiplicities, then we get the number of all measurements made during the sampling - the sample size (In our case, this number is 24, which corresponds to the number of students in the class).

In the third line, the ratio, expressed as a percentage, is called the frequency of the options.

Frequency options =

In general, if a table of relative frequencies is compiled based on the results of the study, then the sum of the relative frequencies is equal to 100%.

I quarter

Russian language.

Let's sort the data of the sample (marks): 3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4 , 4.5.

Average mark in the subject:(average).

Frequency allocation table

Option

Multiplicity options

Not

Frequency%

58.3%

37.5%

4.2%

Literature.

Let's order the data of the sample (marks): 3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,5,5,5 , 5.5.

Average mark in the subject:(average).

Evaluation options

multiplicity

No

Frequency%

37.5%

41.7%

20.8%

Algebra.

Let's sort the data of the sample (marks): 3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4 , 5.5.

Average mark in the subject:(average).

The largest number of students in the subject have "4, 3" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

multiplicity

No

Frequency%

45.8%

45.8%

8.3%

Story.

Let's sort the data of the sample (marks): 3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4 , 4.5

Average mark in the subject:(average).

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

45.8%

4.2%

Social science.

Let's order the data of the sample (marks): 3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,5,5,5 , 5.5

Average mark in the subject:(average).

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

37.5%

41.7%

20.8%

Geography.

Let's sort the data of the sample (marks): 3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5.5 ,5

Average mark in the subject:(average).

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

20.8%

41.7%

37.5%

Physics.

Let's sort the data of the sample (marks): 3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4 , 4.5

Average mark in the subject:(average).

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

37.5%

58.3%

4.2%

Biology.

Let's sort the data of the sample (marks): 3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5 ,5

Average mark in the subject:(average).

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

45.8%

29.2%

LIFE SAFETY FUNDAMENTALS.

Let's sort the data of the sample (marks): 4,4,4,4,4,4.4.5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5 ,5

Average mark in the subject:(average).

Evaluation options

Multiplicity

No

No

Frequency%

29.2%

70.8%

Let's sort the data of the sample (marks): 3,4,4,4.4,4,4,4,4,4,5,5,5,5,5,5,5,5.5,5,5.5,5,5,5

Average mark in the subject:(average).

The largest number of students in the subject have "5" (fashion)

About half of the students in the Russian language study at 5 (median)

Evaluation options

Multiplicity

No

Frequency%

4.2%

37.5%

58.3%

English language.

Let's sort the data of the sample (marks): 3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,5.5,5,5 ,5

Average mark in the subject:(average).

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

37.5%

41.7%

20.8%

Informatics.

Let's sort the data of the sample (marks): 3,4,4,4,4.4,4,4,4,4,4,4,4,4,5,5,5,5.5.5,5,5,5,5

Average mark in the subject:(average).

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

4.2%

54.2%

41.7%

Technology.

Let's sort the data of the sample (marks): 3,3,3,3,3,4,4,4,4,4,4,5,5,5,5,5,5,5,55,5,5,5,5,5

Average mark in the subject:(average).

The largest number of students in the subject have "5" (fashion)

About half of the students in the Russian language study at 4.5 (median)

Evaluation options

Multiplicity

No

Frequency%

20.8%

54.2%

Now let's collect similar information on the results of the second quarter.

Russian language.

Let's sort the data of the sample (marks): 3,3,3.3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4 ,4

Average mark in the subject:(average)

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

multiplicity

Not

No

Frequency%

41.7%

58.3%

Literature.

Let's sort the data of the sample (marks): 3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,5,5,5,5 , 5.5

Average mark in the subject:(average)

The largest number of students in the subject have "3" (fashion)

About half of Russian language students study grade 3 (median)

Evaluation options

multiplicity

No

Frequency%

41.7%

33.3%

Algebra.

Let's sort the data of the sample (marks): 3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,5 , 5.5

Average mark in the subject:(average)

The largest number of students in the subject have "3" (fashion)

About half of Russian language students study grade 3 (median)

Evaluation options

multiplicity

No

Frequency%

37.5%

12.5%

Story.

Let's sort the data of the sample (marks): 3,3,3,3,3,3,3,3,3,4.4,4,4,4,4,4,4,4,4,4,4,4,4,4 ,5

Average mark in the subject:(average)

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

37.5%

58.3%

4.2%

Society.

Let's sort the data of the sample (marks): 3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5 , 5.5

Average mark in the subject:(average)

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

16.7%

70.8%

12.5%

Geography.

Let's sort the data of the sample (marks): 3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5 , 5.5

Average mark in the subject:(average)

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

12.5%

58.3%

29.2%

Physics.

Let's sort the data of the sample (marks): 3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,44,5,5 ,5

Average mark in the subject:(average)

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

33.3%

16.7%

12.5%

Biology.

Let's sort the data of the sample (marks): 3,3,3,4,4,4,4,4,4,4.4,4,4,4,4,4,4,4,4,5,5,5,5,5 ,5

Average mark in the subject:(average)

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

12.5%

62.5%

LIFE SAFETY FUNDAMENTALS.

Let's sort the data of the sample (marks): 3,4,4,5,5,5,5,5.5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5 ,5

Average mark in the subject:(average)

The largest number of students in the subject have "5" (fashion)

About half of the students in the Russian language study at 5 (median)

Evaluation options

Multiplicity

No

Frequency%

4.2%

8.3%

87.5%

History and society of the native land.

Let's sort the data of the sample (marks): 3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5 , 5.5

Average mark in the subject:(average)

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

12.5%

45.8%

41.7%

English language.

Average mark in the subject:(average)

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

20.8%

29.2%

Informatics.

Let's sort the data of the sample (marks): 3,3,3,3,3,4,4,4,4,4,4,4,4,4,4,4,4,4,5,5,5,5,5 , 5.5

Average mark in the subject:(average)

The largest number of students in the subject have "4" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

20.8%

29.2%

Technology.

Let's sort the data of the sample (marks): 3,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5 , 5.5

Average mark in the subject:(average)

The largest number of students in the subject have "5" (fashion)

About half of the students in the Russian language study at 4 (median)

Evaluation options

Multiplicity

No

Frequency%

4.2%

29.2%

66.7%

Data visualization with histograms

For a visual presentation of data obtained as a result of a statistical study, various methods of their presentation are widely used.

We will use histograms for clarity of the data. A histogram is a stepped shape made up of closed rectangles. The base of each rectangle is equal to the length of the interval, and the height is equal to the multiplicity of the variant or relative frequency. Thus, in a histogram, unlike a conventional bar chart, the bases of the rectangle are not chosen arbitrarily, but are strictly determined by the length of the interval.

Comparative characteristics of student performance in the first quarter subjects

Comparative characteristics of student performance in the subjects of the second quarter

conclusions

According to the results of the first quarter, it is clearly seen that the most difficult for students to cope with such subjects as: Russian language and algebra, subjects for which "three" is an assessment that is a priority in relation to other marks. This means that the quality in these subjects is lower than in others.

It is also clear that the high level of triplets in subjects such as literature, history, society, physics, English. It is also sad to have triples in subjects such as technology, biology, geography.

According to the results of the second quarter, the number of triples and fives significantly decreased, that is, students distributed their strengths in all subjects, and not according to separately preferred ones.

Histogram of the distribution of the average score in the subjects of the first quarter

Histogram of the distribution of the average score in the subjects of the second quarter

Conclusion

To create these diagrams, we used such a statistical characteristic as the arithmetic mean. It is clearly seen that in the second quarter, knowledge of the Russian language, history and society of the native land, computer science deteriorated. Improved in history, society, physics, biology, life safety, English. But at the same time, the diagrams show that more significant changes for the better occurred only in physics and the English language.

Comparative characteristics of the educational activity of students according to the results of the first and second quarters

Histogram of the quality of knowledge in the subjects of the first quarter

Histogram of the quality of knowledge in the subjects of the second quarter

By combining both histograms into one, it is much easier to see the picture of class performance in comparison. And individually it is easier to see for which subjects the quality is higher. For example, in the first quarter the quality is less than 60% in subjects - algebra, Russian, history, in the second - Russian, literature, algebra, physics. It is already clear that the most difficult for students are the Russian language, algebra. And the percentage of quality in all subjects is not very different 66% - the first quarter, 68% - the second. That is, the leapfrogging quality in subjects, which is clearly visible on the comparison diagram, suggests that students are not particularly trying to improve their level of knowledge and do not hold their positions in one or another subject area.

Chart comparing all items by quality for the 1st and 2nd quarters

During the second quarter, the number of good and excellent students in the Russian language, society, biology, English, and technology increased significantly. The number in literature, algebra, life safety, IORK and computer science has slightly decreased. And you can see a strong drop in the quality of physics, which is associated with the students' unpreparedness for lessons.

And again we come to the conclusion that children learn “in leaps and bounds”, and there are no special preferences in the direction of education (humanitarian subjects, physics and mathematics, subjects of the natural cycle).

Analysis of the questionnaire survey of 7 "B" grade students on the subject of parental control over the progress of children

Based on the results of the above study, we decided to conduct a survey among students of grade 7 "B" for parental control over the teaching of children (questionnaires, see Appendix)

The sample size is 22 people.

Parents check homework

Conclusion

Almost a quarter of students on this issue without parental control, which of course affects their academic performance.

Number of homework checks per week

Median = 0,0,0,0,0,0,1,1,2,2,3,3,3,3,4,4,5,7,7,7,7,7 = (3 + 3 ): 2 = 3

Arithmetic mean = 3

Conclusion

On average, the assignment is checked three times a week. Given the spasmodic learning curve, this is not enough.

Median = 0,0,0,0,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,5,5,6,7, 7.7 = (2 + 2): 2 = 2

Arithmetic mean = 3 (on average, diaries are checked by parents 3 times a week)

The amount of time students spend doing homework

Variants

Less than 1

Frequency%

  • Swing R = x (max) - x (min) = 3.5 - 0.5 = 3 hours

(characterizes the magnitude of the scatter of the observed values, i.e. shows the difference between the longest and the shortest time)

  • Mode M (0) = 2.5 hours ( shows the value that occurs more often than others, i.e. shows the time students spend most often)

Histogram of Students' Time spent on Homework

Conclusion

On average, homework takes 2.5 hours a day. Which is considered a normal indicator for the age of students.

Conclusion

As a result of the work done, I learned to process and analyze the available information

Knowing the statistical characteristics helped me determine the GPA in various subjects, as well as fashion and scope in those indicators of performance where it would seem impossible to determine them. Without data processing, comparison of events, it is impossible to trace the development of a particular problem. We tried not only to track the problem that had arisen - the decline in the quality of knowledge and academic performance in subjects, but also to try to find out the reason, which, in our opinion, lay in insufficient parental control over the academic performance of their children. The questionnaire survey and the results of academic performance showed that students of grade 7 "B" do not have enough skill in self-control over their learning, and parents believe the opposite.

The work done, I think, will be useful both for the class teacher in working with parents, and for my classmates to improve their results in individual subjects in the future.

Statistics is a science that studies, processes and analyzes quantitative data on a wide variety of mass phenomena in life. We have only revealed its characteristics a little for ourselves, and there is still a lot of unknown and interesting ahead.

Bibliography:

  1. http://www.nado5.ru/e-book/naibolshii-obzchii-delitel
    Preview:

    To use the preview of presentations, create yourself a Google account (account) and log into it: https://accounts.google.com


    Slide captions:

    Statistical data processing Prepared by: 7th grade "B" student of MAOU "Gymnasium No. 24" Anna Atyusheva Consultant: mathematics teacher Natalya Sergeevna Shchetinina

    Purpose: learn to process, analyze, and visualize the available information. Objectives: to study statistical characteristics; collect information about the progress of students in grade 7 in the quarters of the first half of the year; process information; carry out a visual presentation of information using histograms; analyze the data obtained, draw appropriate conclusions.

    A hypothesis using the example of processing data on student performance, you can not only get acquainted with the main statistical characteristics, but also learn how to collect and group statistical data; visually present statistical information; analyze the received data.

    Statistics is a science that deals with obtaining, processing and analyzing quantitative data on various mass phenomena occurring in nature and society. The word "statistics" comes from the Latin word "status", which means "state, state of affairs." Simplest statistical characteristics: Arithmetic mean Median Span Mode

    Determination of each of the characteristics using the example of a series of numbers: 47,46,52,47,52,47,52,49,45,43,53,53,47,52. The arithmetic mean of this series of numbers will be the number 48.7. (47 + 46 + 52 + 47 + 52 + 47 + 52 + 49 + 45 + 43 + 53 + 53 + 47 + 52): 14 = 48.7. The median of this series of numbers will be the number 48.43,45,46,47,47,47, 47, 49, 52,52,52,52,53,53 (47 + 49): 2 = 48 The mode of this series of numbers will be be the numbers 47 and 52. 47, 46, 52, 47, 52, 47, 52, 49,45,43,53,53, 47, 52. The range of this series of numbers will be 10. 49.45, 43, 53, 53.47.52 53-43 = 10

    Problems with academic performance in grade 7 "B"

    Option 2 3 4 5 Frequency rate no options 14 9 1 Frequency% 0% 58.3% 37.5% 4.2% Russian language. Let's sort the data of the sample (marks): 3,3,3,3,3,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4,4,4 , 4.5. Average score in the subject: 14 ∙ 3 + 9 ∙ 4 + 5 ∙ 124 = 8324≈3.5 (arithmetic mean). The largest number of students in the subject have "3" (fashion) Approximately half of the students in the Russian language study at 3 (median)

    For a visual presentation of data obtained as a result of a statistical study, various methods of their presentation are widely used.

    Comparative characteristics of student performance in subjects of the first quarter

    Comparative characteristics of student performance in subjects of the second quarter

    Histogram of the distribution of the average score in the subjects of the I and II quarters

    Comparison chart of all subjects by quality for the I and II quarters

    Questioning among students of grade 7 "B" on the subject of parental control over the education of children QUESTIONNAIRE 1. Do your parents check your homework? ___________________________________________________________ 2. How many times a week? ___________________________________________________________ 3. How many times a week do your parents look at your diary? ___________________________________________________________ 4. How much time on average do you spend each day on homework? ___________________________________________________________

    Parents check homework

    Number of homework checks per week Median = 0.0.0.0.0.0.1.1.2.2.3.3.3.3.4.4.5.7.7.7.7, 7 = (3 + 3): 2 = 3 Arithmetic mean = 3

    Histogram of students' time spent on homework