› 门户 ›Wiki› Wiki› People › view content

Misuse of statistics

2015-2-8 13:38| view publisher: amanda| views: 1003| wiki(57883.com) 0 : 0

description: Misuse of statistics can produce subtle, but serious errors in description and interpretation—subtle in the sense that even experienced professionals make such errors, and serious in the sense that t ...

Misuse of statistics can produce subtle, but serious errors in description and interpretation—subtle in the sense that even experienced professionals make such errors, and serious in the sense that they can lead to devastating decision errors. For instance, social policy, medical practice, and the reliability of structures like bridges all rely on the proper use of statistics.
Even when statistical techniques are correctly applied, the results can be difficult to interpret for those lacking expertise. The statistical significance of a trend in the data—which measures the extent to which a trend could be caused by random variation in the sample—may or may not agree with an intuitive sense of its significance. The set of basic statistical skills (and skepticism) that people need to deal with information in their everyday lives properly is referred to as statistical literacy.
There is a general perception that statistical knowledge is all-too-frequently intentionally misused by finding ways to interpret only the data that are favorable to the presenter.[19] A mistrust and misunderstanding of statistics is associated with the quotation, "There are three kinds of lies: lies, damned lies, and statistics". Misuse of statistics can be both inadvertent and intentional, and the book How to Lie with Statistics[19] outlines a range of considerations. In an attempt to shed light on the use and misuse of statistics, reviews of statistical techniques used in particular fields are conducted (e.g. Warne, Lazo, Ramos, and Ritter (2012)).[20]
Ways to avoid misuse of statistics include using proper diagrams and avoiding bias.[21] Misuse can occur when conclusions are overgeneralized and claimed to be representative of more than they really are, often by either deliberately or unconsciously overlooking sampling bias.[22] Bar graphs are arguably the easiest diagrams to use and understand, and they can be made either by hand or with simple computer programs.[21] Unfortunately, most people do not look for bias or errors, so they are not noticed. Thus, people may often believe that something is true even if it is not well represented.[22] To make data gathered from statistics believable and accurate, the sample taken must be representative of the whole.[23] According to Huff, "The dependability of a sample can be destroyed by [bias]... allow yourself some degree of skepticism."[24]
To assist in the understanding of statistics Huff proposed a series of questions to be asked in each case:[24]
Who says so? (Does he/she have an axe to grind?)
How does he/she know? (Does he/she have the resources to know the facts?)
What’s missing? (Does he/she give us a complete picture?)
Did someone change the subject? (Does he/she offer us the right answer to the wrong problem?)
Does it make sense? (Is his/her conclusion logical and consistent with what we already know?)

The confounding variable problem: X and Y may be correlated, not because there is causal relationship between them, but because both depend on a third variable Z. Z is called a confounding factor.
Misinterpretation: correlation
The concept of correlation is particularly noteworthy for the potential confusion it can cause. Statistical analysis of a data set often reveals that two variables (properties) of the population under consideration tend to vary together, as if they were connected. For example, a study of annual income that also looks at age of death might find that poor people tend to have shorter lives than affluent people. The two variables are said to be correlated; however, they may or may not be the cause of one another. The correlation phenomena could be caused by a third, previously unconsidered phenomenon, called a lurking variable or confounding variable. For this reason, there is no way to immediately infer the existence of a causal relationship between the two variables. (See Correlation does not imply causation.)
History of statistical science

Blaise Pascal, an early pioneer on the mathematics of probability.
Main articles: History of statistics and Founders of statistics
Statistical methods date back at least to the 5th century BC.
Some scholars pinpoint the origin of statistics to 1663, with the publication of Natural and Political Observations upon the Bills of Mortality by John Graunt.[25] Early applications of statistical thinking revolved around the needs of states to base policy on demographic and economic data, hence its stat- etymology. The scope of the discipline of statistics broadened in the early 19th century to include the collection and analysis of data in general. Today, statistics is widely employed in government, business, and natural and social sciences.
Its mathematical foundations were laid in the 17th century with the development of the probability theory by Blaise Pascal and Pierre de Fermat. Mathematical probability theory arose from the study of games of chance, although the concept of probability was already examined in medieval law and by philosophers such as Juan Caramuel.[26] The method of least squares was first described by Adrien-Marie Legendre in 1805.

Karl Pearson, the founder of mathematical statistics.
The modern field of statistics emerged in the late 19th and early 20th century in three stages.[27] The first wave, at the turn of the century, was led by the work of Sir Francis Galton and Karl Pearson, who transformed statistics into a rigorous mathematical discipline used for analysis, not just in science, but in industry and politics as well. Galton's contributions to the field included introducing the concepts of standard deviation, correlation, regression and the application of these methods to the study of the variety of human characteristics – height, weight, eyelash length among others.[28] Pearson developed the Correlation coefficient, defined as a product-moment,[29] the method of moments for the fitting of distributions to samples and the Pearson's system of continuous curves, among many other things.[30] Galton and Pearson founded Biometrika as the first journal of mathematical statistics and biometry, and the latter founded the world's first university statistics department at University College London.[31]

Ronald Fisher coined the term "null hypothesis".
The second wave of the 1910s and 20s was initiated by William Gosset, and reached its culmination in the insights of Sir Ronald Fisher, who wrote the textbooks that were to define the academic discipline in universities around the world. Fisher's most important publications were his 1916 seminal paper The Correlation between Relatives on the Supposition of Mendelian Inheritance and his classic 1925 work Statistical Methods for Research Workers. His paper was the first to use the statistical term, variance. He developed rigorous experimental models and also originated the concepts of sufficiency, ancillary statistics, Fisher's linear discriminator and Fisher information.[32]
The final wave, which mainly saw the refinement and expansion of earlier developments, emerged from the collaborative work between Egon Pearson and Jerzy Neyman in the 1930s. They introduced the concepts of "Type II" error, power of a test and confidence intervals. Jerzy Neyman in 1934 showed that stratified random sampling was in general a better method of estimation than purposive (quota) sampling.[33]
Today, statistical methods are applied in all fields that involve decision making, for making accurate inferences from a collated body of data and for making decisions in the face of uncertainty based on statistical methodology. The use of modern computers has expedited large-scale statistical computations, and has also made possible new methods that are impractical to perform manually. Statistics continues to be an area of active research, for example on the problem of how to analyze Big data.[34]
Applications
Applied statistics, theoretical statistics and mathematical statistics
"Applied statistics" comprises descriptive statistics and the application of inferential statistics.[35][verification needed] Theoretical statistics concerns both the logical arguments underlying justification of approaches to statistical inference, as well encompassing mathematical statistics. Mathematical statistics includes not only the manipulation of probability distributions necessary for deriving results related to methods of estimation and inference, but also various aspects of computational statistics and the design of experiments.
Machine learning and data mining
There are two applications for machine learning and data mining: data management and data analysis. Statistics tools are necessary for the data analysis.
Statistics in society
Statistics is applicable to a wide variety of academic disciplines, including natural and social sciences, government, and business. Statistical consultants can help organizations and companies that don't have in-house expertise relevant to their particular questions.
Statistical computing

gretl, an example of an open source statistical package
Main article: Computational statistics
The rapid and sustained increases in computing power starting from the second half of the 20th century have had a substantial impact on the practice of statistical science. Early statistical models were almost always from the class of linear models, but powerful computers, coupled with suitable numerical algorithms, caused an increased interest in nonlinear models (such as neural networks) as well as the creation of new types, such as generalized linear models and multilevel models.
Increased computing power has also led to the growing popularity of computationally intensive methods based on resampling, such as permutation tests and the bootstrap, while techniques such as Gibbs sampling have made use of Bayesian models more feasible. The computer revolution has implications for the future of statistics with new emphasis on "experimental" and "empirical" statistics. A large number of both general and special purpose statistical software are now available.
Statistics applied to mathematics or the arts
Traditionally, statistics was concerned with drawing inferences using a semi-standardized methodology that was "required learning" in most sciences. This has changed with use of statistics in non-inferential contexts. What was once considered a dry subject, taken in many fields as a degree-requirement, is now viewed enthusiastically.[according to whom?] Initially derided by some mathematical purists, it is now considered essential methodology in certain areas.
In number theory, scatter plots of data generated by a distribution function may be transformed with familiar tools used in statistics to reveal underlying patterns, which may then lead to hypotheses.
Methods of statistics including predictive methods in forecasting are combined with chaos theory and fractal geometry to create video works that are considered to have great beauty.
The process art of Jackson Pollock relied on artistic experiments whereby underlying distributions in nature were artistically revealed.[citation needed] With the advent of computers, statistical methods were applied to formalize such distribution-driven natural processes to make and analyze moving video art.[citation needed]
Methods of statistics may be used predicatively in performance art, as in a card trick based on a Markov process that only works some of the time, the occasion of which can be predicted using statistical methodology.
Statistics can be used to predicatively create art, as in the statistical or stochastic music invented by Iannis Xenakis, where the music is performance-specific. Though this type of artistry does not always come out as expected, it does behave in ways that are predictable and tunable using statistics.
Specialized disciplines
Main article: List of fields of application of statistics
Statistical techniques are used in a wide range of types of scientific and social research, including: biostatistics, computational biology, computational sociology, network biology, social science, sociology and social research. Some fields of inquiry use applied statistics so extensively that they have specialized terminology. These disciplines include:
Actuarial science (assesses risk in the insurance and finance industries)
Applied information economics
Astrostatistics (statistical evaluation of astronomical data)
Biostatistics
Business statistics
Chemometrics (for analysis of data from chemistry)
Data mining (applying statistics and pattern recognition to discover knowledge from data)
Demography
Econometrics (statistical analysis of economic data)
Energy statistics
Engineering statistics
Epidemiology (statistical analysis of disease)
Geography and Geographic Information Systems, specifically in Spatial analysis
Image processing
Medical Statistics
Psychological statistics
Reliability engineering
Social statistics
In addition, there are particular types of statistical analysis that have also developed their own specialised terminology and methodology:
Bootstrap / Jackknife resampling
Multivariate statistics
Statistical classification
Structured data analysis (statistics)
Structural equation modelling
Survey methodology
Survival analysis
Statistics in various sports, particularly baseball - known as 'Sabremetrics' - and cricket
Statistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems variability, control processes (as in statistical process control or SPC), for summarizing data, and to make data-driven decisions. In these roles, it is a key tool, and perhaps the only reliable tool.

up one:Statisticsnext:Proper noun

Misuse of statistics

related categories