Wikipedia and reliability

A study at Dartmouth College of the English Wikipedia noted that, contrary to usual social expectations, anonymous editors were some of Wikipedia's most productive contributors of valid content. Wikipedia has harnessed the work of millions of people to produce the world's largest knowledge-based site along with software to support it, resulting in more than nineteen million articles written, across more than different language versions, in fewer than twelve years.

Wikipedia and reliability

So do we... which is why we've written a book about it and giving it away for free!

Some examples of the methods to estimate reliability include test-retest reliabilityinternal consistency reliability, and parallel-test reliability. Each method comes at the problem of figuring out the source of error in the test somewhat differently.

Item response theory[ edit ] It was well-known to classical test theorists that measurement precision is not uniform across the scale of measurement. Tests tend to distinguish better for test-takers with moderate trait levels and worse among high- and low-scoring test-takers.

Item response theory extends the concept of reliability from a single index to a function called the information function.

The IRT information function is the inverse of the conditional observed score standard error at any given test score. Estimation[ edit ] The goal of estimating reliability is to determine how much of the variability in test scores is due to errors in measurement and how much is due to variability in true scores.

Four practical strategies have been developed that provide workable methods of estimating test reliability. Administering a test to a group of individuals Re-administering the same test to the same group at some later time Correlating the first set of scores with the second The correlation between scores on the first test and the scores on the retest is used to estimate the reliability of the test using the Pearson product-moment correlation coefficient: The key to this method is the development of alternate test forms that are equivalent in terms of content, response processes and statistical characteristics.

For example, alternate forms exist for several tests of general intelligence, and these tests are generally seen equivalent. If both forms of the test were administered to a number of people, differences between scores on form A and form B may be due to errors in measurement only.

Administering one form of the test to a group of individuals At some later time, administering an alternate form of the same test to the same group of people Correlating scores on form A with scores on form B The correlation between scores on the two alternate forms is used to estimate the reliability of the test.

This method provides a partial solution to many of the problems inherent in the test-retest reliability method. For example, since the two forms of the test are different, carryover effect is less of a problem.

Reactivity effects are also partially controlled; although taking the first test may change responses to the second test.

Reliability of Wikipedia - Wikipedia

However, it is reasonable to assume that the effect will not be as strong with alternate forms of the test as with two administrations of the same test.

It may be very difficult to create several alternate forms of a test It may also be difficult if not impossible to guarantee that two alternate forms of a test are parallel measures 3.

This method treats the two halves of a measure as alternate forms. It provides a simple solution to the problem that the parallel-forms method faces: Administering a test to a group of individuals Splitting the test in half Correlating scores on one half of the test with scores on the other half of the test The correlation between these two split halves is used in estimating the reliability of the test.

This halves reliability estimate is then stepped up to the full test length using the Spearman—Brown prediction formula. There are several ways of splitting a test to estimate reliability.

For example, a item vocabulary test could be split into two subtests, the first one made up of items 1 through 20 and the second made up of items 21 through However, the responses from the first half may be systematically different from responses in the second half due to an increase in item difficulty and fatigue.

The simplest method is to adopt an odd-even split, in which the odd-numbered items form one half of the test and the even-numbered items form the other.

Wikipedia and reliability

This arrangement guarantees that each half will contain an equal number of items from the beginning, middle, and end of the original test.

The most common internal consistency measure is Cronbach's alphawhich is usually interpreted as the mean of all possible split-half coefficients.

Also, reliability is a property of the scores of a measure rather than the measure itself and are thus said to be sample dependent. Reliability estimates from one sample might differ from those of a second sample beyond what might be expected due to sampling variations if the second sample is drawn from a different population because the true variability is different in this second population.

This is true of measures of all types—yardsticks might measure houses well yet have poor reliability when used to measure the lengths of insects.

Reliability may be improved by clarity of expression for written assessmentslengthening the measure, [8] and other informal means.

Wikipedia and reliability

However, formal psychometric analysis, called item analysis, is considered the most effective way to increase reliability. This analysis consists of computation of item difficulties and item discrimination indices, the latter index involving computation of correlations between the items and sum of the item scores of the entire test.Dec 16,  · Wikipedia is about as good a source of accurate information as Britannica, the venerable standard-bearer of facts about the world around us, according to a .

Many scholars and academics have denounced Wikipedia’s claims of reliability because it does not feature any of the actual scholarly influence found in print encyclopedias.

Indeed, Wikipedia is primarily edited by individuals who do not live in the ivory towers of academia. Inter-method reliability assesses the degree to which test scores are consistent when there is a variation in the methods or instruments used.

This allows inter-rater reliability to be ruled out. When dealing with forms, it may be termed parallel-forms reliability. The reliability of Wikipedia (predominantly of the English-language edition) has been frequently questioned and often reliability has been tested statistically, through comparative review, analysis of the historical patterns, and strengths and weaknesses inherent in the editing process unique to Wikipedia.

Incidents of conflicted editing, and the use of Wikipedia for 'revenge. John Seigenthaler criticised Wikipedia's reliability The free online resource Wikipedia is about as accurate on science as the Encyclopedia Britannica, a study shows.

The British journal Nature examined a range of scientific entries on both works of reference and found few differences in accuracy.

Wikipedia cannot be considered a reliable source of the information for number of reasons, the most important of which are anonymity and failure to introduce a system which would guarantee reliability.

The Top 10 Reasons Students Cannot Cite or Rely On Wikipedia