What Does Data Dredging Means in Your Business In the simplest sense, data dredging is described as the act of seeking information from a set of data than it actually contains. If you are confident that you are not dredging data, click here to continue the exploration wizard.In contrast, the traditional or conventional scientific method of data dredging begins with the hypothesis and then extends through the stages of data examination.Īlternatively conducted for unethical purposes, data dredging is a data mining process that possibly circumvents the traditional techniques of data mining which may then results in premature conclusions. If you are using data mining procedures to test large data sets for 'significant' associations, be sure to correct for multiple testing and other purely statistical phenomena that might mislead interpretation.
If you have a very large data set (with hundreds or thousands of samples), it may be feasible to use a random subset of samples for exploratory analysis and test any hypotheses derived therefrom on the other samples. If you use exploratory analyses to generate hypotheses, be sure to test those hypotheses on data sets other than the one used for exploratory analysis. If not, you may simply be 'massaging' the data for a (probably false) signal.
If using data transformations or discarding data, ensure that there is solid rationale to do so.
"Data dredging" (sometimes called "data fishing") is a real risk which may invalidate any conclusions you draw from your analysis.Įxploratory analyses are used to find subsets of data that confirm (or are more likely to confirm) an a priori hypothesis which may not be generalisable to the whole (statistical) population.Įxploratory analyses are used to generate a hypothesis from a given data set which is tested using the same data set.Įvaluate if your data supports the results of a hypothesis based on previous knowledge and research.