How can i find outliers




















Outliers can affect the shape or interpretation of your data, which is why it's necessary to understand and possibly eliminate outliers from your data set. In this article, we discuss what outliers are in statistics and describe how to find outliers in your data with examples and explanations. Outliers in statistics refer to pieces of data that differ significantly from the other points of data, so that they appear unusual. For example, if you have a set of data in which most of the data points are in the 20s, but one piece of data is in the 60s, then that piece of data in the 60s is the outlier.

Outliers can significantly change the outcome of the data, whether you're trying to calculate the mean, median, mode or range of your data set, understand trends or create a visual representation. Outliers are important to analyze because they usually hold important information about the data being studied, and may distort the findings of your data analysis. You might remove the outlier from the data set if it was an error, but analyzing it first can show you its meaning or help you predict future outliers.

They can provide insight into the data gathering, recording and analyzing process, and may be the key to discovering system inconsistencies. Even when outliers are errors, they can help you better understand your data, which is why it's important to identify and evaluate any outliers.

Here are five ways to find outliers in your data set:. An easy way to identify outliers is to sort your data, which allows you to see any unusual data points within your information.

Try sorting your data by ascending or descending order, then you can examine the data to find outliers. An unusually high or low piece of data could be an outlier. If you have a small set of data, you can do this by hand. If you have a large set of data, consider sorting it with a database program. For example, if you have these numbers in ascending order: 3, 6, 7, 10 and 54, you can see that 54 is a lot larger than the rest of the data points.

Statisticians would consider 54 an outlier. Another example could be: 2, 38, 43, 49 and You can see that 2 is much smaller than the other data points, so we can say that 2 is the outlier. Once you've identified your outliers, you can begin to research why they appeared in your data.

You can also use graphs, such as scatter plots or histograms, to find outliers. Since the outlier can be attributed to human error and because it's inaccurate to say that this room's average temperature was almost 90 degrees, we should opt to omit our outlier.

Understand the importance of sometimes retaining outliers. Scientific experiments are especially sensitive situations when dealing with outliers - omitting an outlier in error can mean omitting information that signifies some new trend or discovery. For instance, let's say that we're designing a new drug to increase the size of fish in a fish farm. In other words, the first drug gave one fish a mass of 71 grams, the second drug gave a different fish a mass of 70 grams, and so on.

In this situation, is still a big outlier, but we shouldn't omit it because, assuming it's not due to an error, it represents a significant success in our experiment.

The drug that yielded a gram fish worked better than all the other drugs, so this point is actually the most important one in our data set, rather than the least.

The range can never truly be negative. If your interquartile range is negative, you subtracted the upper quartile from the lower quartile. To correct this, either subtract the lower quartile from the upper quartile, or multiply your current answer by Not Helpful 33 Helpful Find the median of the data if it is a singular number, do not include this in either side and separate into two groups.

Then, find the median of each group. The first median is quartile 1 Q1 and the second is quartile three Q3. Use the general formula Q3 - Q1 to find the interquartile range. Not Helpful 20 Helpful Please tell me why 1. How did they come about? Are they a constant figure? This is because the definition of an outlier is any data point more than 1.

And 3 is just 1. Not Helpful 38 Helpful All measures of central tendency are influenced by outliers, but median is affected the least. For example, if the median is 5 and the number above it is 6, it doesn't matter if you have another number that is 7 or if that number is Because median is mostly about how many numbers are on each side, an outlier wouldn't affect it any more then any other number. Not Helpful 16 Helpful You use 1.

What do you think about that? Not Helpful 42 Helpful It's okay to have your lower outlier as a negative, just calculate it the same way. Not Helpful 34 Helpful Is it possible for half of my data set to be outliers if I am dealing with a large data set? Probably not. Let's say your data set is systolic blood pressure measurements.

In most studies, just to prevent the problem with human measurement errors, the blood pressure will be reported as the mean of two samples. This reduces human error greatly. Some systolic pressures are going to be way more than mmHg, while others are way lower than mmHg.

Trust your summary statistics and then do some graphics. In finding the inner fence, do I always have to multiply the inter quartile range by 1. With large amounts of data, it is possible to have multiple outliers, but it can be quite difficult to identify them as they are more likely to fall at the center of the quartiles. Not Helpful 21 Helpful Yes, it can depending on how small the sample size is. Not Helpful 13 Helpful Include your email address to get a message when this question is answered.

When outliers are found, attempt to explain their presence before discarding them from the data set; they can point to measurement errors or abnormalities in the distribution. Helpful 0 Not Helpful 0. Submit a Tip All tip submissions are carefully reviewed before being published.

Related wikiHows How to. How to. Co-authors: Updated: July 8, Categories: Probability and Statistics. Sometimes they are caused by an error. Other times outliers indicate the presence of a previously unknown phenomenon. Another reason that we need to be diligent about checking for outliers is because of all the descriptive statistics that are sensitive to outliers. The mean, standard deviation and correlation coefficient for paired data are just a few of these types of statistics.

Actively scan device characteristics for identification. Use precise geolocation data. Select personalised content. Create a personalised content profile. Measure ad performance. Select basic ads. Create a personalised ads profile. Select personalised ads. Apply market research to generate audience insights. Measure content performance. Develop and improve products. List of Partners vendors. Share Flipboard Email. Courtney Taylor. Professor of Mathematics. Courtney K. Taylor, Ph.

Featured Video. Cite this Article Format.



0コメント

  • 1000 / 1000