Sitemap

Member-only story

How to Find the Best Theoretical Distribution for Your Data

Knowing the underlying data distribution is a crucial step in data modeling, with numerous applications, including anomaly detection, synthetic data generation, and data compression.

19 min readFeb 3, 2023
Image by Author.

Determining the underlying (probability) distribution for your data has many modeling advantages. Examples are in decision making, statistical inferences, and predictive modeling, among others. In general, the easiest manner to determine the underlying distribution is by visually inspecting the random variable(s) using a histogram. However, to determine the exact distribution with its parameters (e.g., loc, scale), it is essential to use quantitative methods. If we know the theoretical distribution, we can create various plots that are helpful to gain more insights and understanding of our variables, such as the Probability Distribution Function plot (PDF/CDF), and the QQ plot. In this blog, I will describe why it is important to determine the underlying probability distribution for your data set. But also explain what the differences are between parametric and non-parametric distributions. I will demonstrate with hands-on examples how to find the best fit using a quantitative approach and then how to confirm it using visual inspections. All

--

--

Erdogan T
Erdogan T

Written by Erdogan T

Machine Learning | Statistics | Bayesian | D3js | Data Science | Ph.D | erdogant.github.io

Responses (16)