San Francisco, California, United States
Contact Info
1K followers
500+ connections
Contributions
-
How can you build statistical models for Machine Learning when data is non-normal?
One way to optimize for extreme valued data is calibration. Toy example: say we want to predict a binary outcome, and one important feature has a long-tailed distribution. We can evaluate the model by measuring the validation set performance over multiple quantiles of this feature. And then explicitly tune for high F1 in the tail quantiles, and guardrail on "overall" F1. Other methods like Platt scaling can correct for skew in the predicted scores. This trains an additional ML model to predict the probability using the model score as the only feature. So if the original model is overconfident and predicts 0.8 when the true probability is 0.6, it can correct this by fitting parameters that compress the high probability scores.
Experience & Education
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More