4 implications for PMs using the Confusion Matrix to set metrics so that performance objectives are clear for developers and stakeholders

Ilona Melnychuk
3 min readJan 17, 2022

You need to measure if your intervention — the AI/ML product you’re developing — is showing signs that it will be better than what’s already in place. « Better » means that it does the job better and/or it’s cheaper. Cheaper is not always the goal, some types of value are complex to measure and calculating a dollar amount may not be accurate, such as damage to reputation or employee happiness. In any case, the quantitative performance of your ML model needs to be agreed upon before the start of a POC and right through the development stages

The Confusion Matrix will help to define how often the model needs to be right and how often it can be forgiven for being wrong.

Let’s take an example of a model that predicts whether a new YouTuber will reach 1MM subscribers by the end of the year, or not. Based on this, YouTube coaches could invest time in giving personal coaching to the YouTubers with 1MM subscriber potential.

True Positives

  • If you have a 100% true positive rate you predict correctly 100% of the time. I’m pretty sure my Mom’s advice model is believed to function at this rate.
  • e.g. If out of a sample of 100 new YouTubers 10 of them will hit 1MM subscribers, but the model predicts that 8 will, the true positive rate is 80%.
  • PM Implication: You want this rate to be high. If you were a PM on this product, your model would need to predict as close to 10 as possible and at a higher performance than the current system. The tricky part deciding what rate is good enough. You’ll need to calculate and decide at what true positive rate the investment into the AI/ML solution is worth it.
  • True Negatives
  • If you have a model that gives you 100% true negatives this means that if it says that a YouTube won’t make it, it’s right.
  • e.g. If out of 100 people, 90 YouTubers don’t make it and your model predicts that 85 will not, then it predicted that 5 too few would not make it and has roughly 94.5% true negative rate.
  • PM Implication: The rate needs to be at a level where people don’t lose trust in the model that tells them to invest coaching time in a YouTuber who’s going to get 3 subscribers and not 1MM.

False Positives

  • If a model gives you 100% false positives, any YouTuber who is predicted to reach 1MM subscribers actually won’t. This is a Type I error in statistics.
  • e.g. If out of 100 YouTubers, 10 actually reach 1MM subscribers and your model predicts that 15 will, it has a 50% error rate. So you should expect that a model will predict about 50% too many YouTubers to reach 1MM than there will be in reality. Note that in the 15 YouTuber prediction, 10 of those were true positives, it just went 5 YouTubers overboard.
  • PM Implication: False positive rates should be low as not to flag everything as worthy someone’s attention, even if they will catch all the winning YouTubers. The YouTube coaches may stop trusting the model. If everything is important, then nothing is.
  • False negatives:
  • If a model gives you 100% false negatives, any YouTuber who is predicted to not reach 1MM actually will. This is a Type II error in statistics.
  • e.g. If out of 100 YouTubers 90 don’t reach 1MM subscribers, but the model says that 95 will not then you have a 5.5% false negative rate and the coaches stand to not coach 5 star YouTubers.
  • PM Implication: You don’t want people to miss out on business, like not coaching valuable YouTubers. Therefore, this error rate must also be low.

Let’s explore further

At what false positive and false negative rate do people start to lose trust in the model predictions? In the story ‘The Boy Who Cried Wolf’ calling wolf 3 times, when there was no wolf, was enough to get ignored. How forgiving are users of AI/ML?

This post was created with Typeshare

--

--

Ilona Melnychuk

CEO and co-founder | AI/ML Product | Community Builder | Marketplace | Share Economy