![]() It is thus important for model administrators to be aware of potential sources of bias in production systems. Unfortunately, even with the best of intentions, bias issues may exist in datasets and be introduced into models with business, ethical, and regulatory consequences. ![]() Under-representation for such groups could result in a disproportionate impact on their predicted outcomes. In fact, some of these groups may correspond to various socially sensitive features such as gender, age range, or nationality. As the number of classes, features, and unique feature values increase, your dataset may only contain a tiny number of training instances for certain groups. There are many variants of this under-representation problem. In fact, a trivial model could simply decide that transactions are always legitimate: as useless as this model would be, it would still be right 99.9% of the time! This simple example shows how careful we have to be about the statistical properties of our data, and about the metrics that we use to measure model accuracy. fraudulent), there’s a strong chance that it would be strongly influenced or biased by the majority group. Training a binary classification model (legitimate vs. Fortunately, the huge majority of transactions are legitimate, and they make up 99.9% of your dataset, meaning that you only have 0.1% fraudulent transactions, say 100 out of 100,000. Imagine that you’re working on a model detecting fraudulent credit card transactions. They are very real, and their implications can be far-reaching. First, can we ever hope to explain why our ML model comes up with a particular prediction? Second, what if our dataset doesn’t faithfully describe the real-life problem we were trying to model? Could we even detect such issues? Would they introduce some sort of bias in imperceptible ways? As we will see, these are not speculative questions at all. Today, I’m extremely happy to announce Amazon SageMaker Clarify, a new capability of Amazon SageMaker that helps customers detect bias in machine learning (ML) models, and increase transparency by helping explain model behavior to stakeholders and customers.Īs ML models are built by training algorithms that learn statistical patterns present in datasets, several questions immediately come to mind.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |