Skip to main content

eBay’s AI can identify 40% of credit card fraud cases with ‘high precision’

Image Credit: Mike Knell/Flickr

Watch all the Transform 2020 sessions on-demand here.


Credit card fraud is more common than you might think. In 2014, of the 17.6 million incidents of identity theft filed with law enforcement, 86 percent of victims reported fraud in connection with an existing credit card or bank account. In fact, according to the Federal Trade Commission, credit card fraud is the most common form of identity theft in the U.S., with more than 130,000 reports of it annually.

Automated methods of detecting suspicious card usage patterns are nothing new, but researchers at eBay describe a cutting-edge technique in a new paper (“Credit Card Fraud Detection in e-Commerce: An Outlier Detection Approach“) published on the preprint server Arxiv.org. Their proposed system uses an algorithm trained to recognize “good behavior,” as it relates to transactions and payments, and to flag activity that falls outside of the expected norm.

“Often the challenge associated with tasks like fraud and spam detection is the lack of all likely patterns needed to train suitable supervised learning models,” the paper’s authors wrote. “This problem accentuates when the fraudulent patterns are not only scarce, they also change over time … Limited data and continuously changing patterns makes learning significantly difficult. We hypothesize that good behavior does not change with time and data points representing good behavior have consistent spatial signature under different groupings.”

The researchers leveraged an “ensemble” of clustering methods — techniques used to identify groups of similar objects in a dataset — with different parameters. Every data point was assigned to a cluster in each training run from which a mathematical representation (vector) was produced, constituting “fingerprints” of the data point that could be combined into a unique signature representation of it.


June 5th: The AI Audit in NYC

Join us next week in NYC to engage with top executive leaders, delving into strategies for auditing AI models to ensure fairness, optimal performance, and ethical compliance across diverse organizations. Secure your attendance for this exclusive invite-only event.


To generate a signature that represented “good behavior” (i.e., consistency), the team combined the per-data point vectors and weighed them by the size of the respective cluster, arriving at a single score between 0 and 1. Low consistency — a score closer to 0 — naturally corresponded to outlier behavior.

The approach had several advantages over conventional AI fraud detection, they wrote. It didn’t require prior knowledge of outliers or inliers, for one. And the underlying algorithm was both (1) highly scalable and (2) general in nature; it could be applied to virtually any clustering problem, including those in the medical domains.

The team sourced data science platform Kaggle’s publicly available credit card database — which contains 284,807 samples of credit card transactions made in September 2013 by European cardholders in two days (492 of which are fraudulent) — to test their method. After a total of 10 runs, the algorithm was able to identify 40 percent of fraud cases with “high precision.”

It wasn’t perfect — it flagged 29 legitimate transactions — but as they noted in the paper, it’s “[a] huge gain,” considering the hundreds of thousands of data points at play.

“Our [technique] can be immensely helpful, as out of 284,807 samples we can safely rule out 139,220 [transactions],” they wrote.

If you’ve purchased or sold something on eBay recently, you might have encountered the system in action. The researchers coyly noted that it was successful in picking out fraudulent transactions in data from an “ecommerce platform”:

“The motivation for [our] approach comes from trying to identify fraudulent consumers on an ecommerce platform … Each time the ecommerce company introduces new consumer aided features or imposes restrictions on certain transactional behaviors, it opens new doors and avenues for some consumers to misuse and abuse the platform. Our algorithm shows tremendous potential in identifying [fraud] … However, due to the confidentiality of the dataset, these results cannot be reported in this paper.”