Author(s): Chandra Mouli Yalamanchili
We have witnessed an enormous evolution in credit card processing over last few years, issuing chip-based credit cards, starting mobile device-based wallets like Apple Pay are some of the significant changes done to secure credit card transactions.
Despite financial institutions (banks) working hard to eliminate fraud in credit card transactions, credit card fraud has been continuously rising over the last few years. Fraudsters are getting smarter and using latest technologies to steal cardholder’s information, either through hacking or through social engineering.
Increasing fraud in the industry makes fraud prediction very critical to be able to identify and stop fraud in real time, and data science plays a significant role in analyzing and being able to predict fraud based on transactional and cardholder information. The scope of this project is to research and identify different types of predictive analysis algorithms available that can be applied to determine and stop fraudulent transactions.
Credit card processing is one of the fast-growing industries due to rapid advances in technology and with more and more customers switching to use credit cards instead of cash for purchases. Innovations like mobile wallets provided by Apple, Google, and other major technology firms have played an enormous role in increased usage of credit cards in recent years.
On a very high level, credit card transactions can be of two types, card present, and the card not present transactions. Card present transactions are the transactions from retail stores or gas stations where cardholder is present during the transaction, and that makes fraud a little bit difficult as the fraudster must either steal the physical card or copy the card details, to create a duplicate card. Fraud in card present transactions has reduced in recent years due to the introduction of chip cards (challenging to copy and reproduce) and increased usage of mobile wallets which have the same security as chip cards. That leaves us with the card not present transactions, where we are seeing an increased number of fraudulent transactions in recent years. These are usually e-commerce or online portal-based transactions. In this case, fraudsters needed very less information about the physical card and cardholder to perform the transactions.
Fraud transactions can be of different types, below are some examples of fraudulent transaction types
Any fraudulent transaction will add liability to different parties in the transaction flow like the merchant, merchant processor, networks like Visa/MasterCard, issuing processor, issuing bank and even cardholder depending on who was the weak link for that transaction.
Although financial institutions are working hard to eliminate fraud in credit card transactions, it has been continuously rising as fraudsters are using the latest technologies to steal cardholder’s information, either through hacking or through social engineering.
We can detect these fraudulent transactions by analyzing parameters from different segments of information like transactional information, historical information, etc. Also, considering the liability burden on banks, it is critical to be able to identify these fraudulent transactions in real time.
Capabilities of data science will add a great value to predict fraudulent transactions in real time and help financial institutions in preventing fraud. There are several techniques within data science to predict fraud. The goal of this project is to review and understand several fraud prediction models that are already documented.
There have been massive data breaches in the past few years, and these data breaches will make cardholder information available to fraudsters, subsequently increasing fraud. Below are some of the well-known data breaches happened in recent years
Below statistics from figure 1 shows the card-not-present being primary area of fraud in different market segment, the trend of the fraudulent transaction from the card present transaction has shifted to CNP after 2015 with introduction of EMV secure payments [2].
Figure 1: This Image Depicts the Distribution of US Retail Merchants Fraud Losses by Payment Method from Year 2020 [2].
Currently, we have so many fraud prediction applications some of them are rules-based, and some of them are score based. These solutions use transactional and historical information to come up with fraud prediction. Below are a few drawbacks I have noticed with these types of applications
We need a robust fraud detection system that can accommodate all the complexities involved with credit card transactions like high volume processing, volatility, variety of transactions, and criticality, and be able to consider the vast number of attributes available in transactional or historical data and predict fraudulent transactions with high precision in real time.
The current solution of static rules-based fraud prediction tools won’t stand a chance before rapidly evolving credit card industry as well as increasing fraud in the industry.
There are several machine learning algorithms that can be used to implement fraud prediction, I have mentioned few of them here based on the research done through several articles.
Hidden Markov Model (HMM): HMM is a statistical Markov model with finite set of hidden states. Only the outcome is visible to an external observer and so this can be a good solution for credit card fraud detection as it does not require fraud signatures and still it is capable to detect frauds just by bearing in mind a cardholder’s spending habit [4]
Figure 2: Image Depicting HMM Finite Set of States. Picture Created by Shailesh S Dhok for ‘Credit Card Fraud Detection Using Hidden Markov Model’ Research Paper [4].
Convolutional Neural Network (CNN): CNN is a class of deep neural network that is most used for detecting credit card fraud [5].
Figure 3: Image Depicting Typical CNN Architecture. Picture Created by Wikipedia for ‘Convolutional Neural Network’ Page on Wikipedia [5].
CNN models have multiple hidden layers, and they are on lower extreme with regards to complexity as they have advantage of the hierarchical pattern in data and assemble more complex patterns using smaller and simpler patterns. They have mechanism to avoid the model over-fitting [5].
Machine learning with neural network models is the best solution for fraud prediction applications as they can continue to learn new trends in the transactions and predict fraudulent transactions more precisely [6].
Apache Spark is interesting as it is analytics engine built to process large scale transactions in real time. Below is a potential data flow structure showing Fraud detection implementation using Apache Spark and Event Streaming [7,8].
Figure 4: Image Depicting Two Phase Approach for Real Time Fraud Detection Application. Picture Created by Carol McDonald for Real-Time-Credit-Card-Fraud-Detection-Apache-Spark-and- Event-Streaming Article on mapr.com [7].
Credit card fraud is a growing concern in today’s world with both credit card usage and credit card fraud on the rise. As shown in the stats, the number of data breaches, identity theft cases, and credit card fraudulent transactions are rising at an alarming level.
Data science plays a significant role in improving the fraud prediction tools we are currently using, by analyzing credit card transactions and being able to predict fraud based on transactional data, cardholder data, and historical data.
Machine learning algorithms running on neural networks should replace the current static rules-based fraud detection products to improve the fraud prediction precision and to stay up to speed with market trends in detecting new strategies of fraudsters and stopping them.