Health care insurance fraud presents a significant challenge for both the medical industry and government bodies. It represents a serious concern for the insurance industry due to fraud’s financial impact on policyholders and insurance companies. According to the National Health Care Anti-Fraud Association, health care insurance fraud leads to the loss of tens of billions of dollars annually []. In 2020, according to the US Department of Justice, a noteworthy accomplishment in combating health care insurance fraud, recovering US $2.7 billion through settlements and judgment, was announced; however, it represented a significant 50% increase compared to the previous year []. Furthermore, the global health care insurance fraud analytics market demonstrates substantial growth, rising from US $2.43 billion in 2022 to US $3.09 billion in 2023, reflecting a compound annual growth rate of 27% []. On the other hand, health insurance is crucial to ensure people’s lives due to the high cost of medical treatments. The advantages of health insurance are being threatened by theft and fraudulent claims. With the increasing number of patient demands for health insurance, manual auditing for validating and pinpointing fraud is no longer efficient. Therefore, it is essential to create an automatic and efficient system that detects fraud.
Hence, machine learning solutions for detecting fraud that rely on data sets to train models for fraud detection have been introduced [,]. However, they raise ethical concerns as the trained models could be biased toward the majority [], and there are also privacy and security issues [] due to the potential compromise of sensitive personally identifiable information of patients. These considerations would have severe consequences, including reputational damage to insurance firms. Machine learning models should rely on high-quality data []. Therefore, they are not trustworthy so far. Recently, blockchain has emerged as a decentralized technology to implement secure transactions in a peer-to-peer network. It consists of a series of interconnected blocks of transactions. Each block contains data and is secured through cryptographic measures, such as hash functions and asymmetric encryption []. Transactions occur between nodes in a peer-to-peer network without the need for a central authority. All transactions are recorded in an immutable ledger, and peers can only add to the ledger, not alter or delete any previously recorded information []. When a new node joins the network, it downloads a copy of the ledger. Before adding a block to the blockchain, a consensus is reached among peers. In addition, blockchain can execute smart contracts [].
Blockchain has demonstrated its potential in various domains, including the health care system [,]. In particular, smart contracts in a blockchain were introduced as self-executing agents based on the transactions being executed []. However, there is a proliferation of blockchain development platforms in the literature with various characteristics, imposing challenges for software developers to determine suitable platforms that include the functionalities needed to implement insurance fraud detection solutions based on smart contracts. In this paper, we propose an automated decision map recommender system specifically designed to select the most suitable blockchain platform among the proposed platforms in the literature. We exemplify the use of our proposed recommender system by implementing smart contract–based solutions for insurance fraud detection on the selected platform. The main contributions of this research are as follows:
We proposed and developed an innovative, adaptive, and automated recommender system based on our proposed decision map. The map evaluates blockchain platforms, considering selected and categorized blockchain features. to suggest the most suitable platform. The system is flexible and responsive to changes, ensuring that, if a platform becomes unavailable or gains new features, it will generate updated results accordingly.We introduced a decision-making map recommender system that allows us to identify the best blockchain platform that is adequate for the implementation of health care insurance fraud detection. The decision map is generic and can be applied to any other domain.We developed a taxonomy of blockchain development platforms, used to determine the characteristics of the platforms that are available for implementing applications in the health insurance field. The platform taxonomy is based on the investigation of 102 blockchain platforms and their applications domains in the literature.We exemplified the applicability of our automated decision map recommender system by developing and implementing blockchain smart contracts for the detection of 12 fraud scenarios.We evaluated the implementation of our recommender system by applying it to 42 blockchain platforms. Consequently, we developed and implemented the detection of fraud on the top 2 platforms recommended by our decision-making map recommender system and evaluated their performances.We made the recommender system toolkit and source code available on GitHub for blockchain developers.Related WorksTo our knowledge, no work in the literature has automated the selection of a suitable blockchain development platform for a specific use case, such as health insurance fraud detection. In the study by Farshidi et al [], the authors divided blockchain features into four categories: (1) must-have, which indicates that the platform needed to include the specified blockchain feature to be deemed suitable; (2) should-have, which implies that the defined blockchain features are highly recommended; (3) could-have, which represent optional blockchain features; and (4) won’t-have, which meant to list the features that are not required by the developer. However, this method may not be precise as blockchain platforms often possess multiple features; for instance, some platforms offer various consensus mechanisms. Thus, classifying a single consensus into the won’t-have category could unjustly disqualify a blockchain platform that might otherwise be suitable for the use case. In addition, our system implements software that can be used by any clinic or hospital interested in adopting blockchain platforms. Moreover, the system is adaptive as it allows for adding both blockchain platforms and features as well as the modification of existing ones.
Some works have introduced machine learning and deep learning models for identifying fraud and overcoming the constraints of manual detection methods. Learning models automate the detection process and enhance the analysis of patterns. As shown in , the study by Lu et al [] proposes a deep learning graph model, which relies on an attributed heterogeneous information network with a hierarchical attention mechanism. The study by Sowah et al [] develops a decision support system using Genetic Support Vector Machines to enhance the detection and classification of health insurance fraud in Ghana. The study by Settipalli et al [] proposes an unsupervised multivariate analysis model named Weighted MultiTree Density-Based Clustering. However, the use of artificial intelligence for detecting health care insurance fraud has raised security concerns, largely due to the sensitive client data used in training the models, consequently suffering from privacy and security issues. In addition, these works do not consider the bias introduced by the use of machine learning or deep learning algorithms. As a result, our emphasis will be on solutions that leverage smart contracts, which are self-executing agreements with predefined rules that activate when conditions are fulfilled. These contracts are immutable, meaning that they cannot be altered once deployed, providing a secure and privacy-preserving blockchain solution for detecting health care insurance fraud. Moreover, throughput, latency, and Central Processing Unit (CPU) and memory use have not been taken into account in the aforementioned works.
Therefore, researchers and developers are turning toward privacy-preserving and secure blockchain-based solutions that incorporate smart contracts for the detection of health insurance fraud. These contracts execute automatically under set conditions once deployed on the blockchain, benefiting from the platform’s immutability, decentralization, and transparency, and cannot be changed after they are set up. The study by Mackey et al [] focuses on determining whether a claim adheres to the applicable provisions of the health care insurance policy. The study by Saldamli et al [] proposes a solution for preventing health insurance fraud by using 2 fraud scenarios. The study by Liu et al [] uses the Ethereum blockchain to develop a framework for recording claim data and transaction patients as validators to assist in the detection of fraud. However, none of these works takes into account all possible fraud scenarios; the quality of service of fraud detection in terms of throughput and latency; or computing resource use, such as CPU and memory. In addition, the use of blockchain platforms is unjustified, and the choice of the development platform is not justified. Our recommender system is adaptive to the evolution of blockchain platforms, offering a comprehensive approach. Furthermore, smart contracts are portable and can operate across different platforms. In this study, we implemented smart contracts based on a blockchain development platform that is selected by our adaptive automatic decision map recommender system. On the basis of these fraud scenarios, we implemented smart contracts for insurance fraud detection on the top 2 blockchain development platforms selected by our recommender system. The strengths and weaknesses of recent works using blockchain development platforms for detecting health care insurance fraud are summarized in .
Table 1. Summary of related works on fraud detection machine learning (ML) and deep learning (DL) algorithms in health insurance claims.Algorithm under studyNumber of fraud scenarios detectedConsidering privacy and securityConsidering bias issueML or DLThroughputLatencyCPUa useMemory useData setMetricsMHAMFDb []NRcXdXDLXXXXMedical-1: balanced data set with a ratio of positive to negative samples of 1:2aCPU: Central Processing Unit.
bMHAMFD: Multilevel Hierarchical Attention Mechanism for Fraud Detection.
cNR: not reported.
dX: not considered.
eGSVM: Genetic Support Vector Machine.
fWMTDBC: Weighted MultiTree Density-Based Clustering.
gCMS: Centers for Medicare & Medicaid Services.
Table 2. Summary of related works on blockchain-based health care insurance fraud detection.StudyThroughputLatencyCPUa useMemory useFraud scenarios considered, NSmart contractRecommender systemPlatformReason for choosing the platformMackey et al []XbXXX1✓cXEthereumNRdSaldamli et al []XXXX2✓XBigchainDBNRLiu et al []XXXX3✓XNRNROur work✓✓✓✓12✓✓Hyperledger Fabric and NeoOn the basis of our proposed decision-making map recommender system tailored to health care insurance fraud detectionaCPU: Central Processing Unit.
bX: not considered.
c✓: considered.
dNR: not reported.
The taxonomy of blockchain platforms was based on reviewing published research articles and white papers that mentioned blockchain platforms. Our study revealed 102 blockchain platforms that we classified according to the application domains they were developed for, such as financial services, social media, Internet of Things, and platforms that can be used across several domains. In addition, we gathered information on various features, such as whether the platform is open source, the consensus mechanism used, the type of blockchain used, and the availability of smart contracts. For the detection of health care insurance fraud, the fraud scenarios were based on the study by Ismail and Zeadally [], which proposes a taxonomy of 12 fraud scenarios that are divided into 7 categories, as shown in . The first category is commission-based, which includes 3 fraud scenarios. The first scenario involves a health care provider directing patients to specific hospitals, clinics, pharmacies, medications, or equipment suppliers in return for a commission. The second fraud scenario involves pharmacies dispensing specific brands of medicines in exchange for commissions from pharmaceutical companies. The third fraud scenario involves pharmaceutical companies offering incentives to physicians to recommend unapproved or off-label drugs. The second category is Pinning the System, which involves health care providers guiding patients to internal entities such as laboratories or pharmacies to keep profits within the organization. The third category, Waiving Copayments, is where the physician regularly waives patients’ copayments and overcharges the health care provider. The fourth category, Managed Care, consists of organizations limiting costs by denying necessary care, providing substandard treatment, and creating administrative barriers for patients. The fifth category, Billing Manipulation, consists of 4 fraud scenarios. The first involves unlicensed hospitals and physicians billing patients for care. The second scenario occurs when a physician alters a diagnosis on a claim without the patient’s knowledge. The third scenario involves health care providers offering unnecessary care, inflating service hours, submitting duplicate claims, phantom billing, or substituting diagnosis codes for higher reimbursements. The final scenario involves medical equipment providers inflating prices for insured patients or claiming expensive equipment while supplying cheaper alternatives. The sixth category, Physician Shopping, involves patients consulting multiple health care providers to obtain prescriptions for nonmedical use. Finally, the seventh category, Self-referral, occurs when physicians direct patients to clinics or health care facilities in which they have a financial interest, potentially leading to conflicts of interest.
The study does not involve any personal or patient-related data, focusing solely on blockchain platforms features. As no human subjects are included and no identifiable information is used, the research does not require ethics review board assessment, in accordance with institutional and regional guidelines for nonhuman subjects research.
The proliferation of blockchain platforms has led to a multitude of choices for developers. However, it is important to note that the various blockchain platforms available today have different features, capabilities, and use cases [,]. Therefore, developers need to evaluate the available options and select the platform that best fits their specific needs. In this section, we provide an overview of our taxonomy, which encompasses 102 blockchain platforms. Subsequently, we present our feature-based decision map recommender system to select the best platform.
Taxonomy of Blockchain PlatformsIn 2008, Bitcoin [] made its debut, and the subsequent addition of smart contract technology by Ethereum [] contributed significantly to the rapid growth and development of blockchain technology. As a result, >100 distinct blockchain platforms were developed for various purposes. To provide a comprehensive understanding of these platforms, we present a taxonomy of 102 blockchain platforms, which we organized based on their respective application domains. Along with the application domain, our classification takes into account the open-source nature of the platform, the consensus mechanism used, the type of blockchain, and the platform’s capability to support smart contract development. The taxonomy of blockchain platforms is presented as a graph. shows an overview of the different generic blockchain platforms that can be used to build a wide range of applications.
presents the blockchain platforms that have been specifically designed for financial services, whereas presents the platforms that are tailored to meet the needs of a particular application domain. These platforms offer specialized features and functionality to cater to the specific needs of their respective industries or sectors, thus providing a more specialized and customized solution for these specific use cases.
While a blockchain platform selection method was proposed in the study by Farshidi et al [], it included unnecessary categories of features and did not specifically focus on the detection of insurance fraud. Therefore, we propose a decision-making map that is tailored specifically to health care insurance fraud detection solutions. It classifies blockchain features into 3 main categories: compulsory features, which are essential to the platform; mandatory features, which are sufficient; and possible features, which are desirable but not necessary. These categories differ in weight, which determines the value of one feature over another. As shown in , our map offers a targeted approach to selecting a blockchain platform for developing health insurance fraud detection mechanisms.
In the health care insurance domain, privacy is a crucial aspect as insurance companies deal with sensitive patient data []. Several research works in health care have implemented blockchain technology to ensure integrity, accountability, and nonrepudiation in the claim process [,]. The study by Ismail and Zeadally [] proposes a blockchain system for health care insurance antifraud that ensures trusted medical process information entry and reading as well as a data privacy protection scheme. In the study by Mackey et al [], a blockchain system is proposed and implemented to prevent counterfeiting in health care insurance, providing a secure and private system.
To implement health care insurance fraud detection using blockchain, we should select features that ensure privacy, such as on-chain transactions and permissioned platforms. This is in addition to other technical features that should be available in the platform, such as the smart contract and user interface development tool features. In summary, we determined the most suitable features in terms of both their relevance to the task of health care insurance fraud detection and the technical capabilities of the platforms. We divided these features into 3 categories, which are compulsory, mandatory, and possible features ().
On the basis of the aforementioned selected features (compulsory, mandatory, and possible), our enforced decision-making map selects 42 platforms out of 102. This extraction of 42 platforms is derived from our proposed taxonomy of blockchain platforms. This taxonomy maps the blockchain platforms into their corresponding application domains and blockchain features. To ensure the privacy and security of patient files, the decision map recommender system selects the blockchain platforms that meet these specific criteria. Therefore, the selection process excludes platforms that are based on permissionless blockchain type, which is open to the public and may compromise data confidentiality. Instead, the recommender system prioritizes platforms that are suited for generic application domains and financial services, as per our taxonomy. In addition, the recommender system focuses on platforms that support the development of smart contracts.
After identifying the relevant blockchain features for health care insurance fraud detection, the recommender system initiates a mapping process to match each feature with the platforms that support it. illustrates the outcomes of this mapping. Initially, after organizing the blockchain features into categories, the recommender system proceeds to map each feature with its corresponding functionality. Next, the recommender system maps the features to the blockchain platforms. On the basis of this, it determines the suitability of each platform. Only the platforms that have all the compulsory features are considered suitable. As shown in , platforms R3 Corda and BigchainDB were eliminated from consideration due to their lack of some of the compulsory features. Our mapping process revealed that Hyperledger Fabric [] was the most optimal platform, followed by Neo [], XinFin XDC [], Quorum [], and Ethereum. These results demonstrate the effectiveness of our mapping process in identifying the ideal blockchain platform for this specific use case.
streamlines the mapping process for health care insurance fraud detection by listing the top 5 platforms and highlighting the selected features. The table is designed to simplify the decision-making map by providing a concise and easy-to-read format for comparing the features.
Compulsory features
Application layer: this capability enables the creation of a user interface and the execution of smart contracts for health care insurance.Network layer: enables the establishment of a peer-to-peer decentralized network.Protocol layer: enables the selection of a consensus protocol. We used Byzantine-based consensus protocols because they prevent the case of a failing or malicious node [].Interoperability technologies: technologies such as Oracle that facilitate the integration of data from off-chain resources into smart contracts.On-chain transaction: the transaction is conducted on the main blockchain for increased security, decentralization, and transparency.Permissioned blockchain: this type of blockchain limits access to the ledger to a select group of trusted nodes.Smart contracts: enables the development of algorithms that can identify health care insurance fraud.Mandatory features
Enterprise system interrogation: provides easy access to data, seamless data flow, and time and cost savings.Private: this type of blockchain network is only accessible to authenticated users.Turing completeness: the virtual machine of the blockchain platform is capable of solving any computational problem.JavaScript, Python, and Solidity: these languages are specifically mentioned because they are intuitive and easily learned by programmers.Possible features
Java and Golang: these languages, similar to the 3 mentioned in the mandatory features list, are intuitive and easily learned by programmers.Virtual machine: it is used to execute smart contracts.Privacy technology: ensures data privacy and certifies the eligibility of peers to participate in the network, particularly when handling sensitive patient data.Zero-knowledge proof: this encryption scheme allows one party (the prover) to assure another party (the verifier) that they know a certain value (X) without revealing the value itself.Cryptographic token: these tokens have the potential to be used as a means of payment.Cross-chain interoperability: this feature enables the connection of 2 separate blockchains to facilitate information exchange.Table 3. Decision-making map results simplified.Category and feature nameHyperledger FabricNeoEthereumQuorumXinFin XDCCompulsory featuresillustrates the use case diagram of our recommender system. Users can perform actions such as adding, editing, and deleting blockchain platforms and features. Following that, they are required to select their desired features, categorize them, and assign weights to mandatory and possible features. Ultimately, users will receive the outcome of the most suitable blockchain platform for their specific use case based on the chosen features.
In this subsection, we present our implementation of the decision map recommender system, which is a desktop software solution that provides a streamlined and efficient method to select the most suitable blockchain platform for a specific use case. Our software uses WinForms C# technology (.NET Foundation) and SQL as the database to deliver a user-friendly experience and recommend the top blockchain platforms. To demonstrate the effectiveness of our software, we used it to identify the top 5 blockchain platforms that are most suitable for health care insurance fraud detection.
defines each function of the recommender system. As previously discussed, the blockchain feature selection process involves dividing features into 3 categories: compulsory, mandatory, and possible. Compulsory features are those that must be present in the blockchain platform for it to be considered. These features are typically critical to the platform’s functionality. Mandatory features, on the other hand, are those that are essential for a specific use case or application. They are not necessarily required for the platform to function, but they are necessary for the platform to be suitable for a particular purpose. Finally, possible features are those that provide additional functionality or value to the platform. They are not necessary for the platform to function, but they can enhance its performance or provide additional benefits.
Textbox 2. Decision map recommender system functions and their definitions.Function and definition
Create the data set: blockchain platform and blockchain feature names are initially entered. Subsequently, the platforms are associated with their corresponding features.Select features and their categories and set weights: specify the category of the blockchain feature by selecting 1 of the 3 options, namely, compulsory, mandatory, or possible. Then, assign the selected feature to the designated category. In addition, assign weights to the mandatory and possible features.Obtain top platforms: retrieve blockchain platforms with compulsory features and count the number of mandatory and possible features found for each platform. Calculate a score for each platform based on its features and weights and add it to an array. Sort the array based on the calculated score to display the top-performing blockchain platforms.Creating the Data SetThis section illustrates the data set creation process, including user interactions with the recommender system. The blue annotations in the figures represent instructions. The pink annotations indicate the textboxes for input, buttons for actions, and grid controls for displaying the added platforms and features. In this step, we focused on adding the necessary platforms and blockchain features to build a comprehensive data set. shows the user’s process of entering the platform name and selecting Add Platform to populate a table showing the added platforms. The same procedure applies to adding blockchain features, resulting in a comprehensive table showcasing both platforms and features. After that, users are provided with the capability to edit platform names or delete them, as well as modify the names or choose to delete blockchain features. As shown in , by double-clicking on a platform and single-clicking on a feature, users can select and make changes to the respective names according to their preferences. After that, users should establish the association between each blockchain platform and its corresponding blockchain features. They can begin by selecting a platform by double-clicking on the row corresponding to the platform name and subsequently choosing the blockchain features that apply to that particular platform, which is done by double-clicking on the rows that correspond to the blockchain features that should be mapped to the selected blockchain platform (). Subsequently, a table will be populated with the IDs of the selected blockchain platform, the chosen blockchain feature, and the name of the blockchain feature.
shows a screenshot illustrating user interaction during the process of selecting features, assigning them to their respective categories, and assigning weights to those categories. In the initial step, users will choose a category, followed by selecting the desired feature to be assigned to that category. This selection process involves double-clicking on the feature name in the table. Subsequently, a table will display the categorized features, providing a clear overview of the features that have been assigned to their respective categories. Once the categorization of features is complete, users can proceed to set the weights for the mandatory and possible feature categories. Afterward, by clicking on the Get Platforms button, users can view the resulting platforms based on the assigned weights and feature categorization.
A and B show the sequence diagram to obtain the suitability percentage of each possible blockchain platform. The initial step involves creating a list of blockchain platforms that meet the requirements of the compulsory features. Once that is done, we determine the total number of mandatory and possible features that have been chosen (A).
After that, we iterate through the list, and for each platform, we calculate the number of mandatory and possible features (B). Finally, using equation 1, we calculate the suitability percentage of each platform (ρ). The first part of the formula calculates the contribution of the mandatory features to the suitability percentage. It takes the number of mandatory features found for the platform (Mfound), multiplies it by 100 to convert it to a percentage, and then divides it by the number of mandatory features selected (Mtotal) multiplied by the weight assigned to mandatory features (ωM), which is 0.7.
The second part of the formula calculates the contribution of the possible features to the suitability percentage. It takes the number of possible features found for the platform (Pfound), multiplies it by 100 to convert it to a percentage, and then divides it by the number of possible features selected (Ptotal) multiplied by the weight assigned to possible features (ωP), which is 0.3.
By combining these 2 contributions, the suitability percentage provides an overall assessment of how well a blockchain platform meets the selected features, with a higher percentage indicating a better match.
(1)
Once we have calculated the suitability percentage (ρ) for each platform, we sort the list of platforms in descending order based on their scores. shows the flowchart of the recommender system’s algorithm, which consists of the different functions involved along with their corresponding input and output parameters.
The platform with the highest score will be at the top of the list, whereas the one with the lowest score will be at the bottom. Finally, we display the top 5 platforms in the list, which are the ones that have the highest scores and, therefore, are the most suitable based on the selected features.
Ismail and Zeadally [] identified the fraud scenarios used for detecting health care insurance fraud, as shown in . The network for detecting health care insurance fraud is made up of 9 participants, as illustrated in .
For certain fraud scenarios, we need to discover a detectable pattern, whereas for others, data from off-chain sources may be required. The required data for processing claims consist of detailed records of patient visits, including the dates in which they occurred, the departments involved, the services rendered, and patients’ information. Consequently, they are on-chain. However, documentation of billed services, detailed service invoices, and pharmacy records are off-chain in the database.
3 Referral Fraud ScenariosAs shown in , we use an algorithm to recognize 3 fraud scenarios that have the same pattern—the referral. We then check for the first scenario, in which the fraudster refers patients within the same health care organization. If this is confirmed, the fraud type is pinning the system. If not, we investigate whether a financial relationship exists between the fraudster and the other organizations. If such a relationship is detected, it is self-referral fraud. If no financial relationship is found, we investigate whether the fraudster received a commission from the organization; if so, it is a commission-based fraud.
In this fraud scenario (A), we obtain all the medication that the possible fraudster has prescribed. We then check whether a specific medication is prescribed more frequently and determine whether the fraudster is receiving a commission. If that is the case, it is a fraud. In B, we should obtain a list from the minister of health containing the approved and labeled drugs and then compare it to the ones prescribed by the fraudster; if we find a drug that does not exist on the list from the minister of health, there is a fraud.
In A, we investigate whether other patients on that date received the same service that the patient requested, and if the number of patients reaches a certain threshold, we are able to demonstrate that the managed care scenario is occurring.
The code for detecting waiving copayment fraud in B involves comparing the price listed in the claim with the price mentioned in the corresponding invoice. If there is a mismatch between the 2 prices, it indicates a potential instance of waiving copayment fraud. The code performs a comparison operation to check whether the claim price and the invoice price are equal. If they are not, it raises an alert or triggers further actions to investigate the possibility of fraud.
In A, we obtain a list of licensed health care providers to determine whether the suspected fraudster is listed. If not, there is a case of fraud. In B, we compare the diagnosis code on the claim to the one on the patient files; if they do not match, we will assume fraud.
In this fraud scenario (A), we may require the opinion of another physician, so after determining whether the claim has been duplicated, we gather all the necessary data to be reviewed by another physician, and based on the physician’s response, we determine whether the claim is fraudulent. In B, we obtain a price list from other equipment suppliers and then compare it to the price paid by the patient; if it is higher than the price on the price list, the patient paid more than necessary for the equipment, and hence, it is a fraud.
illustrates physician shopping fraud, in which an addicted individual visits multiple health care providers to obtain unprescribed drugs. To detect this fraud, we must examine 5 invoices. We check whether the patient visits the provider regularly based on the dates from the invoices and whether the visits are not to the same provider. If this is confirmed, it is a case of fraud.
Comments (0)