Healthcare Claims Fraud Detection

Use the ScamVerify™ API to detect fraudulent healthcare claims through document analysis, provider phone verification, and multi-channel cross-referencing.

Healthcare fraud cost the U.S. healthcare system an estimated $100 billion annually. In fiscal year 2025, the Department of Justice recovered $5.7 billion through False Claims Act cases, a record, with 83% originating from healthcare. A single genetic testing fraud scheme used fake enrollment documents to bill $2.7 million. AI-generated invoices and provider letters are becoming increasingly difficult for human reviewers to catch. The ScamVerify™ API provides multi-channel verification that detects fraudulent claims at intake, before payment is issued.

Fraud Patterns in Healthcare Claims

Document-Based Fraud

Fraudulent claims often include fabricated supporting documents:

Fake provider invoices with inflated charges, non-existent procedure codes, or AI-generated letterheads
Forged enrollment forms with fabricated patient signatures and provider credentials
Counterfeit provider letters claiming medical necessity for expensive procedures
Fictitious facility documents listing addresses that are commercial mailboxes or vacant lots

Provider Infrastructure Fraud

Fraudulent providers set up disposable infrastructure that shares common characteristics:

VoIP phone numbers on high-risk carriers, making it cheap to create and abandon provider phone lines
Newly registered domains for provider websites, often created days before claims are submitted
CMRA addresses (commercial mail receiving agencies like UPS Stores) listed as the provider facility address
Mismatched provider information where the phone carrier, domain registrar, and address do not align with a legitimate medical practice

The Multi-Channel Signal

No single signal is conclusive on its own. A VoIP phone number could belong to a legitimate telehealth provider. A new domain could be a recently opened practice. But when a claim includes a provider with a VoIP number on a high-risk carrier, a website registered 2 days ago, and a facility address at a commercial mailbox, the combined signal is strong.

Claims Intake Processor

The highest-value integration point is at claims intake, before an adjuster spends time reviewing the claim and before any payment is authorized.

import requests
import os
from dataclasses import dataclass, field

SCAMVERIFY_API_KEY = os.environ["SCAMVERIFY_API_KEY"]
BASE_URL = "https://scamverify.ai/api/v1"


@dataclass
class ClaimVerification:
    claim_id: str
    risk_level: str = "low"
    risk_score: int = 0
    flags: list = field(default_factory=list)
    document_result: dict = field(default_factory=dict)
    phone_result: dict = field(default_factory=dict)
    url_result: dict = field(default_factory=dict)
    recommended_action: str = "approve"


def verify_claim(claim_id: str, document_path: str = None,
                 provider_phone: str = None, provider_url: str = None) -> ClaimVerification:
    """
    Run multi-channel verification on a healthcare claim.
    Checks submitted documents, provider phone numbers, and provider websites.
    """
    verification = ClaimVerification(claim_id=claim_id)
    scores = []

    # Channel 1: Document analysis (invoices, enrollment forms, provider letters)
    if document_path:
        try:
            with open(document_path, "rb") as f:
                doc_response = requests.post(
                    f"{BASE_URL}/document/analyze",
                    headers={"Authorization": f"Bearer {SCAMVERIFY_API_KEY}"},
                    files={"file": (document_path.split("/")[-1], f)},
                )
            doc_response.raise_for_status()
            doc_result = doc_response.json()

            verification.document_result = {
                "document_type": doc_result["document_type"],
                "risk_score": doc_result["risk_score"],
                "verdict": doc_result["verdict"],
                "red_flags": doc_result["red_flags"],
                "entity_verifications": doc_result["entity_verifications"],
            }
            scores.append(doc_result["risk_score"])

            # Check for CMRA (commercial mailbox) address
            for addr in doc_result.get("entity_verifications", {}).get("addresses", []):
                if addr.get("is_cmra"):
                    verification.flags.append("CMRA_PROVIDER_ADDRESS")
                if not addr.get("institution_matches"):
                    verification.flags.append("INSTITUTION_MISMATCH")

            # Check for unverified officials
            for official in doc_result.get("entity_verifications", {}).get("officials", []):
                if official.get("match_confidence") == "none":
                    verification.flags.append(f"UNVERIFIED_OFFICIAL:{official['name']}")

        except requests.HTTPError as e:
            verification.flags.append(f"DOCUMENT_CHECK_FAILED:{e.response.status_code}")

    # Channel 2: Provider phone verification
    if provider_phone:
        try:
            phone_response = requests.post(
                f"{BASE_URL}/phone/lookup",
                headers={
                    "Authorization": f"Bearer {SCAMVERIFY_API_KEY}",
                    "Content-Type": "application/json",
                },
                json={"phone_number": provider_phone},
            )
            phone_response.raise_for_status()
            phone_result = phone_response.json()

            verification.phone_result = {
                "risk_score": phone_result["risk_score"],
                "verdict": phone_result["verdict"],
                "line_type": phone_result["signals"]["line_type"],
                "carrier": phone_result["signals"]["carrier"],
                "high_risk_carrier": phone_result["signals"]["high_risk_carrier"],
                "ftc_complaints": phone_result["signals"]["ftc_complaints"],
            }
            scores.append(phone_result["risk_score"])

            if phone_result["signals"]["high_risk_carrier"]:
                verification.flags.append("HIGH_RISK_VOIP_CARRIER")
            if phone_result["signals"]["ftc_complaints"] > 0:
                verification.flags.append("FTC_COMPLAINTS_ON_PROVIDER_PHONE")
            if phone_result["signals"]["robocall_flagged"]:
                verification.flags.append("ROBOCALL_ASSOCIATION")

        except requests.HTTPError as e:
            verification.flags.append(f"PHONE_CHECK_FAILED:{e.response.status_code}")

    # Channel 3: Provider website verification
    if provider_url:
        try:
            url_response = requests.post(
                f"{BASE_URL}/url/lookup",
                headers={
                    "Authorization": f"Bearer {SCAMVERIFY_API_KEY}",
                    "Content-Type": "application/json",
                },
                json={"url": provider_url},
            )
            url_response.raise_for_status()
            url_result = url_response.json()

            verification.url_result = {
                "risk_score": url_result["risk_score"],
                "verdict": url_result["verdict"],
                "domain_age_days": url_result["signals"]["domain_age_days"],
                "brand_impersonation": url_result["signals"].get("brand_impersonation"),
            }
            scores.append(url_result["risk_score"])

            domain_age = url_result["signals"]["domain_age_days"]
            if domain_age is not None and domain_age < 90:
                verification.flags.append(f"NEW_PROVIDER_DOMAIN:{domain_age}_DAYS")
            if url_result["signals"].get("brand_impersonation", {}).get("detected"):
                verification.flags.append("BRAND_IMPERSONATION_ON_PROVIDER_SITE")

        except requests.HTTPError as e:
            verification.flags.append(f"URL_CHECK_FAILED:{e.response.status_code}")

    # Calculate combined risk
    if scores:
        verification.risk_score = max(scores)

    if verification.risk_score >= 70 or len(verification.flags) >= 3:
        verification.risk_level = "high"
        verification.recommended_action = "deny_pending_investigation"
    elif verification.risk_score >= 40 or len(verification.flags) >= 1:
        verification.risk_level = "medium"
        verification.recommended_action = "escalate_to_siu"
    else:
        verification.risk_level = "low"
        verification.recommended_action = "approve"

    return verification


# Usage: process a claim with all available data
result = verify_claim(
    claim_id="CLM-2026-00847",
    document_path="/claims/intake/invoice-847.pdf",
    provider_phone="+12125559876",
    provider_url="https://advancedgenomics-lab.com",
)

print(f"Claim {result.claim_id}: {result.risk_level.upper()}")
print(f"Risk score: {result.risk_score}")
print(f"Action: {result.recommended_action}")
for flag in result.flags:
    print(f"  Flag: {flag}")

Multi-Channel Cross-Referencing

The strongest fraud detection comes from following the chain of entities across channels. A claim document contains a provider phone number and website. Each of those can be independently verified, and the results reinforce each other.

Claim document uploaded
         |
         v
  Document analysis extracts:
  - Provider: "Advanced Genomics Laboratory"
  - Address: "8400 Medical Pkwy, Suite 2200, Dallas, TX"
  - Phone: "(214) 555-0341"
  - Website: "advancedgenomics-lab.com"
  - NPI: 1234567890
         |
         v
  Cross-channel verification:
  ├── Document: Address is CMRA (mailbox center) ← RED FLAG
  ├── Document: No lab found at this address ← RED FLAG
  ├── Phone: VoIP on Bandwidth.com (high-risk carrier) ← RED FLAG
  ├── Phone: 0 FTC complaints (clean)
  ├── URL: Domain registered 8 days ago ← RED FLAG
  ├── URL: No brand impersonation (clean)
  └── Combined: 4 flags across 3 channels
         |
         v
  Result: risk_level "high"
  Action: "deny_pending_investigation"
  Rationale: Provider facility is a commercial mailbox, phone
  is a disposable VoIP line, and website was created last week.
  Pattern is consistent with a shell provider operation.

Risk Scoring for Healthcare Claims

Signal	Weight	Description
CMRA provider address	High	Provider's facility address is a commercial mailbox (UPS Store, Mailboxes Etc.). Legitimate medical practices do not operate from mailbox stores.
High-risk VoIP carrier	High	Provider phone number is on a carrier disproportionately used in fraud operations
Provider domain under 90 days old	High	Medical practices typically have established web presences. A domain created in the last 3 months is unusual.
FTC complaints on provider phone	Critical	The provider's listed phone number has been reported to the FTC for scam activity
Unverified official on documents	Medium	Named physician, administrator, or director not found in available databases
Mismatched institution	High	Claimed facility does not exist at the stated address per Google Places verification
Spelling errors in official documents	Medium	Misspelled medical terms, procedure names, or facility names on invoices and letters
Brand impersonation on provider site	Critical	Provider website mimics a known healthcare brand or insurance portal
Multiple channels flagged	Critical	When 3 or more independent channels (document, phone, URL) each produce flags, the probability of fraud increases significantly

Healthcare claims data is protected under HIPAA. Document images submitted for analysis may contain Protected Health Information (PHI) including patient names, dates of birth, diagnosis codes, and insurance identifiers. Organizations using the ScamVerify™ API for claims verification must ensure their implementation complies with HIPAA requirements for data transmission, storage, and access controls. ScamVerify™ does not store original document images after analysis. A Business Associate Agreement (BAA) is available for Enterprise plan customers.

Best Practices for Healthcare Claims Fraud Detection

Integrate at claims intake, not post-payment. Every dollar paid on a fraudulent claim is a dollar that must be recovered through litigation. Catching fraud before payment is orders of magnitude more cost-effective than pay-and-chase recovery.
Automate triage with risk thresholds. Set automated routing rules: claims with a combined risk score below 30 proceed normally, claims scoring 30 to 70 are flagged for enhanced review, and claims above 70 are held for Special Investigations Unit (SIU) review.
Verify the full provider chain. Do not check the phone number in isolation. When a claim includes a provider phone, website, and facility address, verify all three. The correlation across channels is where the strongest fraud signals emerge.
Maintain an audit trail. Store every ScamVerify™ API response alongside the claim record. The structured verification results (risk scores, verdicts, entity verifications, flags) provide documented evidence for claim denial decisions and regulatory compliance.
Track patterns across claims. When multiple claims share the same provider phone number, website domain, or facility address, and any of those entities are flagged, investigate the entire cluster. Fraud rings often submit claims through multiple patient identities but reuse provider infrastructure.
Re-verify on appeal. When a denied claim is appealed, run the verification again. Fraudsters sometimes update their infrastructure (register a new domain, switch phone numbers) between the initial submission and the appeal.

Document Analysis API Reference for document scanning request and response schemas
Phone Lookup API Reference for provider phone verification
URL Verification API Reference for provider website analysis
Insurance Claims Fraud Detection for general insurance fraud patterns including email and batch analysis

Healthcare Claims Fraud Detection

On this page