Classifiers

How Bigeye Detects Sensitive Data

Bigeye Sensitive Data Scanning uses a combination of pattern matching, checksum validation, and machine learning to detect sensitive data across your structured data sources. These techniques work together to build confidence before a finding is produced, balancing precision with recall to minimize false positives.

All detection runs entirely within your environment. Bigeye does not train models on your data — classification is performed using inference only.

Key Concepts

Classifier: A classifier contains the detection logic used to identify a specific type of sensitive data. Each classifier produces a single data class as its output when a match is found (e.g., "SSN", "Email Address", "Credit Card Number"). When configuring a classifier, you will select the data class to be applied when the classifier's detection patterns are satisfied.

Data class: A label that describes the type of sensitive data discovered. Each data class can be mapped to a sensitivity level — Public, Internal Only, Confidential, or Restricted — to indicate the level of risk associated with that data.

Detector: The atomic unit of detection logic within a classifier. Detectors can use regex patterns or machine learning models. When a classifier contains multiple detectors, they can be combined using AND or OR logic.

Out-of-the-Box Classifiers

Bigeye ships with a curated set of out-of-the-box classifiers ready to use in your scan jobs. These cover common sensitive data types across PII, PHI, PCI, and financial data.

Personally Identifiable Information (PII)

Data ClassDescription
NameFull or partial person names
Email AddressEmail addresses
Phone NumberPhone numbers (USA only)
Home or Mailing AddressPhysical street and mailing addresses
Date of BirthDates of birth
Place of BirthLocation of birth
AgeAge values
Mother's Maiden NameMaiden name identifiers, commonly used in security verification
Nationality, Religion, or Political GroupDemographic and group affiliation identifiers
PasswordPassword fields identified by column name and value patterns
Personal Account UsernamesUser account identifiers
IP AddressIPv4 and IPv6 addresses
LocationGeneral geographic or location data
Date or DatetimeDate and timestamp values
Web URLWeb addresses that may contain identifying information
Device Identifier or Serial NumberHardware or device serial numbers
Crypto Wallet IDCryptocurrency wallet addresses

Government & Travel Identifiers

Data ClassDescription
US SSN/TINUS Social Security Numbers and Tax Identification Numbers
US ITINUS Individual Taxpayer Identification Numbers
US Driver's License Number / State IDUS driver's license and state ID numbers across all US states
US Passport NumberUS passport numbers
Vehicle Identifiers (VIN, Plate #, Registration #)Vehicle identification numbers, license plates, and registration numbers

Protected Health Information (PHI)

Data ClassDescription
Patient NamePatient name fields in healthcare contexts
MRN (Medical Record Number)Medical record number identifiers
DiagnosisDiagnostic codes and descriptions (e.g., ICD codes)
Treatment CodesMedical treatment and procedure codes
Medical InformationGeneral medical and clinical information
Medical LicenseMedical practitioner license numbers
Provider NPI (National Provider Identifier)National Provider Identifier numbers
Health Plan / Insurance NumberHealth insurance and plan identifiers
Healthcare Admission DatePatient admission dates
Healthcare Discharge DatePatient discharge dates

Payment Card Industry (PCI)

Data ClassDescription
Credit Card NumberCredit and debit card numbers, validated with checksum
CC Expiration DateCard expiration dates
CVVCard verification values

Personal Financial Information (PFI)

Data ClassDescription
US Bank Account NumberUS bank account numbers
IBAN CodeInternational Bank Account Numbers
Account PINAccount PIN codes
Credit ScoreCredit score values

If the above list lacks a classifier that you need, please inform your Bigeye representative.

Customizing Classifiers

Clone and Adjust

You can clone any out-of-the-box classifier to create your own version. Cloned classifiers can be adjusted — for example, modifying or adding regex patterns to better match data formats specific to your organization.

Build from Scratch

You can also create classifiers entirely from scratch. When building a custom classifier, you can add one or more detectors using regex patterns, machine learning models, or a combination of both. Multiple detectors within a single classifier can be linked with AND or OR logic to fine-tune detection accuracy.

When creating or editing any classifier, you will assign a data class that will be applied as the finding output whenever the classifier's detection logic is satisfied.