Your First Sensitive Data Scan

This guide walks you through the core workflow of the Data Sensitivity module—from creating your first scan job to understanding results. By the end, you’ll have a working scan, a custom classifier, and a clear view of how findings are produced and reviewed.

Audience note: This guide focuses on how to get value quickly. Deep dives into classifiers, billing, security, and agent infrastructure are covered in separate documentation.

Step 1: Create your first scan job

Scan jobs define what data is scanned, how it’s scanned, and when it runs.

Navigate to Data Sensitivity → Scans.
Click Configure scan.
In the Scope step, select the data you want to scan:
- Choose at the source, schema, or table level.
Proceed to Scan type and select how the data should be scanned (auto, full, incremental, or sampled).
Continue through the wizard to name the scan, set a schedule, add recipients of notifications and optionally run the scan immediately.

Technical callouts

Scan scope (the selected data) cannot be changed after the scan job is created.
Incremental and auto scans require a valid row creation time column to detect new or updated rows. If no RCT column is defined the table will receive a full table scan.
Full scans can be expensive on large tables—use them deliberately.

Step 2: Create a new classifier (optional)

Classifiers define what kind of sensitive data the scan looks for. While Bigeye provides out-of-the-box classifiers, many teams create at least one custom classifier early on.

Go to Data Sensitivity → Classifiers.
Click Add classifier.
Enter a classifier name, description, and choose (or create) a data class.
Configure detectors (for example, a regex or ML detector).
Configure detection logic, option to require ALL or ANY (AND or OR) logic
Save the classifier.

Technical callouts

Each classifier produces exactly one data class.
Regex detection logic can inspect both column names and column values.
You can always refine classifiers later—changes apply to future scans, never past run results.

Step 3: Add the classifier to a scan job

Once created, classifiers must be explicitly attached to a scan job to run.

Return to Data Sensitivity → Scans.
Edit your scan job (or continue during initial setup).
In the Classifier step, select one or more classifiers to include.
Save the scan job.

Technical callouts

A scan job can include one to many classifiers.
Removing a classifier affects future runs only; historical results are preserved.

Step 4: Analyze findings

After a scan starts, it produces a scan run.

Open your scan job and navigate to the Runs tab.
Watch the run status as it progresses from scanning to completed.
Review basic metadata such as duration, records scanned, and status.

Technical callouts

While a scan is running, the scan job cannot be edited or deleted.
If a run partially fails, retry options may be available without re-running the entire scan.

Step 5: Review snapshot results (Runs view)

Snapshot results show the findings from a single scan run.

Click into a completed run from the Runs tab.
Review the list of findings, where each row represents sensitive data detected on a column.
Filter, sort, and search by source, schema, table, data class, or sensitivity.
Download snapshot results as CSV or PDF if needed.

How to interpret snapshot results

Snapshot findings are immutable—they represent what was detected at that moment in time.
They are ideal for audits, investigations, and point-in-time reporting.

Step 6: Review aggregate results (Aggregate view)

Aggregate results consolidate findings across all runs of a scan job.

From the scan job, open the Aggregate tab.
Review the current sensitivity state of each column based on historical scans.
Use filters to focus on high-risk data classes or sensitivity levels.

How to interpret aggregate results

Scan jobs configured as full table scans do not show aggregate findings.
Aggregate views err on the side of caution: if sensitive data was ever found, it remains flagged.
This view answers the question, “What sensitive data do we believe exists right now?”

Step 7: Use reset scans to clear outdated findings

Over time, data may be cleaned up or corrected. Reset scans let you verify that previously found sensitive data is truly gone.

In the Aggregate view, locate a column with outdated or suspected false findings.
Mark the column for a reset scan.
On the next scan run, the column will be fully re-scanned.
If no sensitive data is found, the aggregate finding is cleared.

Technical callouts

Reset scans do not trigger an immediate run; they apply to the next scheduled or manual run.
Resetting treats the column as “new,” ensuring a complete re-evaluation.

Step 8: Understand what happens next

At this point, you have:

A configured scan job
One or more classifiers
Historical scan runs
Aggregate visibility into sensitive data

From here, teams typically:

Refine classifiers to reduce false positives or negatives
Expand scan coverage to additional data sources
Integrate findings into broader data governance processes