Your First Sensitive Data Scan
This guide walks you through the core workflow of the Data Sensitivity module—from creating your first scan job to understanding results. By the end, you’ll have a working scan, a custom classifier, and a clear view of how findings are produced and reviewed.
Audience note: This guide focuses on how to get value quickly. Deep dives into classifiers, billing, security, and agent infrastructure are covered in separate documentation.
Step 1: Create your first scan job
Scan jobs define what data is scanned, how it’s scanned, and when it runs.
- Navigate to Data Sensitivity → Scans.
- Click Configure scan.
- In the Scope step, select the data you want to scan:
- Choose at the source, schema, or table level.
- Proceed to Scan type and select how the data should be scanned (auto, full, incremental, or sampled).
- Continue through the wizard to name the scan, set a schedule, and optionally run the scan immediately.
Technical callouts
- Scan scope (the selected data) cannot be changed after the scan job is created.
- Incremental and auto scans require a valid row creation time column to detect new or updated rows.
- Full scans can be expensive on large tables—use them deliberately.
Step 2: Create a new classifier (optional but common)
Classifiers define what kind of sensitive data the scan looks for. While Bigeye provides out-of-the-box classifiers, many teams create at least one custom classifier early on.
- Go to Data Sensitivity → Classifiers.
- Click Add classifier.
- Enter a classifier name and choose (or create) a data class.
- Configure basic detection logic (for example, a regex or ML detector).
- Save the classifier.
Technical callouts
- Each classifier produces exactly one data class.
- Detection logic can inspect both column names and column values.
- You can always refine classifiers later—changes apply to future scans, not past results.
Step 3: Add the classifier to a scan job
Once created, classifiers must be explicitly attached to a scan job to run.
- Return to Data Sensitivity → Scans.
- Edit your scan job (or continue during initial setup).
- In the Classifier step, select one or more classifiers to include.
- Save the scan job and start the scan if it’s not already running.
Technical callouts
- A scan job can include multiple classifiers.
- Removing a classifier affects future runs only; historical results are preserved.
Step 4: Monitor the scan run
After a scan starts, it produces a scan run.
- Open your scan job and navigate to the Runs tab.
- Watch the run status as it progresses from scanning to completion.
- Review basic metadata such as duration, records scanned, and status.
Technical callouts
- While a scan is running, the scan job cannot be edited or deleted.
- If a run partially fails, retry options may be available without re-running the entire scan.
Step 5: Review snapshot results (Runs view)
Snapshot results show the findings from a single scan run.
- Click into a completed run from the Runs tab.
- Review the list of findings, where each row represents a detected sensitive column.
- Filter and search by source, schema, table, data class, or sensitivity.
- Download snapshot results as CSV if needed.
How to interpret snapshot results
- Snapshot findings are immutable—they represent what was detected at that moment in time.
- They are ideal for audits, investigations, and point-in-time reporting.
Step 6: Review aggregate results (Aggregate view)
Aggregate results consolidate findings across all runs of a scan job.
- From the scan job, open the Aggregate tab.
- Review the current sensitivity state of each column based on historical scans.
- Use filters to focus on high-risk data classes or sensitivity levels.
How to interpret aggregate results
- Aggregate views err on the side of caution: if sensitive data was ever found, it remains flagged.
- This view answers the question, “What sensitive data do we believe exists right now?”
Step 7: Use reset scans to clear outdated findings
Over time, data may be cleaned up or corrected. Reset scans let you verify that previously found sensitive data is truly gone.
- In the Aggregate view, locate a column with outdated or suspected false findings.
- Mark the column for a reset scan.
- On the next scan run, the column will be fully re-scanned.
- If no sensitive data is found, the aggregate finding is cleared.
Technical callouts
- Reset scans do not trigger an immediate run; they apply to the next scheduled or manual run.
- Resetting treats the column as “new,” ensuring a complete re-evaluation.
Step 8: Understand what happens next
At this point, you have:
- A configured scan job
- One or more classifiers
- Historical scan runs
- Aggregate visibility into sensitive data
From here, teams typically:
- Refine classifiers to reduce false positives or negatives
- Expand scan coverage to additional data sources
- Export findings for audits or compliance workflows
- Integrate findings into broader data governance processes
Where to go next
- Classifier deep dive: advanced detection logic and tuning
- Understanding billing and usage: how scans impact consumption
- Permissions and access control: securing sensitive results
- Operational best practices: scaling scans across large environments
This guide covered the essential path from setup to insight—everything else builds on this foundation.
Updated 12 days ago
