Scan Outputs
Sensitive Data Scanning produces structured results every time a scan runs. This page explains how those results are organized, where to find them, how to export them, and — critically — what each output type actually tells you (and what it doesn't).
Before reading this: If you haven't set up your first scan yet, start with Build your first scan. This page assumes you have at least one scan job configured and have seen scan results.
Key concepts
Before diving in, it helps to understand some key scan terms:
| Concept | Definition |
|---|---|
| Scan job | A saved configuration: which tables to scan, which classifiers to use, and how often to run. A scan job produces many scan runs over time. |
| Scan run | One execution of a scan job. Each run is a snapshot — a record of what was found at a specific point in time. |
| Finding | The result of a classifier checking a column. A finding can be positive (data class detected) or negative (nothing found). Findings are recorded at the column level, not the row level. |
| Data class | The type of sensitive data identified, e.g. SSN, Email Address, Credit Card Number. |
| Classifier | A detection rule that evaluates column data and determines whether it matches a specific data class. A scan job can include multiple classifiers. |
| Sensitivity level | The risk tier assigned to a data class: Public (L0), Internal Only (L1), Confidential (L2), or Restricted (L3). Unassigned data classes default to Restricted. |
Two views of your results
Every scan job exposes two distinct views of its findings. Understanding the difference between them is the most important thing on this page.
Snapshot view (per scan run)
The runs view shows the output of a single scan run. It captures what the classifiers found during that specific execution and is frozen in time — it will never change, regardless of what happens to your data, data class sensitivity levels, or your classifier configuration afterward.
Navigate to a snapshot from: Data sensitivity → [Scan job name] → Runs tab → click a run
The snapshot view shows:
| Column | Description |
|---|---|
| Source / Schema / Table / Column | The full path to the scanned column |
| Data class | The sensitive data type detected |
| Rows scanned | Number of rows the classifier evaluated |
| Rows matched | Number of rows that triggered the classifier |
| Sensitivity | The assigned sensitivity level |
Tip: The default filter on the runs view shows only positive findings (where rows matched). To see columns that were scanned but produced no match, use the filter to include negative findings as well.
Aggregate view (across all runs)
The aggregate view consolidates findings across every run a scan job has ever completed. Rather than showing a single moment in time, it answers the question: "Has this column ever been found to contain this type of sensitive data?"
Note that aggregate behavior varies by scan type. For auto, incremental, and sampled scans, aggregate findings are additive — once a finding is recorded, it persists until a reset scan clears it. For full scans, findings are replaced on each run, so the aggregate reflects the most recent execution.
Navigate to the aggregate view from: Data sensitivity → [Scan job name] → Aggregate
The aggregate view surfaces a Reset Scan action per finding row for auto, incremental, and sampled scan types. Reset scans are not available for full scan jobs because full scans re-evaluate all data on every run. See Clearing stale aggregate findings below.
Catalog sensitivity tab
In addition to the scan-specific views above, aggregated findings from all scan jobs are surfaced in the data catalog. Navigate to any source, schema, or table and open the Sensitivity tab to see all findings associated with that asset, with attribution showing which scan job and classifier produced each result.
Note: If a column has findings from two different scan jobs, both appear in the catalog — nothing is suppressed or merged. The catalog always shows the union of all scan job results.
The findings table in detail
Whether you're looking at a snapshot or aggregate view, findings are displayed one row per column + data class pairing. A single column can appear multiple times in the table if multiple classifiers each found a different data class.
Example:
| Column | Data class | Sensitivity | Rows scanned | Rows matched |
|---|---|---|---|---|
users.email | Email Address | Confidential | 50,000 | 48,210 |
users.email | Username | Internal Only | 50,000 | 12 |
orders.card_num | Credit Card Number | Restricted | 50,000 | 50,000 |
In this example, the users.email column has two separate findings because two classifiers flagged it for different data classes. This is expected behavior — a column can legitimately contain data that matches multiple classes.
Important: Findings are never deduplicated or merged across classifiers. Each classifier produces an independent result. When reviewing output for compliance purposes, account for this when counting "sensitive columns."
Runs vs. aggregate: which should you use?
Use the right view for the right job:
| Use case | Recommended view |
|---|---|
| Point-in-time audit or compliance report | Runs |
| Understanding current data risk posture | Aggregate (with a recent re-scan) |
| Verifying a remediation (e.g., "did we remove SSNs?") | Runs from a run after remediation |
| Reviewing all findings across scan jobs for an asset | Catalog sensitivity tab |
| Debugging a specific scan run | Runs |
Risks of misinterpreting scan outputs
This section covers the most common mistakes when reading scan results. Read it carefully before sharing reports with stakeholders.
Tables added after scan creation are not included
Tables added to a source or schema after a scan job is created will not be automatically included in the scan. If new tables appear that need scanning, create a new scan job. See the scope warning displayed during the scan job creation wizard.
Aggregate findings can be stale
For auto, incremental, and sampled scans, aggregate findings are additive and persistent. Once a finding is recorded, it remains in the aggregate view until you explicitly trigger a reset scan for that column — even if the sensitive data has since been deleted from your database. (Full scans replace findings on each run, so this concern does not apply to full scan jobs.)
Scenario: Your team manually redacts SSNs from a column that previously contained SSNs. You rerun the scan. The snapshot from the new run shows no SSN findings for that column. However, the aggregate view still shows the SSN finding from the prior run.
Why this matters: If you share an aggregate report to demonstrate compliance after a remediation, it will appear as though the sensitive data is still present.
What to do: Use the Reset scan action on the aggregate view to flag affected columns for a full rescan, then re-run the scan job. Once the rescan completes, the aggregate will reflect current reality.
Sampled scans may miss sensitive data
When a scan job uses sampled scanning, the classifier only evaluates a subset of rows — not the full column. If sensitive data exists only in rows that weren't included in the sample, the scan will produce no finding.
Auto/incremental scans only evaluate new rows
Auto/incremental scans scan rows added or updated since the last run, using a row creation time column. They do not re-evaluate rows that were present in previous runs.
Scenario: Your scan job is configured as incremental. An SSN was loaded into a column two years ago, before incremental scanning was enabled. The SSN will never appear in an incremental scan finding — it's in rows that are never evaluated.
What to do: Use an auto scan, which performs a full-table scan on its first run to establish a complete baseline, then switches to incremental scanning for subsequent runs. Alternatively, run a full scan at least once. Use auto/incremental scans to catch new sensitive data efficiently, but don't rely on them to surface sensitive data that predates the scan job.
Data can change during a scan
Bigeye splits tables into chunks using queries and processes the chunks independently over a period of time. If your table's data changes while a scan is in progress, some chunks may reflect pre-change data while others reflect post-change data. The scan's "Started At" through "Completed At" window represents a critical period during which the underlying data may have shifted. Keep this in mind when interpreting findings from scans that run over long durations on actively-written tables.
Snapshot staleness
Snapshots reflect the state of your data at the time of the scan run. They are labeled with their completion timestamp. A snapshot from three months ago does not tell you what your data looks like today.
Always check the "last scan completed" timestamp before treating a snapshot as current. The UI displays this prominently and will indicate if a snapshot is stale relative to your scan schedule.
Clearing stale aggregate findings
If a column's aggregate findings no longer reflect the current state of your data, kick off a Reset Scan. This option is available for auto, incremental, and sampled scan types only — full scans re-evaluate all data on every run and do not need reset scans.
- Navigate to Data sensitivity → [Scan job name] → Aggregate
- Find the column-data class row you want to refresh
- Click the Reset scan icon (↻) on that row
- The icon changes to an × — click it to undo if needed
- On the next run, the column will be fully re-evaluated from scratch, and stale findings will be cleared if the data class is no longer present
Note: Clicking the reset icon does not trigger an immediate scan. It marks the column for a full rescan on the next scheduled (or manual) run. The column will be treated as if it were newly added — all chunks will be rescanned against all classifiers.
Kicking off a reset scan requires Write permission on the scan job.
Exporting results
Both the runs and aggregate views support CSV and PDF export. Every export explicitly identifies whether it represents a run or an aggregate result.
You can create an export by selecting the Export button in the top right corner of either your Runs or Aggregate tab view.
Export format and compliance use
Each export file identifies in its filename whether it is a run (snapshot) or aggregate report. Before using an export for a compliance audit or data inventory:
- Check the report type. Aggregate reports are not point-in-time certifications. Snapshot reports are.
- Check the scan type. A report from a sampled scan carries higher uncertainty than one from a full scan. CSV exports include a column indicating the scan type; PDF exports currently reflect the scan type in the header only.
- Check the timestamp. Verify the scan completed recently enough to be meaningful for your use case.
What is preserved when things are deleted
Understanding deletion behavior matters for audit trails:
| Deleted object | Effect on snapshot findings | Effect on aggregate findings |
|---|---|---|
| Scan job (deleted) | Soft-deleted; preserved indefinitely for audit | Soft-deleted; preserved indefinitely for audit |
| Classifier (deleted) | Snapshot findings persist forever | Aggregate findings persist until next re-scan |
| Scan job edited to remove a classifier | Snapshot findings persist forever | Behaves the same as "classifier deleted" — findings persist until next re-scan |
| Column (deleted from source) | Historical run outputs are preserved and marked as deleted | Findings immediately disappear from the aggregate view |
| Table (deleted from source) | Historical run outputs are preserved and marked as deleted | Findings immediately disappear from the aggregate view |
| Data class (deleted) | Snapshot findings persist forever | Aggregate findings persist until next re-scan |
Frequently asked questions
Can I drill down to see which specific rows triggered a finding? No. For security reasons, Bigeye does not display the raw data values that triggered a classifier. The findings table shows the column path, data class, and row counts only.
A column appears twice in my findings table. Is that a bug? No — it means two different classifiers found data in that column for different data classes. This can also happen if the same column is scanned by two different scan jobs using the same classifier. Review each finding independently.
My reset scan completed, but the aggregate still shows the old finding. The reset scan marks the column for re-scan, but the aggregate won't update until the scan job actually runs. Check whether a run has completed since you marked the column for re-scan. If the finding persists after a completed run, it means the data class still exists in the column.
I deleted sensitive data from my database. How do I get it out of the aggregate view? For auto, incremental, or sampled scan jobs: trigger a reset scan on the affected column(s), then run the scan job. Once the new run completes with no finding, the aggregate will no longer show that data class for that column. For full scan jobs: the aggregate is replaced on each run, so simply re-running the scan will clear findings for data that is no longer present.
Why does my snapshot show fewer rows scanned than exist in the table? You're likely using a sampled scan or an auto/incremental scan. Check the scan type on the run detail page. For full coverage, configure the scan job as a full table scan.
Why does my aggregate show fewer rows scanned than exist in the table? The aggregate reflects cumulative scan activity. If you're using an incremental or sampled scan, not all rows may have been evaluated across all runs. For a complete picture, consider running a full scan or an auto scan (which starts with a full scan on its first run).
Why do auto/incremental scans keep scanning rows even though there's no new data? Bigeye re-scans "boundary rows" — rows near the maximum RCT value observed in the previous run — to account for late-arriving data. It also re-scans rows with null RCT values because it cannot determine whether they were previously scanned. See the Row Creation Time section in the scan configuration docs for more details.
Updated about 2 hours ago
