Available Metrics

This is an overview of all metrics available in Bigeye. Bigeye categorizes metrics into data quality dimensions to better help you summarize and track progress over time.

Pipeline Reliability

Pipeline reliability metrics detect whether tables are updating on time and with the expected volume of data.

Metric Name	API Name	Description
Freshness	FRESHNESS	For a given table update, the number of hours since the previous table update (INSERT, COPY, MERGE, CREATE, etc). Available on Snowflake, BigQuery, and Redshift sources, excluding database views.
Volume	VOLUME	For a given table update, the number of rows inserted or upserted to the table (INSERT, COPY, MERGE, CREATE TABLE AS SELECT, etc). Available on Snowflake, BigQuery, and Redshift sources, excluding database views.
Freshness (Data)	FRESHNESS_DATA	This is similar to Freshness but based on data, not metadata. This is available on source types other than snowflake, bigquery, and redshift, and also available for all views. It mimics the behavior of Freshness by recording the time of the actual loads, not by measuring the time since a load.
Volume (Data)	VOLUME_DATA	This is similar to Volume but based on data, not metadata. This is available on source types other than snowflake, bigquery, and redshift, and also available for all views. It mimics the behavior of Volume.
Hours since latest value	HOURS_SINCE_MAX_TIMESTAMP or HOURS_SINCE_MAX_DATE	Applicable to DATE_LIKE or TIMESTAMP_LIKE columns. The difference between the metric run time and the maximum value of the timestamp column, in hours. Hours since latest value is suggested as a basic autometric on all date and timestamp columns.
Row Count (#)	COUNT_ROWS	The total number of rows in a table. It is suggested as a basic autometric once per table.
Read queries	COUNT_READ_QUERIES	The number of SELECT queries issued on a table in the past 24 hours. It is suggested as a basic autometric once per table.

Uniqueness

Uniqueness metrics detect when schema and data constraints are breached.

Metric Name	API Name	Column Type	Description
Distinct (#)	COUNT_DISTINCT	ANY	The count of distinct elements in the column. This metric should be used when you expect a fixed number of value options. It is suggested as an autometric if Bigeye detects 50 or fewer values during profiling.
Duplicates (#)	COUNT_DUPLICATES	ANY	The count of rows with the same value for a particular column. It is suggested as an autometric if Bigeye detects 10 or fewer duplicates during profiling.

Completeness

Completeness metrics detect when there are missing values in datasets.

Column Name	API Name	Column Type	Description
Null (#)	COUNT_NULL	ANY	The count of rows with a null value in the column.
Not Null (#)	COUNT_NOT_NULL	ANY	The count of rows with a non-null value in the column.
Null (%)	PERCENT_NULL	ANY	The percentage of rows with a null value in the column. This metric is suggested as a basic autometric on all column types.
Not Null (%)	PERCENT_NOT_NULL	ANY	The percentage of rows with a non-null value in the column.
Empty string (#)	COUNT_EMPTY_STRING	STRING	The count of rows with a 0-length string (i.e. `""`) as the value for the column.
Empty string(%)	PERCENT_EMPTY_STRING	STRING	The percent of rows with a 0-length string. It is suggested as an autometric if Bigeye detects >= 50% of values that match during profiling or if the column is an ID column.
NaN (#)	COUNT_NAN	NUMERIC	The count of rows where the column value is NaN. This metric will only be available for source types where NaN is a valid value for a column.
NaN (%)	PERCENT_NAN	NUMERIC	The percentage of rows where the column value is NaN. This metric will only be available for source types where NaN is a valid value for a column. It is suggested as an autometric if Bigeye detects >= 50% of values during our profiling.

Distributions

Distribution metrics detect changes in the numeric distribution of values, including outliers, variance, skew and more.

Column Name	API Name	Column Type	Description
Min	MIN	NUMERIC	The minimum value of the column. It is suggested as a basic autometric for all numeric columns.
Max	MAX	NUMERIC	The maximum value of the column. It is suggested as a basic autometric for all numeric columns.
Average	AVERAGE	NUMERIC	The mean value of the column. It is always suggested as a basic autometric for numeric columns, except for ID columns.
Variance	VARIANCE	NUMERIC	The statistical variance of the column. The variance is used to track the spread of numbers beyond the average. It is always suggested as a basic autometric for numeric columns, except for ID columns.
Skew	SKEW	NUMERIC	The statistical skew of the column. The skew is used to determine how evenly the values are distributed about the mean. A negative skew means that there is a larger tail below the mean, while a positive skew indicates a larger tail above the mean.
Kurtosis	KURTOSIS	NUMERIC	The statistical kurtosis of the column. The kurtosis determines how much of a tail datasets have. The value displayed is actually the excess kurtosis, where 3 is subtracted from the kurtosis value, so a normal distribution would end up with a metric value of 0.
Geometric mean	GEOMETRIC_MEAN	NUMERIC	The geometric mean of the column.
Harmonic mean	HARMONIC_MEAN	NUMERIC	The harmonic mean of the column.
Median	MEDIAN	NUMERIC	The median of the column. The median is computed as the 50th percentile, and will only return a value that is in the dataset. It is not valid for the MySQL source type. It is always suggested as a basic autometric for numeric columns, except for ID columns.
Percentile	PERCENTILE	NUMERIC	The statistical percentile of the column. This metric takes a parameter to determine what percentile should be used. Values less than one as well as less than 100 are accepted, where a 90th percentile can be expressed as either 0.9 or 90. Bigeye computes percentile using the discrete percentile, where only existing values will be returned, except for the Presto or AWS Athena source types. Not valid for MySQL source type.
Sum	SUM	NUMERIC	The sum of all values in the column. It is always suggested as a basic autometric for numeric columns, except for ID columns.
False (#)	COUNT_FALSE	BOOLEAN	The count of rows where the column contains the boolean value of false.
False (%)	PERCENT_FALSE	BOOLEAN	The percentage of rows where the column contains the boolean value of false. It is suggested as a basic autometric on all boolean columns.
True (#)	COUNT_TRUE	BOOLEAN	The count of rows where the column contains the boolean value of true.
True (%)	PERCENT_TRUE	BOOLEAN	The percentage of rows where the column contains the boolean value of true. It is suggested as a basic autometric on all boolean columns.

Validity

Validity metrics detect whether data is formatted correctly and represents a valid value. Bigeye offers validity metrics across a number of categories shown below.

String formats

Column Name	API Name	Column Type	Description
String Length Max	STRING_LENGTH_MAX	STRING	The maximum value of the column's length. Not valid for Oracle source type. It is suggested as a basic autometric for all string columns.
String Length Min	STRING_LENGTH_MIN	STRING	The minimum value of the column's length. Not valid for Oracle source type. It is suggested as a basic autometric for all string columns.
String Length Average	STRING_LENGTH_AVERAGE	STRING	The average value of the column's length. Not valid for Oracle source type. It is suggested as a basic autometric for all string columns.

Identification formats

You can run debug query on alerting metrics under Identification formats group.

Metric Name	API Name	Column Type	Description
UUID (#)	COUNT_UUID	STRING	The number of rows where the column matches the UUID format with hyphens (i.e. `123e4567-e89b-12d3-a456-426614174000`). The comparison is case insensitive.
UUID(%)	PERCENT_UUID	STRING	The percentage of rows where the column matches the UUID format with hyphens (i.e. `123e4567-e89b-12d3-a456-426614174000`). The comparison is case insensitive. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.
Perm ID (#)	COUNT_PERM_ID	STRING	The number of rows where the column matches an approximation of a valid Perm ID format (currently `1-<1-15 digits>`)
Perm ID (%)	PERCENT_PERM_ID	STRING	The percentage of rows where the column matches an approximation of a valid Perm ID format (currently `1-<1-15 digits>`). It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.
SSN (#)	COUNT_SSN	STRING	The number of rows where the column matches an approximation of a valid Social Security number, with or without hyphens.
SSN (%)	PERCENT_SSN	STRING	The percentage of rows where the column matches an approximation of a valid Social Security number, with or without hyphens. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.

Contact Information

You can run debug query on alerting metrics under Contact Information group.

Metric Name	API Name	Column Type	Description
USA Phone number (#)	COUNT_USA_PHONE	STRING	The number of rows where the column matches USA phone number format, with or without country code, parethenses, or hyphens. Examples: `1 (401) 555 6789`, `405-555-6789`, `5556789`
USA Phone number (%)	PERCENT_USA_PHONE	STRING	The percentage of rows where the column matches USA phone number format, with or without country code, parethenses, or hyphens. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.
USA State Code (#)	COUNT_USA_STATE_CODE	STRING	The number of rows where the column matches the state codes of the 50 US states. The comparison is case insensitive.
USA State Code (%)	PERCENT_USA_STATE_CODE	STRING	The percentage of rows where the column matches the state codes of the 50 US states. The comparison is case insensitive. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.
USA ZIP Code (#)	COUNT_USA_ZIP_CODE	STRING	The number of rows where the column matches the ZIP code (`12345`) or the ZIP+4 (`12345-1234`) format.
USA ZIP Code (%)	PERCENT_USA_ZIP_CODE	STRING	The percentage of rows where the column matches the ZIP code (`12345`) or the ZIP+4 (`12345-1234`) format. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.
Email (#)	COUNT_EMAIL	STRING	The number of rows where the column matches an approximation of a valid email address. Some source types are slightly more precise than others due regex compatibility, but any source will recognize a large majority of emails.
Email (%)	PERCENT_EMAIL	STRING	The percentage of rows where the column matches an approximation of a valid email address. Some source types are slightly more precise than others due regex compatibility, but any source will recognize a large majority of emails. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.

Financial

You can run debug query on alerting metrics under Financial group.

Metric Name	API Name	Column Type	Description
SEDOL (#)	COUNT_SEDOL	STRING	The number of rows where the column matches the Stock Exchange Daily Official List format. Bigeye does not verify the checksum.
SEDOL (%)	PERCENT_SEDOL	STRING	The percentage of rows where the column matches the Stock Exchange Daily Official List format. We do not verify the checksum. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.
CUSIP (#)	COUNT_CUSIP	STRING	The number of rows where the column matches the Committee on Uniform Securities Identification Procedures format. Bigeye does not verify the check digits.
CUSIP (%)	PERCENT_CUSIP	STRING	The number of rows where the column matches the Committee on Uniform Securities Identification Procedures format. Bigeye does not verify the check digits. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.
LEI (#)	COUNT_LEI	STRING	The number of rows where the column matches the Legal Entity Identifier format. Bigeye does not verify the check digits.
LEI (%)	PERCENT_LEI	STRING	The percentage of rows where the column matches the Legal Entity Identifier format. Bigeye does not verify the check digits. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.
FIGI (#)	COUNT_FIGI	STRING	The number of rows where the column matches the Financial Instrument Global Identifier format. Bigeye does not verify the check digit.
FIGI (%)	PERCENT_FIGI	STRING	The number of rows where the column matches the Financial Instrument Global Identifier format. Bigeye does not verify the check digit. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.
ISIN (#)	COUNT_ISIN	STRING	The number of rows where the column matches the International Securities Identification Number format. Bigeye does not verify the check digit.
ISIN (%)	PERCENT_ISIN	STRING	The percentage of rows where the column matches the International Securities Identification Number format. Bigeye does not verify the check digit. It is suggested as an autometric if Bigeye detect a match greater than 50% during profiling.

Time

Metric Name	API Name	Column Type	Description
Timestamp (#)	COUNT_TIMESTAMP_STRING	STRING	The count of rows where the column matches an ISO-8601 date or timestamp format.
Timestamp (%)	PERCENT_TIMESTAMP_STRING	STRING	The count of rows where the column matches an ISO-8601 date or timestamp format. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.
Not in Future (#)	COUNT_NOT_IN_FUTURE COUNT_DATE_NOT_IN_FUTURE	DATE_LIKE, TIMESTAMP_LIKE	The count of rows where the column contains a date or time that is not after the metric execution time.
Not in Future (%)	PERCENT_NOT_IN_FUTURE PERCENT_DATE_NOT_IN_FUTURE	DATE_LIKE, TIMESTAMP_LIKE	The percentage of rows where the column contains a date or time that is not after the metric execution time. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.

Geolocation

Column Name	API Name	Column Type	Description
Latitude (#)	COUNT_LATITUDE	NUMERIC	The count of rows where the column is a valid latitude; in the range between -90 and 90, inclusive.
Latitude (%)	PERCENT_LATITUDE	NUMERIC	The percentage of rows where the column is a valid latitude; in the range between -90 and 90, inclusive. It is suggested as an autometric if the column name contains `lat` and Bigeye detects a match greater than 80% during profiling.
Longitude (#)	COUNT_LONGITUDE	NUMERIC	The count of rows where the column is a valid latitude; in the range between -180 and 180, inclusive.
Longitude (%)	PERCENT_LONGITUDE	NUMERIC	The percentage of rows where the column is a valid latitude; in the range between -180 and 180, inclusive. It is suggested as an autometric if the column name contains `long` or `lng` and Bigeye detects a match greater than 80% during profiling.

User Specified

Metric Name	API Name	Column Type	Description
Value in list (%)	PERCENT_VALUE_IN_LIST	ANY	The percentage of rows that match a user-supplied, comma-separated list of values. This metric is useful to validate fields with a small number of valid values