Available Metrics

This is a list of the metrics available in Bigeye broken down by category and a brief description of what they do.

Metadata Metrics

Metadata metrics are available on Snowflake, BigQuery and Redshift sources and can be applied to any table, excluding database views.

Metric Name

API Name

Description

Hours since last load

HOURS_SINCE_LAST_LOAD

The number of hours since an INSERT, COPY or MERGE was performed on a table. It is suggested as an autometric once per table.

Rows inserted

ROWS_INSERTED

The number of rows added to the table via INSERT, COPY or MERGE statements in the past 24 hours. It is suggested as an autometric once per table.

Read queries

COUNT_READ_QUERIES

The number of SELECT queries issued on a table in the past 24 hours. It is suggested as an autometric once per table.

Freshness

Metric Name

API Name

Column Type

Description

Freshness (hrs)

HOURS_SINCE_MAX_TIMESTAMP or HOURS_SINCE_MAX_DATE

DATE_LIKE, TIMESTAMP_LIKE

The difference between the metric collection time and the maximum value of the column, in hours. Freshness is suggested as an autometric on all date
and timestamp columns.

Volume

Metric Name

API Name

Column Type

Description

Count (#)

COUNT_ROWS

ANY

The total number of rows in a table. It is suggested as an autometric once per table.

Cardinality (#)

COUNT_DISTINCT

ANY

The count of distinct elements in the column. This metric should be used when you expect a fixed number of value options. It is suggested as an autometric if Bigeye detects 50 or fewer values during profiling.

Duplicates (#)

COUNT_DUPLICATES

ANY

The count of rows with the same value for a particular column. This metric is used when each value should be distinct. It is suggested as an autometric if Bigeye detects 10 or fewer duplicates during profiling.

Nulls and blanks

Column Name

API Name

Column Type

Description

Null (#)

COUNT_NULL

ANY

The count of rows with a null value in the column.

Not Null (#)

COUNT_NOT_NULL

ANY

The count of rows with a non-null value in the column.

Null (%)

PERCENT_NULL

ANY

The percentage of rows with a null value in the column. This metric is always suggested as an autometric.

Not Null (%)

PERCENT_NOT_NULL

ANY

The percentage of rows with a non-null value in the column.

Empty string (#)

COUNT_EMPTY_STRING

STRING

The count of rows with a 0-length string (i.e. "") as the value for the column.

Empty string(%)

PERCENT_EMPTY_STRING

STRING

The percent of rows with a 0-length string. It is suggested as an autometric if Bigeye detects >= 50% of values that match during profiling or if the column is an ID column.

NaN (#)

COUNT_NAN

NUMERIC

The count of rows where the column value is NaN. This metric will only be available for source types where NaN is a valid value for a column.

NaN (%)

PERCENT_NAN

NUMERIC

The percentage of rows where the column value is NaN. This metric will only be available for source types where NaN is a valid value for a column. It is suggested as an autometric if Bigeye detects >= 50% of values during our profiling.

Outliers

Standard numeric

Column Name

API Name

Column Type

Description

Max

MAX

NUMERIC

The maximum value of the column. It is always suggested as an autometric for numeric columns.

Min

MIN

NUMERIC

The minimum value of the column. It is always suggested as an autometric for numeric columns.

String

Column Name

API Name

Column Type

Description

String Length Max

STRING_LENGTH_MAX

STRING

The maximum value of the column's length. Not valid for Oracle source type. It is always suggested as an autometric for string columns.

String Length Min

STRING_LENGTH_MIN

STRING

The minimum value of the column's length. Not valid for Oracle source type. It is always suggested as an autometric for string columns.

String Length Average

STRING_LENGTH_AVERAGE

STRING

The average value of the column's length. Not valid for Oracle source type. It is always suggested as an autometric for string columns.

Geolocation

Column Name

API Name

Column Type

Description

Latitude (#)

COUNT_LATITUDE

NUMERIC

The count of rows where the column is a valid latitude; in the range between -90 and 90, inclusive.

Latitude (%)

PERCENT_LATITUDE

NUMERIC

The percentage of rows where the column is a valid latitude; in the range between -90 and 90, inclusive. It is suggested as an autometric if the column name contains lat and Bigeye detects a match greater than 80% during profiling.

Longitude (#)

COUNT_LONGITUDE

NUMERIC

The count of rows where the column is a valid latitude; in the range between -180 and 180, inclusive.

Longitude (%)

PERCENT_LONGITUDE

NUMERIC

The percentage of rows where the column is a valid latitude; in the range between -180 and 180, inclusive. It is suggested as an autometric if the column name contains long or lng and Bigeye detects a match greater than 80% during profiling.

Distributions

Metric Name

API Name

Column Type

Description

Average

AVERAGE

NUMERIC

The mean value of the column. It is always suggested as an autometric for numeric columns, except for ID columns.

Variance

VARIANCE

NUMERIC

The statistical variance of the column. The variance is used to track the spread of numbers beyond the average. It is always suggested as an autometric for numeric columns, except for ID columns.

Skew

SKEW

NUMERIC

The statistical skew of the column. The skew is used to determine how evenly the values are distributed about the mean. A negative skew means that there is a larger tail below the mean, while a positive skew indicates a larger tail above the mean.

Kurtosis

KURTOSIS

NUMERIC

The statistical kurtosis of the column. The kurtosis determines how much of a tail datasets have. The value displayed is actually the excess kurtosis, where 3 is subtracted from the kurtosis value, so a normal distribution would end up with a metric value of 0.

Geometric mean

GEOMETRIC_MEAN

NUMERIC

The geometric mean of the column.

Harmonic mean

HARMONIC_MEAN

NUMERIC

The harmonic mean of the column.

Median

MEDIAN

NUMERIC

The median of the column. The median is computed as the 50th percentile, and will only return a value that is in the dataset. It is not valid for the MySQL source type. It is always suggested as an autometric for numeric columns, except for ID columns.

Percentile

PERCENTILE

NUMERIC

The statistical percentile of the column. This metric takes a parameter to determine what percentile should be used. Values less than one as well as less than 100 are accepted, where a 90th percentile can be expressed as either 0.9 or 90. Bigeye computes percentile using the discrete percentile, where only existing values will be returned, except for the Presto or AWS Athena source types. Not valid for MySQL source type.

Sum

SUM

NUMERIC

The sum of all values in the column.

Formatting

Identification

Metric Name

API Name

Column Type

Description

UUID (#)

COUNT_UUID

STRING

The number of rows where the column matches the UUID format with hyphens (i.e. 123e4567-e89b-12d3-a456-426614174000). The comparison is case insensitive.

UUID(%)

PERCENT_UUID

STRING

The percentage of rows where the column matches the UUID format with hyphens (i.e. 123e4567-e89b-12d3-a456-426614174000). The comparison is case insensitive. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.

Perm ID (#)

COUNT_PERM_ID

STRING

The number of rows where the column matches an approximation of a valid Perm ID format (currently 1-<1-15 digits>)

Perm ID (%)

PERCENT_PERM_ID

STRING

The percentage of rows where the column matches an approximation of a valid Perm ID format (currently 1-<1-15 digits>). It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.

SSN (#)

COUNT_SSN

STRING

The number of rows where the column matches an approximation of a valid Social Security number, with or without hyphens.

SSN (%)

PERCENT_SSN

STRING

The percentage of rows where the column matches an approximation of a valid Social Security number, with or without hyphens. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.

Financial

Metric Name

API Name

Column Type

Description

SEDOL (#)

COUNT_SEDOL

STRING

The number of rows where the column matches the Stock Exchange Daily Official List format. Bigeye does not verify the checksum.

SEDOL (%)

PERCENT_SEDOL

STRING

The percentage of rows where the column matches the Stock Exchange Daily Official List format. We do not verify the checksum. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.

CUSIP (#)

COUNT_CUSIP

STRING

The number of rows where the column matches the Committee on Uniform Securities Identification Procedures format. Bigeye does not verify the check digits.

CUSIP (%)

PERCENT_CUSIP

STRING

The number of rows where the column matches the Committee on Uniform Securities Identification Procedures format. Bigeye does not verify the check digits. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.

LEI (#)

COUNT_LEI

STRING

The number of rows where the column matches the Legal Entity Identifier format. Bigeye does not verify the check digits.

LEI (%)

PERCENT_LEI

STRING

The percentage of rows where the column matches the Legal Entity Identifier format. Bigeye does not verify the check digits. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.

FIGI (#)

COUNT_FIGI

STRING

The number of rows where the column matches the Financial Instrument Global Identifier format. Bigeye does not verify the check digit.

FIGI (%)

PERCENT_FIGI

STRING

The number of rows where the column matches the Financial Instrument Global Identifier format. Bigeye does not verify the check digit. It is suggested as an autometric if Bigeye detects a match greater than 50% during profiling.

ISIN (#)

COUNT_ISIN

STRING

The number of rows where the column matches the International Securities Identification Number format. Bigeye does not verify the check digit.

ISIN (%)

PERCENT_ISIN

STRING

The percentage of rows where the column matches the International Securities Identification Number format. Bigeye does not verify the check digit. It is suggested as an autometric if Bigeye detect a match greater than 50% during profiling.

Contact information

Metric Name

API Name

Column Type

Description

USA Phone number (#)

COUNT_USA_PHONE

STRING

The number of rows where the column matches USA phone number format, with or without country code, parethenses, or hyphens. Examples: 1 (401) 555 6789, 405-555-6789, 5556789

USA Phone number (%)

PERCENT_USA_PHONE

STRING

The percentage of rows where the column matches USA phone number format, with or without country code, parethenses, or hyphens. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.

USA State Code (#)

COUNT_USA_STATE_CODE

STRING

The number of rows where the column matches the state codes of the 50 US states. The comparison is case insensitive.

USA State Code (%)

PERCENT_USA_STATE_CODE

STRING

The percentage of rows where the column matches the state codes of the 50 US states. The comparison is case insensitive. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.

USA ZIP Code (#)

COUNT_USA_ZIP_CODE

STRING

The number of rows where the column matches the ZIP code (12345) or the ZIP+4 (12345-1234) format.

USA ZIP Code (%)

PERCENT_USA_ZIP_CODE

STRING

The percentage of rows where the column matches the ZIP code (12345) or the ZIP+4 (12345-1234) format. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.

Email (#)

COUNT_EMAIL

STRING

The number of rows where the column matches an approximation of a valid email address. Some source types are slightly more precise than others due regex compatibility, but any source will recognize a large majority of emails.

Email (%)

PERCENT_EMAIL

STRING

The percentage of rows where the column matches an approximation of a valid email address. Some source types are slightly more precise than others due regex compatibility, but any source will recognize a large majority of emails. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.

Time

Metric Name

API Name

Column Type

Description

Timestamp (#)

COUNT_TIMESTAMP_STRING

STRING

The count of rows where the column matches an ISO-8601 date or timestamp format.

Timestamp (%)

PERCENT_TIMESTAMP_STRING

STRING

The count of rows where the column matches an ISO-8601 date or timestamp format. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.

Custom

Metric Name

API Name

Column Type

Description

Value in list (%)

PERCENT_VALUE_IN_LIST

ANY

The percentage of rows that match a user-supplied, comma-separated list of values. This metric is useful to validate fields with a small number of valid values.

Miscellaneous

Metric Name

API Name

Column Type

Description

Not in Future (#)

COUNT_NOT_IN_FUTURE
COUNT_DATE_NOT_IN_FUTURE

DATE_LIKE, TIMESTAMP_LIKE

The count of rows where the column contains a date or time that is not after the metric execution time.

Not in Future (%)

PERCENT_NOT_IN_FUTURE
PERCENT_DATE_NOT_IN_FUTURE

DATE_LIKE, TIMESTAMP_LIKE

The percentage of rows where the column contains a date or time that is not after the metric execution time. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.

False (#)

COUNT_FALSE

BOOLEAN

The count of rows where the column contains the boolean value of false.

False (%)

PERCENT_FALSE

BOOLEAN

The percentage of rows where the column contains the boolean value of false. It is suggested as an autometric if Bigeye detects a match >= 50% in during profiling.

True (#)

COUNT_TRUE

BOOLEAN

The count of rows where the column contains the boolean value of true.

True (%)

PERCENT_TRUE

BOOLEAN

The percentage of rows where the column contains the boolean value of true. It is suggested as an autometric if Bigeye detects a match >= 50% during profiling.