# evidently.metrics.data\_integrity

## Submodules

## column\_missing\_values\_metric module <a href="#module-evidently.metrics.data_integrity.column_missing_values_metric" id="module-evidently.metrics.data_integrity.column_missing_values_metric"></a>

### class ColumnMissingValues(number\_of\_rows: int, different\_missing\_values: Dict\[Any, int], number\_of\_different\_missing\_values: int, number\_of\_missing\_values: int, share\_of\_missing\_values: float)

Bases: `object`

Statistics about missing values in a column

#### Attributes:

&#x20;    **different\_missing\_values : Dict\[Any, int]**

&#x20;    **number\_of\_different\_missing\_values : int**

&#x20;    **number\_of\_missing\_values : int**

&#x20;    **number\_of\_rows : int**

&#x20;    **share\_of\_missing\_values : float**

### class ColumnMissingValuesMetric(column\_name: str, missing\_values: Optional\[list] = None, replace: bool = True)

Bases: [`Metric`](https://francesco.gitbook.io/docs.evidentlyai.com/reference/api-reference/evidently.metrics/..#evidently.metrics.base_metric.Metric)\[`ColumnMissingValuesMetricResult`]

Count missing values in a column.

Missing value is a null or NaN value.

Calculate an amount of missing values kinds and count for such values. NA-types like numpy.NaN, pandas.NaT are counted as one type.

You can set you own missing values list with missing\_values parameter. Value None in the list means that Pandas null values will be included in the calculation.

If replace parameter is False - add defaults to user’s list. If replace parameter is True - use values from missing\_values list only.

#### Attributes:

&#x20;    **DEFAULT\_MISSING\_VALUES = \['', inf, -inf, None]**

&#x20;    **column\_name : str**

&#x20;    **missing\_values : frozenset**

#### Methods:

&#x20;    **calculate(data:** [**InputData**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/api-reference/evidently.metrics/..#evidently.metrics.base_metric.InputData)**)**

### class ColumnMissingValuesMetricRenderer(color\_options: Optional\[[ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions)] = None)

Bases: [`MetricRenderer`](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.renderers#evidently.renderers.base_renderer.MetricRenderer)

#### Attributes:

&#x20;    **color\_options :** [**ColorOptions**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions)

#### Methods:

&#x20;    **render\_html(obj: ColumnMissingValuesMetric)**

&#x20;    **render\_json(obj: ColumnMissingValuesMetric)**

### class ColumnMissingValuesMetricResult(column\_name: str, current: ColumnMissingValues, reference: Optional\[ColumnMissingValues] = None)

Bases: `object`

#### Attributes:

&#x20;    **column\_name : str**

&#x20;    **current : ColumnMissingValues**

&#x20;    **reference : Optional\[ColumnMissingValues] = None**

## column\_regexp\_metric module <a href="#module-evidently.metrics.data_integrity.column_regexp_metric" id="module-evidently.metrics.data_integrity.column_regexp_metric"></a>

### class ColumnRegExpMetric(column\_name: str, reg\_exp: str, top: int = 10)

Bases: [`Metric`](https://francesco.gitbook.io/docs.evidentlyai.com/reference/api-reference/evidently.metrics/..#evidently.metrics.base_metric.Metric)\[`DataIntegrityValueByRegexpMetricResult`]

Count number of values in a column matched or not by a regular expression (regexp)

#### Attributes:

&#x20;    **column\_name : str**

&#x20;    **reg\_exp : str**

&#x20;    **top : int**

#### Methods:

&#x20;    **calculate(data:** [**InputData**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/api-reference/evidently.metrics/..#evidently.metrics.base_metric.InputData)**)**

### class ColumnRegExpMetricRenderer(color\_options: Optional\[[ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions)] = None)

Bases: [`MetricRenderer`](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.renderers#evidently.renderers.base_renderer.MetricRenderer)

#### Attributes:

&#x20;    **color\_options :** [**ColorOptions**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions)

#### Methods:

&#x20;    **render\_html(obj: ColumnRegExpMetric)**

&#x20;    **render\_json(obj: ColumnRegExpMetric)**

### class DataIntegrityValueByRegexpMetricResult(column\_name: str, reg\_exp: str, top: int, current: DataIntegrityValueByRegexpStat, reference: Optional\[DataIntegrityValueByRegexpStat] = None)

Bases: `object`

#### Attributes:

&#x20;    **column\_name : str**

&#x20;    **current : DataIntegrityValueByRegexpStat**

&#x20;    **reference : Optional\[DataIntegrityValueByRegexpStat] = None**

&#x20;    **reg\_exp : str**

&#x20;    **top : int**

### class DataIntegrityValueByRegexpStat(number\_of\_matched: int, number\_of\_not\_matched: int, number\_of\_rows: int, table\_of\_matched: Dict\[str, int], table\_of\_not\_matched: Dict\[str, int])

Bases: `object`

Statistics about matched by a regular expression values in a column for one dataset

#### Attributes:

&#x20;    **number\_of\_matched : int**

&#x20;    **number\_of\_not\_matched : int**

&#x20;    **number\_of\_rows : int**

&#x20;    **table\_of\_matched : Dict\[str, int]**

&#x20;    **table\_of\_not\_matched : Dict\[str, int]**

## column\_summary\_metric module <a href="#module-evidently.metrics.data_integrity.column_summary_metric" id="module-evidently.metrics.data_integrity.column_summary_metric"></a>

### class CategoricalCharacteristics(number\_of\_rows: int, count: int, unique: Optional\[int], unique\_percentage: Optional\[float], most\_common: Optional\[object], most\_common\_percentage: Optional\[float], missing: Optional\[int], missing\_percentage: Optional\[float], new\_in\_current\_values\_count: Optional\[int] = None, unused\_in\_current\_values\_count: Optional\[int] = None)

Bases: `object`

#### Attributes:

&#x20;    **count : int**

&#x20;    **missing : Optional\[int]**

&#x20;    **missing\_percentage : Optional\[float]**

&#x20;    **most\_common : Optional\[object]**

&#x20;    **most\_common\_percentage : Optional\[float]**

&#x20;    **new\_in\_current\_values\_count : Optional\[int] = None**

&#x20;    **number\_of\_rows : int**

&#x20;    **unique : Optional\[int]**

&#x20;    **unique\_percentage : Optional\[float]**

&#x20;    **unused\_in\_current\_values\_count : Optional\[int] = None**

### class ColumnSummary(column\_name: str, column\_type: str, reference\_characteristics: Union\[NumericCharacteristics, CategoricalCharacteristics, DatetimeCharacteristics, NoneType], current\_characteristics: Union\[NumericCharacteristics, CategoricalCharacteristics, DatetimeCharacteristics], plot\_data: DataQualityPlot)

Bases: `object`

#### Attributes:

&#x20;    **column\_name : str**

&#x20;    **column\_type : str**

&#x20;    **current\_characteristics : Union\[NumericCharacteristics, CategoricalCharacteristics, DatetimeCharacteristics]**

&#x20;    **plot\_data : DataQualityPlot**

&#x20;    **reference\_characteristics : Optional\[Union\[NumericCharacteristics, CategoricalCharacteristics, DatetimeCharacteristics]]**

### class ColumnSummaryMetric(column\_name: str)

Bases: [`Metric`](https://francesco.gitbook.io/docs.evidentlyai.com/reference/api-reference/evidently.metrics/..#evidently.metrics.base_metric.Metric)\[`ColumnSummary`]

#### Methods:

&#x20;    **calculate(data:** [**InputData**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/api-reference/evidently.metrics/..#evidently.metrics.base_metric.InputData)**)**

&#x20;    **static map\_data(stats:** [**FeatureQualityStats**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.calculations#evidently.calculations.data_quality.FeatureQualityStats)**)**

### class ColumnSummaryMetricRenderer(color\_options: Optional\[[ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions)] = None)

Bases: [`MetricRenderer`](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.renderers#evidently.renderers.base_renderer.MetricRenderer)

#### Attributes:

&#x20;    **color\_options :** [**ColorOptions**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions)

#### Methods:

&#x20;    **render\_html(obj: ColumnSummaryMetric)**

&#x20;    **render\_json(obj: ColumnSummaryMetric)**

### class DataByTarget(data\_for\_plots: Dict\[str, Dict\[str, Union\[list, pandas.core.frame.DataFrame]]], target\_name: str, target\_type: str)

Bases: `object`

#### Attributes:

&#x20;    **data\_for\_plots : Dict\[str, Dict\[str, Union\[list, DataFrame]]]**

&#x20;    **target\_name : str**

&#x20;    **target\_type : str**

### class DataInTime(data\_for\_plots: Dict\[str, pandas.core.frame.DataFrame], freq: str, datetime\_name: str)

Bases: `object`

#### Attributes:

&#x20;    **data\_for\_plots : Dict\[str, DataFrame]**

&#x20;    **datetime\_name : str**

&#x20;    **freq : str**

### class DataQualityPlot(bins\_for\_hist: Dict\[str, pandas.core.frame.DataFrame], data\_in\_time: Optional\[DataInTime], data\_by\_target: Optional\[DataByTarget], counts\_of\_values: Optional\[Dict\[str, pandas.core.frame.DataFrame]])

Bases: `object`

#### Attributes:

&#x20;    **bins\_for\_hist : Dict\[str, DataFrame]**

&#x20;    **counts\_of\_values : Optional\[Dict\[str, DataFrame]]**

&#x20;    **data\_by\_target : Optional\[DataByTarget]**

&#x20;    **data\_in\_time : Optional\[DataInTime]**

### class DatetimeCharacteristics(number\_of\_rows: int, count: int, unique: Optional\[int], unique\_percentage: Optional\[float], most\_common: Optional\[object], most\_common\_percentage: Optional\[float], missing: Optional\[int], missing\_percentage: Optional\[float], first: Optional\[str], last: Optional\[str])

Bases: `object`

#### Attributes:

&#x20;    **count : int**

&#x20;    **first : Optional\[str]**

&#x20;    **last : Optional\[str]**

&#x20;    **missing : Optional\[int]**

&#x20;    **missing\_percentage : Optional\[float]**

&#x20;    **most\_common : Optional\[object]**

&#x20;    **most\_common\_percentage : Optional\[float]**

&#x20;    **number\_of\_rows : int**

&#x20;    **unique : Optional\[int]**

&#x20;    **unique\_percentage : Optional\[float]**

### class NumericCharacteristics(number\_of\_rows: int, count: int, mean: Union\[float, int, NoneType], std: Union\[float, int, NoneType], min: Union\[float, int, NoneType], p25: Union\[float, int, NoneType], p50: Union\[float, int, NoneType], p75: Union\[float, int, NoneType], max: Union\[float, int, NoneType], unique: Optional\[int], unique\_percentage: Optional\[float], missing: Optional\[int], missing\_percentage: Optional\[float], infinite\_count: Optional\[int], infinite\_percentage: Optional\[float], most\_common: Union\[float, int, NoneType], most\_common\_percentage: Optional\[float])

Bases: `object`

#### Attributes:

&#x20;    **count : int**

&#x20;    **infinite\_count : Optional\[int]**

&#x20;    **infinite\_percentage : Optional\[float]**

&#x20;    **max : Optional\[Union\[float, int]]**

&#x20;    **mean : Optional\[Union\[float, int]]**

&#x20;    **min : Optional\[Union\[float, int]]**

&#x20;    **missing : Optional\[int]**

&#x20;    **missing\_percentage : Optional\[float]**

&#x20;    **most\_common : Optional\[Union\[float, int]]**

&#x20;    **most\_common\_percentage : Optional\[float]**

&#x20;    **number\_of\_rows : int**

&#x20;    **p25 : Optional\[Union\[float, int]]**

&#x20;    **p50 : Optional\[Union\[float, int]]**

&#x20;    **p75 : Optional\[Union\[float, int]]**

&#x20;    **std : Optional\[Union\[float, int]]**

&#x20;    **unique : Optional\[int]**

&#x20;    **unique\_percentage : Optional\[float]**

## dataset\_missing\_values\_metric module <a href="#module-evidently.metrics.data_integrity.dataset_missing_values_metric" id="module-evidently.metrics.data_integrity.dataset_missing_values_metric"></a>

### class DatasetMissingValues(different\_missing\_values: Dict\[Any, int], number\_of\_different\_missing\_values: int, different\_missing\_values\_by\_column: Dict\[str, Dict\[Any, int]], number\_of\_different\_missing\_values\_by\_column: Dict\[str, int], number\_of\_missing\_values: int, share\_of\_missing\_values: float, number\_of\_missing\_values\_by\_column: Dict\[str, int], share\_of\_missing\_values\_by\_column: Dict\[str, float], number\_of\_rows: int, number\_of\_rows\_with\_missing\_values: int, share\_of\_rows\_with\_missing\_values: float, number\_of\_columns: int, columns\_with\_missing\_values: List\[str], number\_of\_columns\_with\_missing\_values: int, share\_of\_columns\_with\_missing\_values: float)

Bases: `object`

Statistics about missed values in a dataset

#### Attributes:

&#x20;    **columns\_with\_missing\_values : List\[str]**

&#x20;    **different\_missing\_values : Dict\[Any, int]**

&#x20;    **different\_missing\_values\_by\_column : Dict\[str, Dict\[Any, int]]**

&#x20;    **number\_of\_columns : int**

&#x20;    **number\_of\_columns\_with\_missing\_values : int**

&#x20;    **number\_of\_different\_missing\_values : int**

&#x20;    **number\_of\_different\_missing\_values\_by\_column : Dict\[str, int]**

&#x20;    **number\_of\_missing\_values : int**

&#x20;    **number\_of\_missing\_values\_by\_column : Dict\[str, int]**

&#x20;    **number\_of\_rows : int**

&#x20;    **number\_of\_rows\_with\_missing\_values : int**

&#x20;    **share\_of\_columns\_with\_missing\_values : float**

&#x20;    **share\_of\_missing\_values : float**

&#x20;    **share\_of\_missing\_values\_by\_column : Dict\[str, float]**

&#x20;    **share\_of\_rows\_with\_missing\_values : float**

### class DatasetMissingValuesMetric(missing\_values: Optional\[list] = None, replace: bool = True)

Bases: [`Metric`](https://francesco.gitbook.io/docs.evidentlyai.com/reference/api-reference/evidently.metrics/..#evidently.metrics.base_metric.Metric)\[`DatasetMissingValuesMetricResult`]

Count missing values in a dataset.

Missing value is a null or NaN value.

Calculate an amount of missing values kinds and count for such values. NA-types like numpy.NaN, pandas.NaT are counted as one type.

You can set you own missing values list with missing\_values parameter. Value None in the list means that Pandas null values will be included in the calculation.

If replace parameter is False - add defaults to user’s list. If replace parameter is True - use values from missing\_values list only.

#### Attributes:

&#x20;    **DEFAULT\_MISSING\_VALUES = \['', inf, -inf, None]**

&#x20;    **missing\_values : frozenset**

#### Methods:

&#x20;    **calculate(data:** [**InputData**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/api-reference/evidently.metrics/..#evidently.metrics.base_metric.InputData)**)**

### class DatasetMissingValuesMetricRenderer(color\_options: Optional\[[ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions)] = None)

Bases: [`MetricRenderer`](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.renderers#evidently.renderers.base_renderer.MetricRenderer)

#### Attributes:

&#x20;    **color\_options :** [**ColorOptions**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions)

#### Methods:

&#x20;    **render\_html(obj: DatasetMissingValuesMetric)**

&#x20;    **render\_json(obj: DatasetMissingValuesMetric)**

### class DatasetMissingValuesMetricResult(current: DatasetMissingValues, reference: Optional\[DatasetMissingValues] = None)

Bases: `object`

#### Attributes:

&#x20;    **current : DatasetMissingValues**

&#x20;    **reference : Optional\[DatasetMissingValues] = None**

## dataset\_summary\_metric module <a href="#module-evidently.metrics.data_integrity.dataset_summary_metric" id="module-evidently.metrics.data_integrity.dataset_summary_metric"></a>

### class DatasetSummary(target: Optional\[str], prediction: Optional\[Union\[str, Sequence\[str]]], date\_column: Optional\[str], id\_column: Optional\[str], number\_of\_columns: int, number\_of\_rows: int, number\_of\_missing\_values: int, number\_of\_categorical\_columns: int, number\_of\_numeric\_columns: int, number\_of\_datetime\_columns: int, number\_of\_constant\_columns: int, number\_of\_almost\_constant\_columns: int, number\_of\_duplicated\_columns: int, number\_of\_almost\_duplicated\_columns: int, number\_of\_empty\_rows: int, number\_of\_empty\_columns: int, number\_of\_duplicated\_rows: int, columns\_type: dict, nans\_by\_columns: dict, number\_uniques\_by\_columns: dict)

Bases: `object`

Columns information in a dataset

#### Attributes:

&#x20;    **columns\_type : dict**

&#x20;    **date\_column : Optional\[str]**

&#x20;    **id\_column : Optional\[str]**

&#x20;    **nans\_by\_columns : dict**

&#x20;    **number\_of\_almost\_constant\_columns : int**

&#x20;    **number\_of\_almost\_duplicated\_columns : int**

&#x20;    **number\_of\_categorical\_columns : int**

&#x20;    **number\_of\_columns : int**

&#x20;    **number\_of\_constant\_columns : int**

&#x20;    **number\_of\_datetime\_columns : int**

&#x20;    **number\_of\_duplicated\_columns : int**

&#x20;    **number\_of\_duplicated\_rows : int**

&#x20;    **number\_of\_empty\_columns : int**

&#x20;    **number\_of\_empty\_rows : int**

&#x20;    **number\_of\_missing\_values : int**

&#x20;    **number\_of\_numeric\_columns : int**

&#x20;    **number\_of\_rows : int**

&#x20;    **number\_uniques\_by\_columns : dict**

&#x20;    **prediction : Optional\[Union\[str, Sequence\[str]]]**

&#x20;    **target : Optional\[str]**

### class DatasetSummaryMetric(almost\_duplicated\_threshold: float = 0.95, almost\_constant\_threshold: float = 0.95)

Bases: [`Metric`](https://francesco.gitbook.io/docs.evidentlyai.com/reference/api-reference/evidently.metrics/..#evidently.metrics.base_metric.Metric)\[`DatasetSummaryMetricResult`]

Common dataset(s) columns/features characteristics

#### Attributes:

&#x20;    **almost\_constant\_threshold : float**

&#x20;    **almost\_duplicated\_threshold : float**

#### Methods:

&#x20;    **calculate(data:** [**InputData**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/api-reference/evidently.metrics/..#evidently.metrics.base_metric.InputData)**)**

### class DatasetSummaryMetricRenderer(color\_options: Optional\[[ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions)] = None)

Bases: [`MetricRenderer`](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.renderers#evidently.renderers.base_renderer.MetricRenderer)

#### Attributes:

&#x20;    **color\_options :** [**ColorOptions**](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions)

#### Methods:

&#x20;    **render\_html(obj: DatasetSummaryMetric)**

&#x20;    **render\_json(obj: DatasetSummaryMetric)**

### class DatasetSummaryMetricResult(almost\_duplicated\_threshold: float, current: DatasetSummary, reference: Optional\[DatasetSummary] = None)

Bases: `object`

#### Attributes:

&#x20;    **almost\_duplicated\_threshold : float**

&#x20;    **current : DatasetSummary**

&#x20;    **reference : Optional\[DatasetSummary] = None**
