# evidently.utils

## Submodules

## data\_operations module <a href="#module-evidently.utils.data_operations" id="module-evidently.utils.data_operations"></a>

Methods for clean null or NaN values in a dataset

### class DatasetColumns(utility\_columns: DatasetUtilityColumns, target\_type: Optional\[str], num\_feature\_names: List\[str], cat\_feature\_names: List\[str], datetime\_feature\_names: List\[str], target\_names: Optional\[List\[str]], task: Optional\[str])

Bases: `object`

#### Attributes:

&#x20;    **cat\_feature\_names : List\[str]**

&#x20;    **datetime\_feature\_names : List\[str]**

&#x20;    **num\_feature\_names : List\[str]**

&#x20;    **target\_names : Optional\[List\[str]]**

&#x20;    **target\_type : Optional\[str]**

&#x20;    **task : Optional\[str]**

&#x20;    **utility\_columns : DatasetUtilityColumns**

#### Methods:

&#x20;    **as\_dict()**

&#x20;    **get\_all\_columns\_list(skip\_id\_column: bool = False)**

List all columns.

&#x20;    **get\_all\_features\_list(cat\_before\_num: bool = True, include\_datetime\_feature: bool = False)**

List all features names. By default, returns cat features than num features and du not return other. If you want to change the order - set cat\_before\_num to False. If you want to add date time columns - set include\_datetime\_feature to True.

&#x20;    **get\_features\_len(include\_time\_columns: bool = False)**

How mane feature do we have. It is useful for pagination in widgets. By default, we sum category nad numeric features. If you want to include date time columns - set include\_datetime\_feature to True.

### class DatasetUtilityColumns(date: Optional\[str], id\_column: Optional\[str], target: Optional\[str], prediction: Union\[str, Sequence\[str], NoneType])

Bases: `object`

#### Attributes:

&#x20;    **date : Optional\[str]**

&#x20;    **id\_column : Optional\[str]**

&#x20;    **prediction : Optional\[Union\[str, Sequence\[str]]]**

&#x20;    **target : Optional\[str]**

#### Methods:

&#x20;    **as\_dict()**

### process\_columns(dataset: DataFrame, column\_mapping: [ColumnMapping](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.pipeline#evidently.pipeline.column_mapping.ColumnMapping))

### recognize\_column\_type(dataset: DataFrame, column\_name: str, columns: DatasetColumns)

Try to get the column type.

### recognize\_task(target\_name: str, dataset: DataFrame)

Try to guess about the target type: if the target has a numeric type and number of unique values > 5: task == ‘regression’ in all other cases task == ‘classification’.

* **Parameters**
  * `target_name` – name of target column.
  * `dataset` – usually the data which you used in training.
* **Returns**

  Task parameter.

### replace\_infinity\_values\_to\_nan(dataframe: DataFrame)

## data\_preprocessing module <a href="#module-evidently.utils.data_preprocessing" id="module-evidently.utils.data_preprocessing"></a>

### class ColumnDefinition(column\_name: str, column\_type: ColumnType)

Bases: `object`

#### Attributes:

&#x20;    **column\_name : str**

&#x20;    **column\_type : ColumnType**

### class ColumnPresenceState(value)

Bases: `Enum`

An enumeration.

#### Attributes:

&#x20;    **Missing = 2**

&#x20;    **Partially = 1**

&#x20;    **Present = 0**

### class ColumnType(value)

Bases: `Enum`

An enumeration.

#### Attributes:

&#x20;    **Categorical = 'cat'**

&#x20;    **Datetime = 'datetime'**

&#x20;    **Numerical = 'num'**

### class DataDefinition(columns: List\[ColumnDefinition], target: Optional\[ColumnDefinition], prediction\_columns: Optional\[PredictionColumns], id\_column: Optional\[ColumnDefinition], datetime\_column: Optional\[ColumnDefinition], task: Optional\[str], classification\_labels: Optional\[Sequence\[str]])

Bases: `object`

#### Methods:

&#x20;    **classification\_labels()**

&#x20;    **get\_columns(filter\_def: str = 'all')**

&#x20;    **get\_datetime\_column()**

&#x20;    **get\_id\_column()**

&#x20;    **get\_prediction\_columns()**

&#x20;    **get\_target\_column()**

&#x20;    **task()**

### class PredictionColumns(predicted\_values: Optional\[ColumnDefinition] = None, prediction\_probas: Optional\[List\[ColumnDefinition]] = None)

Bases: `object`

#### Attributes:

&#x20;    **predicted\_values : Optional\[ColumnDefinition] = None**

&#x20;    **prediction\_probas : Optional\[List\[ColumnDefinition]] = None**

#### Methods:

&#x20;    **get\_columns\_list()**

### create\_data\_definition(reference\_data: Optional\[DataFrame], current\_data: DataFrame, mapping: [ColumnMapping](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.pipeline#evidently.pipeline.column_mapping.ColumnMapping))

## generators module <a href="#module-evidently.utils.generators" id="module-evidently.utils.generators"></a>

### class BaseGenerator()

Bases: `Generic`\[`TObject`]

Base class for tests and metrics generator creation

To create a new generator:

```
- inherit a class from the base class

- implement generate_tests method and return a list of test objects from it
```

A Suite or a Report will call the method and add generated tests to its list instead of the generator object.

You can use columns\_info parameter in generate for getting data structure meta info like columns list.

For example:

```
if you want to create a test generator for 50, 90, 99 quantiles tests
for all numeric columns with default condition, by reference quantiles
```

```python
>>> class TestQuantiles(BaseTestGenerator):
...    def generate(self, columns_info: DatasetColumns) -> List[TestValueQuantile]:
...        return [
...            TestColumnQuantile(column_name=name, quantile=quantile)
...            for quantile in (0.5, 0.9, 0.99)
...            for name in columns_info.num_feature_names
...        ]
```

Do not forget set correct test type for generate return value

#### Methods:

&#x20;    **abstract generate(columns\_info: DatasetColumns)**

### make\_generator\_by\_columns(base\_class: Type, columns: Optional\[Union\[str, list]] = None, parameters: Optional\[Dict] = None, skip\_id\_column: bool = False)

Create a test generator for a columns list with a test class.

Base class is specified with base\_class parameter. If the test have no “column\_name” parameter - TypeError will be raised.

Columns list can be defined with parameter columns. If it is a list - just use it as a list of the columns. If columns is a string, it can be one of values:

* “all” - make tests for all columns, including target/prediction columns
* “num” - for numeric features
* “cat” - for category features
* “features” - for all features, not target/prediction columns. None value is the same as “all”. If columns is string, and it is not one of the values, ValueError will be raised.

parameters is used for specifying other parameters for each object, it is the same for all generated objects.

## numpy\_encoder module <a href="#module-evidently.utils.numpy_encoder" id="module-evidently.utils.numpy_encoder"></a>

### class NumpyEncoder(\*, skipkeys=False, ensure\_ascii=True, check\_circular=True, allow\_nan=True, sort\_keys=False, indent=None, separators=None, default=None)

Bases: `JSONEncoder`

Numpy and Pandas data types to JSON types encoder

#### Methods:

&#x20;    **default(o)**

JSON converter calls the method when it cannot convert an object to a Python type Convert the object to a Python type If we cannot convert the object, leave the default JSONEncoder behaviour - raise a TypeError exception.

## types module <a href="#module-evidently.utils.types" id="module-evidently.utils.types"></a>

Additional types, classes, dataclasses, etc.

### class ApproxValue(value: Union\[float, int], relative: Optional\[Union\[float, int]] = None, absolute: Optional\[Union\[float, int]] = None)

Bases: `object`

Class for approximate scalar value calculations

&#x20;    **property tolerance : Union\[float, int]**

#### Attributes:

&#x20;    **DEFAULT\_ABSOLUTE = 1e-12**

&#x20;    **DEFAULT\_RELATIVE = 1e-06**

&#x20;    **value : Union\[float, int]**

#### Methods:

&#x20;    **as\_dict()**

## visualizations module <a href="#module-evidently.utils.visualizations" id="module-evidently.utils.visualizations"></a>

### class Distribution(x: Union\[, list], y: Union\[, list])

Bases: `object`

#### Attributes:

&#x20;    **x : Union\[array, list]**

&#x20;    **y : Union\[array, list]**

### get\_distribution\_for\_category\_column(column: Series, normalize: bool = False)

### get\_distribution\_for\_column(\*, column\_type: str, current: Series, reference: Optional\[Series] = None)

### get\_distribution\_for\_numerical\_column(column: Series, bins: Optional\[Union\[list, array]] = None)

### make\_hist\_df(hist: Tuple\[array, array])

### make\_hist\_for\_cat\_plot(curr: Series, ref: Optional\[Series] = None, normalize: bool = False, dropna=False)

### make\_hist\_for\_num\_plot(curr: Series, ref: Optional\[Series] = None)

### plot\_boxes(curr\_for\_plots: dict, ref\_for\_plots: Optional\[dict], yaxis\_title: str, xaxis\_title: str, color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

Accepts current and reference data as dicts with box parameters (“mins”, “lowers”, “uppers”, “means”, “maxs”) and name of boxes parameter - “values”

### plot\_cat\_cat\_rel(curr: DataFrame, ref: DataFrame, target\_name: str, feature\_name: str, color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

Accepts current and reference data as pandas dataframes with two columns: feature\_name and “count\_objects”.

### plot\_cat\_feature\_in\_time(curr\_data: DataFrame, ref\_data: Optional\[DataFrame], feature\_name: str, datetime\_name: str, freq: str, color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

Accepts current and reference data as pandas dataframes with two columns: datetime\_name and feature\_name.

### plot\_conf\_mtrx(curr\_mtrx, ref\_mtrx)

### plot\_distr(\*, hist\_curr, hist\_ref=None, orientation='v', color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

### plot\_distr\_subplots(\*, hist\_curr, hist\_ref=None, xaxis\_name: str = '', yaxis\_name: str = '', same\_color: bool = False, color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

### plot\_distr\_with\_log\_button(curr\_data: DataFrame, curr\_data\_log: DataFrame, ref\_data: Optional\[DataFrame], ref\_data\_log: Optional\[DataFrame], color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

### plot\_error\_bias\_colored\_scatter(curr\_scatter\_data: Dict\[str, Dict\[str, Series]], ref\_scatter\_data: Optional\[Dict\[str, Dict\[str, Series]]], color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

### plot\_line\_in\_time(\*, curr: Dict\[str, Series], ref: Optional\[Dict\[str, Series]], x\_name: str, y\_name: str, xaxis\_name: str = '', yaxis\_name: str = '', color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

### plot\_num\_feature\_in\_time(curr\_data: DataFrame, ref\_data: Optional\[DataFrame], feature\_name: str, datetime\_name: str, freq: str, color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

Accepts current and reference data as pandas dataframes with two columns: datetime\_name and feature\_name.

### plot\_num\_num\_rel(curr: Dict\[str, list], ref: Optional\[Dict\[str, list]], target\_name: str, column\_name: str, color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

### plot\_pred\_actual\_time(\*, curr: Dict\[str, Series], ref: Optional\[Dict\[str, Series]], x\_name: str = 'x', xaxis\_name: str = '', yaxis\_name: str = '', color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

### plot\_scatter(\*, curr: Dict\[str, Union\[list, Series]], ref: Optional\[Dict\[str, list]], x: str, y: str, xaxis\_name: Optional\[str] = None, yaxis\_name: Optional\[str] = None, color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

### plot\_scatter\_for\_data\_drift(curr\_y: list, curr\_x: list, y0: float, y1: float, y\_name: str, x\_name: str, color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

### plot\_time\_feature\_distr(curr\_data: DataFrame, ref\_data: Optional\[DataFrame], feature\_name: str, color\_options: [ColorOptions](https://francesco.gitbook.io/docs.evidentlyai.com/reference/evidently.options#evidently.options.color_scheme.ColorOptions))

Accepts current and reference data as pandas dataframes with two columns: feature\_name, “number\_of\_items”