# Core Concepts

This is an explanatory page to describe the key features and concepts at Evidently.

## TL;DR

Evidently helps evaluate, test and monitor ML models in production.

* A **Metric** is a core component of Evidently. You can combine multiple **Metrics** in a **Report**. Reports are best for visual analysis and debugging of your models and data.
* A **Test** is a metric with a condition. Each test returns a pass or fail result. You can combine multiple **Tests** in a **Test Suite**. Test Suites are best for automated model checks as part of an ML pipeline.

For both Tests and Metrics, Evidently has **Presets**. These are pre-built combinations of metrics or checks that fit a specific use case.

* A **Snapshot** is a JSON version of the **Report** or a **Test Suite** which contains measurements and test results for a specific period. You can log them over time and run an Evidently Monitoring Dashboard for continuous monitoring.

## Metrics and Reports

### What is a Metric?

A **Metric** is a component that evaluates a specific aspect of the data or model quality.

A **Metric** can be, literally, a single metric (for example, `DatasetMissingValuesMetric()` returns the share of missing features). It can also be a combination of metrics (for example, `DatasetSummaryMetric()` calculates various descriptive statistics for the dataset). Metrics exist on the dataset level and on the column level.

Each **Metric** has a visual render. Some visualizations simply return the values:

![RegressionQualityMetric](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-64642f4a114cbc0d97a3c344398904e3a929ddcd%2Fmetric_example_regression_quality-min.png?alt=media)

Others have rich visualizations. Here is an example of a dataset-level Metric that evaluates the error of a regression model:

![RegressionErrorDistribution](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-db0fac2a5899d67db2cc5c8670f7182fcafdd27d%2Fmetric_example_error_distribution-min.png?alt=media)

Here is an example of a column-level Metric that evaluates the value range of certain feature:

![ColumnValueRangeMetric](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-43d4ca17edb6c28aadaab652d424b65a2ba3fd8f%2Fmetric_example_value_range-min.png?alt=media)

Evidently contains 35+ **Metrics** (with 100+ different measurements) related to data quality, integrity, drift and model performance. You can also implement a custom one.

### What is a Report?

A **Report** is a combination of different Metrics that evaluate data or ML model quality.

Уou can display an interactive report inside a Jupyter notebook or export it as an HTML file:

![Data Drift report example](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-41d198e2706e14e1a06b53f136be2bd03b5f1c70%2Freport_example_data_drift-min.png?alt=media)

The Report output is also available as JSON or Python dictionary. This "text" version returns any new calculated values and, optionally, some other useful information such as histogram bins. You can also define what to include. Example:

```python
{'timestamp': '2022-10-26 17:46:47.214403',
 'metrics': {'DatasetDriftMetric': {'threshold': 0.5,
   'number_of_columns': 15,
   'number_of_drifted_columns': 5,
   'share_of_drifted_columns': 0.3333333333333333,
   'dataset_drift': False},
  'DataDriftTable': {'number_of_columns': 15,
   'number_of_drifted_columns': 5,
   'share_of_drifted_columns': 0.3333333333333333,
   'dataset_drift': False,
   'drift_by_columns': {'age': {'column_name': 'age',
     'column_type': 'num',
     'stattest_name': 'Wasserstein distance (normed)',
     'drift_score': 0.18534692319042428,
     'drift_detected': True,
     'threshold': 0.1}}}}}
```

You can also export the Report output as an Evidently `snapshot`. This is a more complete JSON that allows recreating the original HTML Report. Use this option if you want to enable logging and continuous monitoring of the model or data performance.

You can read more about Monitoring here:

{% content-ref url="../user-guide/monitoring/monitoring\_overview" %}
[monitoring\_overview](https://francesco.gitbook.io/docs.evidentlyai.com/user-guide/monitoring/monitoring_overview)
{% endcontent-ref %}

You can calculate most Reports for a single dataset. If you pass two datasets, they will show a side-by-side comparison.

You can generate a Report by listing individual **Metrics** to include in it. You can also run one of the **Presets** that cover a specific aspect of the model or data performance.

### What is a Metric Preset?

A **Metric Preset** is a pre-built Report that combines Metrics for a particular use case.

You can think of it as a template. For example, there is a Preset to check for Data Drift (`DataDriftPreset`), Data Quality (`DataQualityPreset`), or Regression Performance (`RegressionPreset`).

![ColumnValueRangeMetric](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-a8ab3ebb39a6dc1e624db45cde83454ab00bd79d%2Fevidently_reports_min.png?alt=media)

You can explore all Metrics and Presets here:

{% content-ref url="../reference/all-metrics" %}
[all-metrics](https://francesco.gitbook.io/docs.evidentlyai.com/reference/all-metrics)
{% endcontent-ref %}

### When to use Reports

You can use Reports at different stages of the ML lifecycle: from exploratory data analysis and model validation to production monitoring and debugging.

**Debugging and exploration**. Reports are best for visual analysis of the data or model performance. For example, during model quality evaluation on the training set, when debugging the model quality decay, or comparing two models.

**Metric logging**. You can also add a model or data evaluation step in the ML pipeline, get outputs as a JSON or an Evidently `snapshot` and log it to later visualize and track model and data performance over time.

**Reporting and documentation**. You can also use Evidently reports to share results with the team and stakeholders or log them as documentation. For example, you can record the model performance results after training.

## Tests and Test Suites

### What is a Test?

Tests help perform structured data and ML model performance checks. They explicitly verify expectations about your data and model.

A **Test** is a Metric with a condition. It calculates a value and compares it against the defined threshold.

If the condition is satisfied, the test returns a **success**.

If you choose to get a visual output with the test results, you will see the current value of the metric and the test condition. On expand, you will get a supporting visualization.

Here is an example of a column-level Test that evaluates the mean value stability:

![Mean value stability test](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-dae87be054d3fbd10ac3d8dd1d7f4bed0391ce4d%2Ftest_example_success_data-min.png?alt=media)

Here is an example of a dataset-level Test that evaluates model error:

![Root mean square error test](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-079217129a6df723597a77eaaf840db2d6666a35%2Ftest_example_success_model-min.png?alt=media)

If the condition is not satisfied, the Test returns a **fail**:

![Data drift per feature test](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-da1fd7901212be4299dc4b5ab05e5ad12c3170b5%2Ftest_example_fail-min.png?alt=media)

If the Test execution fails, it will return an error.

Evidently contains 70+ individual Tests that cover different aspects of model and data quality.

You can set test conditions on your own or pass the reference dataset to auto-generate test conditions. You can also run most of the Tests using defaults even if you do not pass a reference dataset: the tests will use heuristics and dummy models.

### What is a Test Suite?

In most cases, you’d want to run more than one check.

You can list multiple Tests and execute them together in a **Test Suite**. You will see a summary of the results:

![Custom test suite example](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-4566c0f296f765cf3eef8081e353c8b6f6aa4286%2Ftest_suite_example-min.png?alt=media)

If you include a lot of Tests, you can navigate the output by groups:

![No target performance test suite example](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-219a2872a5cd09c4fe5c9c062a6e47a513f59b03%2Ftest_suite_navigation-min.png?alt=media)

Test output is available as an interactive HTML report, JSON, Python dictionary, or evidently `snapshots` for logging and monitoring.

You can create your Test Suite from individual Tests or use one of the existing **Presets**.

### What is a Test Preset?

A **Test Preset** is a pre-built Test Suite that combines checks for a particular use case.

You can think of it as a template to start with. For example, there is a Preset to check for Data Quality (`DataQualityTestPreset`), Data Stability (`DataStabilityTestPreset`), or Regression model performance (`RegressionTestPreset`).

![Regression performance test suite example](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-764e3c32682ba2c2914bf1bacd0d3070c2156f14%2Ftest_preset_example-min.png?alt=media)

You can explore all Tests and Presets here:

{% content-ref url="../reference/all-tests" %}
[all-tests](https://francesco.gitbook.io/docs.evidentlyai.com/reference/all-tests)
{% endcontent-ref %}

### When to use Test Suites

For **test-based monitoring** of production ML models: tests are best suited for integration in ML prediction pipelines. You can easily integrate Evidently Tests with workflow management tools like Airflow.

You can use them to perform batch checks for your data or models.

For example, you can run the tests when you:

* get a new batch of the input data
* generate a new set of predictions
* receive a new batch of the labeled data
* want to check on your model on a schedule

![Model lifecycle](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-adb55e3ac3c2840ffbc846a7c31b4ff1b82a1be7%2Ftest_suite_lifecycle-min.png?alt=media)

You can then build a conditional workflow based on the result of the tests: for example, generate a visual report for debugging, trigger model retraining, or send an alert. You can also visualize the test results over time in the Evidently Monitoring UI.

![](https://3833155839-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FQkv2FmdD03Bpw5VKJjIF%2Fuploads%2Fgit-blob-74cbf354caa649bce03758b2cc5fd27345b1406e%2Fevidently_ml_monitoring_main.png?alt=media)

**During model development**: you can also use tests during model development and validation. For example, you can run tests to evaluate the data quality of the new feature set or to compare test performance to training.

## Test Suites or Reports?

**Reports** and **Test Suites** are complementary interfaces.

**Reports** are best for debugging, exploratory and ad hoc analytics. They focus on interactive visualizations and do not require setting any expectations upfront. You can use them, for example, when you just put a model in production and want to closely monitor the performance. It is best to use Reports on smaller datasets or sample your data first.

**Test Suites** are best for automation. Use them when you can set up expectations upfront (or derive them from the reference dataset). Tests force you to think through what you expect from your data and models, and you can run them at scale, only reacting to the failure alerts. You can use Test Suites on larger datasets since they do not include heavy visuals.

You can also use both Reports and Test Suites. For example, run tests for automated model checks and if tests fail, use Reports for visual debugging.