verona.evaluation.metrics package¶

verona.evaluation.metrics.event module¶

verona.evaluation.metrics.event.get_accuracy(predictions: ~numpy.array, ground_truths: ~numpy.array, preds_format: ~typing.Literal['labels', 'onehot'], gt_format: ~typing.Literal['labels', 'onehot']) -> (<class 'float'>, <class 'int'>, <class 'int'>)[source]¶

Calculates the accuracy score, including the ratio of correct predictions, total number of correct predicted values, and total number of predictions. Both predictions and ground truth can be specified as labels or one-hot vectors.

Parameters:

predictions (np.array) – NumPy Array containing the model’s predictions.
ground_truths (np.array) – NumPy Array containing the ground truths.
preds_format (Literal['labels', 'onehot']) – Format of the predictions. 'label' for labels and 'onehot' for one-hot vectors.
gt_format (Literal['labels', 'onehot']) – Format of the ground truths. 'label' for labels and 'onehot' for one-hot vectors.

Returns:

Float indicating the accuracy ratio, integer for the number of correct predictions,: and integer for the total number of predictions.

Return type:

tuple

Examples

>>> ground_truth = np.array([1, 3, 4, 0])
>>> preds_onehot = np.array([[0.2, 0.7, 0.06, 0.04], [0.1, 0.2, 0.6, 0.1], [0.9, 0.05, 0.04, 0.01], [0.1, 0.5, 0.3, 0.1]])
>>> accuracy, correct, total = get_accuracy(preds_onehot, ground_truth, preds_format='onehot', gt_format='labels')
>>> print(f'{accuracy} - {correct} - {total}')
0.25 - 1 - 4

verona.evaluation.metrics.event.get_brier_loss(predictions: array, ground_truths: array, gt_format: Literal['labels', 'onehot']) → float[source]¶

Calculates the Brier Score Loss adapted to multi-class predictions. The formula for the Brier Score Loss is [ ext{BSL} =

rac{1}{N} sum_{i=1}^{N} (f_i - o_i)^2 ]

where ( f_i ) is the predicted probability for the true class for observation ( i ), ( o_i ) is the actual outcome for observation ( i ) (1 if true class, 0 otherwise), and ( N ) is the total number of observations.

As a measure of loss, the closer to 0, the better the predictions, while higher values indicate worse predictions.

Args:

predictions (np.array): Array of shape (n_samples, n_classes) containing the predictions done by the model as probabilities. ground_truths (np.array): Array containing the ground truths. gt_format (Literal[‘labels’, ‘onehot’]): Format of the ground truth. If 'label', the ground truth array contains the labels of the correct activities/attributes, from which the one-hot vectors are internally extracted. If 'onehot', the ground truths array contains the one-hot representation of the correct values.

Returns:

float: Brier Score Loss, a value equal or greater than zero. Smaller values (close to 0) indicate smaller error (better predictions), and larger values indicate larger error (worse predictions).

Examples:

>>> ground_truth = np.array([1, 3, 4, 0])
>>> preds_onehot = np.array([[0.2, 0.7, 0.06, 0.04], [0.1, 0.2, 0.6, 0.1], [0.9, 0.05, 0.04, 0.01], [0.1, 0.5, 0.3, 0.1]])
>>> brier_loss = event.get_brier_loss(preds_onehot, ground_truth, gt_format='labels')
>>> print(brier_loss)
1.06235

verona.evaluation.metrics.event.get_f1_score(predictions: ~numpy.array, ground_truths: ~numpy.array, average: ~typing.Literal['micro', 'macro', 'weighted'], preds_format: ~typing.Literal['labels', 'onehot'], gt_format: ~typing.Literal['labels', 'onehot']) -> (<class 'float'>, <class 'float'>, <class 'float'>)[source]¶

Calculates the F1-Score, which is the harmonic mean of precision and recall, between the predictions and the ground truth. Equivalent to F-beta score with ‘beta’ = 1. Returns the F1-score, precision, and recall used for the calculation.

Parameters:

predictions (np.array) – NumPy Array containing the model’s predictions.
ground_truths (np.array) – NumPy Array containing the ground truths.
average (Literal['micro', 'macro', 'weighted']) – Type of averaging to be performed on data.
preds_format (Literal['labels', 'onehot']) – Format of the predictions. 'label' for labels and 'onehot' for one-hot vectors.
gt_format (Literal['labels', 'onehot']) – Format of the ground truths. 'label' for labels and 'onehot' for one-hot vectors.

Returns:

Float for the F1-score, float for the precision, and float for the recall.

Return type:

tuple

Examples

>>> ground_truth = np.array([1, 3, 4, 0])
>>> preds_labels = np.array([1, 2, 0, 1])
>>> f1, precision, recall = get_f1_score(preds_labels, ground_truth, average='macro', preds_format='labels', gt_format='labels')
>>> print(f'{f1} - {precision} - {recall}')
0.13333333333333333 - 0.1 - 0.2

verona.evaluation.metrics.event.get_fbeta(predictions: ~numpy.array, ground_truths: ~numpy.array, beta: float, average: ~typing.Literal['micro', 'macro', 'weighted'], preds_format: ~typing.Literal['labels', 'onehot'], gt_format: ~typing.Literal['labels', 'onehot']) -> (<class 'float'>, <class 'float'>, <class 'float'>)[source]¶

Calculates the F-beta score between the predictions and the ground truth. The F-beta score is the weighted harmonic mean of precision and recall.

Parameters:

predictions (np.array) – NumPy Array containing the model’s predictions.
ground_truths (np.array) – NumPy Array containing the ground truths.
beta (float) – Ratio of recall importance to precision importance.
average (Literal['micro', 'macro', 'weighted']) – Type of averaging to be performed on data.
preds_format (Literal['labels', 'onehot']) – Format of the predictions. 'label' for labels and 'onehot' for one-hot vectors.
gt_format (Literal['labels', 'onehot']) – Format of the ground truths. 'label' for labels and ‘onehot’ for one-hot vectors.

Returns:

Float for the F-beta score, float for the precision, and float for the recall.

Return type:

tuple

Examples

>>> ground_truth = np.array([1, 3, 4, 0])
>>> preds_labels = np.array([1, 2, 0, 1])
>>> fbeta, precision, recall = get_fbeta(preds_labels, ground_truth, beta=0.5, average='weighted', preds_format='labels', gt_format='labels')
>>> print(f'{fbeta} - {precision} - {recall}')
0.1388888888888889 - 0.125 - 0.25

verona.evaluation.metrics.event.get_mcc(predictions: array, ground_truths: array, preds_format: Literal['labels', 'onehot'], gt_format: Literal['labels', 'onehot']) → float[source]¶

Calculates the Matthews correlation coefficient (MCC), a value between -1 and +1. A coefficient of +1 represents a perfect prediction, 0 an average random prediction, and -1 an inverse prediction.

Parameters:

predictions (np.array) – Array of predictions from the model.
ground_truths (np.array) – Array of ground truth labels.
preds_format (Literal['labels', 'onehot']) – Format of the predictions.
gt_format (Literal['labels', 'onehot']) – Format of the ground truths.

Returns:

Matthews Correlation Coefficient, between -1 and +1.

Return type:

float

Examples

>>> ground_truth = np.array([1, 3, 4, 0])
>>> preds_labels = np.array([1, 2, 0, 1])
>>> mcc = event.get_mcc(preds_labels, ground_truth, preds_format='labels', gt_format='labels')
>>> print(mcc)
0.09128709291752768

verona.evaluation.metrics.event.get_precision(predictions: array, ground_truths: array, average: Literal['micro', 'macro', 'weighted'], preds_format: Literal['labels', 'onehot'], gt_format: Literal['labels', 'onehot']) → float[source]¶

Calculates the precision using the formula (

rac{{ ext{{tp}}}}{{ ext{{tp}} + ext{{fp}}}} )

where ‘tp’ is the number of true positives and ‘fp’ the number of false positives.

Args:

predictions (np.array): Array of predictions from the model. ground_truths (np.array): Array of ground truth labels. average (Literal[‘micro’, ‘macro’, ‘weighted’]): Type of averaging performed on the data. preds_format (Literal[‘labels’, ‘onehot’]): Format of the predictions. gt_format (Literal[‘labels’, ‘onehot’]): Format of the ground truths.

Returns:

float: Precision score between 0 and 1.

Examples:

>>> ground_truth = np.array([1, 3, 4, 0])
>>> preds_labels = np.array([1, 2, 0, 1])
>>> precision = get_precision(preds_labels, ground_truth, average='macro', preds_format='labels', gt_format='labels')
>>> print(precision)
0.1

verona.evaluation.metrics.event.get_recall(predictions: array, ground_truths: array, average: Literal['micro', 'macro', 'weighted'], preds_format: Literal['labels', 'onehot'], gt_format: Literal['labels', 'onehot']) → float[source]¶

Calculates the recall using the formula (

rac{{ ext{{tp}}}}{{ ext{{tp}} + ext{{fn}}}} )

where ‘tp’ is the number of true positives and ‘fn’ the number of false negatives.

Args:

Returns:

float: Recall score between 0 and 1.

Examples:

>>> ground_truth = np.array([1, 3, 4, 0])
>>> preds_labels = np.array([1, 2, 0, 1])
>>> recall = get_recall(preds_labels, ground_truth, average='macro', preds_format='labels', gt_format='labels')
>>> print(recall)
0.2

verona.evaluation.metrics.suffix module¶

verona.evaluation.metrics.suffix.get_damerau_levenshtein_score(predictions: list[array], ground_truths: list[array], preds_format: Literal['labels', 'onehot'], gt_format: Literal['labels', 'onehot'], eoc: str | int = None) → float[source]¶

Calculates the Damerau-Levenshtein score between the predictions and the real values.

The Damerau-Levenshtein distance represents the number of insertions, deletions, substitutions, and transpositions required to change the first sequence into the second. In this function, the score is normalized by the size of the longest sequence, and the value is obtained by subtracting the normalized distance from 1.

Parameters:

predictions (list[np.array]) – List containing the predicted suffixes as NumPy Arrays.
ground_truths (list[np.array]) – List containing the ground truth suffixes as NumPy Arrays.
preds_format (Literal['labels', 'onehot']) – Format of the predictions. If 'label', the predictions array contains the labels of the activities/attributes predicted. If 'onehot', the predictions array contains vectors of probabilities, and the labels are internally extracted based on the highest value element for the metric calculation.
gt_format (Literal['labels', 'onehot']) – Format of the ground truth. If 'label', the ground truth array contains the labels of the correct activities/attributes. If 'onehot', the ground truth array contains the one-hot representation of the correct values, and the labels are internally extracted for the metric calculation.
eoc (Union[str, int], optional) – Label of the End-of-Case (EOC) which is an element that signifies the end of the trace/suffix.

Returns:

Damerau-Levenshtein score between 0 and 1. A lower value indicates worse suffix prediction, whereas a higher value indicates a prediction closer to the actual suffix.

Return type:

float

Examples

>>> ground_truths = [np.array([0, 1, 2, 3, 4])]
>>> predictions = [np.array([0, 12, 2])]
>>> dl_score = suffix.get_damerau_levenshtein_score(predictions, ground_truths, preds_format='labels', gt_format='labels')
>>> print(dl_score)
0.4

verona.evaluation.metrics.time module¶

verona.evaluation.metrics.time.get_mae(predictions: array, ground_truths: array, reduction: Literal['mean', 'none'] = 'mean') → float | array[source]¶

Calculates the Mean Absolute Error (MAE) between the predicted and real times.

Parameters:

predictions (np.array) – NumPy Array containing the predicted times as floats.
ground_truths (np.array) – NumPy Array containing the real times as floats.
reduction (Literal['mean', 'none'], optional) – Determines the type of reduction applied to the MAE. If 'mean', calculates the average MAE for all pairs of prediction and ground truth. If 'none', returns all MAE values for the individual pairs without reduction. Default is 'mean'

Returns:

MAE as a single float if reduction is ‘mean’, or as a NumPy Array if reduction is 'none'.

Return type:

Union[float, np.array]

verona.evaluation.metrics.time.get_mse(predictions: array, ground_truths: array, reduction: Literal['mean', 'none'] = 'mean') → float | array[source]¶

Calculates the Mean Square Error (MSE) between the predicted and real times.

Parameters:

predictions (np.array) – NumPy Array containing the predicted times as floats.
ground_truths (np.array) – NumPy Array containing the real times as floats.
reduction (Literal['mean', 'none'], optional) – Determines the type of reduction applied to the MSE. If 'mean', calculates the average MSE for all pairs of prediction and ground truth. If 'none', returns all MSE values for the individual pairs without reduction. DEfault is 'mean'.

Returns:

MSE as a single float if reduction is ‘mean’, or as a NumPy Array if reduction is 'none'.

Return type:

Union[float, np.array]

verona.evaluation.metrics.utils module¶

verona.evaluation.metrics.utils.get_metric_by_prefix_len(metric: Literal['accuracy', 'fbeta', 'f1_score', 'precision', 'recall', 'mcc', 'brier_loss', 'damerau_levenshtein', 'mae', 'mse'], predictions: array, ground_truths: array, prefixes: list[DataFrame], preds_format: Literal['labels', 'onehot'], gt_format: Literal['labels', 'onehot'], average: Literal['micro', 'macro', 'weighted'] = None, beta: float = None, eoc: str | int = None) → DataFrame[source]¶

Calculates the value of the specified metric individually for each prefix size.

Generates a Pandas DataFrame in which each column represents a prefix size with: 1- its corresponding value for the selected metric, 2- the number of prefixes with that length.

Parameters:

metric (Literal['accuracy', 'fbeta', 'f1_score', 'precision', 'recall', 'mcc', 'brier_loss', 'damerau_levenshtein', 'mae', 'mse']) – Metric to be calculated.
predictions (np.array) – Array of shape (n_samples, n_classes) containing the predictions done by the model as probabilities. The predictions on the array should respect the same order as their respective prefixes and their ground_truths.
ground_truths (np.array) – Array containing the ground truths. The grounds truths on the array should respect the same order as their respective prefixes and predictions.
prefixes (list[pd.DataFrame]) – List containing the prefixes as Pandas DataFrame. The prefixes on the list should respect the same order as their respective predicates and ground_truths.
preds_format (Literal['labels', 'onehot'], optional) – Format of the predictions. 'label' for labels and 'onehot' for one-hot vectors.
gt_format (Literal['labels', 'onehot'], optional) – Format of the ground truths. 'label' for labels and 'onehot' for one-hot vectors.
average (Literal['micro', 'macro', 'weighted'], optional) – Type of averaging to be performed on data. Only needed for 'fbeta', 'f1_score', 'precision' and 'recall' value in metric parameter.
beta (float, optional) – Ratio of recall importance to precision importance. Only needed for 'fbeta' value in metric parameter.
eoc (Union[str, int], optional) – Label of the End-of-Case (EOC) which is an element that signifies the end of the trace/suffix. Only needed for 'damerau_levenshtein' value in metric parameter.

Returns:

Pandas DataFrame where the columns indicate the size of the prefix and its two values indicate: 1- the value of the metric, 2- the number of prefixes with that size.

Return type:

df_results