Model evaluation¶

cbar.cross_validation.cv(dataset, codebook_size, multi_word_queries=False, threshold=1, n_folds=3, method='loreta', **kwargs)¶

Perform cross-validation

This function performs cross-validation of the retrieval methods on different datasets.

Parameters:

dataset (str, 'cal500', 'cal10k', or 'freesound') – The dataset on which the retrieval method should be evaluated
codebook_size (int) – The codebook size the dataset should be encoded with. The data loading utility cbar.datasets.fetch_cal500() contains more information about the codebook representation of sounds.
multi_word_queries (bool, default: False) – If the retrieval method should be evaluated with multi-word queries. Only relevant when dataset == 'freesound'.
threshold (int, default: 1) – Only queries with relevant examples in X_train and X_test >= threshold are evaluated.
n_folds (int, default: 3) – The number of folds used. Only applies to the CAL500 and Freesound dataset. The CAL10k dataset has 5 pre-defined folds.
method (str, 'loreta', 'pamir', or 'random-forest', default: 'loreta') – The retrieval method to be evaluated.
kwargs (key-value pairs) – Additionaly keyword arguments are passed to the retrieval methods.

cbar.cross_validation.dataset_for_train_test_split(X_train, X_test, Y_train, Y_test, threshold=1, multi_word_queries=False, scaler='standard')¶

Make dataset from a train-test-split

This function scales the input data und generates queries and query-weights from the training set vocabulary.

Parameters:

X_train (array-like, shape = [n_train_samples, n_features]) – Training set data
X_test (array-like, shape = [n_test_samples, n_features]) – Test set data.
Y_train (array-like, shape = [n_train_samples, n_classes]) – Training set labels.
Y_test (array-like, shape = [n_test_samples, n_classes]) – Test set labels.
threshold (int, default: 1) – The threshold ...
multi_word_queries (bool, default: False) – Generate multi-word queries from real-world user-queries for the Freesound dataset if set to True. Ultimately calls cbar.preprocess.get_relevant_queries()
scaler (str, 'standard' or 'robust', or None) – Use either sklearn.preprocessing.StandardScaler() or sklearn.preprocessing.RobustScaler() to scale the input data

Returns:

X_train (array-like, shape = [n_train_samples, n_features]) – The scaled training data.
X_test (array-like, shape = [n_test_samples, n_features]) – The scaled test data.
Y_train_bin (array-like, shape = [n_train_samples, n_classes]) – The training labels in binary indicator format.
Y_test_bin (array-like, shape = [n_test_samples, n_classes]) – The test labels in binary indicator format.
Q_vec (array-like, shape = [n_queries, n_classes]) – The query vectors to evaluate
weights (array-like, shape = [n_queries]) – The weights used to weight the queries during evaluation. For one-word queries the weight for each query is the same. For multi-word queries the counts from the aggregrated query-log of user-queries are used to weight the queries accordingly.

cbar.cross_validation.validate_fold(X_train, X_test, Y_train, Y_test, Q_vec, weights, evaluator, retrieval_method, **kwargs)¶

Perform validation on one fold of the data

This function evaluates a retrieval method on one split of a dataset.

Parameters:	X_train (pd.DataFrame, shape = [n_train_samples, codebook_size]) – Training data. X_test (pd.DataFrame, shape = [n_test_samples, codebook_size]) – Test data. Y_train (pd.DataFrame, shape = [n_test_samples, n_classes]) – Training tags. Y_train – Test tags. Q_vec (array-like, shape = [n_queries, n_classes]) – The queries to evaluate weights (array-like, shape = [n_queries]) – Ouery weights. Multi-word queries can be weighted to reflect importance to users. evaluator (object) – An instance of `cbar.evaluation.Evaluator`. retrieval_method (str, 'loreta', 'pamir', or 'random-forest') – The retrieval to be evaluated. kwargs (key-value pairs) – Additionaly keyword arguments are passed to the retrieval methods.
Returns:	params – The `retrieval_method`‘s parameters used for the evaluation
Return type:	dict

class cbar.evaluation.Evaluator¶

The Evaluator evaluates a retrieval method, collects the perfromance measures, and keeps values of multiple runs (for example in k-fold cross-validation).

eval(queries, weights, Y_score, Y_test, n_relevant)¶

Parameters:

queries (array-like, shape = [n_queries, n_classes]) – The queries to evaluate
weights (int, default: 1) – Ouery weights. Multi-word queries can be weighted to reflect importance to users.
Y_score (array-like, shape = [n_queries, n_classes]) – Scores of queries and sounds.
Y_test (array-like, shape = [n_samples, n_classes]) – Test set tags associated with each test set song in binary indicator format.
n_relevant (array-like, shape = [n_queries]) – The number of relevant sounds in X_train for each query.

to_json(dataset, method, codebook_size, params)¶

Write the retrieval performance results to a file.

Parameters:	dataset (str) – The name of the evaluated dataset. method (str) – The name of the evaluated retrieval method. codebook_size (int) – The codebook size the dataset is encoded with. params (dict) – The `method`‘s parameters used during the evaluation.

cbar.evaluation.ranking_precision_score(y_true, y_score, k=10)¶

Precision at rank k

Parameters:	y_true (array-like, shape = [n_samples]) – Ground truth (true relevance labels). y_score (array-like, shape = [n_samples]) – Predicted scores. k (int) – Rank.
Returns:	precision@k – Precision at rank k.
Return type:	float