Model evaluation

cbar.cross_validation.cv(dataset, codebook_size, multi_word_queries=False, threshold=1, n_folds=3, method='loreta', **kwargs)

Perform cross-validation

This function performs cross-validation of the retrieval methods on different datasets.

Parameters:
  • dataset (str, 'cal500', 'cal10k', or 'freesound') – The dataset on which the retrieval method should be evaluated
  • codebook_size (int) – The codebook size the dataset should be encoded with. The data loading utility cbar.datasets.fetch_cal500() contains more information about the codebook representation of sounds.
  • multi_word_queries (bool, default: False) – If the retrieval method should be evaluated with multi-word queries. Only relevant when dataset == 'freesound'.
  • threshold (int, default: 1) – Only queries with relevant examples in X_train and X_test >= threshold are evaluated.
  • n_folds (int, default: 3) – The number of folds used. Only applies to the CAL500 and Freesound dataset. The CAL10k dataset has 5 pre-defined folds.
  • method (str, 'loreta', 'pamir', or 'random-forest', default: 'loreta') – The retrieval method to be evaluated.
  • kwargs (key-value pairs) – Additionaly keyword arguments are passed to the retrieval methods.
cbar.cross_validation.dataset_for_train_test_split(X_train, X_test, Y_train, Y_test, threshold=1, multi_word_queries=False, scaler='standard')

Make dataset from a train-test-split

This function scales the input data und generates queries and query-weights from the training set vocabulary.

Parameters:
  • X_train (array-like, shape = [n_train_samples, n_features]) – Training set data
  • X_test (array-like, shape = [n_test_samples, n_features]) – Test set data.
  • Y_train (array-like, shape = [n_train_samples, n_classes]) – Training set labels.
  • Y_test (array-like, shape = [n_test_samples, n_classes]) – Test set labels.
  • threshold (int, default: 1) – The threshold ...
  • multi_word_queries (bool, default: False) – Generate multi-word queries from real-world user-queries for the Freesound dataset if set to True. Ultimately calls cbar.preprocess.get_relevant_queries()
  • scaler (str, 'standard' or 'robust', or None) – Use either sklearn.preprocessing.StandardScaler() or sklearn.preprocessing.RobustScaler() to scale the input data
Returns:

  • X_train (array-like, shape = [n_train_samples, n_features]) – The scaled training data.
  • X_test (array-like, shape = [n_test_samples, n_features]) – The scaled test data.
  • Y_train_bin (array-like, shape = [n_train_samples, n_classes]) – The training labels in binary indicator format.
  • Y_test_bin (array-like, shape = [n_test_samples, n_classes]) – The test labels in binary indicator format.
  • Q_vec (array-like, shape = [n_queries, n_classes]) – The query vectors to evaluate
  • weights (array-like, shape = [n_queries]) – The weights used to weight the queries during evaluation. For one-word queries the weight for each query is the same. For multi-word queries the counts from the aggregrated query-log of user-queries are used to weight the queries accordingly.

cbar.cross_validation.validate_fold(X_train, X_test, Y_train, Y_test, Q_vec, weights, evaluator, retrieval_method, **kwargs)

Perform validation on one fold of the data

This function evaluates a retrieval method on one split of a dataset.

Parameters:
  • X_train (pd.DataFrame, shape = [n_train_samples, codebook_size]) – Training data.
  • X_test (pd.DataFrame, shape = [n_test_samples, codebook_size]) – Test data.
  • Y_train (pd.DataFrame, shape = [n_test_samples, n_classes]) – Training tags.
  • Y_train – Test tags.
  • Q_vec (array-like, shape = [n_queries, n_classes]) – The queries to evaluate
  • weights (array-like, shape = [n_queries]) – Ouery weights. Multi-word queries can be weighted to reflect importance to users.
  • evaluator (object) – An instance of cbar.evaluation.Evaluator.
  • retrieval_method (str, 'loreta', 'pamir', or 'random-forest') – The retrieval to be evaluated.
  • kwargs (key-value pairs) – Additionaly keyword arguments are passed to the retrieval methods.
Returns:

params – The retrieval_method‘s parameters used for the evaluation

Return type:

dict

class cbar.evaluation.Evaluator

The Evaluator evaluates a retrieval method, collects the perfromance measures, and keeps values of multiple runs (for example in k-fold cross-validation).

eval(queries, weights, Y_score, Y_test, n_relevant)
Parameters:
  • queries (array-like, shape = [n_queries, n_classes]) – The queries to evaluate
  • weights (int, default: 1) – Ouery weights. Multi-word queries can be weighted to reflect importance to users.
  • Y_score (array-like, shape = [n_queries, n_classes]) – Scores of queries and sounds.
  • Y_test (array-like, shape = [n_samples, n_classes]) – Test set tags associated with each test set song in binary indicator format.
  • n_relevant (array-like, shape = [n_queries]) – The number of relevant sounds in X_train for each query.
to_json(dataset, method, codebook_size, params)

Write the retrieval performance results to a file.

Parameters:
  • dataset (str) – The name of the evaluated dataset.
  • method (str) – The name of the evaluated retrieval method.
  • codebook_size (int) – The codebook size the dataset is encoded with.
  • params (dict) – The method‘s parameters used during the evaluation.
cbar.evaluation.ranking_precision_score(y_true, y_score, k=10)

Precision at rank k

Parameters:
  • y_true (array-like, shape = [n_samples]) – Ground truth (true relevance labels).
  • y_score (array-like, shape = [n_samples]) – Predicted scores.
  • k (int) – Rank.
Returns:

precision@k – Precision at rank k.

Return type:

float