LoretaWARP

class cbar.loreta.LoretaWARP(max_iter=100000, k=30, n0=1.0, n1=0.0, rank_thresh=0.1, lambda_=0.1, loss='warp', valid_interval=10000, max_dips=10, verbose=True)

Low Rank Retraction Algorithm with Weighted Approximate-Rank Pairwise loss (WARP loss).

Parameters:
  • max_iter (int) – The maximum number of iterations. One iteration equals one run of sampling the triplet \((q, x_q^+, x_q^-)\).
  • k (int) – The rank of the parameter matrix \(W = AB^T \in \mathbb{R}^{n \times m}\) with \(A \in \mathbb{R}^{n \times k}\) and \(B \in \mathbb{R}^{m \times k}\)
  • n0 (float) – The first step size parameter.
  • n1 (float) – The second step size parameter
  • rank_thresh (float) – The threshold for early-stopping the sampling procedure. Stops sampling after \(x_q^- \cdot rank\_thresh\) samples have been sampled.
  • lambda (float) – The regularization constant \(\lambda\).
  • loss (str, 'warp' or 'auc') – The loss function.
  • max_dips (int) – The maximum number of dips the algorithms is allowed to make
  • valid_interval (int) – The interval at which a validation of the current model state is performed.
  • verbose (bool, optional) – Turns valiation output at each validation interval on or off.
A

array, shape = [n_classes, k] – Factor \(A\) of the factored parameter matrix \(W\) such that \(W = AB^T\) with \(A \in \mathbb{R}^{n \times k}\)

B

array, shape = [n_features, k] – Factor \(B\) of the factored parameter matrix \(W\) such that \(W = AB^T\) \(B \in \mathbb{R}^{m \times k}\)

A_pinv

array, shape = [k, n_classes] – The pseudo-inverse of A.

B_pinv

array, shape = [k, n_features] – The pseudo-inverse of B.

Notes

This class implements the Low Rank Retraction Algorithm with rank-one gradients as described in [1]. The rank-one gradients allow for an efficient, iterative update of the pseudo-inverses based on [2] instead of two expensive \(\mathcal{O}(n^3)\) spectral decomposition in every iteration to compute the pseudo-inversess.

Moreover, the algorithm features the Weighted Approximate-Rank Pairwise loss (WARP loss) [3] which employs a clever sampling scheme to speed up training time.

A similar algorithm with the additional constraint that W is PSD was developed in [4].

References

[1]Shalit, U., Weinshall, D. and Chechik, G., 2012. Online learning in the embedded manifold of low-rank matrices. Journal of Machine Learning Research, 13(Feb), pp.429-458.
[2]Meyer, Jr, C.D., 1973. Generalized inversion of modified matrices. SIAM Journal on Applied Mathematics, 24(3), pp.315-323.
[3]Weston, J., Bengio, S. and Usunier, N., 2010. Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning, 81(1), pp.21-35.
[4]Lim, D. and Lanckriet, G., 2014. Efficient Learning of Mahalanobis Metrics for Ranking. In Proceedings of The 31st International Conference on Machine Learning (pp. 1980-1988).
_initialize_W(n, m)

Initialize the parameter matrix in factored form \(W = AB^T\) as self.A and self.B and initialize the respective pseudo-inverses as self.A_pinv and self.B_pinv.

Parameters:
  • k (int) – The rank of W
  • n (int) – Dimension for A, such that \(A \in \mathbb{R}^{n \times k}\)
  • m (int) – Dimension for B, such that \(B \in \mathbb{R}^{m \times k}\)
_sample_violator(X, Q, q_id, rel_id)

Sample a violator according to the WARP sampling scheme. See [4] for details.

Parameters:
  • X (array-like, shape = [n_samples, n_features]) – Training data
  • Q (array-like, shape = [n_queries, n_classes]) – Training queries
  • q_id (int) – The sampled query id.
  • rel_id (int) – The sampled id of a relevant training example for the given q_id.
Returns:

  • n_drawn (int) – The number of samples drawn until a violator was found.
  • n_irrelevant (int) – The number of irrelevant traning examples for the q_id.
  • rel_minus_irr (array, shape = [n_features]) – The vector difference of the relevant and the irrelvant training example, \(d = x_q^+ - x_q^-\).

fit(X, Y, Q, X_val=None, Y_val=None, weights=None)

Fit the model.

Parameters:
  • X (array-like, shape = [n_samples, n_features]) – Training data
  • Y (array-like, shape = [n_samples, n_classes]) – Training labels
  • Q (array-like, shape = [n_queries, n_classes]) – Training queries
  • X_val (array-like, shape = [n_samples, n_features], optional) – Validation data
  • Y_val (array-like, shape = [n_samples, n_classes], optional) – Validation labels
  • weights (array-like, shape = [n_queries], optional) – Query weights
Returns:

self – The fitted model.

Return type:

LoretaWARP instance

fit_partial(X, Y, Q, X_val=None, Y_val=None, weights=None)

Fit the model. Repeated calls to this method will cause training to resume from the current model state.

Parameters:
  • X (array-like, shape = [n_samples, n_features]) – Training data
  • Y (array-like, shape = [n_samples, n_classes]) – Training labels
  • Q (array-like, shape = [n_queries, n_classes]) – Training queries
  • X_val (array-like, shape = [n_samples, n_features], optional) – Validation data
  • Y_val (array-like, shape = [n_samples, n_classes], optional) – Validation labels
  • weights (array-like, shape = [n_queries], optional) – Query weights
Returns:

self – The fitted model.

Return type:

LoretaWARP instance