LoretaWARP¶

class cbar.loreta.LoretaWARP(max_iter=100000, k=30, n0=1.0, n1=0.0, rank_thresh=0.1, lambda_=0.1, loss='warp', valid_interval=10000, max_dips=10, verbose=True)¶

Low Rank Retraction Algorithm with Weighted Approximate-Rank Pairwise loss (WARP loss).

Parameters:

max_iter (int) – The maximum number of iterations. One iteration equals one run of sampling the triplet \((q, x_q^+, x_q^-)\).
k (int) – The rank of the parameter matrix \(W = AB^T \in \mathbb{R}^{n \times m}\) with \(A \in \mathbb{R}^{n \times k}\) and \(B \in \mathbb{R}^{m \times k}\)
n0 (float) – The first step size parameter.
n1 (float) – The second step size parameter
rank_thresh (float) – The threshold for early-stopping the sampling procedure. Stops sampling after \(x_q^- \cdot rank\_thresh\) samples have been sampled.
lambda (float) – The regularization constant \(\lambda\).
loss (str, 'warp' or 'auc') – The loss function.
max_dips (int) – The maximum number of dips the algorithms is allowed to make
valid_interval (int) – The interval at which a validation of the current model state is performed.
verbose (bool, optional) – Turns valiation output at each validation interval on or off.

A¶: array, shape = [n_classes, k] – Factor \(A\) of the factored parameter matrix \(W\) such that \(W = AB^T\) with \(A \in \mathbb{R}^{n \times k}\)

B¶: array, shape = [n_features, k] – Factor \(B\) of the factored parameter matrix \(W\) such that \(W = AB^T\) \(B \in \mathbb{R}^{m \times k}\)

A_pinv¶: array, shape = [k, n_classes] – The pseudo-inverse of A.

B_pinv¶: array, shape = [k, n_features] – The pseudo-inverse of B.

Notes

This class implements the Low Rank Retraction Algorithm with rank-one gradients as described in [1]. The rank-one gradients allow for an efficient, iterative update of the pseudo-inverses based on [2] instead of two expensive \(\mathcal{O}(n^3)\) spectral decomposition in every iteration to compute the pseudo-inversess.

Moreover, the algorithm features the Weighted Approximate-Rank Pairwise loss (WARP loss) [3] which employs a clever sampling scheme to speed up training time.

A similar algorithm with the additional constraint that W is PSD was developed in [4].

References

[1]	Shalit, U., Weinshall, D. and Chechik, G., 2012. Online learning in the embedded manifold of low-rank matrices. Journal of Machine Learning Research, 13(Feb), pp.429-458.

[2]	Meyer, Jr, C.D., 1973. Generalized inversion of modified matrices. SIAM Journal on Applied Mathematics, 24(3), pp.315-323.

[3]	Weston, J., Bengio, S. and Usunier, N., 2010. Large scale image annotation: learning to rank with joint word-image embeddings. Machine learning, 81(1), pp.21-35.

[4]	Lim, D. and Lanckriet, G., 2014. Efficient Learning of Mahalanobis Metrics for Ranking. In Proceedings of The 31st International Conference on Machine Learning (pp. 1980-1988).

_initialize_W(n, m)¶

Initialize the parameter matrix in factored form \(W = AB^T\) as self.A and self.B and initialize the respective pseudo-inverses as self.A_pinv and self.B_pinv.

Parameters:	k (int) – The rank of W n (int) – Dimension for A, such that \(A \in \mathbb{R}^{n \times k}\) m (int) – Dimension for B, such that \(B \in \mathbb{R}^{m \times k}\)

_sample_violator(X, Q, q_id, rel_id)¶

Sample a violator according to the WARP sampling scheme. See [4] for details.

Parameters:

X (array-like, shape = [n_samples, n_features]) – Training data
Q (array-like, shape = [n_queries, n_classes]) – Training queries
q_id (int) – The sampled query id.
rel_id (int) – The sampled id of a relevant training example for the given q_id.

Returns:

n_drawn (int) – The number of samples drawn until a violator was found.
n_irrelevant (int) – The number of irrelevant traning examples for the q_id.
rel_minus_irr (array, shape = [n_features]) – The vector difference of the relevant and the irrelvant training example, \(d = x_q^+ - x_q^-\).

fit(X, Y, Q, X_val=None, Y_val=None, weights=None)¶

Fit the model.

Parameters:	X (array-like, shape = [n_samples, n_features]) – Training data Y (array-like, shape = [n_samples, n_classes]) – Training labels Q (array-like, shape = [n_queries, n_classes]) – Training queries X_val (array-like, shape = [n_samples, n_features], optional) – Validation data Y_val (array-like, shape = [n_samples, n_classes], optional) – Validation labels weights (array-like, shape = [n_queries], optional) – Query weights
Returns:	self – The fitted model.
Return type:	LoretaWARP instance

fit_partial(X, Y, Q, X_val=None, Y_val=None, weights=None)¶

Fit the model. Repeated calls to this method will cause training to resume from the current model state.

Parameters:	X (array-like, shape = [n_samples, n_features]) – Training data Y (array-like, shape = [n_samples, n_classes]) – Training labels Q (array-like, shape = [n_queries, n_classes]) – Training queries X_val (array-like, shape = [n_samples, n_features], optional) – Validation data Y_val (array-like, shape = [n_samples, n_classes], optional) – Validation labels weights (array-like, shape = [n_queries], optional) – Query weights
Returns:	self – The fitted model.
Return type:	LoretaWARP instance