{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "# Quickstart guide" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This example demonstrates how to build a simple content-based audio retrieval model and evaluate the retrieval accuracy on a small song dataset, CAL500. This dataset consists of 502 western pop songs, performed by 499 unique artists. Each song is tagged by at least three people using a standard survey and a fixed tag vocabulary of 174 musical concepts.\n", "\n", "This package includes a loading utility for getting and processing this dataset, which makes loading quite easy." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from cbar.datasets import fetch_cal500\n", "\n", "X, Y = fetch_cal500()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Calling `fetch_cal500()` initally downloads the CAL500 dataset to a subfolder of your home directory. You can specify a different location using the `data_home` parameter (`fetch_cal500(data_home='path')`). Subsequents calls simply load the dataset.\n", "\n", "The raw dataset consists of about 10,000 39-dimensional features vectors\n", "per minute of audio content which were created by\n", "\n", "1. Sliding a half-overlapping short-time window of 12 milliseconds over each song's waveform data.\n", "2. Extracting the 13 mel-frequency cepstral coefficients.\n", "3. Appending the instantaneous first-order and second-order derivatives.\n", "\n", "Each song is, then, represented by exactly 10,000 randomly subsampled, real-valued feature vectors as a *bag-of-frames*. The *bag-of-frames* features are further processed into one *k*-dimensional feature vector by encoding the feature vectors using a codebook and pooling them into one compact vector.\n", "\n", "Specifically, *k*-means is used to cluster all frame vectors into *k* clusters. The resulting cluster centers correspond to the codewords in the codebook. Each frame vector is assigned to its closest cluster center and a song represented as the counts of frames assigned to each of the *k* cluster centers.\n", "\n", "By default, `fetch_cal500()` uses a codebook size of 512 but this size is easily modified with the `codebook_size` parameter (`fetch_cal500(codebook_size=1024)`)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "((502, 512), (502, 174))" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "X.shape, Y.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's split the data into training data and test data, fit the model on the training data, and evaluate it on the test data. Import and instantiate the model first." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from cbar.loreta import LoretaWARP\n", "\n", "model = LoretaWARP(n0=0.1, valid_interval=1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then split the data and fit the model using the training data." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/David/.virtualenvs/machine-learning/lib/python2.7/site-packages/scipy/linalg/basic.py:884: RuntimeWarning: internal gelsd driver lwork query error, required iwork dimension not returned. This is likely the result of LAPACK bug 0038, fixed in LAPACK 3.2.2 (released July 21, 2010). Falling back to 'gelss' driver.\n", " warnings.warn(mesg, RuntimeWarning)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "iter: 0, stepsize: 0.100, P10: 0.168, AP: 0.517, loss: 0.000\n", "iter: 1000, stepsize: 0.100, P10: 0.212, AP: 0.176, loss: 0.999\n", "iter: 2000, stepsize: 0.100, P10: 0.197, AP: 0.177, loss: 0.993\n", "iter: 3000, stepsize: 0.100, P10: 0.205, AP: 0.179, loss: 0.975\n", "iter: 4000, stepsize: 0.100, P10: 0.215, AP: 0.181, loss: 0.966\n", "iter: 5000, stepsize: 0.100, P10: 0.191, AP: 0.178, loss: 0.962\n", "iter: 6000, stepsize: 0.100, P10: 0.189, AP: 0.176, loss: 0.941\n", "iter: 7000, stepsize: 0.100, P10: 0.185, AP: 0.175, loss: 0.937\n", "iter: 8000, stepsize: 0.100, P10: 0.186, AP: 0.175, loss: 0.919\n", "iter: 9000, stepsize: 0.100, P10: 0.191, AP: 0.175, loss: 0.919\n", "iter: 10000, stepsize: 0.100, P10: 0.184, AP: 0.173, loss: 0.904\n", "iter: 11000, stepsize: 0.100, P10: 0.188, AP: 0.171, loss: 0.895\n", "iter: 12000, stepsize: 0.100, P10: 0.180, AP: 0.171, loss: 0.883\n", "iter: 13000, stepsize: 0.100, P10: 0.198, AP: 0.172, loss: 0.881\n", "2016-12-29 17:36:30,046 [MainThread ] [WARNI] max_dips reached, stopped at 14000 iterations.\n", "CPU times: user 11.5 s, sys: 375 ms, total: 11.9 s\n", "Wall time: 12 s\n" ] }, { "data": { "text/plain": [ "cbar.loreta.LoretaWARP(max_iter=100000, k=30, n0=0.1, n1=0.0, rank_thresh=0.1, lambda_=0.1, loss='warp', max_dips=10, valid_interval=1000)" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cbar.cross_validation import train_test_split_plus\n", "\n", "(X_train, X_test,\n", " Y_train, Y_test,\n", " Q_vec, weights) = train_test_split_plus(X, Y)\n", "\n", "%time model.fit(X_train, Y_train, Q_vec, X_test, Y_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, predict the scores for each query with all songs. Ordering the songs from highest to lowest score corresponds to the ranking." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": false }, "outputs": [], "source": [ "Y_score = model.predict(Q_vec, X_test)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Evaluate the predictions." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "defaultdict(list,\n", " {1: [0.17682926829268295],\n", " 2: [0.14634146341463417],\n", " 3: [0.15040650406504064],\n", " 4: [0.1443089430894309],\n", " 5: [0.14339430894308947],\n", " 6: [0.14705284552845529],\n", " 7: [0.15779616724738676],\n", " 8: [0.17079703832752613],\n", " 9: [0.17630178087495163],\n", " 10: [0.17976432442895857],\n", " 11: [0.18487488121634463],\n", " 12: [0.19447699996480486],\n", " 13: [0.20046400076887883],\n", " 14: [0.20617928819148332],\n", " 15: [0.21475927527756794],\n", " 16: [0.22658865524719185],\n", " 17: [0.23965035741398727],\n", " 18: [0.24535472896305033],\n", " 19: [0.25569593647729272],\n", " 20: [0.25993391964659807]})" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from cbar.evaluation import Evaluator\n", "from cbar.utils import make_relevance_matrix\n", "\n", "n_relevant = make_relevance_matrix(Q_vec, Y_train).sum(axis=1)\n", "\n", "evaluator = Evaluator()\n", "evaluator.eval(Q_vec, weights, Y_score, Y_test, n_relevant)\n", "evaluator.prec_at" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Cross-validation\n", "\n", "The `cv` function in the `cross_validation` module offers an easy way to evaluate a retrieval method on multiple splits of the data. Let's run the same experiment on three folds." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": false }, "outputs": [], "source": [ "from cbar.cross_validation import cv" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2016-12-29 17:36:30,273 [MainThread ] [INFO ] Running CV with 3 folds ...\n", "2016-12-29 17:36:58,463 [MainThread ] [INFO ] Validating fold 0 ...\n", "iter: 0, stepsize: 0.100, P10: 0.160, AP: 0.515, loss: 0.000\n", "iter: 1000, stepsize: 0.100, P10: 0.175, AP: 0.167, loss: 0.999\n", "iter: 2000, stepsize: 0.100, P10: 0.193, AP: 0.171, loss: 0.989\n", "iter: 3000, stepsize: 0.100, P10: 0.185, AP: 0.168, loss: 0.980\n", "iter: 4000, stepsize: 0.100, P10: 0.182, AP: 0.167, loss: 0.957\n", "iter: 5000, stepsize: 0.100, P10: 0.183, AP: 0.169, loss: 0.953\n", "iter: 6000, stepsize: 0.100, P10: 0.191, AP: 0.168, loss: 0.923\n", "iter: 7000, stepsize: 0.100, P10: 0.177, AP: 0.168, loss: 0.916\n", "iter: 8000, stepsize: 0.100, P10: 0.164, AP: 0.166, loss: 0.902\n", "iter: 9000, stepsize: 0.100, P10: 0.166, AP: 0.163, loss: 0.892\n", "iter: 10000, stepsize: 0.100, P10: 0.165, AP: 0.164, loss: 0.886\n", "iter: 11000, stepsize: 0.100, P10: 0.162, AP: 0.163, loss: 0.864\n", "2016-12-29 17:37:09,004 [MainThread ] [WARNI] max_dips reached, stopped at 12000 iterations.\n", "2016-12-29 17:37:09,165 [MainThread ] [INFO ] Validating fold 1 ...\n", "iter: 0, stepsize: 0.100, P10: 0.182, AP: 0.512, loss: 0.000\n", "iter: 1000, stepsize: 0.100, P10: 0.177, AP: 0.161, loss: 1.000\n", "iter: 2000, stepsize: 0.100, P10: 0.174, AP: 0.165, loss: 0.994\n", "iter: 3000, stepsize: 0.100, P10: 0.165, AP: 0.165, loss: 0.975\n", "iter: 4000, stepsize: 0.100, P10: 0.169, AP: 0.162, loss: 0.954\n", "iter: 5000, stepsize: 0.100, P10: 0.172, AP: 0.164, loss: 0.931\n", "iter: 6000, stepsize: 0.100, P10: 0.167, AP: 0.160, loss: 0.922\n", "iter: 7000, stepsize: 0.100, P10: 0.170, AP: 0.164, loss: 0.903\n", "iter: 8000, stepsize: 0.100, P10: 0.179, AP: 0.163, loss: 0.897\n", "iter: 9000, stepsize: 0.100, P10: 0.167, AP: 0.162, loss: 0.888\n", "iter: 10000, stepsize: 0.100, P10: 0.164, AP: 0.161, loss: 0.869\n", "iter: 11000, stepsize: 0.100, P10: 0.162, AP: 0.162, loss: 0.868\n", "2016-12-29 17:37:19,624 [MainThread ] [WARNI] max_dips reached, stopped at 12000 iterations.\n", "2016-12-29 17:37:19,781 [MainThread ] [INFO ] Validating fold 2 ...\n", "iter: 0, stepsize: 0.100, P10: 0.152, AP: 0.517, loss: 0.000\n", "iter: 1000, stepsize: 0.100, P10: 0.175, AP: 0.166, loss: 1.000\n", "iter: 2000, stepsize: 0.100, P10: 0.178, AP: 0.168, loss: 0.990\n", "iter: 3000, stepsize: 0.100, P10: 0.178, AP: 0.169, loss: 0.970\n", "iter: 4000, stepsize: 0.100, P10: 0.165, AP: 0.169, loss: 0.961\n", "iter: 5000, stepsize: 0.100, P10: 0.161, AP: 0.162, loss: 0.953\n", "iter: 6000, stepsize: 0.100, P10: 0.153, AP: 0.160, loss: 0.913\n", "iter: 7000, stepsize: 0.100, P10: 0.158, AP: 0.160, loss: 0.909\n", "iter: 8000, stepsize: 0.100, P10: 0.159, AP: 0.160, loss: 0.901\n", "iter: 9000, stepsize: 0.100, P10: 0.156, AP: 0.162, loss: 0.886\n", "iter: 10000, stepsize: 0.100, P10: 0.162, AP: 0.160, loss: 0.885\n", "iter: 11000, stepsize: 0.100, P10: 0.157, AP: 0.158, loss: 0.863\n", "iter: 12000, stepsize: 0.100, P10: 0.159, AP: 0.161, loss: 0.864\n", "iter: 13000, stepsize: 0.100, P10: 0.168, AP: 0.160, loss: 0.852\n", "2016-12-29 17:37:32,020 [MainThread ] [WARNI] max_dips reached, stopped at 14000 iterations.\n" ] } ], "source": [ "cv('cal500', 512, n_folds=3, method='loreta', n0=0.1, valid_interval=1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cross-validation results including retrieval method parameters are written to a JSON file. For each dataset three separate result files for mean average precision (MAP), precision-at-*k*, and precision-at-10 as a function of relevant training examples are written to disk. Here are the mean average precision values of the last cross-validation run." ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": false }, "outputs": [ { "data": { "text/plain": [ "[0.15721287889679258, 0.1559216267824481, 0.16173633269857096]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import json\n", "import os\n", "from cbar.settings import RESULTS_DIR\n", "\n", "results = json.load(open(os.path.join(RESULTS_DIR, 'cal500_ap.json')))\n", "results[results.keys()[-1]]['precision']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Start cross-validation with the CLI\n", "\n", "This package comes with a simple CLI which makes it easy to start cross-validation experiments from the command line. The CLI enables you to specify a dataset and a retrieval method as well as additional options in one line.\n", "\n", "To start an experiment on the CAL500 dataset with the LORETA retrieval method, use the following command.\n", "\n", "```\n", "$ cbar crossval --dataset cal500 loreta \n", "```\n", "\n", "This simple command uses all the default parameters for LORETA but you can specify all parameters as arguments to the `loreta` command. To see the available options for the `loreta` command, ask for help like this.\n", "\n", "```\n", "$ cbar crossval loreta --help\n", "Usage: cbar crossval loreta [OPTIONS]\n", "\n", "Options:\n", " -n, --max-iter INTEGER Maximum number of iterations\n", " -i, --valid-interval INTEGER Rank of parameter matrix W\n", " -k INTEGER Rank of parameter matrix W\n", " --n0 FLOAT Step size parameter 1\n", " --n1 FLOAT Step size parameter 2\n", " -t, --rank-thresh FLOAT Threshold for early stopping\n", " -l, --lambda FLOAT Regularization constant\n", " --loss [warp|auc] Loss function\n", " -d, --max-dips INTEGER Maximum number of dips\n", " -v, --verbose Verbosity\n", " --help Show this message and exit.\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }