114 |
115 | Boil the Frog lets you create a playlist of tracks that gradually takes you from one music style
116 | to another. It's like the proverbial frog in the pot of water. If you heat up the pot slowly enough, the
117 | frog will never notice that he's being made into a stew and jump out of the pot. With a Boil the
118 | frog playlist you can do the same, but with music. You can generate a playlist that will take the
119 | listener from one style of music to the other, without the listener ever noticing that they are being made
120 | into a stew.
121 |
122 |
123 |
124 |
How does it work?
125 |
126 | To create a Boil The Frog playlist, just type in the names of two artists and a playlist will be
127 | generated that takes you gradually, step by step, from the first artist to the second artist. You can
128 | click on any track to hear the track. Click on the first track to hear the whole playlist. If you don't
129 | like a particular artist you can route around that particular artist by clicking the 'bypass' button.
130 | The 'New Track' button will select a different track for an artist.
131 |
132 |
133 | Boil the Frog plays 30 second versions of your tracks. When you find a playlist you like you can save it
134 | to Spotify to listen to the full-length versions.
135 |
151 | To create this app, The Echo Nest artist similarity info is
152 | used to build an artist similarity graph of about
153 | 100,000 of the most popular artists. Each artist in the graph is connected to it's most similar neighbors
154 | according to the Echo Nest artist similarity algorithm.
155 |
156 |
157 |
158 |
159 | When a playlist between two artists is created, the graph is used to find the path between the two artists.
160 | The path isn't necessarily the shortest path through the graph. Instead, priority is given to paths that
161 | travel through artists of similar popularity. If you start and end with a popular artist, you are more
162 | likely to find a path that takes you though other popular artists, and if you start with a long-tail artist
163 | you will likely find a path through other long-tail artists.
164 |
165 | Once the path of artists is found, we need to select the best tracks for the playlist. To do this, we pick
166 | a well-known track for each artist that minimizes the difference in energy between this track, the previous
167 | track and the next track.
168 |
169 | Once we have selected the best tracks, we build a playlist using Spotify's nifty web api.
170 |
171 |
172 |
Who made this?
173 |
174 | This app was built by Paul Lamere. If you like this sort of
175 | thing you may be interested in my blog at Music Machinery.
176 |
177 |
178 |
179 |
180 |
181 |
182 |
183 |
Path created in 10 ms.
184 |
185 |
186 |
195 |
196 |
197 |
198 |
840 |
841 |
842 |
843 |
844 |
859 |
860 |
861 |
862 |
--------------------------------------------------------------------------------
/new-web/gstyles.css:
--------------------------------------------------------------------------------
1 | .non-hero-unit {
2 | margin-top:66px;
3 | padding-bottom:30px;
4 | color: #fff;
5 | background-color: #84BD00;
6 | text-align:center;
7 | }
8 |
9 | .hero-unit h1 {
10 | font-size:64px;
11 | }
12 |
13 | #gallery h1 {
14 | color: #84BD00;
15 | }
16 |
17 | #about h1 {
18 | color: #84BD00;
19 | }
20 |
21 | #about h2 {
22 | color: #84BD00;
23 | }
24 |
25 |
26 | .reg-unit {
27 | color:white;
28 | margin-top:66px;
29 | /*background-color: #84BD00;*/
30 | background-color: #6F7073;
31 | }
32 |
33 | .navbar-nav li a {
34 | cursor:pointer;
35 | }
36 |
37 | .artist:hover {
38 | cursor:pointer;
39 | }
40 |
41 | #options li a:hover {
42 | background-color: #727272;
43 | }
44 |
45 | .option-active {
46 | /*background-color: #6f7073;*/
47 | color: #84BD00 !important;
48 | }
49 |
50 | /*
51 | .reg-unit a {
52 | color:red;
53 | }
54 |
55 | .reg-unit a:hover {
56 | color:red;
57 | }
58 |
59 | .reg-unit a:visited {
60 | color:orange;
61 | }
62 | */
63 |
64 |
65 |
66 | #gallery {
67 | display:none;
68 | }
69 |
70 | #time-info {
71 | display:none;
72 | }
73 |
74 | .gallery-list {
75 | font-size:24px;
76 | }
77 |
78 | #about {
79 | display:none;
80 | }
81 |
82 |
83 |
84 | #main {
85 | margin-top:20px;
86 | margin-left:8px;
87 | }
88 |
89 | .adiv {
90 | width:294px;
91 | height:344px;
92 | background-size:100%;
93 | background-repeat:no-repeat;
94 | overflow:hidden;
95 | position:relative;
96 | /*background-color: #122;*/
97 | margin-bottom:2px;
98 | padding:4px;
99 | background-color:#ddd;
100 | }
101 |
102 | #go {
103 | }
104 |
105 | #xbuttons {
106 | width:100%;
107 | margin-left:auto;
108 | margin-right:auto;
109 | text-align:center;
110 | display:none;
111 | }
112 |
113 | #tweet-span {
114 | position:relative;
115 | top:10px;
116 | }
117 |
118 | .tadiv {
119 | display:inline-block;
120 | width:310px;
121 | height:364px;
122 | position:relative;
123 | /*background-color: #122;*/
124 | margin:3px;
125 | }
126 |
127 |
128 | .is-current {
129 | background-color:#e36b23;
130 | font-size:18px;
131 | }
132 |
133 | #list {
134 | margin-left:4px;
135 | text-align:center;
136 | }
137 |
138 | .adiv:hover {
139 | background-color: #84BD00;
140 | }
141 |
142 | #info {
143 | margin-left:10px;
144 | width:100%;
145 | text-align:center;
146 | margin-bottom:10px;
147 | height:32px;
148 | font-size:18px;
149 | }
150 |
151 |
152 | .track-info {
153 | position:absolute;
154 | bottom: 4px;
155 | margin-left:6px;
156 | margin-right:6px;
157 | line-height:18px;
158 | font-size:14px;
159 | height:40px;
160 | overflow:hidden;
161 | }
162 |
163 | .playbutton {
164 | width:100px;
165 | position:absolute;
166 | top:100px;
167 | left:100px;
168 | }
169 |
170 | .buttons {
171 | opacity:.8;
172 | }
173 |
174 | #footer {
175 | margin-top:10px;
176 | margin-left:20px;
177 | margin-bottom:20px;
178 | }
179 |
180 |
181 |
182 | .album-label {
183 | overflow:hidden;
184 | height:18px;
185 | text-overflow:ellipsis;
186 | width:280px;
187 | white-space:nowrap;
188 | }
189 |
190 | .change {
191 | float:left;
192 | }
193 |
194 | .bypass {
195 | float:right;
196 | }
197 |
198 | #search {
199 | margin-bottom: 20px;
200 | }
201 |
202 | #search input {
203 | color:black;
204 | }
205 |
206 | .faq {
207 | margin-right:10px;
208 | }
209 |
210 | #frog-image {
211 | margin-right:15px;
212 | margin-top:5px;
213 | float:left;
214 | border-radius:15px;
215 |
216 | }
217 |
218 | #lz-graph {
219 | float:right;
220 | border-radius:15px;
221 | margin-left:10px;
222 | margin-bottom:20px;
223 | }
224 |
225 | #time-info {
226 | margin-right:20px;
227 | font-size:8px;
228 | }
229 |
230 | .empty-link {
231 | cursor:pointer;
232 | }
233 |
--------------------------------------------------------------------------------
/new-web/images/frog.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/plamere/BoilTheFrog/1f2cb60a1f08d9cec3a06675a633ca35f09c8708/new-web/images/frog.jpg
--------------------------------------------------------------------------------
/new-web/images/lz.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/plamere/BoilTheFrog/1f2cb60a1f08d9cec3a06675a633ca35f09c8708/new-web/images/lz.png
--------------------------------------------------------------------------------
/new-web/images/missing-1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/plamere/BoilTheFrog/1f2cb60a1f08d9cec3a06675a633ca35f09c8708/new-web/images/missing-1.jpg
--------------------------------------------------------------------------------
/new-web/images/missing.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/plamere/BoilTheFrog/1f2cb60a1f08d9cec3a06675a633ca35f09c8708/new-web/images/missing.png
--------------------------------------------------------------------------------
/new-web/images/pause.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/plamere/BoilTheFrog/1f2cb60a1f08d9cec3a06675a633ca35f09c8708/new-web/images/pause.png
--------------------------------------------------------------------------------
/new-web/images/play.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/plamere/BoilTheFrog/1f2cb60a1f08d9cec3a06675a633ca35f09c8708/new-web/images/play.png
--------------------------------------------------------------------------------
/new-web/lib/underscore-min.js:
--------------------------------------------------------------------------------
1 | (function(){var n=this,t=n._,r={},e=Array.prototype,u=Object.prototype,i=Function.prototype,a=e.push,o=e.slice,c=e.concat,l=u.toString,f=u.hasOwnProperty,s=e.forEach,p=e.map,v=e.reduce,h=e.reduceRight,g=e.filter,d=e.every,m=e.some,y=e.indexOf,b=e.lastIndexOf,x=Array.isArray,_=Object.keys,j=i.bind,w=function(n){return n instanceof w?n:this instanceof w?(this._wrapped=n,void 0):new w(n)};"undefined"!=typeof exports?("undefined"!=typeof module&&module.exports&&(exports=module.exports=w),exports._=w):n._=w,w.VERSION="1.4.3";var A=w.each=w.forEach=function(n,t,e){if(null!=n)if(s&&n.forEach===s)n.forEach(t,e);else if(n.length===+n.length){for(var u=0,i=n.length;i>u;u++)if(t.call(e,n[u],u,n)===r)return}else for(var a in n)if(w.has(n,a)&&t.call(e,n[a],a,n)===r)return};w.map=w.collect=function(n,t,r){var e=[];return null==n?e:p&&n.map===p?n.map(t,r):(A(n,function(n,u,i){e[e.length]=t.call(r,n,u,i)}),e)};var O="Reduce of empty array with no initial value";w.reduce=w.foldl=w.inject=function(n,t,r,e){var u=arguments.length>2;if(null==n&&(n=[]),v&&n.reduce===v)return e&&(t=w.bind(t,e)),u?n.reduce(t,r):n.reduce(t);if(A(n,function(n,i,a){u?r=t.call(e,r,n,i,a):(r=n,u=!0)}),!u)throw new TypeError(O);return r},w.reduceRight=w.foldr=function(n,t,r,e){var u=arguments.length>2;if(null==n&&(n=[]),h&&n.reduceRight===h)return e&&(t=w.bind(t,e)),u?n.reduceRight(t,r):n.reduceRight(t);var i=n.length;if(i!==+i){var a=w.keys(n);i=a.length}if(A(n,function(o,c,l){c=a?a[--i]:--i,u?r=t.call(e,r,n[c],c,l):(r=n[c],u=!0)}),!u)throw new TypeError(O);return r},w.find=w.detect=function(n,t,r){var e;return E(n,function(n,u,i){return t.call(r,n,u,i)?(e=n,!0):void 0}),e},w.filter=w.select=function(n,t,r){var e=[];return null==n?e:g&&n.filter===g?n.filter(t,r):(A(n,function(n,u,i){t.call(r,n,u,i)&&(e[e.length]=n)}),e)},w.reject=function(n,t,r){return w.filter(n,function(n,e,u){return!t.call(r,n,e,u)},r)},w.every=w.all=function(n,t,e){t||(t=w.identity);var u=!0;return null==n?u:d&&n.every===d?n.every(t,e):(A(n,function(n,i,a){return(u=u&&t.call(e,n,i,a))?void 0:r}),!!u)};var E=w.some=w.any=function(n,t,e){t||(t=w.identity);var u=!1;return null==n?u:m&&n.some===m?n.some(t,e):(A(n,function(n,i,a){return u||(u=t.call(e,n,i,a))?r:void 0}),!!u)};w.contains=w.include=function(n,t){return null==n?!1:y&&n.indexOf===y?-1!=n.indexOf(t):E(n,function(n){return n===t})},w.invoke=function(n,t){var r=o.call(arguments,2);return w.map(n,function(n){return(w.isFunction(t)?t:n[t]).apply(n,r)})},w.pluck=function(n,t){return w.map(n,function(n){return n[t]})},w.where=function(n,t){return w.isEmpty(t)?[]:w.filter(n,function(n){for(var r in t)if(t[r]!==n[r])return!1;return!0})},w.max=function(n,t,r){if(!t&&w.isArray(n)&&n[0]===+n[0]&&65535>n.length)return Math.max.apply(Math,n);if(!t&&w.isEmpty(n))return-1/0;var e={computed:-1/0,value:-1/0};return A(n,function(n,u,i){var a=t?t.call(r,n,u,i):n;a>=e.computed&&(e={value:n,computed:a})}),e.value},w.min=function(n,t,r){if(!t&&w.isArray(n)&&n[0]===+n[0]&&65535>n.length)return Math.min.apply(Math,n);if(!t&&w.isEmpty(n))return 1/0;var e={computed:1/0,value:1/0};return A(n,function(n,u,i){var a=t?t.call(r,n,u,i):n;e.computed>a&&(e={value:n,computed:a})}),e.value},w.shuffle=function(n){var t,r=0,e=[];return A(n,function(n){t=w.random(r++),e[r-1]=e[t],e[t]=n}),e};var F=function(n){return w.isFunction(n)?n:function(t){return t[n]}};w.sortBy=function(n,t,r){var e=F(t);return w.pluck(w.map(n,function(n,t,u){return{value:n,index:t,criteria:e.call(r,n,t,u)}}).sort(function(n,t){var r=n.criteria,e=t.criteria;if(r!==e){if(r>e||void 0===r)return 1;if(e>r||void 0===e)return-1}return n.indexi;){var o=i+a>>>1;u>r.call(e,n[o])?i=o+1:a=o}return i},w.toArray=function(n){return n?w.isArray(n)?o.call(n):n.length===+n.length?w.map(n,w.identity):w.values(n):[]},w.size=function(n){return null==n?0:n.length===+n.length?n.length:w.keys(n).length},w.first=w.head=w.take=function(n,t,r){return null==n?void 0:null==t||r?n[0]:o.call(n,0,t)},w.initial=function(n,t,r){return o.call(n,0,n.length-(null==t||r?1:t))},w.last=function(n,t,r){return null==n?void 0:null==t||r?n[n.length-1]:o.call(n,Math.max(n.length-t,0))},w.rest=w.tail=w.drop=function(n,t,r){return o.call(n,null==t||r?1:t)},w.compact=function(n){return w.filter(n,w.identity)};var R=function(n,t,r){return A(n,function(n){w.isArray(n)?t?a.apply(r,n):R(n,t,r):r.push(n)}),r};w.flatten=function(n,t){return R(n,t,[])},w.without=function(n){return w.difference(n,o.call(arguments,1))},w.uniq=w.unique=function(n,t,r,e){w.isFunction(t)&&(e=r,r=t,t=!1);var u=r?w.map(n,r,e):n,i=[],a=[];return A(u,function(r,e){(t?e&&a[a.length-1]===r:w.contains(a,r))||(a.push(r),i.push(n[e]))}),i},w.union=function(){return w.uniq(c.apply(e,arguments))},w.intersection=function(n){var t=o.call(arguments,1);return w.filter(w.uniq(n),function(n){return w.every(t,function(t){return w.indexOf(t,n)>=0})})},w.difference=function(n){var t=c.apply(e,o.call(arguments,1));return w.filter(n,function(n){return!w.contains(t,n)})},w.zip=function(){for(var n=o.call(arguments),t=w.max(w.pluck(n,"length")),r=Array(t),e=0;t>e;e++)r[e]=w.pluck(n,""+e);return r},w.object=function(n,t){if(null==n)return{};for(var r={},e=0,u=n.length;u>e;e++)t?r[n[e]]=t[e]:r[n[e][0]]=n[e][1];return r},w.indexOf=function(n,t,r){if(null==n)return-1;var e=0,u=n.length;if(r){if("number"!=typeof r)return e=w.sortedIndex(n,t),n[e]===t?e:-1;e=0>r?Math.max(0,u+r):r}if(y&&n.indexOf===y)return n.indexOf(t,r);for(;u>e;e++)if(n[e]===t)return e;return-1},w.lastIndexOf=function(n,t,r){if(null==n)return-1;var e=null!=r;if(b&&n.lastIndexOf===b)return e?n.lastIndexOf(t,r):n.lastIndexOf(t);for(var u=e?r:n.length;u--;)if(n[u]===t)return u;return-1},w.range=function(n,t,r){1>=arguments.length&&(t=n||0,n=0),r=arguments[2]||1;for(var e=Math.max(Math.ceil((t-n)/r),0),u=0,i=Array(e);e>u;)i[u++]=n,n+=r;return i};var I=function(){};w.bind=function(n,t){var r,e;if(n.bind===j&&j)return j.apply(n,o.call(arguments,1));if(!w.isFunction(n))throw new TypeError;return r=o.call(arguments,2),e=function(){if(!(this instanceof e))return n.apply(t,r.concat(o.call(arguments)));I.prototype=n.prototype;var u=new I;I.prototype=null;var i=n.apply(u,r.concat(o.call(arguments)));return Object(i)===i?i:u}},w.bindAll=function(n){var t=o.call(arguments,1);return 0==t.length&&(t=w.functions(n)),A(t,function(t){n[t]=w.bind(n[t],n)}),n},w.memoize=function(n,t){var r={};return t||(t=w.identity),function(){var e=t.apply(this,arguments);return w.has(r,e)?r[e]:r[e]=n.apply(this,arguments)}},w.delay=function(n,t){var r=o.call(arguments,2);return setTimeout(function(){return n.apply(null,r)},t)},w.defer=function(n){return w.delay.apply(w,[n,1].concat(o.call(arguments,1)))},w.throttle=function(n,t){var r,e,u,i,a=0,o=function(){a=new Date,u=null,i=n.apply(r,e)};return function(){var c=new Date,l=t-(c-a);return r=this,e=arguments,0>=l?(clearTimeout(u),u=null,a=c,i=n.apply(r,e)):u||(u=setTimeout(o,l)),i}},w.debounce=function(n,t,r){var e,u;return function(){var i=this,a=arguments,o=function(){e=null,r||(u=n.apply(i,a))},c=r&&!e;return clearTimeout(e),e=setTimeout(o,t),c&&(u=n.apply(i,a)),u}},w.once=function(n){var t,r=!1;return function(){return r?t:(r=!0,t=n.apply(this,arguments),n=null,t)}},w.wrap=function(n,t){return function(){var r=[n];return a.apply(r,arguments),t.apply(this,r)}},w.compose=function(){var n=arguments;return function(){for(var t=arguments,r=n.length-1;r>=0;r--)t=[n[r].apply(this,t)];return t[0]}},w.after=function(n,t){return 0>=n?t():function(){return 1>--n?t.apply(this,arguments):void 0}},w.keys=_||function(n){if(n!==Object(n))throw new TypeError("Invalid object");var t=[];for(var r in n)w.has(n,r)&&(t[t.length]=r);return t},w.values=function(n){var t=[];for(var r in n)w.has(n,r)&&t.push(n[r]);return t},w.pairs=function(n){var t=[];for(var r in n)w.has(n,r)&&t.push([r,n[r]]);return t},w.invert=function(n){var t={};for(var r in n)w.has(n,r)&&(t[n[r]]=r);return t},w.functions=w.methods=function(n){var t=[];for(var r in n)w.isFunction(n[r])&&t.push(r);return t.sort()},w.extend=function(n){return A(o.call(arguments,1),function(t){if(t)for(var r in t)n[r]=t[r]}),n},w.pick=function(n){var t={},r=c.apply(e,o.call(arguments,1));return A(r,function(r){r in n&&(t[r]=n[r])}),t},w.omit=function(n){var t={},r=c.apply(e,o.call(arguments,1));for(var u in n)w.contains(r,u)||(t[u]=n[u]);return t},w.defaults=function(n){return A(o.call(arguments,1),function(t){if(t)for(var r in t)null==n[r]&&(n[r]=t[r])}),n},w.clone=function(n){return w.isObject(n)?w.isArray(n)?n.slice():w.extend({},n):n},w.tap=function(n,t){return t(n),n};var S=function(n,t,r,e){if(n===t)return 0!==n||1/n==1/t;if(null==n||null==t)return n===t;n instanceof w&&(n=n._wrapped),t instanceof w&&(t=t._wrapped);var u=l.call(n);if(u!=l.call(t))return!1;switch(u){case"[object String]":return n==t+"";case"[object Number]":return n!=+n?t!=+t:0==n?1/n==1/t:n==+t;case"[object Date]":case"[object Boolean]":return+n==+t;case"[object RegExp]":return n.source==t.source&&n.global==t.global&&n.multiline==t.multiline&&n.ignoreCase==t.ignoreCase}if("object"!=typeof n||"object"!=typeof t)return!1;for(var i=r.length;i--;)if(r[i]==n)return e[i]==t;r.push(n),e.push(t);var a=0,o=!0;if("[object Array]"==u){if(a=n.length,o=a==t.length)for(;a--&&(o=S(n[a],t[a],r,e)););}else{var c=n.constructor,f=t.constructor;if(c!==f&&!(w.isFunction(c)&&c instanceof c&&w.isFunction(f)&&f instanceof f))return!1;for(var s in n)if(w.has(n,s)&&(a++,!(o=w.has(t,s)&&S(n[s],t[s],r,e))))break;if(o){for(s in t)if(w.has(t,s)&&!a--)break;o=!a}}return r.pop(),e.pop(),o};w.isEqual=function(n,t){return S(n,t,[],[])},w.isEmpty=function(n){if(null==n)return!0;if(w.isArray(n)||w.isString(n))return 0===n.length;for(var t in n)if(w.has(n,t))return!1;return!0},w.isElement=function(n){return!(!n||1!==n.nodeType)},w.isArray=x||function(n){return"[object Array]"==l.call(n)},w.isObject=function(n){return n===Object(n)},A(["Arguments","Function","String","Number","Date","RegExp"],function(n){w["is"+n]=function(t){return l.call(t)=="[object "+n+"]"}}),w.isArguments(arguments)||(w.isArguments=function(n){return!(!n||!w.has(n,"callee"))}),w.isFunction=function(n){return"function"==typeof n},w.isFinite=function(n){return isFinite(n)&&!isNaN(parseFloat(n))},w.isNaN=function(n){return w.isNumber(n)&&n!=+n},w.isBoolean=function(n){return n===!0||n===!1||"[object Boolean]"==l.call(n)},w.isNull=function(n){return null===n},w.isUndefined=function(n){return void 0===n},w.has=function(n,t){return f.call(n,t)},w.noConflict=function(){return n._=t,this},w.identity=function(n){return n},w.times=function(n,t,r){for(var e=Array(n),u=0;n>u;u++)e[u]=t.call(r,u);return e},w.random=function(n,t){return null==t&&(t=n,n=0),n+(0|Math.random()*(t-n+1))};var T={escape:{"&":"&","<":"<",">":">",'"':""","'":"'","/":"/"}};T.unescape=w.invert(T.escape);var M={escape:RegExp("["+w.keys(T.escape).join("")+"]","g"),unescape:RegExp("("+w.keys(T.unescape).join("|")+")","g")};w.each(["escape","unescape"],function(n){w[n]=function(t){return null==t?"":(""+t).replace(M[n],function(t){return T[n][t]})}}),w.result=function(n,t){if(null==n)return null;var r=n[t];return w.isFunction(r)?r.call(n):r},w.mixin=function(n){A(w.functions(n),function(t){var r=w[t]=n[t];w.prototype[t]=function(){var n=[this._wrapped];return a.apply(n,arguments),z.call(this,r.apply(w,n))}})};var N=0;w.uniqueId=function(n){var t=""+ ++N;return n?n+t:t},w.templateSettings={evaluate:/<%([\s\S]+?)%>/g,interpolate:/<%=([\s\S]+?)%>/g,escape:/<%-([\s\S]+?)%>/g};var q=/(.)^/,B={"'":"'","\\":"\\","\r":"r","\n":"n"," ":"t","\u2028":"u2028","\u2029":"u2029"},D=/\\|'|\r|\n|\t|\u2028|\u2029/g;w.template=function(n,t,r){r=w.defaults({},r,w.templateSettings);var e=RegExp([(r.escape||q).source,(r.interpolate||q).source,(r.evaluate||q).source].join("|")+"|$","g"),u=0,i="__p+='";n.replace(e,function(t,r,e,a,o){return i+=n.slice(u,o).replace(D,function(n){return"\\"+B[n]}),r&&(i+="'+\n((__t=("+r+"))==null?'':_.escape(__t))+\n'"),e&&(i+="'+\n((__t=("+e+"))==null?'':__t)+\n'"),a&&(i+="';\n"+a+"\n__p+='"),u=o+t.length,t}),i+="';\n",r.variable||(i="with(obj||{}){\n"+i+"}\n"),i="var __t,__p='',__j=Array.prototype.join,print=function(){__p+=__j.call(arguments,'');};\n"+i+"return __p;\n";try{var a=Function(r.variable||"obj","_",i)}catch(o){throw o.source=i,o}if(t)return a(t,w);var c=function(n){return a.call(this,n,w)};return c.source="function("+(r.variable||"obj")+"){\n"+i+"}",c},w.chain=function(n){return w(n).chain()};var z=function(n){return this._chain?w(n).chain():n};w.mixin(w),A(["pop","push","reverse","shift","sort","splice","unshift"],function(n){var t=e[n];w.prototype[n]=function(){var r=this._wrapped;return t.apply(r,arguments),"shift"!=n&&"splice"!=n||0!==r.length||delete r[0],z.call(this,r)}}),A(["concat","join","slice"],function(n){var t=e[n];w.prototype[n]=function(){return z.call(this,t.apply(this._wrapped,arguments))}}),w.extend(w.prototype,{chain:function(){return this._chain=!0,this},value:function(){return this._wrapped}})}).call(this);
--------------------------------------------------------------------------------
/new-web/ss.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/plamere/BoilTheFrog/1f2cb60a1f08d9cec3a06675a633ca35f09c8708/new-web/ss.png
--------------------------------------------------------------------------------
/new-web/styles.css:
--------------------------------------------------------------------------------
1 | .hero-unit {
2 | margin-top:6px;
3 | padding-bottom:30px;
4 | color: #fff;
5 | background-color: #1ED760;
6 | text-align:center;
7 | }
8 |
9 | .hero-unit h1 {
10 | font-weight:lighter;
11 | font-size:64px;
12 | }
13 |
14 | .btn-narrow :hover{
15 | background:green;
16 | width:12px !important;
17 | }
18 |
19 | .artist-popularity {
20 | font-size:7px;
21 | }
22 |
23 | #gallery h1 {
24 | color: #1ED760;
25 | }
26 |
27 | #about h1 {
28 | color: #1ED760;
29 | }
30 |
31 | #about h2 {
32 | color: #1ED760;
33 | }
34 |
35 |
36 | .reg-unit {
37 | color:white;
38 | margin-top:6px;
39 | /*background-color: #1ED760;*/
40 | /*background-color: #6F7073;*/
41 | background-color: #444;
42 | }
43 |
44 | .navbar-nav li a {
45 | cursor:pointer;
46 | }
47 |
48 | .artist:hover {
49 | cursor:pointer;
50 | }
51 |
52 | #options li a:hover {
53 | background-color: #727272;
54 | }
55 |
56 | .option-active {
57 | /*background-color: #6f7073;*/
58 | color: #1ED760 !important;
59 | }
60 |
61 | /*
62 | .reg-unit a {
63 | color:red;
64 | }
65 |
66 | .reg-unit a:hover {
67 | color:red;
68 | }
69 |
70 | .reg-unit a:visited {
71 | color:orange;
72 | }
73 | */
74 |
75 |
76 |
77 | #gallery {
78 | display:none;
79 | }
80 |
81 | #time-info {
82 | display:none;
83 | }
84 |
85 | .gallery-list {
86 | font-size:24px;
87 | }
88 |
89 | #about {
90 | display:none;
91 | font-weight:lighter !important;
92 | }
93 |
94 |
95 |
96 | #main {
97 | margin-top:20px;
98 | margin-left:8px;
99 | }
100 |
101 | .adiv {
102 | width:300px;
103 | height:350px;
104 | background-size:100%;
105 | background-repeat:no-repeat;
106 | overflow:hidden;
107 | position:relative;
108 | /*background-color: #122;*/
109 | margin-bottom:2px;
110 | /*padding:4px; */
111 | background-color:#ddd;
112 | }
113 |
114 | #go {
115 | }
116 |
117 | #xbuttons {
118 | width:100%;
119 | margin-left:auto;
120 | margin-right:auto;
121 | text-align:center;
122 | display:none;
123 | }
124 |
125 | #tweet-span {
126 | position:relative;
127 | top:10px;
128 | }
129 |
130 | .tadiv {
131 | display:inline-block;
132 | width:310px;
133 | height:364px;
134 | position:relative;
135 | /*background-color: #122;*/
136 | margin:3px;
137 | }
138 |
139 |
140 | .is-current {
141 | background-color:#e36b23;
142 | font-size:18px;
143 | }
144 |
145 | #list {
146 | margin-left:4px;
147 | text-align:center;
148 | }
149 |
150 | .adiv:hover {
151 | background-color: #1ED760;
152 | }
153 |
154 | #info {
155 | margin-left:10px;
156 | width:100%;
157 | text-align:center;
158 | margin-bottom:10px;
159 | height:32px;
160 | font-size:18px;
161 | }
162 |
163 |
164 | .track-info {
165 | position:absolute;
166 | bottom: 4px;
167 | margin-left:6px;
168 | margin-right:6px;
169 | line-height:18px;
170 | font-size:14px;
171 | height:40px;
172 | overflow:hidden;
173 | }
174 |
175 | .playbutton {
176 | width:100px;
177 | position:absolute;
178 | top:100px;
179 | left:100px;
180 | }
181 |
182 | .buttons {
183 | opacity:.8;
184 | }
185 |
186 | #footer {
187 | margin-top:10px;
188 | margin-left:20px;
189 | margin-bottom:20px;
190 | }
191 |
192 |
193 |
194 | .album-label {
195 | overflow:hidden;
196 | height:18px;
197 | text-overflow:ellipsis;
198 | width:280px;
199 | white-space:nowrap;
200 | }
201 |
202 | .change {
203 | float:left;
204 | }
205 |
206 | .bypass {
207 | float:right;
208 | }
209 |
210 | #search {
211 | margin-bottom: 20px;
212 | }
213 |
214 | #search input {
215 | color:black;
216 | }
217 |
218 | .faq {
219 | margin-right:10px;
220 | }
221 |
222 | #frog-image {
223 | margin-right:15px;
224 | margin-top:5px;
225 | float:left;
226 | border-radius:15px;
227 |
228 | }
229 |
230 | #lz-graph {
231 | float:right;
232 | border-radius:15px;
233 | margin-left:10px;
234 | margin-bottom:20px;
235 | }
236 |
237 | #time-info {
238 | margin-right:20px;
239 | font-size:8px;
240 | }
241 |
242 | .empty-link {
243 | cursor:pointer;
244 | }
245 |
246 | #the-plot : {
247 | width:1000px;
248 | height:600px;
249 | }
250 |
--------------------------------------------------------------------------------
/new_crawler/artist_graph.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import networkx as nx
4 | import json
5 | import rocksdb
6 | import collections
7 | import search
8 |
9 |
10 | class ArtistGraph:
11 | def __init__(self, db_path='rocks.db'):
12 | self.db = rocksdb.DB(db_path, rocksdb.Options(), read_only=True)
13 | self.trace = True
14 | self.searcher = search.Searcher(exact=True)
15 |
16 | #important configs
17 | self.skip_artists_with_no_tracks = True
18 | self.simple_edges = False
19 | self.max_edges_per_artist = 4
20 | self.pop_weight = 100.00
21 | self.min_popularity = 30
22 |
23 |
24 | self.artist_blacklist = set()
25 | self.edge_blacklist = collections.defaultdict(set)
26 | self.load_blacklist()
27 |
28 | self.load_graph()
29 |
30 |
31 | def load_blacklist(self):
32 | f = open("blacklist.csv")
33 | for lineno, line in enumerate(f):
34 | line = line.strip()
35 | if len(line) == 0:
36 | continue
37 | if line[0] == '#':
38 | continue
39 | fields = [f.strip() for f in line.split(',')]
40 | if len(fields) > 1 and fields[0] == 'artist':
41 | aid = to_aid(fields[1])
42 | self.artist_blacklist.add(aid)
43 | elif fields[0] == 'edge' and len(fields) > 2:
44 | aid1 = to_aid(fields[1])
45 | aid2 = to_aid(fields[2])
46 | self.edge_blacklist[aid1].add(aid2)
47 | self.edge_blacklist[aid2].add(aid1)
48 | else:
49 | print "unknown blacklist type", fields[0], "at line", lineno
50 |
51 | def load_graph(self):
52 | self.G = nx.Graph()
53 | popularity = collections.defaultdict(int)
54 | skips = set()
55 |
56 | it = self.db.itervalues()
57 | it.seek_to_first()
58 | missing = []
59 |
60 | print "loading popularity"
61 | for i, tartist_js in enumerate(it):
62 | artist = json.loads(tartist_js)
63 | popularity[artist['id']] = artist['popularity']
64 | if self.skip_artists_with_no_tracks:
65 | if 'tracks' not in artist or len(artist['tracks']) == 0:
66 | skips.add(artist['id'])
67 | print len(popularity), "artists", "skipping", len(skips)
68 |
69 | print "bulding graph"
70 | nnodes = 0
71 | it.seek_to_first()
72 | for i, tartist_js in enumerate(it):
73 | artist = json.loads(tartist_js)
74 | node = artist['id']
75 | if node in skips:
76 | continue
77 |
78 | if node in self.artist_blacklist:
79 | print 'skipped artist', node
80 | continue
81 |
82 |
83 | pop = popularity[node]
84 | if pop < self.min_popularity:
85 | continue
86 |
87 | nnodes += 1
88 | self.index(artist['name'],node)
89 | if 'edges' in artist:
90 | edges = [edge for edge in artist['edges'] if self.is_good_edge(node, edge, skips, popularity)]
91 |
92 | if self.simple_edges:
93 | # just weighed by order
94 | edges = edges[:self.max_edges_per_artist]
95 | nedges = float(len(edges))
96 | weighted_edges = [(1.0 + nedge / nedges, edge) for nedge, edge in enumerate(edges)]
97 | else:
98 | weighted_edges = [ (1 + self.pop_weight * abs(pop - popularity[edge]) / 100.0, edge) for edge in edges]
99 | weighted_edges.sort()
100 | weighted_edges = weighted_edges[:self.max_edges_per_artist]
101 |
102 | for weight, target in weighted_edges:
103 | self.add_edge(node, target, weight)
104 |
105 |
106 | if self.trace and nnodes % 1000 == 0:
107 | print "loading %d artists" % (nnodes, )
108 |
109 | print "nodes", self.G.number_of_nodes()
110 | print "edges", self.G.number_of_edges()
111 | components = list(nx.connected_components(self.G))
112 | print "connnected components", len(components)
113 | clens = [len(c) for c in components]
114 | clens.sort(reverse=True)
115 | for cl in clens:
116 | print cl,
117 |
118 | def is_good_edge(self, src, dest, skips, popularity):
119 | if dest in skips:
120 | return False
121 |
122 | if dest in self.artist_blacklist:
123 | return False
124 |
125 | if dest in self.edge_blacklist[src]:
126 | return False
127 |
128 | if popularity[dest] < self.min_popularity:
129 | return False
130 |
131 | return True
132 |
133 |
134 | def index(self, name, aid):
135 | self.searcher.add(name, aid)
136 |
137 | def search(self, name):
138 | if name == None or len(name) == 0:
139 | return None
140 | if name.startswith('spotify:artist:'):
141 | fields = name.split(':')
142 | if len(fields) == 3:
143 | return fields[2]
144 |
145 | matches = self.searcher.search(name)
146 | for match in matches:
147 | if match in self.G:
148 | return match
149 | return None
150 |
151 | def get_artist(self, aid):
152 | tjs = self.db.get(aid)
153 | if tjs:
154 | tartist = json.loads(tjs)
155 | if not 'edges' in tartist:
156 | tartist['edges'] = []
157 | if not 'incoming_edges' in tartist:
158 | tartist['incoming_edges'] = []
159 | else:
160 | tartist = None
161 | return tartist
162 |
163 | def get_skipset(self):
164 | skips = set()
165 | it = self.db.itervalues()
166 | it.seek_to_first()
167 | missing = []
168 | for i, tartist_js in enumerate(it):
169 | artist = json.loads(tartist_js)
170 | if 'tracks' not in artist or len(artist['tracks']) == 0:
171 | skips.add(artist['id'])
172 | if self.trace:
173 | print "found %d artists with no tracks" % (len(skips),)
174 | return skips
175 |
176 | def path(self, source_name, target_name, skipset=set()):
177 | def get_weight(src, dest, attrs):
178 | if src in skipset or dest in skipset:
179 | # print "gw", srx, dest, attrs, 10000
180 | return 10000
181 | # print "gw", src, dest, attrs, 1
182 | return attrs['weight']
183 |
184 | results = {
185 | 'status': 'ok'
186 | }
187 |
188 | if len(source_name) == 0:
189 | results['status'] = 'error'
190 | results['reason'] = "No artist given"
191 | else:
192 | source_aid = self.search(source_name)
193 | if source_aid == None:
194 | results['status'] = 'error'
195 | results['reason'] = "Can't find " + source_name
196 |
197 | target_aid = self.search(target_name)
198 | if target_aid == None:
199 | results['status'] = 'error'
200 | results['reason'] = "Can't find " + target_name
201 |
202 | print "s=t", source_aid, target_aid
203 | if source_aid not in self.G:
204 | results['status'] = 'error'
205 | results['reason'] = "Can't find " + source_name + " in the artist graph"
206 |
207 | if target_aid not in self.G:
208 | results['status'] = 'error'
209 | results['reason'] = "Can't find " + target_name + " in the artist graph"
210 |
211 | if source_aid and target_aid and results['status'] == 'ok':
212 | start = time.time()
213 | if len(skipset) > 0:
214 | rpath = nx.dijkstra_path(self.G, source_aid, target_aid, get_weight)
215 | score = len(rpath)
216 | else:
217 | score, rpath = nx.bidirectional_dijkstra(self.G, source_aid, target_aid)
218 | pdelta = time.time() - start
219 | results['score'] = score
220 | populated_path = [self.get_artist(aid) for aid in rpath]
221 | fdelta = time.time() - start
222 |
223 | results['status'] = 'ok'
224 | results['raw_path'] = rpath
225 | results['path'] = populated_path
226 | results['pdelta'] = pdelta * 1000
227 | results['fdelta'] = fdelta * 1000
228 | return results
229 |
230 | def add_node(self, node):
231 | if node not in self.G:
232 | self.G.add_node(node)
233 |
234 | def add_edge(self, source, target, weight):
235 | self.add_node(source)
236 | self.add_node(target)
237 | self.G.add_edges_from([(source, target, {"weight": weight})])
238 |
239 |
240 | def normalize_name(self, name):
241 | name = name.lower().strip()
242 | return name
243 |
244 | def an(self, aid):
245 | #return self.get_artist(aid)['name']
246 | artist = self.get_artist(aid)
247 | return "%s(%d)" % (artist['name'], artist['popularity'])
248 |
249 | def edge_check(self, uri):
250 | aid = to_aid(uri)
251 | artist = self.get_artist(aid)
252 |
253 | print "edge check", self.an(aid)
254 |
255 | combined = set()
256 | combined.union(artist['edges'])
257 | combined.union(artist['incoming_edges'])
258 |
259 | print "combined:"
260 | for aid in combined:
261 | print " ", self.an(aid)
262 | print
263 |
264 | print "outgoing:"
265 | for aid in artist['edges']:
266 | if aid not in combined:
267 | print " ", self.an(aid)
268 | print
269 | print "incoming:"
270 | for aid in artist['incoming_edges']:
271 | if aid not in combined:
272 | print " ", self.an(aid)
273 | print
274 |
275 | def sim_check(self, uri):
276 | aid = to_aid(uri)
277 | artist = self.get_artist(aid)
278 |
279 | if not 'edges' in artist:
280 | print "leaf node, nothing to do"
281 | return
282 |
283 | sim_counts = collections.Counter()
284 | osim_counts = collections.Counter()
285 |
286 | simset = set(artist['edges'])
287 | print "sim_check", self.an(aid)
288 | print
289 | print "normal sims"
290 | for i, edge in enumerate(artist['edges']):
291 | print " %d %s %s" % (i, edge, self.an(edge))
292 | sim_artist = self.get_artist(edge)
293 | if 'edges' in sim_artist:
294 | for sedge in sim_artist['edges']:
295 | if sedge in simset:
296 | sim_counts[sedge] += 1
297 | osim_counts[sedge] += 1
298 | print
299 |
300 | print "ranked sims"
301 | print artist['name']
302 | for edge, count in sim_counts.most_common():
303 | print " %d %s %s"% (count, edge, self.an(edge))
304 |
305 | print
306 | print "sim neighborhooed"
307 | for edge, count in osim_counts.most_common():
308 | if count > 1:
309 | print "%d %s %s"% (count, edge, self.an(edge))
310 |
311 |
312 | def to_aid(uri_or_aid):
313 | if uri_or_aid:
314 | fields = uri_or_aid.split(':')
315 | if len(fields) == 3:
316 | return fields[2]
317 | return uri_or_aid
318 |
319 | if __name__ == '__main__':
320 | args = sys.argv[1:]
321 | uris = []
322 |
323 | ag = ArtistGraph()
324 |
325 | while args:
326 | arg = args.pop(0)
327 | if arg == '--path':
328 | pass
329 |
330 |
331 |
--------------------------------------------------------------------------------
/new_crawler/blacklist.csv:
--------------------------------------------------------------------------------
1 | # the black list. you can block artists like so:
2 |
3 | artist, spotify:artist:61zv3hX7l838ZyhaDyAx8S, gary glitter
4 | artist, spotify:artist:1faxe25Wp3Nk43xVVxsdSB, billy davis, too many bad sims due to ambiguous artist
5 |
6 | edge, spotify:artist:1uKR3ihZmv8a93heLPYKQ8,spotify:artist:4NgfOZCL9Ml67xzM0xzIvC, janice to janice joplin
7 |
--------------------------------------------------------------------------------
/new_crawler/build_db.py:
--------------------------------------------------------------------------------
1 | import db
2 | import json
3 |
4 | def process_file(in_path):
5 | f = open(in_path)
6 |
7 | for line in f:
8 | line.strip()
9 |
10 |
11 | if __name__ == '__main__':
12 |
13 | in_path = sys.argv[1]
14 | db_path = sys.argv[1]
15 |
--------------------------------------------------------------------------------
/new_crawler/db.py:
--------------------------------------------------------------------------------
1 | import json
2 | import sys
3 | import os
4 |
5 | artists = {}
6 | edges = {}
7 |
8 | artist_by_name = {}
9 |
10 | def get_artist(uri):
11 | if uri in artists:
12 | return artists[uri]
13 | else:
14 | return None
15 |
16 | def get_artists(uris):
17 | return [artists[uri] for uri in uris if uri in artists]
18 |
19 | def get_artist_name(uri):
20 | artist = get_artist(uri)
21 | if artist:
22 | return artist['name']
23 | else:
24 | return None
25 |
26 | def get_artists_with_edges(uris):
27 | ret_artists = []
28 | for uri in uris:
29 | artist = get_artist(uri)
30 | if artist:
31 | ret_artists.append(artist)
32 | edges = get_edges(uri)
33 | if edges:
34 | artist['edges'] = edges
35 | return ret_artists
36 |
37 | def get_edges(uri):
38 | if uri in edges:
39 | return edges[uri]
40 | else:
41 | return None
42 |
43 | def get_all_edges():
44 | return edges
45 |
46 | def get_all_artists():
47 | return artists
48 |
49 |
50 | def load_db(prefix="g1"):
51 | if len(artists) == 0:
52 | f = open(prefix + "/nodes.js")
53 | for line in f:
54 | try:
55 | artist = json.loads(line.strip())
56 | artists[artist['uri']] = artist
57 | nname = normalize_name(artist['name'])
58 | artist_by_name[nname] = artist
59 | except:
60 | print "skipped bad line in db", line
61 | print "loaded", len(artists), "artists"
62 |
63 | if len(edges) == 0:
64 | f = open(prefix + "/edges.js")
65 | for line in f:
66 | try:
67 | edge = json.loads(line.strip())
68 | for uri, targets in edge.items():
69 | edges[uri] = targets
70 | except:
71 | print "skipped bad edge in db", line
72 | print "loaded", len(edges), "edges"
73 |
74 |
75 | def normalize_name(n):
76 | return ''.join(e.lower() for e in n if e.isalnum())
77 |
78 | def get_artist_by_name(name):
79 | nname = normalize_name(name)
80 | print "nname", nname
81 | if nname in artist_by_name:
82 | return artist_by_name[nname]
83 | else:
84 | return None
85 |
86 |
87 | if __name__ == '__main__':
88 | load_db()
89 |
90 | for uri in sys.argv[1:]:
91 | print uri
92 | print json.dumps(get_artist(uri), indent=4)
93 | print json.dumps(get_edges(uri), indent=4)
94 | print
95 |
96 |
--------------------------------------------------------------------------------
/new_crawler/flask_server.py:
--------------------------------------------------------------------------------
1 | """ the http server for SFC
2 | """
3 | import sys
4 | import logging
5 | import atexit
6 | import time
7 | import collections
8 |
9 | from flask import Flask, request, jsonify
10 | from flask_cors import cross_origin
11 | from werkzeug.contrib.fixers import ProxyFix
12 | import artist_graph
13 |
14 | APP = Flask(__name__)
15 | APP.debug = False
16 | APP.trace = False
17 | APP.testing = False
18 | APP.ag = artist_graph.ArtistGraph()
19 |
20 | @APP.route('/frog/path')
21 | @cross_origin()
22 | def api_path():
23 | start = time.time()
24 | src = request.args.get("src", None)
25 | dest = request.args.get("dest", None)
26 | skips = request.args.get("skips", None)
27 |
28 | if skips and len(skips) > 0:
29 | skipset = set(skips.split(','))
30 | else:
31 | skipset = set()
32 |
33 | if src and dest:
34 | results = APP.ag.path(src, dest, skipset)
35 | if results['status'] == 'ok' and results['path']:
36 | src_name = results['path'][0]['name']
37 | src_id = results['path'][0]['id']
38 | dest_name = results['path'][-1]['name']
39 | dest_id = results['path'][-1]['id']
40 | text = "From " + src_name + " to " + dest_name
41 | add_to_history(src_id, dest_id, text, skips)
42 | else:
43 | results = {
44 | "status": "error",
45 | "reason": "missing src and/or dest",
46 | }
47 | return jsonify(results)
48 |
49 |
50 | history = []
51 | max_history = 100
52 | popular = collections.Counter()
53 | popular_text = {}
54 |
55 | def add_to_history(src, dest, text, skips):
56 | global history
57 | if not found_in_history(src, dest):
58 | history.append( (src, dest, skips, text, time.time()) )
59 | history = history[:max_history]
60 |
61 | key = src + ":::" + dest
62 | popular[key] += 1
63 | popular_text[key] = text
64 |
65 |
66 | def found_in_history(src, dest):
67 | for hsrc, hdest, skips, text, ts in history:
68 | if src == hsrc and dest == hdest:
69 | return True
70 | return False
71 |
72 |
73 | @APP.route('/frog/history')
74 | @cross_origin()
75 | def api_get_history():
76 | out = []
77 | for hist in reversed(history):
78 | src, dest, skips, text, ts = hist
79 | h = {
80 | "src": src,
81 | "dest": dest,
82 | "skips": skips,
83 | "text": text,
84 | "ts": ts,
85 | }
86 | out.append(h)
87 | results = {
88 | "status": 'ok',
89 | "history": out
90 | }
91 | return jsonify(results)
92 |
93 | @APP.route('/frog/popular')
94 | @cross_origin()
95 | def api_get_popular():
96 | pop = []
97 | for key, count in popular.most_common(100):
98 | src, dest = key.split(':::')
99 | text = popular_text[key]
100 | h = {
101 | "src": src,
102 | "dest": dest,
103 | "text": text,
104 | "count": count
105 | }
106 | pop.append(h)
107 | results = {
108 | "status": 'ok',
109 | "popular": pop
110 | }
111 | return jsonify(results)
112 |
113 |
114 | @APP.route('/frog/get')
115 | @cross_origin()
116 | def api_get():
117 | """ get artist info for the given aids/uris
118 | """
119 | start = time.time()
120 | aids = request.args.get("aids", None)
121 |
122 | if aids and len(aids) > 0:
123 | artist_ids = aids.split(',')
124 | out = []
125 | for artist_id in artist_ids:
126 | artist = APP.ag.get_artist(artist_id)
127 | out.append(artist)
128 |
129 | results = {
130 | "status": "ok",
131 | "artists": out
132 | }
133 | else:
134 | results = {
135 | "status": "error",
136 | "reason": "no input artist ids given"
137 | }
138 |
139 | fdelta = time.time() - start
140 | results['fdelta'] = fdelta
141 | return jsonify(results)
142 |
143 | @APP.route('/frog/sims')
144 | @cross_origin()
145 | def api_sims():
146 | """ get sim artists for the given aid
147 | """
148 | start = time.time()
149 | name = request.args.get("artist", None)
150 |
151 | aid = APP.ag.search(name)
152 |
153 | out = []
154 | if aid:
155 | artist = APP.ag.get_artist(aid)
156 | for edge in artist["edges"]:
157 | out.append(APP.ag.get_artist(edge))
158 |
159 | results = {
160 | "status": "ok",
161 | "sims": out,
162 | "seed": artist
163 | }
164 | else:
165 | results = {
166 | "status": "error",
167 | "reason": "can't find artist " + name
168 | }
169 |
170 | fdelta = time.time() - start
171 | results['fdelta'] = fdelta
172 | return jsonify(results)
173 |
174 |
175 | #@APP.errorhandler(Exception)
176 | def handle_invalid_usage(error):
177 | """ implements the standard error processing
178 | """
179 | print "error", error
180 | results = {'status': 'internal_error', "message": str(error)}
181 | return jsonify(results)
182 |
183 |
184 | APP.wsgi_app = ProxyFix(APP.wsgi_app)
185 |
186 |
187 |
188 | def shutdown():
189 | """ performs any server shutdown cleanup
190 | """
191 | print 'shutting down server ...'
192 | print 'done'
193 |
194 |
195 | if __name__ == '__main__':
196 | APP.debug = False
197 | APP.trace = False
198 | APP.wsgi = False
199 | HOST = '0.0.0.0'
200 | PORT = 3457 # debugging
201 | PORT = 4682
202 |
203 | logging.basicConfig(
204 | stream=sys.stdout,
205 | level=logging.INFO,
206 | format='%(asctime)s %(levelname)s %(message)s')
207 |
208 | atexit.register(shutdown)
209 |
210 | for arg in sys.argv[1:]:
211 | if arg == '--debug':
212 | APP.debug = True
213 | if arg == '--trace':
214 | APP.trace = True
215 | if APP.debug:
216 | print 'debug mode', 'host', HOST, 'port', PORT
217 | APP.run(threaded=False, debug=True, host=HOST, port=PORT)
218 | elif APP.wsgi:
219 | from gevent.wsgi import WSGIServer
220 | print 'WSGI production mode', 'port', PORT
221 | print 'WSGI production mode - ready'
222 | HTTP_SERVER = WSGIServer(('', PORT), APP)
223 | HTTP_SERVER.serve_forever()
224 | else:
225 | print 'production mode', 'port/host', PORT, HOST
226 | print 'production mode - ready'
227 | APP.run(threaded=True, debug=False, host=HOST, port=PORT)
228 |
--------------------------------------------------------------------------------
/new_crawler/foo.js:
--------------------------------------------------------------------------------
1 | {
2 | "external_urls": {
3 | "spotify": "https://open.spotify.com/artist/3hE8S8ohRErocpkY7uJW4a"
4 | },
5 | "followers": {
6 | "href": null,
7 | "total": 393621
8 | },
9 | "genres": [
10 | "gothic metal",
11 | "gothic symphonic metal",
12 | "power metal",
13 | "symphonic metal"
14 | ],
15 | "href": "https://api.spotify.com/v1/artists/3hE8S8ohRErocpkY7uJW4a",
16 | "id": "3hE8S8ohRErocpkY7uJW4a",
17 | "images": [
18 | {
19 | "height": 640,
20 | "url": "https://i.scdn.co/image/ffaf90c9047adffbccc3af6f6f783ec608ced282",
21 | "width": 640
22 | },
23 | {
24 | "height": 320,
25 | "url": "https://i.scdn.co/image/7312779b10c0a90d1ef61f29addb0b1f9b17c3b3",
26 | "width": 320
27 | },
28 | {
29 | "height": 160,
30 | "url": "https://i.scdn.co/image/452dda43bb548452614feb72a87ad93fb6515f7a",
31 | "width": 160
32 | }
33 | ],
34 | "name": "Within Temptation",
35 | "popularity": 64,
36 | "type": "artist",
37 | "uri": "spotify:artist:3hE8S8ohRErocpkY7uJW4a"
38 | }
39 |
--------------------------------------------------------------------------------
/new_crawler/rdb.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import rocksdb
3 | import time
4 | import collections
5 | import json
6 | import random
7 | import spotipy
8 | import spotipy_util as util
9 | from spotipy.oauth2 import SpotifyClientCredentials
10 |
11 | js_nodes = 'g2/nodes.js'
12 | js_edges = 'g2/edges.js'
13 | js_nodes = 'g4/nodes.js'
14 | js_edges = 'g4/edges.js'
15 | db_path = 'rocks2.db'
16 | read_only = False
17 | total_runs = 1
18 |
19 |
20 | server_side_credentials = False
21 |
22 | if server_side_credentials:
23 | client_credentials_manager = SpotifyClientCredentials()
24 | spotify = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
25 | else:
26 | scope = ''
27 | username = 'plamere'
28 | token = util.prompt_for_user_token(username, scope, use_web_browser=False)
29 | if token:
30 | spotify = spotipy.Spotify(auth=token)
31 | else:
32 | print "can't get token"
33 |
34 |
35 | def to_tiny_artist(artist):
36 | image = None
37 | if len(artist['images']) > 0:
38 | image = artist['images'][0]['url']
39 | ta = {
40 | "id": artist['id'],
41 | "name": artist['name'],
42 | "followers": artist['followers']['total'],
43 | "popularity": artist['popularity'],
44 | "image": image,
45 | "genres": artist['genres']
46 | }
47 | return ta
48 |
49 | def to_tiny_track(track):
50 | if len(track['album']['images']) > 0:
51 | image = track['album']['images'][0]['url']
52 | else:
53 | image = None
54 |
55 | tt = {
56 | "id": track['id'],
57 | "name": track['name'],
58 | "audio": track['preview_url'],
59 | "image": image
60 | }
61 | return tt
62 |
63 |
64 | def build():
65 | db = rocksdb.DB(db_path, rocksdb.Options(create_if_missing=True))
66 | edge_map = load_edges(js_edges)
67 |
68 | f = open(js_nodes)
69 | for i, line in enumerate(f):
70 | try:
71 | artist = json.loads(line.strip())
72 | tiny_artist = to_tiny_artist(artist)
73 | if tiny_artist['id'] in edge_map:
74 | tiny_artist['edges'] = edge_map[tiny_artist['id']]
75 | else:
76 | tiny_artist['edges'] = []
77 | db.put(tiny_artist['id'], json.dumps(tiny_artist))
78 | print i, tiny_artist['name']
79 | except:
80 | print "trouble with artist", line
81 | continue
82 | f.close()
83 |
84 | def dump_nodes():
85 | f = open(js_nodes)
86 | for i, line in enumerate(f):
87 | try:
88 | artist = json.loads(line.strip())
89 | tiny_artist = to_tiny_artist(artist)
90 | print "%7d %s" % (tiny_artist['followers'], tiny_artist['name'])
91 | except:
92 | print "trouble with", line
93 | continue
94 | f.close()
95 |
96 | def load_edges(path):
97 | edge_map = {}
98 | f = open(path)
99 | for line in f:
100 | try:
101 | edge_dict = json.loads(line.strip())
102 | for uri, edges in edge_dict.items():
103 | tid = uri_to_tid(uri)
104 | tedges = []
105 | for edge in edges:
106 | tedges.append(uri_to_tid(edge))
107 | edge_map[tid] = tedges
108 | except:
109 | print "trouble with edge", line
110 | continue
111 | f.close()
112 | return edge_map
113 |
114 |
115 | def add_track_info():
116 | db = rocksdb.DB(db_path, rocksdb.Options(create_if_missing=False))
117 | it = db.itervalues()
118 | it.seek_to_first()
119 | missing = []
120 | for i, tartist_js in enumerate(it):
121 | tartist = json.loads(tartist_js)
122 | if 'tracks' not in tartist or len(tartist['tracks']) == 0 or has_no_audio(tartist['tracks']):
123 | missing.append(tartist)
124 |
125 | print "artists missing tracks", len(missing)
126 | missing.sort(key=lambda a:a['followers'], reverse=True)
127 | for i, artist in enumerate(missing):
128 | add_tracks(artist)
129 | if not 'incoming_edges' in artist:
130 | artist['incoming_edges'] = []
131 | if not 'edges' in artist:
132 | artist['edges'] = []
133 | db.put(artist['id'], json.dumps(artist))
134 | print i, len(missing), len(artist['tracks']), artist['followers'], artist['id'], artist['name']
135 | print "artists missing tracks", len(missing)
136 |
137 | def port_tracks(old_db, new_db):
138 | odb = rocksdb.DB(old_db, rocksdb.Options(create_if_missing=False))
139 | ndb = rocksdb.DB(new_db, rocksdb.Options(create_if_missing=False))
140 | it = ndb.itervalues()
141 | it.seek_to_first()
142 | for i, tartist_js in enumerate(it):
143 | tartist = json.loads(tartist_js)
144 | oartist = get_artist(odb, tartist['id'])
145 | if oartist and 'tracks' in oartist and len(oartist['tracks']) > 0:
146 | tartist['tracks'] = oartist['tracks']
147 | ndb.put(tartist['id'], json.dumps(tartist))
148 | print i, len(tartist['tracks'])
149 |
150 | def has_no_audio(tracks):
151 | for track in tracks:
152 | if 'audio' in track and track['audio'] != None:
153 | return False
154 | return True
155 |
156 | def has_no_audio(tracks):
157 | for track in tracks:
158 | if 'audio' in track and track['audio'] != None:
159 | return False
160 | return True
161 |
162 |
163 |
164 | def add_incoming_edges():
165 |
166 | db = rocksdb.DB(db_path, rocksdb.Options(create_if_missing=False))
167 | it = db.itervalues()
168 | it.seek_to_first()
169 |
170 | incoming_edges = collections.defaultdict(list)
171 | all_ids = set()
172 |
173 | for i, tartist_js in enumerate(it):
174 | tartist = json.loads(tartist_js)
175 | source = tartist['id']
176 | all_ids.add(source)
177 | if i % 1000 == 0:
178 | print i, tartist['name']
179 | if 'edges' in tartist:
180 | for edge in tartist['edges']:
181 | incoming_edges[edge].append(source)
182 |
183 | for i, aid in enumerate(all_ids):
184 | artist = get_artist(db, aid)
185 | artist['incoming_edges'] = incoming_edges[artist['id']]
186 | db.put(artist['id'], json.dumps(artist))
187 | if i % 1000 == 0:
188 | print i, artist['name'], len(artist['incoming_edges'])
189 |
190 | def add_tracks(artist):
191 | results = spotify.artist_top_tracks(artist['id'], country='SE')
192 | #print json.dumps(results, indent=4)
193 | tracks = []
194 | for track in results['tracks']:
195 | ttrack = to_tiny_track(track)
196 | tracks.append(ttrack)
197 | artist['tracks'] = tracks
198 |
199 | def id_to_uri(tid):
200 | if not tid.startswith('spotify:artist:'):
201 | return 'spotify:artist:' + tid
202 | return tid
203 |
204 | def uri_to_tid(uri):
205 | return uri.split(':')[-1]
206 |
207 | def get_artist(db, uri_or_tid):
208 | tid = uri_to_tid(uri_or_tid)
209 | tjs = db.get(tid)
210 | if tjs:
211 | tartist = json.loads(tjs)
212 | else:
213 | tartist = None
214 | return tartist
215 |
216 | def test_getter(total_runs=1):
217 | db = rocksdb.DB(db_path, rocksdb.Options(), read_only=True)
218 | errs = 0
219 | total_time = 0
220 | count = 0
221 | f = open(js_nodes)
222 | tartists = []
223 | for i, line in enumerate(f):
224 | artist = json.loads(line.strip())
225 | tiny_artist = to_tiny_artist(artist)
226 | tartists.append(tiny_artist)
227 |
228 | while total_runs:
229 | random.shuffle(tartists)
230 | for i, tiny_artist in enumerate(tartists):
231 | start = time.time()
232 | tartist = get_artist(db, tiny_artist['id'])
233 | delta = time.time() - start
234 | total_time += delta
235 |
236 | count += 1
237 | if tiny_artist['name'] == tartist['name']:
238 | print i, total_runs, errs, tiny_artist['name'], '==', tartist['name']
239 | else:
240 | print 'MISMATCH', i, total_runs, errs, tiny_artist['name'], '==', tartist['name']
241 | errs += 1
242 | total_runs -= 1
243 | f.close()
244 |
245 | print "errors", errs
246 | print "total_time", total_time, "ms per read", total_time * 1000 / count
247 |
248 | if __name__ == '__main__':
249 | args = sys.argv[1:]
250 | uris = []
251 | while args:
252 | arg = args.pop(0)
253 |
254 | if arg == '--build':
255 | build()
256 | elif arg == '--artist' and args and not args[0].startswith('--'):
257 | uris.append(args.pop(0))
258 | elif arg == '--dump':
259 | db = rocksdb.DB(db_path, rocksdb.Options(), read_only=True)
260 | for uri in uris:
261 | artist = get_artist(db, uri)
262 | print json.dumps(artist, indent=4)
263 | print
264 | elif arg == '--dump-nodes':
265 | dump_nodes()
266 | elif arg == '--add-tracks':
267 | add_track_info()
268 | elif arg == '--add-incoming-edges':
269 | add_incoming_edges()
270 |
271 | elif arg == '--port-tracks':
272 | old_path = args.pop(0)
273 | port_tracks(old_path, db_path)
274 | elif arg == '--db' and args and not args[0].startswith('--'):
275 | db_path = args.pop(0)
276 | elif arg == '--test':
277 | test_getter(total_runs)
278 | elif arg == '--ptest':
279 | artist = { 'id': '6jJ0s89eD6GaHleKKya26X' }
280 | add_tracks(artist)
281 | print json.dumps(artist, indent=4)
282 | elif arg == '--runs' and args:
283 | total_runs = int(args.pop(0))
284 |
--------------------------------------------------------------------------------
/new_crawler/search.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | # This Python file uses the following encoding: utf-8
3 | import bisect
4 | import collections
5 | import re
6 | import stringutils
7 | import spotipy
8 | from spotipy.oauth2 import SpotifyClientCredentials
9 |
10 | client_credentials_manager = SpotifyClientCredentials()
11 | spotify = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
12 |
13 |
14 | class Searcher:
15 |
16 | def __init__(self, exact=False):
17 | self.items = collections.defaultdict(list)
18 | self.names = []
19 | self.exact = exact
20 |
21 | def add(self, name, o):
22 | # print 'adding', name, o
23 | s = de_norm(name)
24 | if o not in self.items[s]:
25 | self.items[s].append(o)
26 | bisect.insort_left(self.names, s)
27 |
28 | def search(self, s, force_exact=False):
29 | exact = self.exact or force_exact
30 |
31 | org_name = s
32 | s = de_norm(s)
33 | p = bisect.bisect_left(self.names, de_norm(s))
34 |
35 | matches = []
36 | for i in xrange(p, len(self.names)):
37 | if exact and self.names[i] == s:
38 | matches.append( (len(self.names[i]) - len(s), self.names[i]) )
39 | elif not exact and self.names[i].startswith(s):
40 | matches.append( (len(self.names[i]) - len(s), self.names[i]) )
41 | else:
42 | break
43 |
44 | matches.sort()
45 | results = []
46 |
47 | # TODO: don't add dups
48 | for l, name in matches:
49 | for o in self.items[name]:
50 | results.append(o)
51 | if len(results) == 0:
52 | print 'ssearch', s
53 | sresults = spotify.search(q=s, type='artist')
54 | for item in sresults['artists']['items']:
55 | aid = item['id']
56 | print ' ss', item['name']
57 | self.add(org_name, aid)
58 | self.add(item['name'], aid)
59 | results.append(aid)
60 | return results
61 |
62 |
63 | def de_norm(name, space=''):
64 | ''' Dan Ellis normalization
65 | '''
66 | s = name
67 | s = s.replace("'", "")
68 | s = s.replace(".", "")
69 | s = strip_accents(s)
70 | s = s.lower()
71 | s = re.sub(r'&', ' and ', s)
72 | s = re.sub(r'^the ', '', s)
73 | s = re.sub(r'[\W+]', '_', s)
74 | s = re.sub(r'_+', '_', s)
75 | s = s.strip('_')
76 | s = s.replace('_', space)
77 |
78 | # if we've normalized away everything
79 | # keep it.
80 | if len(s) == 0:
81 | s = name
82 | return s
83 |
84 | def de_equals(n1, n2):
85 | return n1 == n2 or de_norm(n1) == de_norm(n2)
86 |
87 | def de_match(n1, n2):
88 | if de_equals(n1, n2):
89 | return True
90 | else:
91 | dn1 = de_norm(n1)
92 | dn2 = de_norm(n2)
93 | return dn1.find(dn2) >= 0 or dn2.find(dn1) >= 0
94 |
95 | def strip_accents(s):
96 | return stringutils.unaccent(s)
97 |
98 | def test_norm(s):
99 | print s, de_norm(s)
100 |
101 | def norm_test():
102 | test_norm("N'sync")
103 | test_norm("D'Angelo")
104 | test_norm("R. Kelly")
105 | test_norm("P.J. Harvey")
106 | test_norm("Beyoncé")
107 | test_norm("The Bangles")
108 | test_norm("Run-D.M.C.")
109 | test_norm("The Presidents of the United States of America")
110 | test_norm("Emerson Lake & Palmer")
111 | test_norm("Emerson, Lake & Palmer")
112 | test_norm("Emerson, Lake and Palmer")
113 | test_norm("Emerson Lake and Palmer")
114 |
115 |
116 | if __name__ == '__main__':
117 | norm_test()
118 |
--------------------------------------------------------------------------------
/new_crawler/shell.py:
--------------------------------------------------------------------------------
1 | import cmd
2 | import artist_graph
3 | import simplejson as json
4 | import time
5 |
6 | class ArtistGraphShell(cmd.Cmd):
7 | prompt = "ag% "
8 | ag = artist_graph.ArtistGraph()
9 | raw = False
10 | skips = set()
11 |
12 | def do_test(self, line):
13 | print self.my_redis
14 | print 'hello world'
15 |
16 | def do_EOF(self, line):
17 | return True
18 |
19 | def do_toggle_raw(self, line):
20 | self.raw = not self.raw
21 | return True
22 |
23 | def do_skip(self, line):
24 | if len(line) == 0:
25 | for s in self.skips:
26 | print self.an(s),
27 | print
28 | elif line == 'clear':
29 | self.skips = set()
30 | else:
31 | artists = line.split(",")
32 | for artist in artists:
33 | aid = self.ag.search(line)
34 | if aid:
35 | self.skips.add(aid)
36 |
37 | def an(self, aid):
38 | return self.ag.get_artist(aid)['name']
39 |
40 | def do_path(self, line):
41 | artists = line.split(",")
42 | if len(artists) == 2:
43 | results = self.ag.path(artists[0].strip(), artists[1].strip(), self.skips)
44 | if self.raw:
45 | print json.dumps(results, indent=4)
46 | if results['status'] == 'ok':
47 | dump_path(results['path'])
48 | print "time:", results['pdelta'], results['fdelta']
49 | else:
50 | print results['status']
51 | print results['reason']
52 | else:
53 | print "usage: path artist1, artist2"
54 |
55 | def do_edge_check(self, line):
56 | aid = self.ag.search(line)
57 | if aid:
58 | self.ag.edge_check(aid)
59 |
60 | def do_sim_check(self, line):
61 | aid = self.ag.search(line)
62 | if aid:
63 | self.ag.sim_check(aid)
64 |
65 |
66 | def dump_path(path):
67 | for i, artist in enumerate(path):
68 | print "%2d %2d %s %s" % (i, artist['popularity'], artist['id'], artist['name'])
69 |
70 | if __name__ == '__main__':
71 | ArtistGraphShell().cmdloop()
72 |
--------------------------------------------------------------------------------
/new_crawler/sim_crawl.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import json
3 | import spotipy
4 | import db
5 | from spotipy.oauth2 import SpotifyClientCredentials
6 |
7 | max_artists = 1000000
8 | superseeds = 50000
9 | client_credentials_manager = SpotifyClientCredentials()
10 | spotify = spotipy.Spotify(client_credentials_manager=client_credentials_manager)
11 |
12 | known_artists = set()
13 | expanded_artists = set()
14 | queue = []
15 |
16 | ta = 'spotifh:artist:7dGJo4pcD2V6oG8kP0tJRR' # troublesome artist
17 | def check_ta(where, artist):
18 | if artist['uri'] == ta:
19 | print 'TA', where
20 |
21 | def check_ta_uri(where, uri):
22 | if uri == ta:
23 | print 'TAU', where
24 |
25 | def queue_append(artist):
26 | check_ta('queue_append', artist)
27 | queue.append( (artist['followers']['total'], artist['uri'], artist['name']))
28 |
29 | def queue_sort():
30 | queue.sort(reverse=True)
31 |
32 | def process_queue(nodefile, edgefile):
33 | edge_count = 0
34 |
35 | queue_sort()
36 | while queue and len(known_artists) < max_artists:
37 | followers, uri, artist_name = queue.pop(0)
38 | print len(queue), followers, uri, artist_name
39 | if uri in expanded_artists:
40 | print " done"
41 | check_ta_uri('already expanded', uri)
42 | continue
43 |
44 | expanded_artists.add(uri)
45 | results = spotify.artist_related_artists(uri)
46 | if not results['artists']:
47 | print "NO SIMS FOR", artist_name
48 | check_ta_uri('goit sims', uri)
49 | for sim_artist in results['artists']:
50 | print " %s => %s" % (artist_name, sim_artist['name'])
51 |
52 | sim_uris = []
53 | for sim_artist in results['artists']:
54 | edge_count += 1
55 | sim_uri = sim_artist['uri']
56 | if sim_uri not in known_artists:
57 | known_artists.add(sim_uri)
58 | print "%5d/%-7d %7d %s %3d %7d %s" % (len(known_artists), len(queue), edge_count, sim_uri,
59 | sim_artist['popularity'], sim_artist['followers']['total'], sim_artist['name'])
60 | queue_append(sim_artist)
61 | print >> nodefile, json.dumps(sim_artist)
62 | sim_uris.append(sim_artist['uri'])
63 | queue_sort()
64 |
65 | check_ta_uri('appended sims', uri)
66 | dict = { uri: sim_uris }
67 | print >> edgefile, json.dumps(dict)
68 |
69 | # print " %s %s => %s %s" % (artist['uri'], artist['name'], sim_artist['uri'], sim_artist['name'])
70 |
71 | def load_external_artist_list(top, nodefile, dbpath=None):
72 | if dbpath:
73 | db.load_db(dbpath)
74 | for i, line in enumerate(open('top_artists.txt')):
75 | if i < top:
76 | fields = line.strip().split()
77 | uri = fields[0]
78 | count = int(fields[1])
79 | name = ' '.join(fields[2:])
80 |
81 | if uri not in known_artists:
82 | print "NEW", i, uri, count, name
83 | artist = None
84 | if dbpath:
85 | artist = db.get_artist(uri)
86 | if not artist:
87 | artist = spotify.artist(uri)
88 | else:
89 | print " cache hit for", name
90 | known_artists.add(uri)
91 | queue_append(artist)
92 | print >> nodefile, json.dumps(artist)
93 | else:
94 | break
95 |
96 | if __name__ == '__main__':
97 |
98 | seeds = [
99 | 'spotify:artist:3hE8S8ohRErocpkY7uJW4a', # within temptation
100 | 'spotify:artist:0kbYTNQb4Pb1rPbbaF0pT4', # miles davis
101 | 'spotify:artist:3WrFJ7ztbogyGnTHbHJFl2', # the beatles
102 | 'spotify:artist:6eUKZXaKkcviH0Ku9w2n3V', # ed sheeran
103 | 'spotify:artist:36QJpDe2go2KgaRleHCDTp', # led zeppelin
104 | ]
105 |
106 | args = sys.argv[1:]
107 | prefix = "./"
108 |
109 | while args:
110 | arg = args.pop(0)
111 |
112 | if arg == '--path':
113 | prefix = args.pop(0)
114 |
115 | elif arg == '--load':
116 | db.load_db(prefix)
117 |
118 | artists = db.get_all_artists()
119 |
120 | for uri, artist in artists.items():
121 | known_artists.add(uri)
122 |
123 | edges = db.get_all_edges()
124 |
125 | for source, targets in edges.items():
126 | check_ta_uri('load expanded', source)
127 | expanded_artists.add(source)
128 |
129 | for uri, artist in artists.items():
130 | if uri not in expanded_artists:
131 | queue_append(artist)
132 |
133 | for source, targets in edges.items():
134 | for target in targets:
135 | if target not in expanded_artists:
136 | artist = db.get_artist(target)
137 | if artist:
138 | queue_append(artist)
139 | else:
140 | print "trouble on restart, unknown artist", artist
141 |
142 | nodefile = open(prefix + '/nodes.js', 'a')
143 | edgefile = open(prefix + '/edges.js', 'a')
144 | #load_external_artist_list(superseeds, nodefile)
145 | queue_sort()
146 | process_queue(nodefile, edgefile)
147 |
148 | elif arg == '--fresh':
149 | nodefile = open(prefix + '/nodes.js', 'w')
150 | edgefile = open(prefix + '/edges.js', 'w')
151 | for seed in seeds:
152 | artist = spotify.artist(seed)
153 | known_artists.add(seed)
154 | queue_append(artist)
155 | print >> nodefile, json.dumps(artist)
156 |
157 | process_queue(nodefile, edgefile)
158 |
159 | elif arg == '--superseeds':
160 | seed_count = 100
161 | if args:
162 | seed_count = int(args.pop(0))
163 | nodefile = open(prefix + '/nodes.js', 'w')
164 | edgefile = open(prefix + '/edges.js', 'w')
165 | load_external_artist_list(seed_count, nodefile, "g2")
166 | process_queue(nodefile, edgefile)
167 |
168 |
--------------------------------------------------------------------------------
/new_crawler/spotipy_util.py:
--------------------------------------------------------------------------------
1 |
2 | # shows a user's playlists (need to be authenticated via oauth)
3 |
4 | from __future__ import print_function
5 | import os
6 | from spotipy import oauth2
7 | import spotipy
8 |
9 | def prompt_for_user_token(username, scope=None, client_id = None,
10 | client_secret = None, redirect_uri = None, cache_path = None, use_web_browser=True):
11 | ''' prompts the user to login if necessary and returns
12 | the user token suitable for use with the spotipy.Spotify
13 | constructor
14 |
15 | Parameters:
16 |
17 | - username - the Spotify username
18 | - scope - the desired scope of the request
19 | - client_id - the client id of your app
20 | - client_secret - the client secret of your app
21 | - redirect_uri - the redirect URI of your app
22 | - cache_path - path to location to save tokens
23 |
24 | '''
25 |
26 | if not client_id:
27 | client_id = os.getenv('SPOTIPY_CLIENT_ID')
28 |
29 | if not client_secret:
30 | client_secret = os.getenv('SPOTIPY_CLIENT_SECRET')
31 |
32 | if not redirect_uri:
33 | redirect_uri = os.getenv('SPOTIPY_REDIRECT_URI')
34 |
35 | if not client_id:
36 | print('''
37 | You need to set your Spotify API credentials. You can do this by
38 | setting environment variables like so:
39 |
40 | export SPOTIPY_CLIENT_ID='your-spotify-client-id'
41 | export SPOTIPY_CLIENT_SECRET='your-spotify-client-secret'
42 | export SPOTIPY_REDIRECT_URI='your-app-redirect-url'
43 |
44 | Get your credentials at
45 | https://developer.spotify.com/my-applications
46 | ''')
47 | raise spotipy.SpotifyException(550, -1, 'no credentials set')
48 |
49 | cache_path = cache_path or ".cache-" + username
50 | sp_oauth = oauth2.SpotifyOAuth(client_id, client_secret, redirect_uri,
51 | scope=scope, cache_path=cache_path)
52 |
53 | # try to get a valid token for this user, from the cache,
54 | # if not in the cache, the create a new (this will send
55 | # the user to a web page where they can authorize this app)
56 |
57 | token_info = sp_oauth.get_cached_token()
58 |
59 | if not token_info:
60 | print('''
61 |
62 | User authentication requires interaction with your
63 | web browser. Once you enter your credentials and
64 | give authorization, you will be redirected to
65 | a url. Paste that url you were directed to to
66 | complete the authorization.
67 |
68 | ''')
69 | auth_url = sp_oauth.get_authorize_url()
70 | try:
71 | if use_web_browser:
72 | import webbrowser
73 | webbrowser.open(auth_url)
74 | print("Opened %s in your browser" % auth_url)
75 | else:
76 | print("Please navigate here: %s" % auth_url)
77 | except:
78 | print("Please navigate here: %s" % auth_url)
79 |
80 | print()
81 | print()
82 | try:
83 | response = raw_input("Enter the URL you were redirected to: ")
84 | except NameError:
85 | response = input("Enter the URL you were redirected to: ")
86 |
87 | print()
88 | print()
89 |
90 | code = sp_oauth.parse_response_code(response)
91 | token_info = sp_oauth.get_access_token(code)
92 | # Auth'ed API request
93 | if token_info:
94 | return token_info['access_token']
95 | else:
96 | return None
97 |
--------------------------------------------------------------------------------
/new_crawler/start_simple_server:
--------------------------------------------------------------------------------
1 | source credentials.sh
2 | export PBL_CACHE=REDIS
3 | # python flask_server.py
4 | #gunicorn flask_server:app -b localhost:8000
5 | until python flask_server.py; do
6 | echo "flask server crashed with exit code $?. Respawning.." >&2
7 | sleep 1
8 | done
9 |
--------------------------------------------------------------------------------
/new_crawler/stringutils.py:
--------------------------------------------------------------------------------
1 | import datetime
2 | import hashlib
3 | import htmlentitydefs
4 | import logging
5 | import random
6 | import re
7 | import string
8 | import sys
9 | import time
10 | from types import BooleanType, FloatType, IntType, ListType, LongType, StringType, UnicodeType
11 | import unicodedata
12 | import uuid
13 | import urllib
14 |
15 | from chartable import simplerchars
16 |
17 | COMMENT = re.compile(r'')
18 |
19 | # doesn't handle higher range control chars - will interfere with unicode chars:
20 | # + range(127,160)
21 | control_chars = ''.join(map(unichr, range(0,32)))
22 | CONTROL_CHARS = re.compile('[%s]' % re.escape(control_chars))
23 |
24 | logger = logging.getLogger(__name__)
25 |
26 | class NestOpener(urllib.FancyURLopener):
27 | version = "nestReader/0.2 (discovery; http://the.echonest.com/reader.html; reader at echonest.com)"
28 |
29 | def random_alphanumeric(length):
30 | chars = string.letters + string.digits
31 | return ''.join(random.choice(chars) for i in xrange(length))
32 |
33 | def randomString(length=10):
34 | return ''.join(random.choice(string.letters) for x in xrange(length))
35 |
36 | def randomType():
37 | return random.choice(["artist","track","release","doc"])
38 |
39 | def randomInt(max=1000):
40 | return random.randint(0,max)
41 |
42 | def randomFloat():
43 | return random.random()
44 |
45 | def randomUUID():
46 | return str(uuid.uuid4())
47 |
48 | def randomDocument():
49 | return {"name":randomString(15), "enid":randomUUID(), "type":randomType(), "grackleCount":randomInt(), "hotttnesss":randomFloat()}
50 |
51 | def bandNameNormalize(name):
52 | # Does name normalization for myspace etc name matching
53 | out = name.lower()
54 | out = re.sub(r' group', '', out)
55 | out = re.sub(r' band', '', out)
56 | out = re.sub(r' and ', ' ', out)
57 | out = re.sub(r'\(.*?\)', '', out)
58 | out = re.sub(r'\[.*?\]', '', out)
59 | out = re.sub(r'[\-\,\.\&\$\%\!\@\#\*\:\"\'\?\;]',' ', out)
60 | out = re.sub(r'\ {2,}', ' ', out)
61 | out = re.sub(r'^ ', '', out)
62 | out = re.sub(r' $', '', out)
63 | out = re.sub(r'^the', '', out)
64 | out = re.sub(r'^ ', '', out)
65 | out = re.sub(r' $', '', out)
66 | out = out[:25]
67 | out = unaccent(out, erase_unrecognized=False)
68 | return out
69 |
70 | def uncomma(s, dumb=False):
71 | if type(s) not in (StringType, UnicodeType):
72 | raise ValueError, "Argument must be a string."
73 | if ', ' not in s:
74 | return s
75 |
76 | if dumb:
77 | return re.sub('(.*?), ([^ ]*)', r'\2 \1', s)
78 |
79 | a, b = s.split(', ', 1)
80 | suffix = ''
81 |
82 | for amp in [' & ', ' And ', ' and ', ' AND ', ' / ', ' + ']:
83 | if amp in b:
84 | commable, suffix = b.split(amp, 1)
85 | suffix = amp + suffix
86 | return "%s %s%s" % (commable, a, suffix)
87 | return "%s %s" % (b, a)
88 |
89 | def str2bool(s):
90 | if(isinstance(s,bool)):
91 | return s
92 | if s in ['Y', 'y']:
93 | return True
94 | if s in ['N', 'n']:
95 | return False
96 | if s in ['True', 'true']:
97 | return True
98 | elif s in ['False', 'false']:
99 | return False
100 | else:
101 | raise ValueError, "Bool-looking string required."
102 |
103 | def delist(item):
104 | if item == []:
105 | return ''
106 | if type(item) is ListType:
107 | return item[0]
108 | return item
109 |
110 | def summarize_string(s, length=50):
111 | ss = str(s)
112 | if len(ss) > length:
113 | return '%s: %s ... [%s more chars]' % (type(s), ss[:length], len(ss) - length)
114 | else:
115 | return s
116 |
117 | def reallyunicode(s, encoding="utf-8"):
118 | """
119 | Try the user's encoding first, then others in order; break the loop as
120 | soon as we find an encoding we can read it with. If we get to ascii,
121 | include the "replace" argument so it can't fail (it'll just turn
122 | everything fishy into question marks).
123 |
124 | Usually this will just try utf-8 twice, because we will rarely if ever
125 | specify an encoding. But we could!
126 | """
127 | if type(s) is StringType:
128 | for args in ((encoding,), ('utf-8',), ('latin-1',), ('ascii', 'replace')):
129 | try:
130 | s = s.decode(*args)
131 | break
132 | except UnicodeDecodeError:
133 | continue
134 | if type(s) is not UnicodeType:
135 | raise ValueError, "%s is not a string at all." % s
136 | return s
137 |
138 | def reallyUTF8(s):
139 | return reallyunicode(s).encode("utf-8")
140 |
141 | def unfancy(s):
142 | "Removes smartquotes, smartellipses, and nbsps. Always returns Unicode."
143 | simplerpunc = {145: "'", 146: "'", 147: '"', 148: '"', 133: '...', 160: ' ', 173: '-',
144 | 8211: "-", 8212: "--", 8216:"'", 8217: "'", 8220:'"', 8221:'"', 8222: '"', 8230: '...'}
145 | ret = "".join([simplerpunc.get(ord(char), char) for char in reallyunicode(s)])
146 | return ret
147 |
148 | def unaccent(s, erase_unrecognized=True):
149 | """Removes umlauts and accents, etc. Unless erase_unrecognized=False,
150 | any characters that don't have an ASCII simplified form are
151 | removed entirely."""
152 | ## The dict "simplerchars" is in another file because it's
153 | ## so huge. See import statement at top.
154 | if not isinstance(s,basestring):
155 | raise ValueError, "unaccent argument %s must be a string." % str(s)
156 | unistr = reallyunicode(s)
157 | ret = u''
158 | for c in unistr:
159 | if c in simplerchars:
160 | ret += simplerchars[c]
161 | else:
162 | decomp = unicodedata.normalize('NFKD', c)
163 | basechar = decomp[0]
164 | ## These will all be unicode characters, so
165 | ## technically none of them are in string.printable.
166 | ## But "in" uses equality, not identity, and
167 | ## since u'a' == 'a' it will all work out.
168 | if not erase_unrecognized or basechar in string.printable:
169 | ret += basechar
170 | return ret
171 |
172 | def convertentity(m):
173 | """Convert a HTML entity into normal string (UTF-8)"""
174 | prefix, entity = m.groups()
175 | try:
176 | if prefix != '#':
177 | ## Look up name, change it to a unicode code point (integer).
178 | entity = htmlentitydefs.name2codepoint[entity]
179 | else:
180 | if entity.startswith('x'):
181 | entity = int(entity[1:], 16)
182 | else:
183 | entity = int(entity)
184 | except (KeyError, ValueError):
185 | ## Give back original unchanged.
186 | return "&%s%s;" % (prefix, entity)
187 |
188 | return unichr(int(entity))
189 |
190 | def decode_htmlentities(string):
191 | """Uses converentity to convert a string containing
192 | HTML entitites in a string into normal strings (UTF-8.)"""
193 | entity_re = re.compile("&(#?)(\d{1,5}|\w{1,8});")
194 | return entity_re.subn(convertentity, reallyunicode(string))[0]
195 |
196 | def convertentities(s):
197 | """Convert a HTML quoted string into normal string (UTF-8).
198 | Works with X; and with > etc."""
199 | s = reallyunicode(s)
200 | rep = re.compile(r'&(#?)([a-zA-Z0-9]+?);')
201 | unquoted = rep.sub(convertentity,s)
202 | return unquoted
203 |
204 | def unquotehtml(s):
205 | unquoted = convertentities(s)
206 | return unfancy(unquoted).encode('utf-8')
207 |
208 | # YES I WANT TO CALL IT CHOMP
209 | def chomp(str):
210 | return str.rstrip('\r\n')
211 |
212 | def long_time(t=None):
213 | if t is None:
214 | t = datetime.datetime.now()
215 | st = datetime.datetime.isoformat(t)
216 | st = re.sub(r"[\-\:\.A-Za-z]","",st)
217 | st = st[0:17]
218 | return st
219 |
220 | def solr_time(when=None):
221 | "Returns solr-specific UTC time string (1995-12-31T23:59:59.999Z)."
222 | if when is None:
223 | when = time.gmtime()
224 | return time.strftime('%Y-%m-%dT%H:%M:%SZ', when)
225 |
226 | timeToLongTime = long_time
227 | timeToSolrTime = solr_time
228 |
229 | def readable_time(t=None):
230 | "Returns second-accuracy time string suitable for sorting or reading by humans, like '2009-04-24-12-45-04'."
231 | if t is None:
232 | t = time.localtime()
233 | return time.strftime('%Y-%m-%d-%H-%M-%S', t)
234 |
235 | def MD5(text):
236 | ## We will convert to UTF-8 if given Unicode,
237 | ## BUT if fed a bytestring we just checksum it,
238 | ## so don't go MD5ing strings in encodings
239 | ## incompatible with UTF-8 and expecting it
240 | ## to work out okay!
241 | if type(text) == UnicodeType:
242 | sys.stderr.write("md5 of Unicode requested - encoding to UTF-8.\n")
243 | sys.stderr.write("text (asciified for display): %s\n" % ascii(text))
244 | text = text.encode('utf-8')
245 | m = hashlib.md5()
246 | m.update(text)
247 | return m.hexdigest()
248 |
249 | def parseEntererLine(line):
250 | line = line.rstrip("\r\n")
251 | s = line.split(' ### ')
252 | l = {}
253 | if(len(s) == 9):
254 | l["foreignID_AR"] = s[0]
255 | l["foreignID_RE"] = s[1]
256 | l["foreignID_TR"] = s[2]
257 | l["name_AR"] = s[3]
258 | l["name_RE"] = s[4]
259 | l["name_TR"] = s[5]
260 | l["type"] = s[6]
261 | l["tagname"] = s[7]
262 | l["tagvalue"] = s[8]
263 | else:
264 | logger.error("Can't parse line %s got %d out of it", line, len(s))
265 | l = None
266 | return l
267 |
268 | def makeNiceLucene(text):
269 | #http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping%20Special%20Characters
270 | text = re.sub(r'\bAND\b', '\AND', text)
271 | text = re.sub(r'\bOR\b', '\OR', text)
272 | text = re.sub(r'\bNOT\b', '\NOT', text)
273 | return re.sub(r"([\+\-\&\|\!\(\)\{\}\[\]\;\^\"\~\*\?\:\\])",r"\\\1", text)
274 |
275 | def normalizeString(text):
276 | "Does ryan mckinley text normalization"
277 | digitMap = ["zero","one","two","three","four","five","six","seven","eight","nine"]
278 | LOWASCII = range(0,128)
279 | charList = (['E', chr(129) ,',', 'f', ',', '.', 't', '+', '^', '%', 'S', '<','D', ' ', 'Z', ' ', ' ', '\'', '\'', '\"','\"', '-','-', '-', '-', ' ', 's', '>', 'c', ' ', 'z', 'Y',' ', '!', 'c', 'L', '.', 'Y', '|', 'S', '.', 'c',' ', '<', '-', '-', 'R', '-', 'o', '+', '.','3','\'', 'u', 'P','.',',', '1', 'o', '>', '.', '.', ' ', '?', 'A', 'A','A', 'A', 'A', 'A', 'A', 'C','E', 'E', 'E', 'E', 'I','I', 'I', 'I', 'D', 'N', 'O', 'O','O','O', 'O', 'x','O','U','U', 'U', 'U', 'Y','P','B', 'a', 'a','a','a', 'a', 'a', 'A', 'c', 'e', 'e','e','e','i','i','i', 'i','o', 'n', 'o', 'o', 'o', 'o', 'o', '/','o', 'u','u', 'u', 'u', 'y', 'p', 'y'])
280 | for c in charList:
281 | LOWASCII.append(ord(c))
282 |
283 | # Strip whitespace at ends
284 | r = text.strip().lower()
285 | r = re.sub(r"[\/\,\:\.\&\(\)\<\>\:\;\-\_\+]"," ",r)
286 | words = text.split()
287 | newphrase = []
288 | for w in words:
289 | if(w=="and" or w=="the" or w=="of" or w=="und"):
290 | pass
291 | else:
292 | newword = ""
293 | for c in w:
294 | if(ord(c)>1 and ord(c)<255):
295 | c = chr(LOWASCII[ord(c)]).lower()
296 | if(ord(c)>=ord('a') and ord(c) <= ord('z')):
297 | newword = newword + c
298 | if(ord(c)>=ord('0') and ord(c) <= ord('9')):
299 | if(len(newword)>0):
300 | newphrase.append(newword)
301 | newphrase.append(digitMap[ord(c)-ord('0')])
302 |
303 | if(len(newword)>0):
304 | newphrase.append(newword)
305 |
306 | normalized = " ".join(newphrase)
307 | return normalized
308 |
309 | def undo_wtf8(s):
310 | try:
311 | if type(s) == str:
312 | s2 = s.decode('utf-8')
313 | else:
314 | s2 = s
315 | s3 = s2.encode('raw-unicode-escape')
316 | s4 = s3.decode('utf-8')
317 | return s4
318 | except (UnicodeEncodeError, UnicodeDecodeError):
319 | return s
320 |
321 | def undo_windows_wtf8(s):
322 | try:
323 | if type(s) == str:
324 | s2 = s.decode('utf-8')
325 | else:
326 | s2 = s
327 | s3 = s2.encode('windows-1252')
328 | s4 = s3.decode('utf-8')
329 | return s4
330 | except (UnicodeEncodeError, UnicodeDecodeError):
331 | return s
332 |
333 | def utf8(text):
334 | return unicode(text, "utf8", errors="replace")
335 |
336 | def ascii(text, errors='ignore'):
337 | if type(text) not in (StringType, UnicodeType):
338 | raise ValueError, "Argument %s must be string or Unicode!" % str(text)
339 | text = unaccent(text, erase_unrecognized=True)
340 | ## unaccent returns Unicode-- should be ascii safe, but
341 | ## in case it's not...
342 | return text.encode('ascii', errors)
343 |
344 | def cleanup(text):
345 | ltRem = text.replace("\r","").replace("\n","")
346 | ltRem = re.sub(r" {2,}"," ",ltRem)
347 | ltRem = re.sub(r"\<.{1,20}\>","",ltRem)
348 | return ltRem
349 |
350 | def striphtml(text):
351 | return re.sub('<.*?>', '', text)
352 |
353 | def clean(html):
354 | """strip html and unquotehtml"""
355 | for tag in [' ', ' ', '
']:
356 | html = html.replace(tag, ' ')
357 | html = COMMENT.sub('', html)
358 | return unquotehtml(htmlstripper.stripHTML(html,'UTF-8'))
359 |
360 | def link(url, timeout=5, version=False):
361 | """save URL link to temp file, return html
362 | if it fails, retry after timeout=5 seconds; use input ua"""
363 | if version:
364 | NestOpener.version = version
365 | myOpener = NestOpener()
366 | try:
367 | page = myOpener.open(url)
368 | except (IOError, AttributeError):
369 | time.sleep(timeout)
370 | try:
371 | page = myOpener.open(url)
372 | except (IOError, AttributeError):
373 | return False
374 | try:
375 | html = page.read()
376 | except Exception:
377 | time.sleep(timeout)
378 | try:
379 | html = page.read()
380 | except Exception:
381 | logger.exception('SCRAPPY: After waiting page could not be read.')
382 | return ""
383 | return reallyunicode(html)
384 |
385 |
386 |
387 | def istyperight(doc):
388 | for (field, val) in doc.items():
389 | ## Some docs in sands have None in them (how?) so
390 | ## we need to clean them to re-add them.
391 | if val is None:
392 | doc.pop(field)
393 | if isinstance(val, list):
394 | while None in val:
395 | val.remove(None)
396 |
397 | if not is_right_type(field, val):
398 | logger.error("field %s and value %s were not right type", field, val)
399 | if isinstance(val, list):
400 | logger.error("BAD TYPE; DID NOT ADD DOCUMENT. Field '%s' had value %s, which is %s.", field, val, set(type(x) for x in val))
401 | else:
402 | logger.error("BAD TYPE; DID NOT ADD DOCUMENT. Field '%s' had value %s, which is %s.", field, val, type(val))
403 | return False
404 |
405 | return True
406 |
407 | def is_right_type(fieldname, value):
408 | OurDateType = type(datetime.datetime.today())
409 | if fieldname in ['thingID', 'url', 'id'] or fieldname.startswith('_'):
410 | return type(value) in (StringType, UnicodeType)
411 | if fieldname in ['indexed', 'modified']:
412 | return type(value) == OurDateType
413 | if fieldname in ['score']:
414 | return type(value) == FloatType
415 |
416 | righttypes = {'i_': (IntType,) ,
417 | 'f_': (FloatType,IntType) ,
418 | 's_': (StringType, UnicodeType) ,
419 | 'v_': (StringType, UnicodeType) ,
420 | 't_': (StringType, UnicodeType) ,
421 | 'n_': (StringType, UnicodeType) ,
422 | 'b_': (BooleanType,) ,
423 | 'd_': (OurDateType,) ,
424 | 'l_': (IntType, LongType) }
425 | rightfuncs = {'i_': int,
426 | 'f_': float,
427 | 'l_': long,
428 | 'b_': str2bool}
429 |
430 | prefix = fieldname[:2]
431 | if prefix not in righttypes:
432 | raise ValueError("Field called %s has an invalid prefix for sands." % fieldname, 'uknown doc ID')
433 | if type(value) is ListType:
434 | return bool(False not in [is_right_type(fieldname, x) for x in value])
435 | if type(value) in (StringType, UnicodeType) and prefix in rightfuncs:
436 | try:
437 | rightfuncs[prefix](value)
438 | ## What the HELL people
439 | if prefix == 'f_' and str(float(value)) in ['inf', '-inf', 'nan']:
440 | return False
441 | sys.stderr.write("Warning: string '%s' being added with prefix '%s'.\n" % (value, prefix))
442 | return True
443 | except ValueError:
444 | return False
445 | ## Finally, if the prefix is valid and it's not an array
446 | ## AND it's not a string that Solr will numericalize,
447 | ## then answer the obvious way: is it the right type?
448 | return type(value) in righttypes[prefix]
449 |
450 | def remove_control_chars(s):
451 | return CONTROL_CHARS.sub('', s)
452 |
453 | def is_valid_unicode_xml_char(character):
454 | '''
455 | test whether a unicode character can exist in an xml document,
456 | according to the characters specified in:
457 |
458 | Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
459 |
460 | as defined in:
461 | http://www.w3.org/TR/2000/REC-xml-20001006#NT-Char
462 |
463 | raises TypeError `character` is not unicode
464 | '''
465 |
466 | if not isinstance(character, unicode):
467 | raise TypeError('character must be unicode')
468 |
469 | # if len(character) != 1:
470 | # raise ValueError('character must be a single character: %s' % character)
471 |
472 | if character in (u'\u0009', u'\u000A', u'\u000D'):
473 | return True
474 |
475 | if character < u'\u0020':
476 | return False
477 |
478 | if character > u'\uD7FF' and character < u'\uE000':
479 | return False
480 |
481 | if character > u'\uFFFD' and character < u'\U00010000':
482 | return False
483 |
484 | if character > u'\U0010FFFF':
485 | return False
486 |
487 | return True
488 |
489 | def truncate_words(s, num, end_text='...'):
490 | """Truncates a string after a certain number of words. Takes an optional
491 | argument of what should be used to notify that the string has been
492 | truncated, defaults to ellipsis (...)"""
493 | s = reallyunicode(s)
494 | length = int(num)
495 | words = s.split()
496 | if len(words) > length:
497 | words = words[:length]
498 | if not words[-1].endswith(end_text):
499 | words.append(end_text)
500 | return u' '.join(words)
501 |
--------------------------------------------------------------------------------
/server/deploy:
--------------------------------------------------------------------------------
1 | deploy_labs "graph.py full_spotify.dat server.py spotify_songs.dat web.conf" BoilTheFrog
2 |
--------------------------------------------------------------------------------
/server/graph.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import networkx as nx
4 | import search
5 | import random
6 | import os
7 | import pprint
8 | import simplejson as json
9 |
10 | RS = ' '
11 | artists ={}
12 | aid_to_sid = {}
13 | max_edges_per_node = 4
14 | min_hotttnesss = .5
15 |
16 | G = nx.Graph()
17 |
18 | searcher = search.Searcher()
19 | songs = {}
20 | skips = set()
21 |
22 | skip_artists_with_no_songs = True
23 |
24 |
25 | def stats():
26 | print 'nodes', G.number_of_nodes()
27 | print 'edges', G.number_of_edges()
28 | cc = nx.connected_components(G)
29 | print 'components', len(cc)
30 |
31 | print ' ',
32 | for c in cc:
33 | print len(c),
34 | print
35 | #print 'diameter', nx.diameter(G)
36 |
37 | def get_artist(id):
38 | artist = artists[id]
39 | return artist
40 |
41 |
42 | def get_random_artist():
43 | id = random.choice(G.nodes())
44 | return artists[id]
45 |
46 | def add_artist(artist):
47 | id = artist['id']
48 | if not id in artists:
49 | artists[id] = artist
50 | searcher.add(artist['name'], artist)
51 | G.add_node(id)
52 | if id in songs:
53 | artist['songs'] = songs[id]
54 | # print 'found', len(artist['songs']), 'songs for', artist['name']
55 |
56 | def add_edge(sid, did, weight):
57 | G.add_edge(sid, did, weight=weight)
58 |
59 |
60 | def npair(n1, n2):
61 | return n1 + ' // ' + n2
62 |
63 |
64 | def load_skiplist(path):
65 | for line in open(path):
66 | fields = line.strip().split(RS)
67 | if len(fields) == 4:
68 | skips.add(npair(fields[0], fields[2]))
69 | skips.add(npair(fields[2], fields[0]))
70 | else:
71 | skips.add(fields[0])
72 |
73 |
74 | def has_songs(id):
75 | return id in songs and len(songs[id]) > 0
76 |
77 | def skipped(n1, n2):
78 | if n1 in skips:
79 | return True
80 | if n2 in skips:
81 | return True
82 |
83 | if skip_artists_with_no_songs:
84 | if not has_songs(n1):
85 | return True
86 |
87 | if not has_songs(n2):
88 | return True
89 |
90 | skip = npair(n1, n2) in skips
91 | return skip
92 |
93 |
94 | def get_edge_weight(id1, id2):
95 | hot1 = artists[id1]['hot']
96 | hot2 = artists[id2]['hot']
97 | edge_weight = 1 + int(100 * (abs(hot1 - hot2)))
98 | # edge_weight = 1 + int(1000 * (abs(hot1 - hot2)))
99 | return edge_weight
100 |
101 |
102 | def add_ids(aid, sid):
103 | if aid in aid_to_sid:
104 | if sid != aid_to_sid[aid]:
105 | print 'mismatched ids', aid, sid
106 | sys.exit(-1)
107 | else:
108 | aid_to_sid[aid] = sid
109 |
110 |
111 | def load_graph(path):
112 | last_source = ''
113 | edge_count = 0
114 | for i, line in enumerate(open(path)):
115 | fields = line.strip().split(RS)
116 | if i % 100000 == 0:
117 | print i, fields[2]
118 | if fields[0] == 'artist':
119 | aid = fields[1]
120 | # sid = fields[3].split(':')[2]
121 | sid = fields[3]
122 | add_ids(aid, sid)
123 | hot = float(fields[4])
124 | artist = { 'id' : sid, 'name' : fields[2], 'hot': hot }
125 | if has_songs(sid) and hot >= min_hotttnesss:
126 | add_artist(artist)
127 | elif fields[0] == 'sim' and len(fields) > 5:
128 | source_aid = fields[1]
129 | source_sid = aid_to_sid[source_aid]
130 | if source_sid <> last_source:
131 | last_source = source_sid
132 | edge_count = 0
133 |
134 | if edge_count < max_edges_per_node:
135 | dest_aid = fields[3]
136 | #dest_sid = fields[6].split(':')[2]
137 | dest_sid = fields[6]
138 | add_ids(dest_aid, dest_sid)
139 |
140 | if not skipped(source_sid, dest_sid) and source_sid in artists:
141 | source = artists[source_sid]
142 | shot = float(fields[5])
143 | dest = { 'id' : dest_sid, 'name' : fields[4], 'hot': shot }
144 | if has_songs(dest['id']) and shot >= min_hotttnesss:
145 | add_artist(dest)
146 | edge_weight = get_edge_weight(source_sid, dest_sid)
147 | add_edge(source_sid, dest['id'], edge_weight)
148 | edge_count += 1
149 |
150 | def find_artist(name):
151 | results = searcher.search(name)
152 | if len(results) > 0:
153 | return results[0]
154 | return None
155 |
156 | def is_id(name_or_id):
157 | return len(name_or_id) == 18 and name_or_id.startswith('AR')
158 |
159 | def sim_artist(name_or_id):
160 | if is_id(name_or_id):
161 | a = artists[name_or_id]
162 | else:
163 | a = find_artist(name_or_id)
164 | if a:
165 | id = a['id']
166 | return id, G[id].keys()
167 | return None, None
168 |
169 |
170 | def sims(artist):
171 | return [get_artist(id) for id in G[artist['id']]]
172 |
173 | def find_path(n1, n2, skip = []):
174 | start = time.time()
175 | path = None
176 | status = 'ok'
177 |
178 | a1 = find_artist(n1)
179 | a2 = find_artist(n2)
180 |
181 | if not a1:
182 | status = "Can't find " + n1
183 | if not a2:
184 | status = "Can't find " + n2
185 |
186 | if a1 and a2:
187 | if skip and len(skip) > 0:
188 | # graph = G.copy()
189 | graph = G
190 | else:
191 | graph = G
192 |
193 | remove_nodes(graph, skip)
194 | try:
195 | l, path = nx.bidirectional_dijkstra(graph, a1['id'], a2['id'], 'weight')
196 |
197 | except nx.NetworkXNoPath:
198 | status = 'No path found between ' + n1 + " and " + n2;
199 |
200 | restore_nodes(graph, skip)
201 |
202 | print 'find_path took %s seconds' % (time.time() - start,)
203 | return status, path
204 |
205 | def qfind(a1, a2):
206 | start = time.time()
207 | path = None
208 |
209 | if a1 and a2:
210 | graph = G
211 | try:
212 | l, path = nx.bidirectional_dijkstra(graph, a1['id'], a2['id'], 'weight')
213 | except nx.NetworkXNoPath:
214 | pass
215 | return path
216 |
217 | def remove_nodes(graph, nodes):
218 | if nodes:
219 | for n in nodes:
220 | for other, edge in graph[n].items():
221 | edge['weight'] = 10000000
222 |
223 | def restore_nodes(graph, nodes):
224 | if nodes:
225 | for n in nodes:
226 | for other, edge in graph[n].items():
227 | edge['weight'] = get_edge_weight(n, other)
228 |
229 |
230 | def sp(n1, n2, skip=[]):
231 | print '===', n1, 'to', n2, 'with', len(skip), 'skips', '==='
232 | iskip = []
233 | for n in skip:
234 | artist = find_artist(n)
235 | if artist:
236 | iskip.append(artist['id'])
237 |
238 | status, path = find_path(n1, n2, iskip)
239 |
240 | if path:
241 | for a in path:
242 | print artists[a]['name']
243 | pprint.pprint( artists[a])
244 | print
245 |
246 | else:
247 | print status
248 |
249 | edges = set()
250 |
251 | def edge_exists(a1, a2):
252 | n1 = a1+ '--' + a2
253 | found = n1 in edges
254 | if not found:
255 | n2 = a2 + '--' + a1
256 | edges.add(n1)
257 | edges.add(n2)
258 | return found
259 |
260 | def gv(n1, n2, skip=[]):
261 |
262 | gv = open('graph.gv', 'w')
263 | print >>gv, "digraph {"
264 | iskip = []
265 | for n in skip:
266 | artist = find_artist(n)
267 | if artist:
268 | iskip.append(artist['id'])
269 |
270 | status, path = find_path(n1, n2, iskip)
271 |
272 | extra = 4
273 | if path:
274 | last = None
275 | for a in path:
276 | if last:
277 | neighbors = list(G[last].keys())
278 | neighbors.remove(a)
279 | #for n, attr in G[last].items()[:2]:
280 | for n in neighbors[0:extra]:
281 | if not edge_exists(last, n):
282 | print >>gv, q(last), '->', q(n) + ';'
283 | print >>gv, q(last), '->', q(a), '[color=red,style=bold];'
284 | edge_exists(last, a)
285 | for n in neighbors[extra: extra + 4]:
286 | if not edge_exists(last, n):
287 | print >>gv, q(last), '->', q(n) + ';'
288 | print >>gv, q(a), '[color=red,style=bold];'
289 | last = a
290 | else:
291 | print status
292 | print >>gv, "}"
293 | gv.close()
294 |
295 | def q(a):
296 | return '"' + artists[a]['name'] + '"'
297 |
298 | def init():
299 | global songs
300 |
301 | #load_skiplist('skip_list.dat')
302 | songs = load_song_data('spotify_songs.dat')
303 | load_graph('full_spotify.dat')
304 | #load_graph('tiny_spotify.dat')
305 | stats()
306 |
307 | def load_song_data(path):
308 | hash = {}
309 | if os.path.exists(path):
310 | file = open(path)
311 | shash = file.read()
312 | hash = json.loads(shash)
313 | file.close()
314 | print 'loaded', len(hash), 'songs from', path
315 | return hash
316 |
317 | def test():
318 | sp('Miley Cyrus', 'Miles Davis')
319 | sp('Miley Cyrus', 'Miles Davis', [ 'Beth Orton'] )
320 | sp('Miley Cyrus', 'Miles Davis', [ 'Beth Orton', 'Miles Davis' ] )
321 | sp('Miley Cyrus', 'Miles Davis')
322 |
323 | def test1():
324 | sp('Miley Cyrus', 'Britney Spears')
325 |
326 | def test2():
327 | #gv('Miley Cyrus', 'Miles Davis')
328 | #gv('Cannibal Corpse', 'Dora the Explorer')
329 | gv('Kenny G', 'Nile')
330 |
331 | if __name__ == '__main__':
332 | init()
333 | test1()
334 | #test2()
335 |
--------------------------------------------------------------------------------
/server/sdeploy:
--------------------------------------------------------------------------------
1 | deploy_labs "graph.py server.py web.conf" BoilTheFrog
2 |
--------------------------------------------------------------------------------
/server/server.py:
--------------------------------------------------------------------------------
1 | import os
2 | import sys
3 | import cherrypy
4 | import ConfigParser
5 | import urllib2
6 | import simplejson as json
7 | import webtools
8 | import time
9 |
10 |
11 | import graph
12 |
13 |
14 | class ArtistGraphServer(object):
15 | def __init__(self, config):
16 | self.production_mode = config.getboolean('settings', 'production')
17 | graph.init()
18 |
19 |
20 | def find_path(self, start, end, skips=None, callback=None, _=''):
21 | cherrypy.response.headers["Access-Control-Allow-Origin"] = "*"
22 | if callback:
23 | cherrypy.response.headers['Content-Type']= 'text/javascript'
24 | else:
25 | cherrypy.response.headers['Content-Type']= 'application/json'
26 |
27 | start_time = time.time()
28 | skips = make_list(skips)
29 | status, path = graph.find_path(start, end, skips)
30 | results = {}
31 | results['status'] = status
32 | if path:
33 | results['path'] = [graph.get_artist(id) for id in path]
34 | results['time'] = time.time() - start_time
35 | return to_json(results, callback)
36 | find_path.exposed = True
37 |
38 | def similar(self, artist, callback=None, _=''):
39 | cherrypy.response.headers["Access-Control-Allow-Origin"] = "*"
40 | if callback:
41 | cherrypy.response.headers['Content-Type']= 'text/javascript'
42 | else:
43 | cherrypy.response.headers['Content-Type']= 'application/json'
44 |
45 | start_time = time.time()
46 | results = {}
47 | seed, sims = graph.sim_artist(artist)
48 | if sims:
49 | results['status'] = 'ok'
50 | results['seed'] = graph.get_artist(seed)
51 | results['sims'] = [graph.get_artist(id) for id in sims]
52 | else:
53 | results['status'] = 'error'
54 | results['time'] = time.time() - start_time
55 | return to_json(results, callback)
56 | similar.exposed = True
57 |
58 | def random(self, callback=None, _=''):
59 | cherrypy.response.headers["Access-Control-Allow-Origin"] = "*"
60 | if callback:
61 | cherrypy.response.headers['Content-Type']= 'text/javascript'
62 | else:
63 | cherrypy.response.headers['Content-Type']= 'application/json'
64 |
65 | results = graph.get_random_artist()
66 | return to_json(results, callback)
67 | random.exposed = True
68 |
69 |
70 |
71 | def make_list(item):
72 | if item and not isinstance(item, list):
73 | item = [ item ]
74 | return item
75 |
76 | def to_json(dict, callback=None):
77 | results = json.dumps(dict, sort_keys=True, indent = 4)
78 | if callback:
79 | results = callback + "(" + results + ")"
80 | return results
81 |
82 | if __name__ == '__main__':
83 | urllib2.install_opener(urllib2.build_opener())
84 | conf_path = os.path.abspath('web.conf')
85 | print 'reading config from', conf_path
86 | cherrypy.config.update(conf_path)
87 |
88 | config = ConfigParser.ConfigParser()
89 | config.read(conf_path)
90 | production_mode = config.getboolean('settings', 'production')
91 |
92 | current_dir = os.path.dirname(os.path.abspath(__file__))
93 | # Set up site-wide config first so we get a log if errors occur.
94 |
95 | if production_mode:
96 | print "Starting in production mode"
97 | cherrypy.config.update({'environment': 'production',
98 | 'log.error_file': 'simdemo.log',
99 | 'log.screen': True})
100 | else:
101 | print "Starting in development mode"
102 | cherrypy.config.update({'noenvironment': 'production',
103 | 'log.error_file': 'site.log',
104 | 'log.screen': True})
105 |
106 | conf = webtools.get_export_map_for_directory("static")
107 | cherrypy.quickstart(ArtistGraphServer(config), '/ArtistGraphServer', config=conf)
108 |
109 |
--------------------------------------------------------------------------------
/server/web.conf:
--------------------------------------------------------------------------------
1 | [global]
2 | #server.socket_host = '127.0.0.1'
3 | server.socket_host = '0.0.0.0'
4 | #server.socket_host = 'localhost'
5 | server.socket_port = 8444
6 | server.thread_pool = 10
7 |
8 | [settings]
9 | production = True
10 |
--------------------------------------------------------------------------------
/web/callback.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
121 |
141 |
142 |
143 |
119 |
120 | Boil the Frog lets you create a playlist of songs that gradually takes you from one music style
121 | to another. It's like the proverbial frog in the pot of water. If you heat up the pot slowly enough, the
122 | frog will never notice that he's being made into a stew and jump out of the pot. With a Boil the
123 | frog playlist you can do the same, but with music. You can generate a playlist that will take the
124 | listener from one style of music to the other, without the listener ever noticing that they are being made
125 | into a stew.
126 |
127 |
128 |
129 |
How does it work?
130 |
131 | To create a Boil The Frog playlist, just type in the names of two artists and a playlist will be
132 | generated that takes you gradually, step by step, from the first artist to the second artist. You can
133 | click on any song to hear the song. Click on the first song to hear the whole playlist. If you don't
134 | like a particular artist you can route around that particular artist by clicking the 'bypass' button.
135 | The 'New Track' button will select a different track for an artist.
136 |
137 |
138 | Boil the Frog plays 30 second versions of your songs. When you find a playlist you like you can save it
139 | to Spotify to listen to the full-length versions.
140 |
156 | To create this app, The Echo Nest artist similarity info is
157 | used to build an artist similarity graph of about
158 | 100,000 of the most popular artists. Each artist in the graph is connected to it's most similar neighbors
159 | according to the Echo Nest artist similarity algorithm.
160 |
161 |
162 |
163 |
164 | When a playlist between two artists is created, the graph is used to find the path between the two artists.
165 | The path isn't necessarily the shortest path through the graph. Instead, priority is given to paths that
166 | travel through artists of similar popularity. If you start and end with a popular artist, you are more
167 | likely to find a path that takes you though other popular artists, and if you start with a long-tail artist
168 | you will likely find a path through other long-tail artists.
169 |
170 | Once the path of artists is found, we need to select the best songs for the playlist. To do this, we pick
171 | a well-known song for each artist that minimizes the difference in energy between this song, the previous
172 | song and the next song.
173 |
174 | Once we have selected the best songs, we build a playlist using Spotify's nifty web api.
175 |
176 |
177 |
Who made this?
178 |
179 | This app was built by Paul Lamere. If you like this sort of
180 | thing you may be interested in my blog at Music Machinery.
181 |