├── .gitignore
├── .travis.yml
├── LICENSE
├── MANIFEST.in
├── README.md
├── requirements.txt
├── setup.py
├── soundscrape
├── .gitignore
├── __init__.py
└── soundscrape.py
├── test.sh
└── tests
└── test.py
/.gitignore:
--------------------------------------------------------------------------------
1 | env/
2 | *.DS_Store
3 | *.pyc
4 | *.bak
5 | build/
6 | dist/
7 |
--------------------------------------------------------------------------------
/.travis.yml:
--------------------------------------------------------------------------------
1 | language: python
2 | python:
3 | - "3.4"
4 | - "3.5"
5 | - "3.8"
6 | - "3.9"
7 | # command to install dependencies
8 | install:
9 | - "pip install setuptools --upgrade; python setup.py install"
10 | # command to run tests
11 | script: nosetests
12 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2013 Rich Jones
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy of
6 | this software and associated documentation files (the "Software"), to deal in
7 | the Software without restriction, including without limitation the rights to
8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
9 | the Software, and to permit persons to whom the Software is furnished to do so,
10 | subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
21 |
--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include README.md LICENSE requirements.txt
2 | recursive-include soundscrape *.py
3 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | 
2 |
3 | SoundScrape [](https://travis-ci.org/Miserlou/SoundScrape) [](https://pypi.python.org/pypi/soundscrape/) [](https://pypi.python.org/pypi/SoundScrape)
4 | ==============
5 |
6 | **SoundScrape** makes it super easy to download artists from SoundCloud (and Bandcamp and MixCloud) - even those which don't have download links! It automatically creates ID3 tags as well (including album art), which is handy.
7 |
8 | Usage
9 | ---------
10 |
11 | First, install it:
12 |
13 | ```bash
14 | pip install soundscrape
15 | ```
16 |
17 | Note that if you are having problems, please first try updating to the latest version:
18 |
19 | ```bash
20 | pip install soundscrape --upgrade
21 | ```
22 |
23 | Then, just call soundscrape and the name of the artist you want to scrape:
24 |
25 | ```bash
26 | soundscrape rabbit-i-am
27 | ```
28 |
29 | And you're done! Hooray! Files are stored as mp3s in the format **Artist name - Track title.mp3**.
30 |
31 | You can also use the *-n* argument to only download a certain number of songs.
32 |
33 | ```bash
34 | soundscrape rabbit-i-am -n 3
35 | ```
36 |
37 | Sets
38 | -------
39 |
40 | Soundscrape can also download sets, but you have to include the full URL of the set you want to download:
41 |
42 | ```bash
43 | soundscrape https://soundcloud.com/vsauce-awesome/sets/awesome
44 | ```
45 |
46 | Groups
47 | --------
48 |
49 | Soundscrape can also download tracks from SoundCloud groups with the *-g* argument.
50 |
51 | ```bash
52 | soundscrape chopped-and-screwed -gn 2
53 | ```
54 |
55 | Tracks
56 | --------
57 |
58 | Soundscrape can also download specific tracks with *-t*:
59 |
60 | ```bash
61 | soundscrape foolsgoldrecs -t danny-brown-dip
62 | ```
63 |
64 | or with just the straight URL:
65 |
66 | ```bash
67 | soundscrape https://soundcloud.com/foolsgoldrecs/danny-brown-dip
68 | ```
69 |
70 | Likes
71 | --------
72 |
73 | Soundscrape can also download all of an Artist's Liked items with *-l*:
74 |
75 | ```bash
76 | soundscrape troyboi -l
77 | ```
78 |
79 | or with just the straight URL:
80 |
81 | ```bash
82 | soundscrape https://soundcloud.com/troyboi/likes
83 | ```
84 |
85 | High-Quality Downloads Only
86 | --------
87 |
88 | By default, SoundScrape will try to rip everything it can. However, if you only want to download tracks that have an official download available (which are typically at a higher-quality 320kbps bitrate), you can use the *-d* argument.
89 |
90 | ```bash
91 | soundscrape sly-dogg -d
92 | ```
93 |
94 | Keep Preview Tracks
95 | --------
96 |
97 | By default, SoundScrape will skip the 30-second preview tracks that SoundCloud now provides. You can choose to keep these preview snippets with the *-k* argument.
98 |
99 | ```bash
100 | soundscrape chromeo -k
101 | ```
102 |
103 | Folders
104 | --------
105 |
106 | By default, SoundScrape aims to act like _wget_, downloading in place in the current directory. With the *-f* argument, however, SoundScrape acts more like a download manager and sorts songs into the following format:
107 |
108 | ```
109 | ./ARTIST_NAME - ALBUM_NAME/SONG_NUMBER - SONG_TITLE.mp3
110 | ```
111 |
112 | It will also skip previously downloaded tracks.
113 |
114 | ```bash
115 | soundscrape murdercitydevils -f
116 | ```
117 |
118 | Bandcamp
119 | --------
120 |
121 | SoundScrape can also pull down albums from Bandcamp. For Bandcamp pages, use the *-b* argument along with an artist's username or a specific URL. It only downloads one album at a time. This works with all of the other arguments, except *-d* as Bandcamp streams only come at one bitrate, as far as I can tell.
122 |
123 | Note: Currently, when using the *-n* argument, the limit is evaluated for each album separately.
124 |
125 | ```bash
126 | soundscrape warsaw -b -f
127 | ```
128 |
129 | This also works for non-Bandcamp URLs that are hosted on Bandcamp:
130 |
131 | ```bash
132 | soundscrape -b http://music.monstercat.com/
133 | ```
134 |
135 | Note that the full URL must be included.
136 |
137 | Mixcloud
138 | --------
139 |
140 | SoundScrape can also grab mixes from Mixcloud. This feature is extremely expermental and is in no way guaranteed to work!
141 |
142 | Finds the original mp3 of a mix and grabs that (with tags and album art) if it can, or else just gets the raw m4a stream.
143 |
144 | Mixcloud currently only takes an invidiual mix. Capacity for a whole artist's profile due shortly.
145 |
146 | ```bash
147 | soundscrape https://www.mixcloud.com/corenewsuploads/flume-essential-mix-2015-10-03/ -of
148 | ```
149 |
150 | Audiomack
151 | --------
152 |
153 | Just for fun, SoundScrape can also download individual songs from Audiomack. Not that you'd ever want to.
154 |
155 | ```bash
156 | soundscrape -a http://www.audiomack.com/song/bottomfeedermusic/top-shottas
157 | ```
158 |
159 | MusicBed
160 | --------
161 |
162 | For some strange reason, it also works for MusicBed.com. Thanks @brachna for this feature.
163 |
164 | ```bash
165 | soundscrape https://www.musicbed.com/albums/be-still/2828
166 | ```
167 |
168 | Opening Files
169 | --------
170 |
171 | As a convenience method, SoundScrape can automatically _'open'_ files that it downloads. This uses your system's 'open' command for file associations.
172 |
173 | ```bash
174 | soundscrape lorn -of
175 | ```
176 |
177 | Issues
178 | -------
179 |
180 | There's probably a lot more that can be done to improve this. Please file issues if you find them!
181 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | args>=0.1.0
2 | clint>=0.3.2
3 | demjson>=2.2.2
4 | fudge>=1.0.3
5 | nose>=1.3.7
6 | requests[security]>=2.9.0
7 | setuptools>=18.0.0
8 | simplejson>=3.3.1
9 | soundcloud>=0.4.1
10 | wheel>=0.24.0
11 | mutagen>=1.31.0
12 |
--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
1 | import os
2 | import setuptools
3 | import soundscrape
4 | import sys
5 |
6 | from setuptools import setup
7 |
8 | # To support 2/3 installation
9 | setup_version = int(setuptools.__version__.split('.')[0])
10 | if setup_version < 18:
11 | print("Please upgrade your setuptools to install SoundScrape: ")
12 | print("pip install -U pip wheel setuptools")
13 | quit()
14 |
15 | # Set external files
16 | try:
17 | from pypandoc import convert
18 | README = convert('README.md', 'rst')
19 | except ImportError:
20 | README = open(os.path.join(os.path.dirname(__file__), 'README.md')).read()
21 |
22 | with open(os.path.join(os.path.dirname(__file__), 'requirements.txt')) as f:
23 | required = f.read().splitlines()
24 |
25 | # allow setup.py to be run from any path
26 | os.chdir(os.path.normpath(os.path.join(os.path.abspath(__file__), os.pardir)))
27 |
28 | setup(
29 | name='soundscrape',
30 | version=soundscrape.__version__,
31 | packages=['soundscrape'],
32 | install_requires=required,
33 | extras_require={ ':python_version < "3.0"': [ 'wsgiref>=0.1.2', ], },
34 | include_package_data=True,
35 | license='MIT License',
36 | description='Scrape an artist from SoundCloud',
37 | long_description=README,
38 | url='https://github.com/Miserlou/SoundScrape',
39 | author='Rich Jones',
40 | author_email='rich@openwatch.net',
41 | entry_points={
42 | 'console_scripts': [
43 | 'soundscrape = soundscrape.soundscrape:main',
44 | ]
45 | },
46 | classifiers=[
47 | 'Environment :: Console',
48 | 'License :: OSI Approved :: Apache Software License',
49 | 'Operating System :: OS Independent',
50 | 'Programming Language :: Python',
51 | 'Programming Language :: Python :: 3.4',
52 | 'Programming Language :: Python :: 3.5',
53 | 'Programming Language :: Python :: 3.7',
54 | 'Programming Language :: Python :: 3.8',
55 | 'Programming Language :: Python :: 3.9',
56 | 'Topic :: Internet :: WWW/HTTP',
57 | 'Topic :: Internet :: WWW/HTTP :: Dynamic Content',
58 | ],
59 | )
60 |
--------------------------------------------------------------------------------
/soundscrape/.gitignore:
--------------------------------------------------------------------------------
1 | *.mp3
--------------------------------------------------------------------------------
/soundscrape/__init__.py:
--------------------------------------------------------------------------------
1 | __version__ = '0.31.0'
2 |
--------------------------------------------------------------------------------
/soundscrape/soundscrape.py:
--------------------------------------------------------------------------------
1 | #! /usr/bin/env python
2 | import argparse
3 | import demjson
4 | import html
5 | import os
6 | import re
7 | import requests
8 | import soundcloud
9 | import sys
10 | import urllib
11 |
12 | from clint.textui import colored, puts, progress
13 | from datetime import datetime
14 | from mutagen.mp3 import MP3, EasyMP3
15 | from mutagen.id3 import APIC, WXXX
16 | from mutagen.id3 import ID3 as OldID3
17 | from subprocess import Popen, PIPE
18 | from os.path import dirname, exists, join
19 | from os import access, mkdir, W_OK
20 |
21 | if sys.version_info.minor < 4:
22 | html_unescape = html.parser.HTMLParser().unescape
23 | else:
24 | html_unescape = html.unescape
25 |
26 | ####################################################################
27 |
28 | # Please be nice with this!
29 | CLIENT_ID = 'a3dd183a357fcff9a6943c0d65664087'
30 | CLIENT_SECRET = '7e10d33e967ad42574124977cf7fa4b7'
31 | MAGIC_CLIENT_ID = 'b45b1aa10f1ac2941910a7f0d10f8e28'
32 |
33 | AGGRESSIVE_CLIENT_ID = 'OmTFHKYSMLFqnu2HHucmclAptedxWXkq'
34 | APP_VERSION = '1481046241'
35 |
36 | ####################################################################
37 |
38 |
39 | def main():
40 | """
41 | Main function.
42 |
43 | Converts arguments to Python and processes accordingly.
44 |
45 | """
46 |
47 | # Hack related to #58
48 | if sys.platform == "win32":
49 | os.system("chcp 65001");
50 |
51 | parser = argparse.ArgumentParser(description='SoundScrape. Scrape an artist from SoundCloud.\n')
52 | parser.add_argument('artist_url', metavar='U', type=str, nargs='*',
53 | help='An artist\'s SoundCloud username or URL')
54 | parser.add_argument('-n', '--num-tracks', type=int, default=sys.maxsize,
55 | help='The number of tracks to download')
56 | parser.add_argument('-g', '--group', action='store_true',
57 | help='Use if downloading tracks from a SoundCloud group')
58 | parser.add_argument('-b', '--bandcamp', action='store_true',
59 | help='Use if downloading from Bandcamp rather than SoundCloud')
60 | parser.add_argument('-m', '--mixcloud', action='store_true',
61 | help='Use if downloading from Mixcloud rather than SoundCloud')
62 | parser.add_argument('-a', '--audiomack', action='store_true',
63 | help='Use if downloading from Audiomack rather than SoundCloud')
64 | parser.add_argument('-c', '--hive', action='store_true',
65 | help='Use if downloading from Hive.co rather than SoundCloud')
66 | parser.add_argument('-l', '--likes', action='store_true',
67 | help='Download all of a user\'s Likes.')
68 | parser.add_argument('-L', '--login', type=str, default='soundscrape123@mailinator.com',
69 | help='Set login')
70 | parser.add_argument('-d', '--downloadable', action='store_true',
71 | help='Only fetch tracks with a Downloadable link.')
72 | parser.add_argument('-t', '--track', type=str, default='',
73 | help='The name of a specific track by an artist')
74 | parser.add_argument('-f', '--folders', action='store_true',
75 | help='Organize saved songs in folders by artists')
76 | parser.add_argument('-p', '--path', type=str, default='',
77 | help='Set directory path where downloads should be saved to')
78 | parser.add_argument('-P', '--password', type=str, default='soundscraperocks',
79 | help='Set password')
80 | parser.add_argument('-o', '--open', action='store_true',
81 | help='Open downloaded files after downloading.')
82 | parser.add_argument('-k', '--keep', action='store_true',
83 | help='Keep 30-second preview tracks')
84 | parser.add_argument('-v', '--version', action='store_true', default=False,
85 | help='Display the current version of SoundScrape')
86 |
87 | args = parser.parse_args()
88 | vargs = vars(args)
89 |
90 | if vargs['version']:
91 | import pkg_resources
92 | version = pkg_resources.require("soundscrape")[0].version
93 | print(version)
94 | return
95 |
96 | if not vargs['artist_url']:
97 | parser.error('Please supply an artist\'s username or URL!')
98 |
99 | if sys.version_info < (3,0,0):
100 | vargs['artist_url'] = urllib.quote(vargs['artist_url'][0], safe=':/')
101 | else:
102 | vargs['artist_url'] = urllib.parse.quote(vargs['artist_url'][0], safe=':/')
103 |
104 | artist_url = vargs['artist_url']
105 |
106 | if not exists(vargs['path']):
107 | if not access(dirname(vargs['path']), W_OK):
108 | vargs['path'] = ''
109 | else:
110 | mkdir(vargs['path'])
111 |
112 | if 'bandcamp.com' in artist_url or vargs['bandcamp']:
113 | process_bandcamp(vargs)
114 | elif 'mixcloud.com' in artist_url or vargs['mixcloud']:
115 | process_mixcloud(vargs)
116 | elif 'audiomack.com' in artist_url or vargs['audiomack']:
117 | process_audiomack(vargs)
118 | elif 'hive.co' in artist_url or vargs['hive']:
119 | process_hive(vargs)
120 | elif 'musicbed.com' in artist_url:
121 | process_musicbed(vargs)
122 | else:
123 | process_soundcloud(vargs)
124 |
125 |
126 | ####################################################################
127 | # SoundCloud
128 | ####################################################################
129 |
130 |
131 | def process_soundcloud(vargs):
132 | """
133 | Main SoundCloud path.
134 | """
135 |
136 | artist_url = vargs['artist_url']
137 | track_permalink = vargs['track']
138 | keep_previews = vargs['keep']
139 | folders = vargs['folders']
140 |
141 | id3_extras = {}
142 | one_track = False
143 | likes = False
144 | client = get_client()
145 | if 'soundcloud' not in artist_url.lower():
146 | if vargs['group']:
147 | artist_url = 'https://soundcloud.com/groups/' + artist_url.lower()
148 | elif len(track_permalink) > 0:
149 | one_track = True
150 | track_url = 'https://soundcloud.com/' + artist_url.lower() + '/' + track_permalink.lower()
151 | else:
152 | artist_url = 'https://soundcloud.com/' + artist_url.lower()
153 | if vargs['likes'] or 'likes' in artist_url.lower():
154 | likes = True
155 |
156 | if 'likes' in artist_url.lower():
157 | artist_url = artist_url[0:artist_url.find('/likes')]
158 | likes = True
159 |
160 | if one_track:
161 | num_tracks = 1
162 | else:
163 | num_tracks = vargs['num_tracks']
164 |
165 | try:
166 | if one_track:
167 | resolved = client.get('/resolve', url=track_url, limit=200)
168 |
169 | elif likes:
170 | userId = str(client.get('/resolve', url=artist_url).id)
171 |
172 | resolved = client.get('/users/' + userId + '/favorites', limit=200, linked_partitioning=1)
173 | next_href = False
174 | if(hasattr(resolved, 'next_href')):
175 | next_href = resolved.next_href
176 | while (next_href):
177 |
178 | resolved2 = requests.get(next_href).json()
179 | if('next_href' in resolved2):
180 | next_href = resolved2['next_href']
181 | else:
182 | next_href = False
183 | resolved2 = soundcloud.resource.ResourceList(resolved2['collection'])
184 | resolved.collection.extend(resolved2)
185 | resolved = resolved.collection
186 |
187 | else:
188 | resolved = client.get('/resolve', url=artist_url, limit=200)
189 |
190 | except Exception as e: # HTTPError?
191 |
192 | # SoundScrape is trying to prevent us from downloading this.
193 | # We're going to have to stop trusting the API/client and
194 | # do all our own scraping. Boo.
195 |
196 | if '404 Client Error' in str(e):
197 | puts(colored.red("Problem downloading [404]: ") + colored.white("Item Not Found"))
198 | return None
199 |
200 | message = str(e)
201 | item_id = message.rsplit('/', 1)[-1].split('.json')[0].split('?client_id')[0]
202 | hard_track_url = get_hard_track_url(item_id)
203 |
204 | track_data = get_soundcloud_data(artist_url)
205 | puts_safe(colored.green("Scraping") + colored.white(": " + track_data['title']))
206 |
207 | filenames = []
208 | filename = sanitize_filename(track_data['artist'] + ' - ' + track_data['title'] + '.mp3')
209 |
210 | if folders:
211 | name_path = join(vargs['path'], track_data['artist'])
212 | if not exists(name_path):
213 | mkdir(name_path)
214 | filename = join(name_path, filename)
215 | else:
216 | filename = join(vargs['path'], filename)
217 |
218 | if exists(filename):
219 | puts_safe(colored.yellow("Track already downloaded: ") + colored.white(track_data['title']))
220 | return None
221 |
222 | filename = download_file(hard_track_url, filename)
223 | tagged = tag_file(filename,
224 | artist=track_data['artist'],
225 | title=track_data['title'],
226 | year='2018',
227 | genre='',
228 | album='',
229 | artwork_url='')
230 |
231 | if not tagged:
232 | wav_filename = filename[:-3] + 'wav'
233 | os.rename(filename, wav_filename)
234 | filename = wav_filename
235 |
236 | filenames.append(filename)
237 |
238 | else:
239 |
240 | aggressive = False
241 |
242 | # This is is likely a 'likes' page.
243 | if not hasattr(resolved, 'kind'):
244 | tracks = resolved
245 | else:
246 | if resolved.kind == 'artist':
247 | artist = resolved
248 | artist_id = str(artist.id)
249 | tracks = client.get('/users/' + artist_id + '/tracks', limit=200)
250 | elif resolved.kind == 'playlist':
251 | id3_extras['album'] = resolved.title
252 | if resolved.tracks != []:
253 | tracks = resolved.tracks
254 | else:
255 | tracks = get_soundcloud_api_playlist_data(resolved.id)['tracks']
256 | tracks = tracks[:num_tracks]
257 | aggressive = True
258 | for track in tracks:
259 | download_track(track, resolved.title, keep_previews, folders, custom_path=vargs['path'])
260 |
261 | elif resolved.kind == 'track':
262 | tracks = [resolved]
263 | elif resolved.kind == 'group':
264 | group = resolved
265 | group_id = str(group.id)
266 | tracks = client.get('/groups/' + group_id + '/tracks', limit=200)
267 | else:
268 | artist = resolved
269 | artist_id = str(artist.id)
270 | tracks = client.get('/users/' + artist_id + '/tracks', limit=200)
271 | if tracks == [] and artist.track_count > 0:
272 | aggressive = True
273 | filenames = []
274 |
275 | # this might be buggy
276 | data = get_soundcloud_api2_data(artist_id)
277 |
278 | for track in data['collection']:
279 |
280 | if len(filenames) >= num_tracks:
281 | break
282 |
283 | if track['type'] == 'playlist':
284 | track['playlist']['tracks'] = track['playlist']['tracks'][:num_tracks]
285 | for playlist_track in track['playlist']['tracks']:
286 | album_name = track['playlist']['title']
287 | filename = download_track(playlist_track, album_name, keep_previews, folders, filenames, custom_path=vargs['path'])
288 | if filename:
289 | filenames.append(filename)
290 | else:
291 | d_track = track['track']
292 | filename = download_track(d_track, custom_path=vargs['path'])
293 | if filename:
294 | filenames.append(filename)
295 |
296 | if not aggressive:
297 | filenames = download_tracks(client, tracks, num_tracks, vargs['downloadable'], vargs['folders'], vargs['path'],
298 | id3_extras=id3_extras)
299 |
300 | if vargs['open']:
301 | open_files(filenames)
302 |
303 |
304 | def get_client():
305 | """
306 | Return a new SoundCloud Client object.
307 | """
308 | client = soundcloud.Client(client_id=CLIENT_ID)
309 | return client
310 |
311 | def download_track(track, album_name=u'', keep_previews=False, folders=False, filenames=[], custom_path=''):
312 | """
313 | Given a track, force scrape it.
314 | """
315 |
316 | hard_track_url = get_hard_track_url(track['id'])
317 |
318 | # We have no info on this track whatsoever.
319 | if not 'title' in track:
320 | return None
321 |
322 | if not keep_previews:
323 | if (track.get('duration', 0) < track.get('full_duration', 0)):
324 | puts_safe(colored.yellow("Skipping preview track") + colored.white(": " + track['title']))
325 | return None
326 |
327 | # May not have a "full name"
328 | name = track['user'].get('full_name', '')
329 | if name == '':
330 | name = track['user']['username']
331 |
332 | filename = sanitize_filename(name + ' - ' + track['title'] + '.mp3')
333 |
334 | if folders:
335 | name_path = join(custom_path, name)
336 | if not exists(name_path):
337 | mkdir(name_path)
338 | filename = join(name_path, filename)
339 | else:
340 | filename = join(custom_path, filename)
341 |
342 | if exists(filename):
343 | puts_safe(colored.yellow("Track already downloaded: ") + colored.white(track['title']))
344 | return None
345 |
346 | # Skip already downloaded track.
347 | if filename in filenames:
348 | return None
349 |
350 | if hard_track_url:
351 | puts_safe(colored.green("Scraping") + colored.white(": " + track['title']))
352 | else:
353 | # Region coded?
354 | puts_safe(colored.yellow("Unable to download") + colored.white(": " + track['title']))
355 | return None
356 |
357 | filename = download_file(hard_track_url, filename)
358 | tagged = tag_file(filename,
359 | artist=name,
360 | title=track['title'],
361 | year=track['created_at'][:4],
362 | genre=track['genre'],
363 | album=album_name,
364 | artwork_url=track['artwork_url'])
365 | if not tagged:
366 | wav_filename = filename[:-3] + 'wav'
367 | os.rename(filename, wav_filename)
368 | filename = wav_filename
369 |
370 | return filename
371 |
372 | def download_tracks(client, tracks, num_tracks=sys.maxsize, downloadable=False, folders=False, custom_path='', id3_extras={}):
373 | """
374 | Given a list of tracks, iteratively download all of them.
375 |
376 | """
377 |
378 | filenames = []
379 |
380 | for i, track in enumerate(tracks):
381 |
382 | # "Track" and "Resource" objects are actually different,
383 | # even though they're the same.
384 | if isinstance(track, soundcloud.resource.Resource):
385 |
386 | try:
387 |
388 | t_track = {}
389 | t_track['downloadable'] = track.downloadable
390 | t_track['streamable'] = track.streamable
391 | t_track['title'] = track.title
392 | t_track['user'] = {'username': track.user['username']}
393 | t_track['release_year'] = track.release
394 | t_track['genre'] = track.genre
395 | t_track['artwork_url'] = track.artwork_url
396 | if track.downloadable:
397 | t_track['stream_url'] = track.download_url
398 | else:
399 | if downloadable:
400 | puts_safe(colored.red("Skipping") + colored.white(": " + track.title))
401 | continue
402 | if hasattr(track, 'stream_url'):
403 | t_track['stream_url'] = track.stream_url
404 | else:
405 | t_track['direct'] = True
406 | streams_url = "https://api.soundcloud.com/i1/tracks/%s/streams?client_id=%s&app_version=%s" % (
407 | str(track.id), AGGRESSIVE_CLIENT_ID, APP_VERSION)
408 | response = requests.get(streams_url).json()
409 | t_track['stream_url'] = response['http_mp3_128_url']
410 |
411 | track = t_track
412 | except Exception as e:
413 | puts_safe(colored.white(track.title) + colored.red(' is not downloadable.'))
414 | continue
415 |
416 | if i > num_tracks - 1:
417 | continue
418 | try:
419 | if not track.get('stream_url', False):
420 | puts_safe(colored.white(track['title']) + colored.red(' is not downloadable.'))
421 | continue
422 | else:
423 | track_artist = sanitize_filename(track['user']['username'])
424 | track_title = sanitize_filename(track['title'])
425 | track_filename = track_artist + ' - ' + track_title + '.mp3'
426 |
427 | if folders:
428 | track_artist_path = join(custom_path, track_artist)
429 | if not exists(track_artist_path):
430 | mkdir(track_artist_path)
431 | track_filename = join(track_artist_path, track_filename)
432 | else:
433 | track_filename = join(custom_path, track_filename)
434 |
435 | if exists(track_filename):
436 | puts_safe(colored.yellow("Track already downloaded: ") + colored.white(track_title))
437 | continue
438 |
439 | puts_safe(colored.green("Downloading") + colored.white(": " + track['title']))
440 |
441 |
442 | if track.get('direct', False):
443 | location = track['stream_url']
444 | else:
445 | stream = client.get(track['stream_url'], allow_redirects=False, limit=200)
446 | if hasattr(stream, 'location'):
447 | location = stream.location
448 | else:
449 | location = stream.url
450 |
451 | filename = download_file(location, track_filename)
452 | tagged = tag_file(filename,
453 | artist=track['user']['username'],
454 | title=track['title'],
455 | year=track['release_year'],
456 | genre=track['genre'],
457 | album=id3_extras.get('album', None),
458 | artwork_url=track['artwork_url'])
459 |
460 | if not tagged:
461 | wav_filename = filename[:-3] + 'wav'
462 | os.rename(filename, wav_filename)
463 | filename = wav_filename
464 |
465 | filenames.append(filename)
466 | except Exception as e:
467 | puts_safe(colored.red("Problem downloading ") + colored.white(track['title']))
468 | puts_safe(str(e))
469 |
470 | return filenames
471 |
472 |
473 |
474 | def get_soundcloud_data(url):
475 | """
476 | Scrapes a SoundCloud page for a track's important information.
477 |
478 | Returns:
479 | dict: of audio data
480 |
481 | """
482 |
483 | data = {}
484 |
485 | request = requests.get(url)
486 |
487 | title_tag = request.text.split('
')[1].split(' num_tracks - 1:
615 | continue
616 |
617 | try:
618 | track_name = track["title"]
619 | if track["track_num"]:
620 | track_number = str(track["track_num"]).zfill(2)
621 | else:
622 | track_number = None
623 | if track_number and folders:
624 | track_filename = '%s - %s.mp3' % (track_number, track_name)
625 | else:
626 | track_filename = '%s.mp3' % (track_name)
627 | track_filename = sanitize_filename(track_filename)
628 |
629 | if folders:
630 | path = join(directory, track_filename)
631 | else:
632 | path = join(custom_path, sanitize_filename(artist) + ' - ' + track_filename)
633 |
634 | if exists(path):
635 | puts_safe(colored.yellow("Track already downloaded: ") + colored.white(track_name))
636 | continue
637 |
638 | if not track['file']:
639 | puts_safe(colored.yellow("Track unavailble for scraping: ") + colored.white(track_name))
640 | continue
641 |
642 | puts_safe(colored.green("Downloading") + colored.white(": " + track_name))
643 | path = download_file(track['file']['mp3-128'], path)
644 |
645 | album_year = album_data['album_release_date']
646 | if album_year:
647 | album_year = datetime.strptime(album_year, "%d %b %Y %H:%M:%S GMT").year
648 |
649 | tag_file(path,
650 | artist,
651 | track_name,
652 | album=album_name,
653 | year=album_year,
654 | genre=album_data['genre'],
655 | artwork_url=album_data['artFullsizeUrl'],
656 | track_number=track_number,
657 | url=album_data['url'])
658 |
659 | filenames.append(path)
660 |
661 | except Exception as e:
662 | puts_safe(colored.red("Problem downloading ") + colored.white(track_name))
663 | print(e)
664 | return filenames
665 |
666 |
667 | def extract_embedded_json_from_attribute(request, attribute, debug=False):
668 | """
669 | Extract JSON object embedded in an element's attribute value.
670 |
671 | The JSON is "sloppy". The native python JSON parser often can't deal,
672 | so we use the more tolerant demjson instead.
673 |
674 | Args:
675 | request (obj:`requests.Response`): HTTP GET response from which to extract
676 | attribute (str): name of the attribute holding the desired JSON object
677 | debug (bool, optional): whether to print debug messages
678 |
679 | Returns:
680 | The embedded JSON object as a dict, or None if extraction failed
681 | """
682 | try:
683 | embed = request.text.split('{}="'.format(attribute))[1]
684 | embed = html_unescape(
685 | embed.split('"')[0]
686 | )
687 | output = demjson.decode(embed)
688 | if debug:
689 | print(
690 | 'extracted JSON: '
691 | + demjson.encode(
692 | output,
693 | compactly=False,
694 | indent_amount=2,
695 | )
696 | )
697 | except Exception as e:
698 | output = None
699 | if debug:
700 | print(e)
701 | return output
702 |
703 |
704 | def get_bandcamp_metadata(url):
705 | """
706 | Read information from Bandcamp embedded JavaScript object notation.
707 | The method may return a list of URLs (indicating this is probably a "main" page which links to one or more albums),
708 | or a JSON if we can already parse album/track info from the given url.
709 | """
710 | request = requests.get(url)
711 | output = {}
712 | try:
713 | for attr in ['data-tralbum', 'data-embed']:
714 | output.update(
715 | extract_embedded_json_from_attribute(
716 | request, attr
717 | )
718 | )
719 | # if the JSON parser failed, we should consider it's a "/music" page,
720 | # so we generate a list of albums/tracks and return it immediately
721 | except Exception as e:
722 | regex_all_albums = r''
723 | all_albums = re.findall(regex_all_albums, request.text, re.MULTILINE)
724 | album_url_list = list()
725 | for album in all_albums:
726 | album_url = re.sub(r'music/?$', '', url) + album
727 | album_url_list.append(album_url)
728 | return album_url_list
729 | # if the JSON parser was successful, use a regex to get all tags
730 | # from this album/track, join them and set it as the "genre"
731 | regex_tags = r']+>([^<]+)'
732 | tags = re.findall(regex_tags, request.text, re.MULTILINE)
733 | # make sure we treat integers correctly with join()
734 | # according to http://stackoverflow.com/a/7323861
735 | # (very unlikely, but better safe than sorry!)
736 | output['genre'] = ' '.join(s for s in tags)
737 |
738 | try:
739 | artUrl = request.text.split("\"tralbumArt\">")[1].split("\">")[0].split("href=\"")[1]
740 | output['artFullsizeUrl'] = artUrl
741 | except:
742 | puts_safe(colored.red("Couldn't get full artwork") + "")
743 | output['artFullsizeUrl'] = None
744 |
745 | return output
746 |
747 |
748 | ####################################################################
749 | # Mixcloud
750 | ####################################################################
751 |
752 |
753 | def process_mixcloud(vargs):
754 | """
755 | Main MixCloud path.
756 | """
757 |
758 | artist_url = vargs['artist_url']
759 |
760 | if 'mixcloud.com' in artist_url:
761 | mc_url = artist_url
762 | else:
763 | mc_url = 'https://mixcloud.com/' + artist_url
764 |
765 | filenames = scrape_mixcloud_url(mc_url, num_tracks=vargs['num_tracks'], folders=vargs['folders'], custom_path=vargs['path'])
766 |
767 | if vargs['open']:
768 | open_files(filenames)
769 |
770 | return
771 |
772 |
773 | def scrape_mixcloud_url(mc_url, num_tracks=sys.maxsize, folders=False, custom_path=''):
774 | """
775 | Returns:
776 | list: filenames to open
777 |
778 | """
779 |
780 | try:
781 | data = get_mixcloud_data(mc_url)
782 | except Exception as e:
783 | puts_safe(colored.red("Problem downloading ") + mc_url)
784 | print(e)
785 | return []
786 |
787 | filenames = []
788 |
789 | track_artist = sanitize_filename(data['artist'])
790 | track_title = sanitize_filename(data['title'])
791 | track_filename = track_artist + ' - ' + track_title + data['mp3_url'][-4:]
792 |
793 | if folders:
794 | track_artist_path = join(custom_path, track_artist)
795 | if not exists(track_artist_path):
796 | mkdir(track_artist_path)
797 | track_filename = join(track_artist_path, track_filename)
798 | if exists(track_filename):
799 | puts_safe(colored.yellow("Skipping") + colored.white(': ' + data['title'] + " - it already exists!"))
800 | return []
801 | else:
802 | track_filename = join(custom_path, track_filename)
803 |
804 | puts_safe(colored.green("Downloading") + colored.white(
805 | ': ' + data['artist'] + " - " + data['title'] + " (" + track_filename[-4:] + ")"))
806 | download_file(data['mp3_url'], track_filename)
807 | if track_filename[-4:] == '.mp3':
808 | tag_file(track_filename,
809 | artist=data['artist'],
810 | title=data['title'],
811 | year=data['year'],
812 | genre="Mix",
813 | artwork_url=data['artwork_url'])
814 | filenames.append(track_filename)
815 |
816 | return filenames
817 |
818 |
819 | def get_mixcloud_data(url):
820 | """
821 | Scrapes a Mixcloud page for a track's important information.
822 |
823 | Returns:
824 | dict: containing audio data
825 |
826 | """
827 |
828 | data = {}
829 | request = requests.get(url)
830 | preview_mp3_url = request.text.split('m-preview="')[1].split('" m-preview-light')[0]
831 | song_uuid = request.text.split('m-preview="')[1].split('" m-preview-light')[0].split('previews/')[1].split('.mp3')[0]
832 |
833 | # Fish for the m4a..
834 | for server in range(1, 23):
835 | # Ex: https://stream6.mixcloud.com/c/m4a/64/1/2/0/9/30fe-23aa-40da-9bf3-4bee2fba649d.m4a
836 | mp3_url = "https://stream" + str(server) + ".mixcloud.com/c/m4a/64/" + song_uuid + '.m4a'
837 | try:
838 | if requests.head(mp3_url).status_code == 200:
839 | if '?' in mp3_url:
840 | mp3_url = mp3_url.split('?')[0]
841 | break
842 | except Exception as e:
843 | continue
844 |
845 | full_title = request.text.split("")[1].split(" | Mixcloud")[0]
846 | title = full_title.split(' by ')[0].strip()
847 | artist = full_title.split(' by ')[1].strip()
848 |
849 | img_thumbnail_url = request.text.split('m-thumbnail-url="')[1].split(" ng-class")[0]
850 | artwork_url = img_thumbnail_url.replace('60/', '300/').replace('60/', '300/').replace('//', 'https://').replace('"',
851 | '')
852 |
853 | data['mp3_url'] = mp3_url
854 | data['title'] = title
855 | data['artist'] = artist
856 | data['artwork_url'] = artwork_url
857 | data['year'] = None
858 |
859 | return data
860 |
861 |
862 | ####################################################################
863 | # Audiomack
864 | ####################################################################
865 |
866 |
867 | def process_audiomack(vargs):
868 | """
869 | Main Audiomack path.
870 | """
871 |
872 | artist_url = vargs['artist_url']
873 |
874 | if 'audiomack.com' in artist_url:
875 | mc_url = artist_url
876 | else:
877 | mc_url = 'https://audiomack.com/' + artist_url
878 |
879 | filenames = scrape_audiomack_url(mc_url, num_tracks=vargs['num_tracks'], folders=vargs['folders'], custom_path=vargs['path'])
880 |
881 | if vargs['open']:
882 | open_files(filenames)
883 |
884 | return
885 |
886 |
887 | def scrape_audiomack_url(mc_url, num_tracks=sys.maxsize, folders=False, custom_path=''):
888 | """
889 | Returns:
890 | list: filenames to open
891 |
892 | """
893 |
894 | try:
895 | data = get_audiomack_data(mc_url)
896 | except Exception as e:
897 | puts_safe(colored.red("Problem downloading ") + mc_url)
898 | print(e)
899 |
900 | filenames = []
901 |
902 | track_artist = sanitize_filename(data['artist'])
903 | track_title = sanitize_filename(data['title'])
904 | track_filename = track_artist + ' - ' + track_title + '.mp3'
905 |
906 | if folders:
907 | track_artist_path = join(custom_path, track_artist)
908 | if not exists(track_artist_path):
909 | mkdir(track_artist_path)
910 | track_filename = join(track_artist_path, track_filename)
911 | if exists(track_filename):
912 | puts_safe(colored.yellow("Skipping") + colored.white(': ' + data['title'] + " - it already exists!"))
913 | return []
914 | else:
915 | track_filename = join(custom_path, track_filename)
916 |
917 | puts_safe(colored.green("Downloading") + colored.white(': ' + data['artist'] + " - " + data['title']))
918 | download_file(data['mp3_url'], track_filename)
919 | tag_file(track_filename,
920 | artist=data['artist'],
921 | title=data['title'],
922 | year=data['year'],
923 | genre=None,
924 | artwork_url=data['artwork_url'])
925 | filenames.append(track_filename)
926 |
927 | return filenames
928 |
929 |
930 | def get_audiomack_data(url):
931 | """
932 | Scrapes a Mixcloud page for a track's important information.
933 |
934 | Returns:
935 | dict: containing audio data
936 |
937 | """
938 |
939 | data = {}
940 | request = requests.get(url)
941 |
942 | mp3_url = request.text.split('class="player-icon download-song" title="Download" href="')[1].split('"')[0]
943 | artist = request.text.split('')[1].split('')[0].strip()
944 | title = request.text.split('')[1].split('')[1].split('')[0].strip()
945 | artwork_url = request.text.split('')[1].split('')[0].strip()
1041 | # title = request.text.split('')[1].split('')[1].split('')[0].strip()
1042 | # artwork_url = request.text.split('/' - a number of albums will be downloaded.
1091 | If provided url is of pattern 'https://www.musicbed.com/albums//' - only one album will be downloaded.
1092 | If provided url is of pattern 'https://www.musicbed.com/songs//' - will be treated as one album (but download only 1st track).
1093 | Metadata and urls are obtained from JavaScript data that's treated as JSON data.
1094 |
1095 | Returns:
1096 | list: filenames to open
1097 | """
1098 |
1099 | session = requests.Session()
1100 |
1101 | response = session.get( url )
1102 | if response.status_code != 200:
1103 | puts( colored.red( 'scrape_musicbed_url: couldn\'t open provided url. Status code: ' + str( response.status_code ) + '. Aborting.' ) )
1104 | session.close()
1105 | return []
1106 |
1107 | albums = []
1108 | # let's determine what url type we got
1109 | # '/artists/' - search for and download many albums
1110 | # '/albums/' - means we're downloading 1 album
1111 | # '/songs/' - means 1 album as well, but we're forcing num_tracks=1 in order to download only first relevant track
1112 | if url.startswith( 'https://www.musicbed.com/artists/' ):
1113 | # a hackjob code to get a list of available albums
1114 | main_index = 0
1115 | while response.text.find( 'https://www.musicbed.com/albums/', main_index ) != -1:
1116 | start_index = response.text.find( 'https://www.musicbed.com/albums/', main_index )
1117 | end_index = response.text.find( '">', start_index )
1118 | albums.append( response.text[start_index:end_index] )
1119 | main_index = end_index
1120 | elif url.startswith( 'https://www.musicbed.com/songs/' ):
1121 | albums.append( url )
1122 | num_tracks = 1
1123 | else: # url.startswith( 'https://www.musicbed.com/albums/' )
1124 | albums.append( url )
1125 |
1126 | # let's get our token and try to login (csrf_token seems to be present on every page)
1127 | token = response.text.split( 'var csrf_token = "' )[1].split( '";' )[0]
1128 | details = { '_token': token, 'login': login, 'password': password }
1129 | response = session.post( 'https://www.musicbed.com/ajax/login', data=details )
1130 | if response.status_code != 200:
1131 | puts( colored.red( 'scrape_musicbed_url: couldn\'t login. Aborting. ' ) + colored.white( 'Couldn\'t access login page.' ) )
1132 | session.close()
1133 | return []
1134 | login_response_data = demjson.decode( response.text )
1135 | if not login_response_data['body']['status']:
1136 | puts( colored.red( 'scrape_musicbed_url: couldn\'t login. Aborting. ' ) + colored.white( 'Did you provide correct login and password?' ) )
1137 | session.close()
1138 | return []
1139 |
1140 | # now let's actually scrape collected pages
1141 | filenames = []
1142 | for each_album_url in albums:
1143 | response = session.get( each_album_url )
1144 | if response.status_code != 200:
1145 | puts_safe( colored.red( 'scrape_musicbed_url: couldn\'t open url: ' + each_album_url +
1146 | '. Status code: ' + str( response.status_code ) + '. Skipping.' ) )
1147 | continue
1148 |
1149 | # actually not a JSON, but a JS object, but so far so good
1150 | json = response.text.split( 'App.components.SongRows = ' )[1].split( '' )[0]
1151 | data = demjson.decode( json )
1152 |
1153 | song_count = 1
1154 | for each_song in data['loadedSongs']:
1155 | if song_count > num_tracks:
1156 | break
1157 |
1158 | try:
1159 | url, params = each_song['playback_url'].split( '?' )
1160 |
1161 | details = dict()
1162 | for each_param in params.split( '&' ):
1163 | name, value = each_param.split( '=' )
1164 | details.update( { name: value } )
1165 | # musicbed warns about it if it's not fixed
1166 | details['X-Amz-Credential'] = details['X-Amz-Credential'].replace( '%2F', '/' )
1167 |
1168 | directory = custom_path
1169 | if folders:
1170 | sanitized_artist = sanitize_filename( each_song['album']['data']['artist']['data']['name'] )
1171 | sanitized_album = sanitize_filename( each_song['album']['data']['name'] )
1172 | directory = join( directory, sanitized_artist + ' - ' + sanitized_album )
1173 | if not exists( directory ):
1174 | mkdir( directory )
1175 | filename = join( directory, str( song_count ) + ' - ' + sanitize_filename( each_song['name'] ) + '.mp3' )
1176 |
1177 | if exists( filename ):
1178 | puts_safe( colored.yellow( 'Skipping' ) + colored.white( ': ' + each_song['name'] + ' - it already exists!' ) )
1179 | song_count += 1
1180 | continue
1181 |
1182 | puts_safe( colored.green( 'Downloading' ) + colored.white( ': ' + each_song['name'] ) )
1183 | path = download_file( url, filename, session=session, params=details )
1184 |
1185 | # example of genre_string:
1186 | # "Ambient Cinematic"
1187 | genres = ''
1188 | for each in each_song['genre_string'].split( '' ):
1189 | if ( each != "" ):
1190 | genres += each.split( '">' )[1] + '/'
1191 | genres = genres[:-1] # removing last '/
1192 |
1193 | tag_file(path,
1194 | each_song['album']['data']['artist']['data']['name'],
1195 | each_song['name'],
1196 | album=each_song['album']['data']['name'],
1197 | year=int( each_song['album']['data']['released_at'].split( '-' )[0] ),
1198 | genre=genres,
1199 | artwork_url=each_song['album']['data']['imageObject']['data']['paths']['original'],
1200 | track_number=str( song_count ),
1201 | url=each_song['song_url'])
1202 |
1203 | filenames.append( path )
1204 | song_count += 1
1205 | except:
1206 | puts_safe( colored.red( 'Problem downloading ' ) + colored.white( each_song['name'] ) + '. Skipping.' )
1207 | song_count += 1
1208 |
1209 | session.close()
1210 |
1211 | return filenames
1212 |
1213 |
1214 | ####################################################################
1215 | # File Utility
1216 | ####################################################################
1217 |
1218 |
1219 | def download_file(url, path, session=None, params=None):
1220 | """
1221 | Download an individual file.
1222 | """
1223 |
1224 | if url[0:2] == '//':
1225 | url = 'https://' + url[2:]
1226 |
1227 | # Use a temporary file so that we don't import incomplete files.
1228 | tmp_path = path + '.tmp'
1229 |
1230 | if session and params:
1231 | r = session.get( url, params=params, stream=True )
1232 | elif session and not params:
1233 | r = session.get( url, stream=True )
1234 | else:
1235 | r = requests.get(url, stream=True)
1236 | with open(tmp_path, 'wb') as f:
1237 | total_length = int(r.headers.get('content-length', 0))
1238 | for chunk in progress.bar(r.iter_content(chunk_size=1024), expected_size=(total_length / 1024) + 1):
1239 | if chunk: # filter out keep-alive new chunks
1240 | f.write(chunk)
1241 | f.flush()
1242 |
1243 | os.rename(tmp_path, path)
1244 |
1245 | return path
1246 |
1247 |
1248 | def tag_file(filename, artist, title, year=None, genre=None, artwork_url=None, album=None, track_number=None, url=None):
1249 | """
1250 | Attempt to put ID3 tags on a file.
1251 |
1252 | Args:
1253 | artist (str):
1254 | title (str):
1255 | year (int):
1256 | genre (str):
1257 | artwork_url (str):
1258 | album (str):
1259 | track_number (str):
1260 | filename (str):
1261 | url (str):
1262 | """
1263 |
1264 | try:
1265 | audio = EasyMP3(filename)
1266 | audio.tags = None
1267 | audio["artist"] = artist
1268 | audio["title"] = title
1269 | if year:
1270 | audio["date"] = str(year)
1271 | if album:
1272 | audio["album"] = album
1273 | if track_number:
1274 | audio["tracknumber"] = track_number
1275 | if genre:
1276 | audio["genre"] = genre
1277 | if url: # saves the tag as WOAR
1278 | audio["website"] = url
1279 | audio.save()
1280 |
1281 | if artwork_url:
1282 |
1283 | artwork_url = artwork_url.replace('https', 'http')
1284 |
1285 | mime = 'image/jpeg'
1286 | if '.jpg' in artwork_url:
1287 | mime = 'image/jpeg'
1288 | if '.png' in artwork_url:
1289 | mime = 'image/png'
1290 |
1291 | if '-large' in artwork_url:
1292 | new_artwork_url = artwork_url.replace('-large', '-t500x500')
1293 | try:
1294 | image_data = requests.get(new_artwork_url).content
1295 | except Exception as e:
1296 | # No very large image available.
1297 | image_data = requests.get(artwork_url).content
1298 | else:
1299 | image_data = requests.get(artwork_url).content
1300 |
1301 | audio = MP3(filename, ID3=OldID3)
1302 | audio.tags.add(
1303 | APIC(
1304 | encoding=3, # 3 is for utf-8
1305 | mime=mime,
1306 | type=3, # 3 is for the cover image
1307 | desc='Cover',
1308 | data=image_data
1309 | )
1310 | )
1311 | audio.save()
1312 |
1313 | # because there is software that doesn't seem to use WOAR we save url tag again as WXXX
1314 | if url:
1315 | audio = MP3(filename, ID3=OldID3)
1316 | audio.tags.add( WXXX( encoding=3, url=url ) )
1317 | audio.save()
1318 |
1319 | return True
1320 |
1321 | except Exception as e:
1322 | puts(colored.red("Problem tagging file: ") + colored.white("Is this file a WAV?"))
1323 | return False
1324 |
1325 | def open_files(filenames):
1326 | """
1327 | Call the system 'open' command on a file.
1328 | """
1329 | command = ['open'] + filenames
1330 | process = Popen(command, stdout=PIPE, stderr=PIPE)
1331 | stdout, stderr = process.communicate()
1332 |
1333 |
1334 | def sanitize_filename(filename):
1335 | """
1336 | Make sure filenames are valid paths.
1337 |
1338 | Returns:
1339 | str:
1340 | """
1341 | sanitized_filename = re.sub(r'[/\\:*?"<>|]', '-', filename)
1342 | sanitized_filename = sanitized_filename.replace('&', 'and')
1343 | sanitized_filename = sanitized_filename.replace('"', '')
1344 | sanitized_filename = sanitized_filename.replace("'", '')
1345 | sanitized_filename = sanitized_filename.replace("/", '')
1346 | sanitized_filename = sanitized_filename.replace("\\", '')
1347 |
1348 | # Annoying.
1349 | if sanitized_filename[0] == '.':
1350 | sanitized_filename = u'dot' + sanitized_filename[1:]
1351 |
1352 | return sanitized_filename
1353 |
1354 | def puts_safe(text):
1355 | if sys.platform == "win32":
1356 | if sys.version_info < (3,0,0):
1357 | puts(text)
1358 | else:
1359 | puts(text.encode(sys.stdout.encoding, errors='replace').decode())
1360 | else:
1361 | puts(text)
1362 |
1363 |
1364 | ####################################################################
1365 | # Main
1366 | ####################################################################
1367 |
1368 | if __name__ == '__main__':
1369 | try:
1370 | sys.exit(main())
1371 | except Exception as e:
1372 | print(e)
1373 |
--------------------------------------------------------------------------------
/test.sh:
--------------------------------------------------------------------------------
1 | #! /bin/bash
2 | nosetests
3 |
--------------------------------------------------------------------------------
/tests/test.py:
--------------------------------------------------------------------------------
1 | import glob
2 | import os
3 | import sys
4 | import unittest
5 |
6 | from mutagen.mp3 import EasyMP3
7 | from soundscrape.soundscrape import get_client
8 | from soundscrape.soundscrape import process_soundcloud
9 | from soundscrape.soundscrape import process_bandcamp
10 |
11 |
12 | def rm_mp3():
13 | """ deletes all ``*.mp3`` files in current directory
14 | """
15 | for f in glob.glob('*.mp3'):
16 | os.unlink(f)
17 |
18 |
19 | class TestSoundscrape(unittest.TestCase):
20 |
21 | ##
22 | # Basic Tests
23 | ##
24 |
25 | def test_test(self):
26 | self.assertTrue(True)
27 |
28 | def test_get_client(self):
29 | client = get_client()
30 | self.assertTrue(bool(client))
31 |
32 | def test_soundcloud(self):
33 | rm_mp3()
34 | mp3_count = len(glob.glob1('', "*.mp3"))
35 | vargs = {'path':'', 'folders': False, 'group': False, 'track': '', 'num_tracks': 9223372036854775807, 'bandcamp': False, 'downloadable': False, 'likes': False, 'open': False, 'artist_url': 'https://soundcloud.com/fzpz/revised', 'keep': True}
36 | process_soundcloud(vargs)
37 | new_mp3_count = len(glob.glob1('', "*.mp3"))
38 | self.assertTrue(new_mp3_count > mp3_count)
39 | rm_mp3()
40 |
41 | def test_soundcloud_hard(self):
42 | rm_mp3()
43 | mp3_count = len(glob.glob1('', "*.mp3"))
44 | vargs = {'path':'', 'folders': False, 'group': False, 'track': '', 'num_tracks': 1, 'bandcamp': False, 'downloadable': False, 'likes': False, 'open': False, 'artist_url': 'puptheband', 'keep': False}
45 | process_soundcloud(vargs)
46 | new_mp3_count = len(glob.glob1('', "*.mp3"))
47 | self.assertTrue(new_mp3_count > mp3_count)
48 | self.assertTrue(new_mp3_count == 1) # This used to be 3, but is now 'Not available in United States.'
49 | rm_mp3()
50 |
51 | def test_soundcloud_hard_2(self):
52 | rm_mp3()
53 | mp3_count = len(glob.glob1('', "*.mp3"))
54 | vargs = {'path':'', 'folders': False, 'group': False, 'track': '', 'num_tracks': 1, 'bandcamp': False, 'downloadable': False, 'likes': False, 'open': False, 'artist_url': 'https://soundcloud.com/lostdogz/snuggles-chapstick', 'keep': False}
55 | process_soundcloud(vargs)
56 | new_mp3_count = len(glob.glob1('', "*.mp3"))
57 | self.assertTrue(new_mp3_count > mp3_count)
58 | self.assertTrue(new_mp3_count == 1) # This used to be 3, but is now 'Not available in United States.'
59 | rm_mp3()
60 |
61 | # The test URL for this is no longer a WAV. Need a new testcase.
62 | #
63 | # def test_soundcloud_wav(self):
64 | # for f in glob.glob('*.wav'):
65 | # os.unlink(f)
66 |
67 | # wav_count = len(glob.glob1('', "*.wav"))
68 | # vargs = {'path':'', 'folders': False, 'group': False, 'track': '', 'num_tracks': 1, 'bandcamp': False, 'downloadable': False, 'likes': False, 'open': False, 'artist_url': 'https://soundcloud.com/coastal/major-lazer-aerosol-can-coastal-flip', 'keep': False}
69 | # process_soundcloud(vargs)
70 | # new_wav_count = len(glob.glob1('', "*.wav"))
71 | # self.assertTrue(new_wav_count > wav_count)
72 | # self.assertTrue(new_wav_count == 1)
73 |
74 | # for f in glob.glob('*.wav'):
75 | # os.unlink(f)
76 |
77 | def test_bandcamp(self):
78 | rm_mp3()
79 | mp3_count = len(glob.glob1('', "*.mp3"))
80 | vargs = {'path':'', 'folders': False, 'group': False, 'track': '', 'num_tracks': 9223372036854775807, 'bandcamp': False, 'downloadable': False, 'likes': False, 'open': False, 'artist_url': 'https://atenrays.bandcamp.com/track/who-u-think'}
81 | process_bandcamp(vargs)
82 | new_mp3_count = len(glob.glob1('', "*.mp3"))
83 | self.assertTrue(new_mp3_count > mp3_count)
84 | rm_mp3()
85 |
86 | def test_bandcamp_slashes(self):
87 | rm_mp3()
88 | mp3_count = len(glob.glob1('', "*.mp3"))
89 | vargs = {'path':'', 'folders': False, 'group': False, 'track': '', 'num_tracks': 9223372036854775807, 'bandcamp': False, 'downloadable': False, 'likes': False, 'open': False, 'artist_url': 'https://defill.bandcamp.com/track/amnesia-chamber-harvest-skit'}
90 | process_bandcamp(vargs)
91 | new_mp3_count = len(glob.glob1('', "*.mp3"))
92 | self.assertTrue(new_mp3_count > mp3_count)
93 | rm_mp3()
94 |
95 | def test_bandcamp_html_entities(self):
96 | rm_mp3()
97 | vargs = {'path': '', 'folders': False, 'num_tracks': sys.maxsize, 'open': False, 'artist_url': 'https://anaalnathrakh.bandcamp.com/track/man-at-c-a-bonus-track'}
98 | process_bandcamp(vargs)
99 | mp3s = glob.glob('*.mp3')
100 | self.assertEquals(1, len(mp3s))
101 | fn = mp3s[0]
102 | self.assertTrue('CandA' in fn)
103 | t = EasyMP3(fn)['title']
104 | self.assertTrue('C&A' in t[0])
105 | rm_mp3()
106 |
107 |
108 | # def test_musicbed(self):
109 | # for f in glob.glob('*.mp3'):
110 | # os.unlink(f)
111 |
112 | # mp3_count = len(glob.glob1('', "*.mp3"))
113 | # vargs = {'login':'musicbedtest@gmail.com', 'password':'oo6alY9T', 'path':'', 'folders': False, 'group': False, 'track': '', 'num_tracks': 9223372036854775807, 'bandcamp': False, 'downloadable': False, 'likes': False, 'open': False, 'artist_url': 'https://www.musicbed.com/albums/be-still/2828'}
114 | # process_musicbed(vargs)
115 | # new_mp3_count = len(glob.glob1('', "*.mp3"))
116 | # self.assertTrue(new_mp3_count > mp3_count)
117 |
118 | # for f in glob.glob('*.mp3'):
119 | # os.unlink(f)
120 |
121 | def test_mixcloud(self):
122 | """
123 | MixCloud is being blocked from Travis, interestingly.
124 | """
125 |
126 | # rm_mp3()
127 | # for f in glob.glob('*.m4a'):
128 | # os.unlink(f)
129 |
130 | # shortest mix I could find that was still semi tolerable
131 | #mp3_count = len(glob.glob1('', "*.mp3"))
132 | #m4a_count = len(glob.glob1('', "*.m4a"))
133 | #vargs = {'path':'', 'folders': False, 'group': False, 'track': '', 'num_tracks': 9223372036854775807, 'bandcamp': False, 'downloadable': False, 'likes': False, 'open': False, 'artist_url': 'https://www.mixcloud.com/Bobby_T_FS15/coffee-cigarettes-saturday-morning-hip-hop-fix/'}
134 | #process_mixcloud(vargs)
135 | #new_mp3_count = len(glob.glob1('', "*.mp3"))
136 | #new_m4a_count = len(glob.glob1('', "*.m4a"))
137 | #self.assertTrue((new_mp3_count > mp3_count) or (new_m4a_count > m4a_count))
138 |
139 | # rm_mp3()
140 | # for f in glob.glob('*.m4a'):
141 | # os.unlink(f)
142 |
143 | # def test_audiomack(self):
144 | # for f in glob.glob('*.mp3'):
145 | # os.unlink(f)
146 |
147 | # mp3_count = len(glob.glob1('', "*.mp3"))
148 | # vargs = {'path':'', 'folders': False, 'group': False, 'track': '', 'num_tracks': 9223372036854775807, 'bandcamp': False, 'audiomack': True, 'downloadable': False, 'likes': False, 'open': False, 'artist_url': 'https://www.audiomack.com/song/bottomfeedermusic/power'}
149 | # process_audiomack(vargs)
150 | # new_mp3_count = len(glob.glob1('', "*.mp3"))
151 | # self.assertTrue(new_mp3_count > mp3_count)
152 |
153 | # for f in glob.glob('*.mp3'):
154 | # os.unlink(f)
155 |
156 | if __name__ == '__main__':
157 | unittest.main()
158 |
--------------------------------------------------------------------------------