├── .github ├── FUNDING.yml ├── ISSUE_TEMPLATE.md ├── PULL_REQUEST_TEMPLATE.md └── workflows │ └── ci.yml ├── .gitignore ├── CHANGELOG.md ├── LICENSE ├── README.md ├── deprecated.md ├── domains.yml ├── index.js ├── lib ├── error.js ├── index.js ├── is_error_fatal.js └── providers │ ├── clck.ru.js │ ├── flic.kr.js │ ├── google.com.js │ ├── index.js │ └── vk.com.js ├── package.json ├── support └── unshort.js └── test ├── cache.js ├── default.js ├── expand.js ├── services.js └── services.yml /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | open_collective: puzrin 2 | patreon: puzrin 3 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Prior to request adding new site to defaults, please check: 2 | 3 | - It should be popular 4 | - It should not be restricted to single domain (fb.me and others) 5 | 6 | Anything else can be added via library API at user's side. 7 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | Requirements for adding new site to defaults: 2 | 3 | - It should be popular 4 | - It should not be restricted to single domain (fb.me and others) 5 | - Test for new site should exist (`npm run test-all`) 6 | 7 | Anything else can be added via library API at user's side. 8 | -------------------------------------------------------------------------------- /.github/workflows/ci.yml: -------------------------------------------------------------------------------- 1 | name: CI 2 | 3 | on: 4 | push: 5 | pull_request: 6 | schedule: 7 | - cron: '0 0 * * 3' 8 | 9 | jobs: 10 | test: 11 | runs-on: ubuntu-latest 12 | 13 | steps: 14 | - uses: actions/checkout@v2 15 | - uses: actions/setup-node@v2 16 | 17 | - run: npm install 18 | 19 | - name: Test 20 | run: | 21 | npm test 22 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | node_modules 2 | doc 3 | *.log 4 | *.swp 5 | -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | 6.1.0 / 2022-10-22 2 | ------------------ 3 | 4 | - Deps bump. 5 | - Deprecated `soo.gd` & `korta.nu`. 6 | - Fixed `vk.cc` & `vurl.com`. 7 | 8 | 9 | 6.0.0 / 2022-05-17 10 | ------------------ 11 | 12 | - Cleanup deprecated redirectors. Move deprecation info to separate file. 13 | - node.js v14+ required. 14 | - Renamed option `select` => `link_selector`. 15 | - Added method `.remove()` method. 16 | - Add `isErrorFatal` helper. 17 | - Deps bump. 18 | 19 | 20 | 5.0.0 / 2017-06-08 21 | ------------------ 22 | 23 | - Switch to native async/await (need nodejs 7.+) 24 | - Drop callbacks support. 25 | 26 | 27 | 4.1.0 / 2017-06-08 28 | ------------------ 29 | 30 | - Maintenance, deps bump. `got` 6.x -> 7.x. `got` timeouts may work a bit 31 | different but should affect result. 32 | 33 | 34 | 4.0.0 / 2016-12-08 35 | ------------------ 36 | 37 | - Move request options to `options.request`. 38 | - Update default User-Agent string. 39 | - Deprecate `error.status` (use `error.statusCode`). 40 | - Add more info (code) to error messages. 41 | - flic.kr should use `.request()` method. 42 | - Increase default request timeout to 30 seconds. 43 | 44 | 45 | 3.1.0 / 2016-12-05 46 | ------------------ 47 | 48 | - `err.status` -> `err.statusCode` (old `err.status` still exists for backward 49 | compatibility, but will be deprecated). 50 | 51 | 52 | 3.0.0 / 2016-11-27 53 | ------------------ 54 | 55 | - Rewrite internals to promises (including .require() / cache.get() / 56 | cache.set()). 57 | - Drop old node.js support, now v4.+ required. 58 | 59 | 60 | 2.1.0 / 2016-07-15 61 | ------------------ 62 | 63 | - Added `google.*/url` unshortening. 64 | - Reenabled some glitching services. 65 | - Added incident dates to default config for tracking progress in future. 66 | 67 | 68 | 2.0.0 / 2016-05-24 69 | ------------------ 70 | 71 | - Added Promise support in `.expand` method. 72 | - Services cleanup. 73 | 74 | 75 | 1.1.3 / 2016-01-17 76 | ------------------ 77 | 78 | - Maintenance: deps update. 79 | 80 | 81 | 1.1.2 / 2015-12-07 82 | ------------------ 83 | 84 | - Enchanced error info with `code` & `status` properties. 85 | 86 | 87 | 1.1.1 / 2015-11-27 88 | ------------------ 89 | 90 | - Improved cache use for edge case with empty result. 91 | 92 | 93 | 1.1.0 / 2015-11-25 94 | ------------------ 95 | 96 | - Optimized cache use. Store data only if fetch happened. 97 | - Increased request timeout to 10 seconds. 98 | 99 | 100 | 1.0.1 / 2015-10-28 101 | ------------------ 102 | 103 | - Added `vk.com/away.php` support. 104 | 105 | 106 | 1.0.0 / 2015-08-16 107 | ------------------ 108 | 109 | - First release. 110 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015 Vitaly Puzrin. 2 | 3 | Permission is hereby granted, free of charge, to any person 4 | obtaining a copy of this software and associated documentation 5 | files (the "Software"), to deal in the Software without 6 | restriction, including without limitation the rights to use, 7 | copy, modify, merge, publish, distribute, sublicense, and/or sell 8 | copies of the Software, and to permit persons to whom the 9 | Software is furnished to do so, subject to the following 10 | conditions: 11 | 12 | The above copyright notice and this permission notice shall be 13 | included in all copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 16 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES 17 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND 18 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 19 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 20 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 21 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # url-unshort 2 | 3 | [![CI](https://github.com/nodeca/url-unshort/actions/workflows/ci.yml/badge.svg)](https://github.com/nodeca/url-unshort/actions/workflows/ci.yml) 4 | [![NPM version](https://img.shields.io/npm/v/url-unshort.svg?style=flat)](https://www.npmjs.org/package/url-unshort) 5 | 6 | > This library expands urls provided by url shortening services (see [full list](https://github.com/nodeca/url-unshort/blob/master/domains.yml)). 7 | 8 | 9 | ## Why should I use it? 10 | 11 | It has been [argued](http://joshua.schachter.org/2009/04/on-url-shorteners) that 12 | “shorteners are bad for the ecosystem as a whole”. In particular, if you're 13 | running a forum or a blog, such services might cause trouble for your users: 14 | 15 | - such links load slower than usual (shortening services require an extra DNS 16 | and HTTP request) 17 | - it adds another point of failure (should this service go down, the links will 18 | die; [301works](https://archive.org/details/301works) tries to solve this, 19 | but it's better to avoid the issue in the first place) 20 | - users don't see where the link points to (tinyurl previews don't *really* 21 | solve this) 22 | - it can be used for user activity tracking 23 | - certain shortening services are displaying ads before redirect 24 | - shortening services can be malicious or be hacked so they could redirect to 25 | a completely different place next month 26 | 27 | Also, short links are used to bypass the spam filters. So if you're implementing 28 | a domain black list for your blog comments, you might want to check where all 29 | those short links *actually* point to. 30 | 31 | 32 | ## Installation 33 | 34 | ```js 35 | $ npm install url-unshort 36 | ``` 37 | 38 | ## Basic usage 39 | 40 | ```js 41 | const uu = require('url-unshort')() 42 | 43 | try { 44 | const url = await uu.expand('http://goo.gl/HwUfwd') 45 | 46 | if (url) console.log('Original url is: ${url}') 47 | else console.log('This url can\'t be expanded') 48 | 49 | } catch (err) { 50 | console.log(err); 51 | } 52 | ``` 53 | 54 | ## Retrying errors 55 | 56 | Temporary network errors are retried automatically once (`options.request.retry=1` by default). 57 | 58 | You may choose to retry some errors after an extended period of time using code like this: 59 | 60 | ```js 61 | const uu = require('url-unshort')() 62 | const { isErrorFatal } = require('url-unshort') 63 | let tries = 0 64 | 65 | while (true) { 66 | try { 67 | tries++ 68 | const url = await uu.expand('http://goo.gl/HwUfwd') 69 | 70 | // If url is expanded, it returns string (expanded url); 71 | // "undefined" is returned if service is unknown 72 | if (url) console.log(`Original url is: ${url}`) 73 | else console.log("This url can't be expanded") 74 | break 75 | 76 | } catch (err) { 77 | // use isErrorFatal function to check if url can be retried or not 78 | if (isErrorFatal(err)) { 79 | // this url can't be expanded (e.g. 404 error) 80 | console.log(`Unshort error (fatal): ${err}`) 81 | break 82 | } 83 | 84 | // Temporary error, trying again in 10 minutes 85 | // (5xx errors, ECONNRESET, etc.) 86 | console.log(`Unshort error (retrying): ${err}`) 87 | if (tries >= 3) { 88 | console.log(`Too many errors, aborting`) 89 | break 90 | } 91 | await new Promise(resolve => setTimeout(resolve, 10 * 60 * 1000)) 92 | } 93 | } 94 | ``` 95 | 96 | 97 | ## API 98 | 99 | ### Creating an instance 100 | 101 | When you create an instance, you can pass an options object to fine-tune unshortener behavior. 102 | 103 | ```js 104 | const uu = require('url-unshort')({ 105 | nesting: 3, 106 | cache: { 107 | get: async key => {}, 108 | set: async (key, value) => {} 109 | } 110 | }); 111 | ``` 112 | 113 | Available options are: 114 | 115 | - **nesting** (Number, default: `3`) - stop resolving urls 116 | when `nesting` amount of redirects is reached. 117 | 118 | It happens if one shortening service refers to a link belonging to 119 | another shortening service which in turn points to yet another one 120 | and so on. 121 | 122 | If this limit is reached, `expand()` will return an error. 123 | 124 | - **cache** (Object) - set a custom cache implementation (e.g. if you wish 125 | to store urls in Redis). 126 | 127 | You need to specify 2 promise-based functions, `set(key, value)` & `get(key)`. 128 | 129 | - **request** (Object) - default options for 130 | [got](https://github.com/sindresorhus/got) in `.request()` method. Can be 131 | used to set custom `User-Agent` and other headers. 132 | 133 | 134 | ### uu.expand(url) -> Promise 135 | 136 | Expand an URL supplied. If we don't know how to expand it, returns `null`. 137 | 138 | ```js 139 | const uu = require('url-unshort')(); 140 | 141 | try { 142 | const url = await uu.expand('http://goo.gl/HwUfwd') 143 | 144 | if (url) console.log('Original url is: ${url}') 145 | // no shortening service or an unknown one is used 146 | else console.log('This url can\'t be expanded') 147 | 148 | } catch (err) { 149 | console.log(err) 150 | } 151 | ``` 152 | 153 | ### uu.add(domain [, options]) 154 | 155 | Add a new url shortening service (domain name or an array of them) to the white 156 | list of domains we know how to expand. 157 | 158 | ```js 159 | uu.add([ 'tinyurl.com', 'bit.ly' ]) 160 | ``` 161 | 162 | The default behavior will be to follow the URL with a HEAD request and check 163 | the status code. If it's `3xx`, return the `Location` header. You can override 164 | this behavior by supplying your own function in options. 165 | 166 | Options: 167 | 168 | - **aliases** (Array) - Optional. List of alternate domaine names, if exist. 169 | - **match** (String|RegExp) - Optional. Custom regexp to use for URL match. 170 | For example, if you need to match wildcard prefixes or country-specific 171 | suffixes. If used with `validate`, then regexp may be not precise, only to 172 | filter out noise. If `match` not passed, then exact value auto-generated from 173 | `domain` & `aliases`. 174 | - **validate** (Function) - Optional. Does exact URL check, when complex logic 175 | required and regexp is not enouth (when `match` is only preliminary). See 176 | `./lib/providers/*` for example. 177 | - **fetch** (Function) - Optional. Specifies custom function to retrieve expanded 178 | url, see `./lib/providers/*` for examples. If not set - default method used 179 | (it checks 30X redirect codes & `` 180 | in HTML). 181 | - **link_selector** (String) - Optional. Some sites may return HTML pages instead 182 | of 302 redirects. This option allows use jquery-like selector to extract 183 | `` value. 184 | 185 | Example: 186 | 187 | ```js 188 | const uu = require('url-unshort')() 189 | 190 | uu.add('notlong.com', { 191 | match: '^(https?:)//[a-zA-Z0-9_-]+[.]notlong[.]com/' 192 | }) 193 | 194 | uu.add('tw.gs', { 195 | link_selector: '#lurllink > a' 196 | }) 197 | ``` 198 | 199 | ### uu.remove(domain) 200 | 201 | (String|Array|Undefined). Opposite to `.add()`. Remove selected domains from 202 | instance config. If no params passed - remove everything. 203 | 204 | 205 | ## Security considerations 206 | 207 | Only `http` and `https` protocols are allowed in the output. Browsers technically 208 | support redirects to other protocols (like `ftp` or `magnet`), but most url 209 | shortening services limit redirects to `http` and `https` anyway. In case 210 | service redirects to an unknown protocol, `expand()` will return an error. 211 | 212 | `expand()` function returns url from the url shortening **as is** without any 213 | escaping or even ensuring that the url is valid. If you want to guarantee a 214 | valid url as an output, you're encouraged to re-encode it like this: 215 | 216 | ```js 217 | var URL = require('url'); 218 | 219 | url = await uu.expand('http://goo.gl/HwUfwd') 220 | 221 | if (url) url = URL.format(URL.parse(url, null, true)) 222 | 223 | console.log(url)); 224 | ``` 225 | 226 | ## License 227 | 228 | [MIT](https://raw.github.com/nodeca/url-unshort/master/LICENSE) 229 | -------------------------------------------------------------------------------- /deprecated.md: -------------------------------------------------------------------------------- 1 | List of outdated domains, removed from `domains.yml`. 2 | 3 | **2.gp** 4 | 5 | Alias: 7.ly 6 | 7 | 2021.11.30. Redirects to another site, old links removed. 8 | 9 | **adfa.st** 10 | 11 | 2016.07.13. Domain lost. 12 | 13 | **b23.ru** 14 | 15 | 2016.07.13. Not responding. 16 | 17 | **budurl.me** 18 | 19 | 2016.07.13. Become adware, old links removed. 20 | 21 | **fur.ly** 22 | 23 | 2016.12.06. Empty main page. 404 to all links. 24 | 25 | **korta.nu** 26 | 27 | 2022.10.22. Not working (accedd denied) 28 | 29 | **macte.ch** 30 | 31 | 2016.07.13. Removed old links & disabled foreign domains 32 | 33 | **migre.me** 34 | 35 | 2021.11.30. Not working. 36 | 37 | **minu.me** 38 | 39 | 2021.11.30. Not responding. 40 | 41 | **nsfw.in** 42 | 43 | 2021.11.30. Cyclic redirect. 44 | 45 | **o-x.fr** 46 | 47 | 2016.07.13. Domain lost. 48 | 49 | **qr.net** 50 | 51 | 2016.12.06. Not working, strange default redirects. 52 | 53 | **scrnch.me** 54 | 55 | 2021.11.30. Domain lost. 56 | 57 | **smsh.me** 58 | 59 | 2016.07.13. Not working. 60 | 61 | **snipurl.com** 62 | 63 | 2021.11.30. "We're migrating to a new server" (and nothing changed) 64 | 65 | **soo.gd** 66 | 67 | 2022.10.22. Domain lost. 68 | 69 | **thecow.me** 70 | 71 | 2021.11.30. Only root page works (and nothing changed) 72 | 73 | **tiny.ly** 74 | 75 | 2016.07.13. Not responding. 76 | 77 | **tnij.org** 78 | 79 | 2021.11.30. Not working. 80 | 81 | **to.ly** 82 | 83 | 2016.07.13. Not responding. Currently redirects to another site, old links removed. 84 | 85 | **tr.im** 86 | 87 | 2022.02.02. Domain lost. 88 | 89 | **trim.li** 90 | 91 | 2021.11.30. Not working. 92 | 93 | **url.az** 94 | 95 | 2016.07.13. Become adware with intermediate page. Old links removed. 96 | 97 | **http://ur1.ca** 98 | 99 | 2021.11.30. Domain lost. 100 | 101 | **➡.ws** 102 | 103 | With aliases: ➯.ws, ➔.ws, ➞.ws, ➽.ws, ➹.ws, ✩.ws, ✿.ws, ❥.ws, ›.ws, ⌘.ws, ‽.ws, 104 | ☁.ws, ta.gd, ri.ms. 105 | 106 | 2021.11.30. All domains unconfigured or lost. -------------------------------------------------------------------------------- /domains.yml: -------------------------------------------------------------------------------- 1 | # 2 | # The list of domains handled by internal rules 3 | # 4 | 5 | - 0rz.tw 6 | - alturl.com 7 | - amzn.to 8 | - bit.do 9 | - bit.ly: 10 | aliases: [ aaja.de, adct.me, archdai.ly, aspt.co, ccwc.me, crks.me, 11 | bcool.bz, detne.ws, digs.by, drudge.tw, emu.sc, go72.de, got.cr, j.mp, 12 | j-tv.me, hnnng.de, s.htc.com, kon.gg, livesi.de, perez.ly, rol.st, 13 | scr.bi, theatln.tc, tgr.ph, trib.in, utsd.us, yhoo.it ] 14 | - chilp.it 15 | - clck.ru 16 | - cort.as 17 | - cutt.us 18 | - db.tt 19 | - fave.co 20 | - flic.kr 21 | - goo.gl 22 | - google.com: 23 | match: '^(https?:)//(www[.])?google([.]\w+)([.]\w+)?/' 24 | - is.gd 25 | - merky.de 26 | - notlong.com: 27 | match: '^(https?:)//[a-zA-Z0-9_-]+[.]notlong[.]com/' 28 | - ow.ly: 29 | aliases: [ owl.li, ht.ly ] 30 | - shorl.com 31 | - smu.gs 32 | - t.co 33 | # 2016.07.13, Domain blacklisted by idiots from RKN. Viva Russia! 34 | - tiny.cc 35 | - tiny.pl 36 | - tinyurl.com 37 | - tmblr.co: 38 | aliases: [ tumblr.com ] 39 | # 2022.10.22 Shows DB Error 40 | - tw.gs: 41 | link_selector: '#lurllink > a' 42 | # 2021.11.30 Unstable, but works. 43 | - url.ie 44 | - v.gd: 45 | link_selector: '.biglink' 46 | - vk.cc 47 | - vk.com: 48 | aliases: [ vkontakte.ru ] 49 | - vurl.com: 50 | link_selector: '.padder a:nth-child(3)' 51 | - wp.me 52 | - xurl.es 53 | -------------------------------------------------------------------------------- /index.js: -------------------------------------------------------------------------------- 1 | 'use strict' 2 | 3 | module.exports = require('./lib') 4 | module.exports.Error = require('./lib/error') 5 | module.exports.isErrorFatal = require('./lib/is_error_fatal') 6 | -------------------------------------------------------------------------------- /lib/error.js: -------------------------------------------------------------------------------- 1 | // Error class based on http://stackoverflow.com/questions/8458984 2 | // 3 | 'use strict' 4 | 5 | class UnshortError extends Error { 6 | constructor (message, code, statusCode) { 7 | super(message) 8 | this.name = this.constructor.name 9 | Error.captureStackTrace(this, this.constructor) 10 | 11 | if (code) this.code = code 12 | if (statusCode) this.statusCode = statusCode 13 | } 14 | } 15 | 16 | module.exports = UnshortError 17 | -------------------------------------------------------------------------------- /lib/index.js: -------------------------------------------------------------------------------- 1 | // Main class 2 | // 3 | 4 | 'use strict' 5 | 6 | const $ = require('cheerio/lib/slim').load('') 7 | const read = require('fs').readFileSync 8 | const yaml = require('js-yaml') 9 | const path = require('path') 10 | const punycode = require('punycode/') 11 | const got = require('got') 12 | const escapeRe = require('escape-string-regexp') 13 | const merge = require('lodash.merge') 14 | const URL = require('url').URL 15 | const UnshortError = require('./error') 16 | const pkg = require('../package.json') 17 | 18 | const config = yaml.load(read(path.join(__dirname, '..', 'domains.yml'), 'utf8')) 19 | 20 | const defaultAgent = `${pkg.name}/${pkg.version} (+https://github.com/nodeca/url-unshort)` 21 | 22 | const defaultOptions = { 23 | timeout: 30 * 1000, 24 | retry: 1, 25 | followRedirect: false, // redirects are handled manually 26 | headers: { 27 | 'User-Agent': defaultAgent 28 | } 29 | } 30 | 31 | const customProviders = require('./providers/index') 32 | 33 | // Create an unshortener instance 34 | // 35 | // options: 36 | // - cache (Object) - cache instance 37 | // - get(key) -> Promise 38 | // - set(key, value) -> Promise 39 | // - nesting (Number) - max amount of redirects to follow, default: `3` 40 | // - request (Object) - default options for `got` in `.request()` method 41 | // 42 | function Unshort (options = {}) { 43 | if (!(this instanceof Unshort)) return new Unshort(options) 44 | 45 | this._options = merge({}, defaultOptions, options.request || {}) 46 | 47 | // config data with compiled regexps and fetch functions attached 48 | this._sites = [] 49 | this._compiled_sites = [] 50 | 51 | this.cache = options.cache || { 52 | get: async () => {}, 53 | set: async () => {} 54 | } 55 | 56 | this.nesting = options.nesting || 3 57 | 58 | // Regexp that matches links to all the known services, it is used 59 | // to determine whether url should be processed at all or not. 60 | // 61 | // Initialized to regexp that matches nothing, it gets overwritten 62 | // when domains are added. 63 | // 64 | this._matchAllRE = /(?!)/ 65 | 66 | // Merge config data & custom providers 67 | config.forEach(site => { 68 | let domain, options 69 | 70 | if (typeof site === 'string') { 71 | [domain, options] = [site, {}] 72 | } else { 73 | [domain, options] = Object.entries(site)[0] 74 | } 75 | 76 | if (customProviders[domain]) { 77 | Object.assign(options, customProviders[domain]) 78 | } 79 | 80 | this.add(domain, options) 81 | }) 82 | } 83 | 84 | // Remove previously added domain. 85 | // 86 | // - domain (String|Array) - list of domain names (leave undefined to drop all) 87 | // 88 | Unshort.prototype.remove = function (domain) { 89 | if (!domain) { 90 | this._sites.length = 0 91 | } else if (!Array.isArray(domain)) { 92 | this._sites = this._sites.filter(s => s.id !== domain) 93 | } else { 94 | for (const d of domain) { 95 | this._sites = this._sites.filter(s => s.id !== d) 96 | } 97 | } 98 | 99 | this._compile() 100 | } 101 | 102 | // Add a domain name to the list of known domains 103 | // 104 | // - domain (String|Array) - list of domain names 105 | // - options (Object) - options for these domains 106 | // - link_selector (String) - jquery-like selector to retrieve url with 107 | // - match (String|RegExp) - custom regexp to use to match this domain 108 | // - fetch (Function) - custom function to retrieve expanded url 109 | // 110 | Unshort.prototype.add = function (domain, options = {}) { 111 | if (Array.isArray(domain)) { 112 | for (const d of domain) { 113 | this._sites.push(Object.assign({ id: d }, options)) 114 | } 115 | } else { 116 | this._sites.push(Object.assign({ id: domain }, options)) 117 | } 118 | 119 | this._compile() 120 | } 121 | 122 | // Normalize site data: 123 | // 124 | // - create default handlers if not exist 125 | // - build `match` regexp, depending on other fields 126 | // 127 | // Returns normalized object, suitable for unified processing. 128 | // 129 | Unshort.prototype._compileSingle = function (site) { 130 | // Prepare list of all the domain names, including aliases 131 | // and punycode variations 132 | let dList = [site.id].concat(site.aliases || []) 133 | 134 | // create variations + make unique 135 | dList = Array.from(new Set( 136 | dList.map(punycode.toASCII).concat(dList.map(punycode.toUnicode)) 137 | )) 138 | 139 | let match 140 | 141 | if (site.match) { 142 | // regexp is specified by a user 143 | match = typeof site.match === 'string' 144 | ? new RegExp(site.match, 'i') 145 | : site.match 146 | } else { 147 | // regexp is auto-generated out of domain list 148 | match = new RegExp( 149 | `^(https?:)?//(www[.])?(${dList.map(escapeRe).join('|')})/`, 150 | 'i' 151 | ) 152 | } 153 | 154 | return Object.assign( 155 | { fetch: this._defaultFetch, validate: () => true }, 156 | site, 157 | { match } 158 | ) 159 | } 160 | 161 | // Rebuild all regexps & default handlers for fast run 162 | Unshort.prototype._compile = function () { 163 | this._compiled_sites.length = 0 164 | 165 | for (const site of this._sites) { 166 | this._compiled_sites.push(this._compileSingle(site)) 167 | } 168 | 169 | // Create global search regexp 170 | this._matchAllRE = new RegExp( 171 | this._compiled_sites.map(cs => cs.match.source).join('|'), 172 | 'i' 173 | ) 174 | } 175 | 176 | // Internal method to perform an http(s) request, it's supposed to be used 177 | // in fetchers. You can override it with custom implementation (for example, 178 | // if you want to avoid http requests at all and use cache only, you can 179 | // replace this with a stub). 180 | // 181 | Unshort.prototype.request = function (url, options) { 182 | const opts = merge({}, this._options, options || {}) 183 | 184 | return got(url, opts).catch(err => { 185 | let statusCode = err.statusCode 186 | 187 | if (err.code === 'ERR_NON_2XX_3XX_RESPONSE' && err.response) { 188 | // https://github.com/sindresorhus/got/blob/main/documentation/8-errors.md 189 | statusCode = err.response.statusCode 190 | } 191 | 192 | throw new UnshortError( 193 | `Remote server error, code ${err.code}, statusCode ${statusCode}`, 194 | 'EHTTP', 195 | statusCode) 196 | }) 197 | } 198 | 199 | // Expand an URL 200 | // 201 | // - url (String) - url to expand 202 | // 203 | Unshort.prototype.expand = function (url) { 204 | return this._expand(url) 205 | } 206 | 207 | // Internal method that expands url recursively up to `nesting` times, 208 | // on each execution it parses input url and calls a fetcher of the 209 | // matching domain. 210 | // 211 | Unshort.prototype._expand = async function (origUrl) { 212 | if (origUrl.startsWith('//')) { 213 | try { 214 | /* eslint-disable no-new */ 215 | new URL(origUrl) 216 | } catch (e) { 217 | try { 218 | // set protocol for relative links like `//example.com` 219 | new URL('http:' + origUrl) 220 | origUrl = 'http:' + origUrl 221 | } catch {} 222 | } 223 | } 224 | 225 | let url = origUrl 226 | let shouldCache = false 227 | let nestingLeft = this.nesting 228 | 229 | for (; nestingLeft >= 0; nestingLeft--) { 230 | let hash = '' 231 | 232 | // 233 | // Normalize url & pre-validate 234 | // 235 | 236 | let u = new URL(url) 237 | 238 | // user-submitted url has weird protocol, just return `null` in this case 239 | if (u.protocol !== 'http:' && u.protocol !== 'https:') break 240 | 241 | if (u.hash) { 242 | // Copying browser-like behavior here: if we're not redirected to a hash, 243 | // but original url has one, set it as a final hash. 244 | hash = u.hash 245 | u.hash = '' 246 | } 247 | 248 | const urlNormalized = u.toString() 249 | 250 | // 251 | // At top level try cache first. On recursive calls skip cache. 252 | // !! Cache should be probed even for disabled services, to resolve old links. 253 | // 254 | let result 255 | 256 | if (nestingLeft === this.nesting) { 257 | result = await this.cache.get(urlNormalized) 258 | 259 | // If cache exists - use it. 260 | if (result || result === null) { 261 | // forward hash if needed 262 | if (hash && result) { 263 | u = new URL(result) 264 | u.hash = u.hash || hash 265 | result = u.toString() 266 | } 267 | 268 | return result 269 | } 270 | } 271 | 272 | // 273 | // First pass validation (quick). 274 | // 275 | 276 | if (!this._matchAllRE.test(urlNormalized)) break 277 | 278 | // Something found - run additional checks. 279 | 280 | const siteConfig = this._compiled_sites.find(cs => cs.match.exec(urlNormalized)) 281 | 282 | if (!siteConfig || !siteConfig.validate(urlNormalized)) break 283 | 284 | // Valid redirector => should cache result 285 | shouldCache = true 286 | 287 | result = await siteConfig.fetch.call(this, urlNormalized, siteConfig) 288 | 289 | // If unshortener has persistent fail - stop. 290 | if (!result) break 291 | 292 | // Parse and check url 293 | // 294 | try { 295 | u = new URL(result) 296 | } catch (e) { 297 | if (e instanceof TypeError && e.message === 'Invalid URL') { 298 | throw new UnshortError('Redirected to an invalid location', 'EBADREDIRECT') 299 | } 300 | 301 | throw e 302 | } 303 | 304 | if (u.protocol !== 'http:' && u.protocol !== 'https:') { 305 | // Accept: 306 | // 307 | // - http:// protocol (e.g. http://example.org/) 308 | // - https:// protocol (e.g. https://example.org/) 309 | // 310 | // Restriction is done for security reasons. Even though browsers 311 | // can redirect anywhere, most shorteners have similar restrictions. 312 | // 313 | throw new UnshortError('Redirected to an invalid location', 'EBADREDIRECT') 314 | } 315 | 316 | // restore hash if needed 317 | if (hash && !u.hash) { 318 | u.hash = hash 319 | result = u.toString() 320 | } 321 | 322 | url = result 323 | } 324 | 325 | if (nestingLeft < 0) { 326 | throw new UnshortError('Too many redirects', 'EBADREDIRECT') 327 | } 328 | 329 | const result = (url !== origUrl) ? url : null 330 | 331 | if (shouldCache) { 332 | // Cache result. 333 | // !! use normalized original URL for cache key. 334 | const uo = new URL(origUrl) 335 | 336 | uo.hash = '' 337 | 338 | await this.cache.set(uo.toString(), result) 339 | } 340 | 341 | return result 342 | } 343 | 344 | Unshort.prototype._isRedirect = function (code) { 345 | return [301, 302, 303, 307, 308].includes(code) 346 | } 347 | 348 | // Default fetcher, it requests an url and retrieves url it redirects to 349 | // using following data sources: 350 | // 351 | // - "Location" header if response code is 3xx 352 | // - meta tag 353 | // - $(selector).attr('href, src') if selector is specified 354 | // 355 | Unshort.prototype._defaultFetch = async function (url, options) { 356 | let res 357 | 358 | try { 359 | res = await this.request(url) 360 | } catch (e) { 361 | if (e.statusCode >= 400 && e.statusCode < 500) return null 362 | throw e 363 | } 364 | 365 | if (this._isRedirect(res.statusCode)) { 366 | return res.headers.location ? res.headers.location.trim() : null 367 | } 368 | 369 | if (res.statusCode >= 200 && res.statusCode < 300) { 370 | if (!res.headers['content-type'] || 371 | res.headers['content-type'].split(';')[0].trim() !== 'text/html') { 372 | return null 373 | } 374 | 375 | const body = String(res.body) 376 | 377 | if (options.link_selector) { 378 | // try to lookup selector if it's defined in the config 379 | const el = $(body).find(options.link_selector) 380 | const result = el.attr('href') 381 | 382 | if (result) return result.trim() 383 | } 384 | 385 | // try tag 386 | let refresh = $(body) 387 | .find('meta[http-equiv="refresh"]') 388 | .attr('content') 389 | 390 | if (!refresh) return null 391 | 392 | // parse meta-tag and remove timeout, 393 | // refresh at this point is like `0.5; url=http://example.org` 394 | refresh = refresh.replace(/^[^;]+;\s*url=/i, '').trim() 395 | 396 | return refresh 397 | } 398 | 399 | throw new UnshortError( 400 | `Remote server error, code ${res.code}, statusCode ${res.statusCode}`, 401 | 'EHTTP', 402 | res.statusCode) 403 | } 404 | 405 | module.exports = Unshort 406 | -------------------------------------------------------------------------------- /lib/is_error_fatal.js: -------------------------------------------------------------------------------- 1 | // Function checks whether an error can be retried later or not 2 | // 3 | 'use strict' 4 | 5 | function isErrorFatal (err) { 6 | // HTTP errors, fatal errors are everything except: 7 | // - 5xx - server-side errors 8 | // - 429 - rate limit 9 | // - 408 - request timeout 10 | if (err.statusCode && !String(+err.statusCode).match(/^(5..|429|408)$/)) { 11 | return true 12 | } 13 | 14 | // EINVAL - bad urls like http://1234 15 | if (err.code === 'EINVAL') return true 16 | 17 | // server returned invalid url or caused redirect loop 18 | if (err.code === 'EBADREDIRECT') return true 19 | 20 | return false 21 | } 22 | 23 | module.exports = isErrorFatal 24 | -------------------------------------------------------------------------------- /lib/providers/clck.ru.js: -------------------------------------------------------------------------------- 1 | // clck.ru redirects to https://sba.yandex.net/redirect?url=..., which 2 | // restricts allowed user agents. 3 | // Let's extract url directly. 4 | 5 | 'use strict' 6 | 7 | const URL = require('url').URL 8 | const UnshortError = require('../error') 9 | 10 | exports.fetch = async function (url) { 11 | let res 12 | try { 13 | res = await this.request(url, { method: 'HEAD' }) 14 | } catch (e) { 15 | if (e.statusCode >= 400 && e.statusCode < 500) return null 16 | throw e 17 | } 18 | 19 | if (!this._isRedirect(res.statusCode)) { 20 | throw new UnshortError( 21 | `Unexpected server response ${res.statusCode}, expect redirect`, 22 | 'EHTTP', 23 | 500 24 | ) 25 | } 26 | 27 | let dest 28 | try { 29 | const u = new URL(res.headers.location) 30 | dest = u.searchParams.get('url') 31 | } catch (e) { 32 | throw new UnshortError( 33 | 'Redirected to an invalid location', 34 | 'EBADREDIRECT' 35 | ) 36 | } 37 | 38 | return dest 39 | } 40 | -------------------------------------------------------------------------------- /lib/providers/flic.kr.js: -------------------------------------------------------------------------------- 1 | // Process flic.kr redirects (including relative urls default fetcher can't do) 2 | // 3 | 4 | 'use strict' 5 | 6 | const URL = require('url').URL 7 | const UnshortError = require('../error') 8 | 9 | exports.fetch = async function (url) { 10 | let nestingLeft = 5 11 | 12 | while (nestingLeft--) { 13 | let res 14 | 15 | try { 16 | res = await this.request(url, { method: 'HEAD' }) 17 | } catch (e) { 18 | if (e.statusCode >= 400 && e.statusCode < 500) return null 19 | throw e 20 | } 21 | 22 | if (this._isRedirect(res.statusCode)) { 23 | try { 24 | const uDst = new URL(res.headers.location, url) 25 | 26 | url = uDst.toString() 27 | continue 28 | } catch (e) { 29 | if (e instanceof TypeError && e.message === 'Invalid URL') { 30 | throw new UnshortError('Redirected to an invalid location', 'EBADREDIRECT') 31 | } 32 | 33 | throw e 34 | } 35 | } 36 | 37 | // reached destination 38 | if (res.statusCode >= 200 && res.statusCode < 300) return url 39 | 40 | throw new UnshortError(`Unexpected status code: ${res.statusCode}`, 'EHTTP', res.statusCode) 41 | } 42 | 43 | return null 44 | } 45 | -------------------------------------------------------------------------------- /lib/providers/google.com.js: -------------------------------------------------------------------------------- 1 | // Process google.com/url?... redirects 2 | // 3 | 4 | 'use strict' 5 | 6 | const URL = require('url').URL 7 | const isGoogle = require('is-google-domain') 8 | 9 | exports.validate = url => { 10 | const u = new URL(url) 11 | 12 | if (!isGoogle(u.hostname)) return false 13 | 14 | return u.pathname === '/url' && u.searchParams.get('url') 15 | } 16 | 17 | exports.fetch = async url => { 18 | const u = new URL(url) 19 | 20 | return u.searchParams.get('url') 21 | } 22 | -------------------------------------------------------------------------------- /lib/providers/index.js: -------------------------------------------------------------------------------- 1 | 'use strict' 2 | 3 | module.exports = { 4 | 'clck.ru': require('./clck.ru'), 5 | 'flic.kr': require('./flic.kr'), 6 | 'google.com': require('./google.com'), 7 | 'vk.com': require('./vk.com') 8 | } 9 | -------------------------------------------------------------------------------- /lib/providers/vk.com.js: -------------------------------------------------------------------------------- 1 | // Process vk.com/away.php redirects 2 | // 3 | 4 | 'use strict' 5 | 6 | const URL = require('url').URL 7 | 8 | exports.validate = url => { 9 | const u = new URL(url) 10 | 11 | return u.pathname === '/away.php' && u.searchParams.get('to') 12 | } 13 | 14 | exports.fetch = async url => { 15 | const u = new URL(url) 16 | 17 | return u.searchParams.get('to') 18 | } 19 | -------------------------------------------------------------------------------- /package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "url-unshort", 3 | "version": "6.1.0", 4 | "description": "Expand urls provided by url shortening services.", 5 | "keywords": [ 6 | "unshort", 7 | "expand", 8 | "url" 9 | ], 10 | "repository": "nodeca/url-unshort", 11 | "license": "MIT", 12 | "scripts": { 13 | "lint": "standardx -v .", 14 | "test": "npm run lint && mocha", 15 | "test-all": "npm run lint && LINKS_CHECK=all mocha" 16 | }, 17 | "files": [ 18 | "index.js", 19 | "domains.yml", 20 | "lib/" 21 | ], 22 | "dependencies": { 23 | "cheerio": "^1.0.0-rc.12", 24 | "escape-string-regexp": "^4.0.0", 25 | "got": "^11.8.3", 26 | "is-google-domain": "^1.0.0", 27 | "js-yaml": "^4.1.0", 28 | "lodash.merge": "^4.6.2", 29 | "mdurl": "^1.0.0", 30 | "punycode": "^2.0.1" 31 | }, 32 | "devDependencies": { 33 | "mocha": "^10.1.0", 34 | "mocha.parallel": "^0.15.2", 35 | "nock": "^13.2.4", 36 | "standardx": "^7.0.0" 37 | }, 38 | "mocha": { 39 | "timeout": 60000 40 | }, 41 | "engines": { 42 | "node": ">=14" 43 | } 44 | } 45 | -------------------------------------------------------------------------------- /support/unshort.js: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env node 2 | 3 | 'use strict' 4 | 5 | /* eslint-disable no-console */ 6 | 7 | const params = process.argv.slice(2) 8 | 9 | if (!params.length) { 10 | console.error('Usage: unshort.js URL') 11 | require('process').exit() 12 | } 13 | 14 | const url = params[0] 15 | 16 | require('../')().expand(url).then((to) => { 17 | console.log(to) 18 | }) 19 | -------------------------------------------------------------------------------- /test/cache.js: -------------------------------------------------------------------------------- 1 | 'use strict' 2 | 3 | /* eslint-env mocha */ 4 | 5 | const assert = require('assert') 6 | 7 | describe('Cache', function () { 8 | let uu 9 | let fetchCount = 0 10 | let cache = {} 11 | let result 12 | 13 | before(() => { 14 | uu = require('../')({ 15 | cache: { 16 | get: async key => cache[key], 17 | set: async (key, value) => { 18 | cache[key] = value 19 | return true 20 | } 21 | } 22 | }) 23 | 24 | uu.add('example.org', { 25 | async fetch () { 26 | fetchCount++ 27 | return 'http://foo.bar/' 28 | } 29 | }) 30 | }) 31 | 32 | it('should cache urls', async () => { 33 | cache = {} 34 | 35 | result = await uu.expand('http://example.org/foo') 36 | assert.strictEqual(result, 'http://foo.bar/') 37 | 38 | result = await uu.expand('http://example.org/foo') 39 | assert.strictEqual(result, 'http://foo.bar/') 40 | assert.strictEqual(fetchCount, 1) 41 | }) 42 | 43 | it('should not cache invalid urls', async () => { 44 | cache = {} 45 | 46 | result = await uu.expand('http://invalid-url.com/foo') 47 | assert.strictEqual(result, null) 48 | assert.deepStrictEqual(cache, {}) 49 | }) 50 | 51 | it('should resolve disabled services from cache, if used before', async () => { 52 | cache = { 'http://old.service.com/123': 'http://redirected.to/' } 53 | 54 | result = await uu.expand('http://old.service.com/123') 55 | assert.strictEqual(result, 'http://redirected.to/') 56 | }) 57 | 58 | it('should forward hash to cached value', async () => { 59 | cache = { 'http://old.service.com/123': 'http://redirected.to/' } 60 | 61 | result = await uu.expand('http://old.service.com/123#foo') 62 | assert.strictEqual(result, 'http://redirected.to/#foo') 63 | }) 64 | 65 | it('should cache null result after first fetch', async () => { 66 | uu.add('example2.org', { 67 | fetch: async () => null 68 | }) 69 | 70 | cache = {} 71 | 72 | result = await uu.expand('http://example2.org/foo') 73 | assert.strictEqual(result, null) 74 | assert.deepStrictEqual(cache, { 'http://example2.org/foo': null }) 75 | 76 | result = await uu.expand('http://example2.org/foo') 77 | assert.strictEqual(result, null) 78 | }) 79 | 80 | it('should properly cache last null fetch in nested redirects', async () => { 81 | uu.add('example3.org', { 82 | fetch: async () => 'http://example4.org/test' 83 | }) 84 | 85 | uu.add('example4.org', { 86 | fetch: async () => null 87 | }) 88 | 89 | cache = {} 90 | 91 | result = await uu.expand('http://example3.org/foo') 92 | assert.strictEqual(result, 'http://example4.org/test') 93 | assert.deepStrictEqual(cache, { 'http://example3.org/foo': 'http://example4.org/test' }) 94 | }) 95 | }) 96 | -------------------------------------------------------------------------------- /test/default.js: -------------------------------------------------------------------------------- 1 | 'use strict' 2 | 3 | /* eslint-env mocha */ 4 | 5 | const assert = require('assert') 6 | const nock = require('nock') 7 | const { isErrorFatal } = require('../') 8 | 9 | describe('Default', function () { 10 | let uu 11 | 12 | before(async () => { 13 | uu = require('..')({ 14 | request: { 15 | retry: 0 16 | } 17 | }) 18 | uu.add('example.org') 19 | }) 20 | 21 | it('should process redirect', async () => { 22 | nock('http://example.org') 23 | .get('/foo') 24 | .reply(301, '', { location: 'https://github.com/0' }) 25 | 26 | const result = await uu.expand('http://example.org/foo') 27 | assert.strictEqual(result, 'https://github.com/0') 28 | }) 29 | 30 | it('should parse meta tags', async () => { 31 | const html = '' 32 | nock('http://example.org') 33 | .get('/bar') 34 | .reply(200, html, { 'content-type': 'text/html' }) 35 | 36 | const result = await uu.expand('http://example.org/bar') 37 | assert.strictEqual(result, 'https://github.com/1') 38 | }) 39 | 40 | it("should not process file if it's not html", async () => { 41 | const html = '' 42 | nock('http://example.org') 43 | .get('/zzz') 44 | .reply(200, html, { 'content-type': 'application/json' }) 45 | 46 | const result = await uu.expand('http://example.org/zzz') 47 | assert.strictEqual(result, null) 48 | }) 49 | 50 | it('should return nothing on 404', async () => { 51 | nock('http://example.org') 52 | .get('/baz') 53 | .reply(404, '') 54 | 55 | const result = await uu.expand('http://example.org/baz') 56 | assert.strictEqual(result, null) 57 | }) 58 | 59 | it('should return errors on unknown status codes', async () => { 60 | nock('http://example.org') 61 | .get('/bazzz') 62 | .reply(503, '') 63 | 64 | await assert.rejects( 65 | async () => uu.expand('http://example.org/bazzz'), 66 | err => { 67 | assert.match(err.message, /Remote server error/) 68 | assert.strictEqual(err.code, 'EHTTP') 69 | assert.strictEqual(err.statusCode, 503) 70 | assert.strictEqual(isErrorFatal(err), false) 71 | return true 72 | } 73 | ) 74 | }) 75 | 76 | it('should treat invalid urls as fatal error', async () => { 77 | nock('http://example.org') 78 | .get('/invalid') 79 | .reply(301, '', { location: 'http://xn--/1' }) 80 | 81 | await assert.rejects( 82 | async () => uu.expand('http://example.org/invalid'), 83 | err => { 84 | assert.match(err.message, /Redirected to an invalid location/) 85 | assert.strictEqual(err.code, 'EBADREDIRECT') 86 | assert.strictEqual(isErrorFatal(err), true) 87 | return true 88 | } 89 | ) 90 | }) 91 | 92 | it.skip('should fail on page > 100K', async () => { 93 | const html = ' '.repeat(110000) + '' 94 | nock('http://example.org') 95 | .get('/large') 96 | .reply(200, html, { 'content-type': 'text/html' }) 97 | 98 | const result = await uu.expand('http://example.org/large') 99 | assert.strictEqual(result, null) 100 | }) 101 | }) 102 | -------------------------------------------------------------------------------- /test/expand.js: -------------------------------------------------------------------------------- 1 | 'use strict' 2 | 3 | /* eslint-env mocha */ 4 | 5 | const assert = require('assert') 6 | 7 | const urls = { 8 | 'http://example.org/regular': 'https://github.com/', 9 | 10 | // loop1 -> loop2 -> loop3 -> loop4 -> github 11 | 'http://example.org/loop1': 'http://example.org/loop2', 12 | 'http://example.org/loop2': 'http://example.org/loop3', 13 | 'http://example.org/loop3': 'http://example.org/loop4', 14 | 'http://example.org/loop4': 'https://github.com/', 15 | 16 | // self-referenced 17 | 'http://example.org/cycle': 'http://example.org/cycle', 18 | 19 | // control characters in the output 20 | 'http://example.org/control': 'https://github.com/', 21 | 22 | // invalid protocol 23 | 'http://example.org/file': 'file:///etc/passwd', 24 | 25 | // result has anchor in it 26 | 'http://example.org/hashy': 'https://github.com/foo#bar', 27 | 28 | // relative urls 29 | 'http://example.org/rel1': '//github.com/foo', 30 | 'http://example.org/rel2': '/foo', 31 | 32 | // invalid urls 33 | 'http://example.org/invalid_punycode': 'https://xn--/', 34 | 35 | // internationalized domain names 36 | 'http://example.org/idn1': 'http://www.bücher.de/', 37 | 'http://example.org/idn2': 'http://www.xn--bcher-kva.de/', 38 | 39 | // l1 -> l2 -> null 40 | 'http://example.org/l1': 'http://example.org/l2', 41 | 'http://example.org/l2': null 42 | } 43 | 44 | describe('Expand', function () { 45 | let uu 46 | let result 47 | 48 | before(function () { 49 | uu = require('../')() 50 | 51 | uu.add('example.org', { 52 | fetch: url => urls[url.replace(/^https/, 'http')] 53 | }) 54 | }) 55 | 56 | it('should expand regular url via Promise', async () => { 57 | result = await uu.expand('http://example.org/regular') 58 | assert.strictEqual(result, 'https://github.com/') 59 | }) 60 | 61 | it('should expand url up to 3 levels', async () => { 62 | result = await uu.expand('http://example.org/loop2') 63 | assert.strictEqual(result, 'https://github.com/') 64 | }) 65 | 66 | it('should fail on url nested more than 3 levels', async () => { 67 | await assert.rejects( 68 | async () => uu.expand('http://example.org/loop1'), 69 | /Too many redirects/ 70 | ) 71 | }) 72 | 73 | it('should fail on links redirecting to themselves', async () => { 74 | await assert.rejects( 75 | async () => uu.expand('http://example.org/cycle'), 76 | /Too many redirects/ 77 | ) 78 | }) 79 | 80 | it('should fail on bad protocols', async () => { 81 | await assert.rejects( 82 | async () => uu.expand('http://example.org/file'), 83 | /Redirected to an invalid location/ 84 | ) 85 | }) 86 | 87 | it('should not encode non-url characters', async () => { 88 | result = await uu.expand('http://example.org/control') 89 | assert.strictEqual(result, 'https://github.com/') 90 | }) 91 | 92 | it('should preserve an anchor', async () => { 93 | result = await uu.expand('http://example.org/regular#foobar') 94 | assert.strictEqual(result, 'https://github.com/#foobar') 95 | }) 96 | 97 | it('should respect destination anchor', async () => { 98 | result = await uu.expand('http://example.org/hashy#quux') 99 | assert.strictEqual(result, 'https://github.com/foo#bar') 100 | }) 101 | 102 | it('should accept relative urls without protocol', async () => { 103 | result = await uu.expand('//example.org/regular') 104 | assert.strictEqual(result, 'https://github.com/') 105 | }) 106 | 107 | it('should reject links to relative urls without protocol', async () => { 108 | await assert.rejects( 109 | async () => uu.expand('http://example.org/rel1'), 110 | /Redirected to an invalid location/ 111 | ) 112 | }) 113 | 114 | it('should reject links to relative urls without host', async () => { 115 | await assert.rejects( 116 | async () => uu.expand('http://example.org/rel2'), 117 | /Redirected to an invalid location/ 118 | ) 119 | }) 120 | 121 | it('should reject links to invalid urls', async () => { 122 | await assert.rejects( 123 | async () => uu.expand('http://example.org/invalid_punycode'), 124 | /Redirected to an invalid location/ 125 | ) 126 | }) 127 | 128 | it('should accept IDN (decoded)', async () => { 129 | result = await uu.expand('http://example.org/idn1') 130 | assert.strictEqual(result, 'http://www.bücher.de/') 131 | }) 132 | 133 | it('should accept IDN (punycode)', async () => { 134 | result = await uu.expand('http://example.org/idn2') 135 | assert.strictEqual(result, 'http://www.xn--bcher-kva.de/') 136 | }) 137 | 138 | it('should properly expand url with last null fetch in nested redirects', async () => { 139 | result = await uu.expand('http://example.org/l1') 140 | assert.strictEqual(result, 'http://example.org/l2') 141 | }) 142 | }) 143 | -------------------------------------------------------------------------------- /test/services.js: -------------------------------------------------------------------------------- 1 | 'use strict' 2 | 3 | /* eslint-env mocha */ 4 | 5 | const assert = require('assert') 6 | const read = require('fs').readFileSync 7 | const YAML = require('js-yaml') 8 | const path = require('path') 9 | const punycode = require('punycode/') 10 | const URL = require('url').URL 11 | const uu = require('../')() 12 | const parallel = require('mocha.parallel') 13 | 14 | const urls = YAML.load(read(path.join(__dirname, 'services.yml'), 'utf8')) 15 | const domains = YAML.load(read(path.join(__dirname, '..', 'domains.yml'), 'utf8')) 16 | 17 | const checkAll = (process.env.LINKS_CHECK === 'all') 18 | 19 | // get 2nd level domain, e.g. "foo.example.org" -> "example.org" 20 | function truncateDomain (str) { 21 | return str.split('.').slice(-2).join('.') 22 | } 23 | 24 | describe('Services', function () { 25 | it('all services should be tested', function () { 26 | let expected = [] 27 | const actual = [] 28 | 29 | domains.forEach(function (d) { 30 | if (typeof d === 'string') { 31 | expected.push(d) 32 | } else { 33 | expected = expected.concat(Object.keys(d)) 34 | } 35 | }) 36 | 37 | Object.keys(urls).forEach(function (url) { 38 | const u = new URL(url) 39 | 40 | actual.push(u.host) 41 | }) 42 | 43 | assert.deepStrictEqual( 44 | expected.map(truncateDomain).map(punycode.toUnicode).sort(), 45 | actual.map(truncateDomain).map(punycode.toUnicode).sort() 46 | ) 47 | }) 48 | 49 | parallel('ping services', function () { 50 | let links = Object.keys(urls) 51 | 52 | if (!checkAll) { links = links.slice(0, 1) } 53 | 54 | links.forEach(function (link) { 55 | it(link, async () => { 56 | const result = await uu.expand(link) 57 | assert.strictEqual(result, urls[link]) 58 | }) 59 | }) 60 | }) 61 | }) 62 | -------------------------------------------------------------------------------- /test/services.yml: -------------------------------------------------------------------------------- 1 | # In normal mode we run only the first test, to avoid unnecessary errors in CY. 2 | http://bit.ly/1gMSVzZ: https://github.com/nodeca/url-unshort 3 | 4 | 5 | http://0rz.tw/DvhxQ: https://www.google.ru/maps 6 | http://alturl.com/8iyap: https://github.com/nodeca/url-unshort 7 | http://amzn.to/MyKindleBook: http://www.amazon.com/The-Most-Useful-Websites-ebook/dp/B006R4RN3U 8 | http://bit.do/9ZiH: https://github.com/nodeca/url-unshort 9 | http://chilp.it/e8dcdce: https://github.com/nodeca/url-unshort 10 | https://clck.ru/9ZEvf: https://github.com/nodeca/url-unshort 11 | http://cort.as/VgV0: https://github.com/nodeca/url-unshort 12 | http://cutt.us/0B2XI: https://github.com/nodeca/url-unshort 13 | https://db.tt/c0mFuu1Y: https://www.dropbox.com/s/toyzur6e0m34t7v/dropbox-logos_dropbox-glyph-blue.png 14 | http://fave.co/1Tba9m5: https://github.com/nodeca/url-unshort 15 | https://flic.kr/p/p6kuZs: https://www.flickr.com/photos/europeanspaceagency/15156592796/ 16 | https://goo.gl/HwUfwd: https://github.com/nodeca/url-unshort 17 | https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwj5m5GD_bjLAhUrMJoKHVQiBWsQFggcMAA&url=http%3A%2F%2Fwww.rcdesign.ru%2Farticles%2Favia%2Fdvs_trnr&usg=AFQjCNHTKYt-4cx-4_r7vljdRdm1wspAlA: http://www.rcdesign.ru/articles/avia/dvs_trnr 18 | http://is.gd/YkmUG5: https://github.com/nodeca/url-unshort 19 | http://merky.de/1fce5b: https://github.com/nodeca/url-unshort 20 | http://ow.ly/QCoNe: https://github.com/nodeca/url-unshort 21 | http://shorl.com/gridejynybyso: https://github.com/nodeca/url-unshort 22 | http://smu.gs/1JUWjme: http://www.ianbrodiephoto.net/Image-of-the-Day/Image-of-the-Day/i-mchdBZ3/ 23 | https://t.co/DD3MKQZtXj: https://github.com/nodeca/url-unshort 24 | http://tiny.cc/6d2muz: https://www.youtube.com/watch?v=EHLTVVMxXuA 25 | http://tiny.pl/gx13v: https://github.com/nodeca/url-unshort 26 | https://tinyurl.com/nzezbl8: https://github.com/nodeca/url-unshort 27 | http://tmblr.co/ZIaJJw1raVase: https://calciofication.tumblr.com/post/126240050600/1990 28 | http://tw.gs/3xT0fX: https://github.com/nodeca/url-unshort 29 | http://url.ie/z346: https://github.com/nodeca/url-unshort 30 | http://v.gd/rSZr8O: https://github.com/nodeca/url-unshort 31 | http://vk.cc/45cFoR: https://github.com/nodeca/url-unshort 32 | http://vk.com/away.php?to=https%3A%2F%2Fgithub.com%2Fnodeca%2Furl-unshort: https://github.com/nodeca/url-unshort 33 | http://vurl.com/mZGMO: https://github.com/nodeca/url-unshort 34 | http://winpe.notlong.com/: 'http://technet2.microsoft.com/WindowsVista/en/library/08629d0b-56b0-4194-9782-88d01a488ae01033.mspx?mfr=true' 35 | http://wp.me/pBMYe-Lu: https://wptavern.com/?p=2944 36 | http://xurl.es/p8xti: https://github.com/nodeca/url-unshort 37 | --------------------------------------------------------------------------------