├── .github
    ├── FUNDING.yml
    ├── ISSUE_TEMPLATE.md
    ├── PULL_REQUEST_TEMPLATE.md
    └── workflows
    │   └── ci.yml
├── .gitignore
├── CHANGELOG.md
├── LICENSE
├── README.md
├── deprecated.md
├── domains.yml
├── index.js
├── lib
    ├── error.js
    ├── index.js
    ├── is_error_fatal.js
    └── providers
    │   ├── clck.ru.js
    │   ├── flic.kr.js
    │   ├── google.com.js
    │   ├── index.js
    │   └── vk.com.js
├── package.json
├── support
    └── unshort.js
└── test
    ├── cache.js
    ├── default.js
    ├── expand.js
    ├── services.js
    └── services.yml


/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | open_collective: puzrin
2 | patreon: puzrin
3 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | Prior to request adding new site to defaults, please check:
2 | 
3 | - It should be popular
4 | - It should not be restricted to single domain (fb.me and others)
5 | 
6 | Anything else can be added via library API at user's side.
7 | 


--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | Requirements for adding new site to defaults:
2 | 
3 | - It should be popular
4 | - It should not be restricted to single domain (fb.me and others)
5 | - Test for new site should exist (`npm run test-all`)
6 | 
7 | Anything else can be added via library API at user's side.
8 | 


--------------------------------------------------------------------------------
/.github/workflows/ci.yml:
--------------------------------------------------------------------------------
 1 | name: CI
 2 | 
 3 | on:
 4 |   push:
 5 |   pull_request:
 6 |   schedule:
 7 |     - cron: '0 0 * * 3'
 8 | 
 9 | jobs:
10 |   test:
11 |     runs-on: ubuntu-latest
12 | 
13 |     steps:
14 |     - uses: actions/checkout@v2
15 |     - uses: actions/setup-node@v2
16 | 
17 |     - run: npm install
18 | 
19 |     - name: Test
20 |       run: |
21 |         npm test
22 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | node_modules
2 | doc
3 | *.log
4 | *.swp
5 | 


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
  1 | 6.1.0 / 2022-10-22
  2 | ------------------
  3 | 
  4 | - Deps bump.
  5 | - Deprecated `soo.gd` & `korta.nu`.
  6 | - Fixed `vk.cc` & `vurl.com`.
  7 | 
  8 | 
  9 | 6.0.0 / 2022-05-17
 10 | ------------------
 11 | 
 12 | - Cleanup deprecated redirectors. Move deprecation info to separate file.
 13 | - node.js v14+ required.
 14 | - Renamed option `select` => `link_selector`.
 15 | - Added method `.remove()` method.
 16 | - Add `isErrorFatal` helper.
 17 | - Deps bump.
 18 | 
 19 | 
 20 | 5.0.0 / 2017-06-08
 21 | ------------------
 22 | 
 23 | - Switch to native async/await (need nodejs 7.+)
 24 | - Drop callbacks support.
 25 | 
 26 | 
 27 | 4.1.0 / 2017-06-08
 28 | ------------------
 29 | 
 30 | - Maintenance, deps bump. `got` 6.x -> 7.x. `got` timeouts may work a bit
 31 |   different but should affect result.
 32 | 
 33 | 
 34 | 4.0.0 / 2016-12-08
 35 | ------------------
 36 | 
 37 | - Move request options to `options.request`.
 38 | - Update default User-Agent string.
 39 | - Deprecate `error.status` (use `error.statusCode`).
 40 | - Add more info (code) to error messages.
 41 | - flic.kr should use `.request()` method.
 42 | - Increase default request timeout to 30 seconds.
 43 | 
 44 | 
 45 | 3.1.0 / 2016-12-05
 46 | ------------------
 47 | 
 48 |  - `err.status` -> `err.statusCode` (old `err.status` still exists for backward
 49 |    compatibility, but will be deprecated).
 50 | 
 51 | 
 52 | 3.0.0 / 2016-11-27
 53 | ------------------
 54 | 
 55 | - Rewrite internals to promises (including .require() / cache.get() /
 56 |   cache.set()).
 57 | - Drop old node.js support, now v4.+ required.
 58 | 
 59 | 
 60 | 2.1.0 / 2016-07-15
 61 | ------------------
 62 | 
 63 | - Added `google.*/url` unshortening.
 64 | - Reenabled some glitching services.
 65 | - Added incident dates to default config for tracking progress in future.
 66 | 
 67 | 
 68 | 2.0.0 / 2016-05-24
 69 | ------------------
 70 | 
 71 | - Added Promise support in `.expand` method.
 72 | - Services cleanup.
 73 | 
 74 | 
 75 | 1.1.3 / 2016-01-17
 76 | ------------------
 77 | 
 78 | - Maintenance: deps update.
 79 | 
 80 | 
 81 | 1.1.2 / 2015-12-07
 82 | ------------------
 83 | 
 84 | - Enchanced error info with `code` & `status` properties.
 85 | 
 86 | 
 87 | 1.1.1 / 2015-11-27
 88 | ------------------
 89 | 
 90 | - Improved cache use for edge case with empty result.
 91 | 
 92 | 
 93 | 1.1.0 / 2015-11-25
 94 | ------------------
 95 | 
 96 | - Optimized cache use. Store data only if fetch happened.
 97 | - Increased request timeout to 10 seconds.
 98 | 
 99 | 
100 | 1.0.1 / 2015-10-28
101 | ------------------
102 | 
103 | - Added `vk.com/away.php` support.
104 | 
105 | 
106 | 1.0.0 / 2015-08-16
107 | ------------------
108 | 
109 | - First release.
110 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2015 Vitaly Puzrin.
 2 | 
 3 | Permission is hereby granted, free of charge, to any person
 4 | obtaining a copy of this software and associated documentation
 5 | files (the "Software"), to deal in the Software without
 6 | restriction, including without limitation the rights to use,
 7 | copy, modify, merge, publish, distribute, sublicense, and/or sell
 8 | copies of the Software, and to permit persons to whom the
 9 | Software is furnished to do so, subject to the following
10 | conditions:
11 | 
12 | The above copyright notice and this permission notice shall be
13 | included in all copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
17 | OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18 | NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
19 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
20 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
21 | FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22 | OTHER DEALINGS IN THE SOFTWARE.
23 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # url-unshort
  2 | 
  3 | [![CI](https://github.com/nodeca/url-unshort/actions/workflows/ci.yml/badge.svg)](https://github.com/nodeca/url-unshort/actions/workflows/ci.yml)
  4 | [![NPM version](https://img.shields.io/npm/v/url-unshort.svg?style=flat)](https://www.npmjs.org/package/url-unshort)
  5 | 
  6 | > This library expands urls provided by url shortening services (see [full list](https://github.com/nodeca/url-unshort/blob/master/domains.yml)).
  7 | 
  8 | 
  9 | ## Why should I use it?
 10 | 
 11 | It has been [argued](http://joshua.schachter.org/2009/04/on-url-shorteners) that
 12 | “shorteners are bad for the ecosystem as a whole”. In particular, if you're
 13 | running a forum or a blog, such services might cause trouble for your users:
 14 | 
 15 |  - such links load slower than usual (shortening services require an extra DNS
 16 |    and HTTP request)
 17 |  - it adds another point of failure (should this service go down, the links will
 18 |    die; [301works](https://archive.org/details/301works) tries to solve this,
 19 |    but it's better to avoid the issue in the first place)
 20 |  - users don't see where the link points to (tinyurl previews don't *really*
 21 |    solve this)
 22 |  - it can be used for user activity tracking
 23 |  - certain shortening services are displaying ads before redirect
 24 |  - shortening services can be malicious or be hacked so they could redirect to
 25 |    a completely different place next month
 26 | 
 27 | Also, short links are used to bypass the spam filters. So if you're implementing
 28 | a domain black list for your blog comments, you might want to check where all
 29 | those short links *actually* point to.
 30 | 
 31 | 
 32 | ## Installation
 33 | 
 34 | ```js
 35 | $ npm install url-unshort
 36 | ```
 37 | 
 38 | ## Basic usage
 39 | 
 40 | ```js
 41 | const uu = require('url-unshort')()
 42 | 
 43 | try {
 44 |   const url = await uu.expand('http://goo.gl/HwUfwd')
 45 | 
 46 |   if (url) console.log('Original url is: ${url}')
 47 |   else console.log('This url can\'t be expanded')
 48 | 
 49 | } catch (err) {
 50 |   console.log(err);
 51 | }
 52 | ```
 53 | 
 54 | ## Retrying errors
 55 | 
 56 | Temporary network errors are retried automatically once (`options.request.retry=1` by default).
 57 | 
 58 | You may choose to retry some errors after an extended period of time using code like this:
 59 | 
 60 | ```js
 61 | const uu = require('url-unshort')()
 62 | const { isErrorFatal } = require('url-unshort')
 63 | let tries = 0
 64 | 
 65 | while (true) {
 66 |   try {
 67 |     tries++
 68 |     const url = await uu.expand('http://goo.gl/HwUfwd')
 69 | 
 70 |     // If url is expanded, it returns string (expanded url);
 71 |     // "undefined" is returned if service is unknown
 72 |     if (url) console.log(`Original url is: ${url}`)
 73 |     else console.log("This url can't be expanded")
 74 |     break
 75 | 
 76 |   } catch (err) {
 77 |     // use isErrorFatal function to check if url can be retried or not
 78 |     if (isErrorFatal(err)) {
 79 |       // this url can't be expanded (e.g. 404 error)
 80 |       console.log(`Unshort error (fatal): ${err}`)
 81 |       break
 82 |     }
 83 | 
 84 |     // Temporary error, trying again in 10 minutes
 85 |     // (5xx errors, ECONNRESET, etc.)
 86 |     console.log(`Unshort error (retrying): ${err}`)
 87 |     if (tries >= 3) {
 88 |       console.log(`Too many errors, aborting`)
 89 |       break
 90 |     }
 91 |     await new Promise(resolve => setTimeout(resolve, 10 * 60 * 1000))
 92 |   }
 93 | }
 94 | ```
 95 | 
 96 | 
 97 | ## API
 98 | 
 99 | ### Creating an instance
100 | 
101 | When you create an instance, you can pass an options object to fine-tune unshortener behavior.
102 | 
103 | ```js
104 | const uu = require('url-unshort')({
105 |   nesting: 3,
106 |   cache: {
107 |     get: async key => {},
108 |     set: async (key, value) => {}
109 |   }
110 | });
111 | ```
112 | 
113 | Available options are:
114 | 
115 | - **nesting** (Number, default: `3`) - stop resolving urls
116 |   when `nesting` amount of redirects is reached.
117 | 
118 |   It happens if one shortening service refers to a link belonging to
119 |   another shortening service which in turn points to yet another one
120 |   and so on.
121 | 
122 |   If this limit is reached, `expand()` will return an error.
123 | 
124 | - **cache** (Object) - set a custom cache implementation (e.g. if you wish
125 |   to store urls in Redis).
126 | 
127 |   You need to specify 2 promise-based functions, `set(key, value)` & `get(key)`.
128 | 
129 | - **request** (Object) - default options for
130 |   [got](https://github.com/sindresorhus/got) in `.request()` method. Can be
131 |   used to set custom `User-Agent` and other headers.
132 | 
133 | 
134 | ### uu.expand(url) -> Promise
135 | 
136 | Expand an URL supplied. If we don't know how to expand it, returns `null`.
137 | 
138 | ```js
139 | const uu = require('url-unshort')();
140 | 
141 | try {
142 |   const url = await uu.expand('http://goo.gl/HwUfwd')
143 | 
144 |   if (url) console.log('Original url is: ${url}')
145 |   // no shortening service or an unknown one is used
146 |   else console.log('This url can\'t be expanded')
147 | 
148 | } catch (err) {
149 |   console.log(err)
150 | }
151 | ```
152 | 
153 | ### uu.add(domain [, options])
154 | 
155 | Add a new url shortening service (domain name or an array of them) to the white
156 | list of domains we know how to expand.
157 | 
158 | ```js
159 | uu.add([ 'tinyurl.com', 'bit.ly' ])
160 | ```
161 | 
162 | The default behavior will be to follow the URL with a HEAD request and check
163 | the status code. If it's `3xx`, return the `Location` header. You can override
164 | this behavior by supplying your own function in options.
165 | 
166 | Options:
167 | 
168 | - **aliases** (Array) - Optional. List of alternate domaine names, if exist.
169 | - **match** (String|RegExp) - Optional. Custom regexp to use for URL match.
170 |   For example, if you need to match wildcard prefixes or country-specific
171 |   suffixes. If used with `validate`, then regexp may be not precise, only to
172 |   filter out noise. If `match` not passed, then exact value auto-generated from
173 |   `domain` & `aliases`.
174 | - **validate** (Function) - Optional. Does exact URL check, when complex logic
175 |   required and regexp is not enouth (when `match` is only preliminary). See
176 |   `./lib/providers/*` for example.
177 | - **fetch**  (Function) - Optional. Specifies custom function to retrieve expanded
178 |   url, see `./lib/providers/*` for examples. If not set - default method used
179 |   (it checks 30X redirect codes & `<meta http-equiv="refresh" content='...'>`
180 |   in HTML).
181 | - **link_selector** (String) - Optional. Some sites may return HTML pages instead
182 |   of 302 redirects. This option allows use jquery-like selector to extract
183 |   `<a href="...">` value.
184 | 
185 | Example:
186 | 
187 | ```js
188 | const uu = require('url-unshort')()
189 | 
190 | uu.add('notlong.com', {
191 |   match: '^(https?:)//[a-zA-Z0-9_-]+[.]notlong[.]com/'
192 | })
193 | 
194 | uu.add('tw.gs', {
195 |   link_selector: '#lurllink > a'
196 | })
197 | ```
198 | 
199 | ### uu.remove(domain)
200 | 
201 | (String|Array|Undefined). Opposite to `.add()`. Remove selected domains from
202 | instance config. If no params passed - remove everything.
203 | 
204 | 
205 | ## Security considerations
206 | 
207 | Only `http` and `https` protocols are allowed in the output. Browsers technically
208 | support redirects to other protocols (like `ftp` or `magnet`), but most url
209 | shortening services limit redirects to `http` and `https` anyway. In case
210 | service redirects to an unknown protocol, `expand()` will return an error.
211 | 
212 | `expand()` function returns url from the url shortening **as is** without any
213 | escaping or even ensuring that the url is valid. If you want to guarantee a
214 | valid url as an output, you're encouraged to re-encode it like this:
215 | 
216 | ```js
217 | var URL = require('url');
218 | 
219 | url = await uu.expand('http://goo.gl/HwUfwd')
220 | 
221 | if (url) url = URL.format(URL.parse(url, null, true))
222 | 
223 | console.log(url));
224 | ```
225 | 
226 | ## License
227 | 
228 | [MIT](https://raw.github.com/nodeca/url-unshort/master/LICENSE)
229 | 


--------------------------------------------------------------------------------
/deprecated.md:
--------------------------------------------------------------------------------
  1 | List of outdated domains, removed from `domains.yml`.
  2 | 
  3 | **2.gp**
  4 | 
  5 | Alias: 7.ly
  6 | 
  7 | 2021.11.30. Redirects to another site, old links removed.
  8 | 
  9 | **adfa.st**
 10 | 
 11 | 2016.07.13. Domain lost.
 12 | 
 13 | **b23.ru**
 14 | 
 15 | 2016.07.13. Not responding.
 16 | 
 17 | **budurl.me**
 18 | 
 19 | 2016.07.13. Become adware, old links removed.
 20 | 
 21 | **fur.ly**
 22 | 
 23 | 2016.12.06. Empty main page. 404 to all links.
 24 | 
 25 | **korta.nu**
 26 | 
 27 | 2022.10.22. Not working (accedd denied)
 28 | 
 29 | **macte.ch**
 30 | 
 31 | 2016.07.13. Removed old links & disabled foreign domains
 32 | 
 33 | **migre.me**
 34 | 
 35 | 2021.11.30. Not working.
 36 | 
 37 | **minu.me**
 38 | 
 39 | 2021.11.30. Not responding.
 40 | 
 41 | **nsfw.in**
 42 | 
 43 | 2021.11.30. Cyclic redirect.
 44 | 
 45 | **o-x.fr**
 46 | 
 47 | 2016.07.13. Domain lost.
 48 | 
 49 | **qr.net**
 50 | 
 51 | 2016.12.06. Not working, strange default redirects.
 52 | 
 53 | **scrnch.me**
 54 | 
 55 | 2021.11.30. Domain lost.
 56 | 
 57 | **smsh.me**
 58 | 
 59 | 2016.07.13. Not working.
 60 | 
 61 | **snipurl.com**
 62 | 
 63 | 2021.11.30. "We're migrating to a new server" (and nothing changed)
 64 | 
 65 | **soo.gd**
 66 | 
 67 | 2022.10.22. Domain lost.
 68 | 
 69 | **thecow.me**
 70 | 
 71 | 2021.11.30. Only root page works (and nothing changed)
 72 | 
 73 | **tiny.ly**
 74 | 
 75 | 2016.07.13. Not responding.
 76 | 
 77 | **tnij.org**
 78 | 
 79 | 2021.11.30. Not working.
 80 | 
 81 | **to.ly**
 82 | 
 83 | 2016.07.13. Not responding. Currently redirects to another site, old links removed.
 84 | 
 85 | **tr.im**
 86 | 
 87 | 2022.02.02. Domain lost.
 88 | 
 89 | **trim.li**
 90 | 
 91 | 2021.11.30. Not working.
 92 | 
 93 | **url.az**
 94 | 
 95 | 2016.07.13. Become adware with intermediate page. Old links removed.
 96 | 
 97 | **http://ur1.ca**
 98 | 
 99 | 2021.11.30. Domain lost.
100 | 
101 | **➡.ws**
102 | 
103 | With aliases: ➯.ws, ➔.ws, ➞.ws, ➽.ws, ➹.ws, ✩.ws, ✿.ws, ❥.ws, ›.ws, ⌘.ws, ‽.ws,
104 | ☁.ws, ta.gd, ri.ms.
105 | 
106 | 2021.11.30. All domains unconfigured or lost.


--------------------------------------------------------------------------------
/domains.yml:
--------------------------------------------------------------------------------
 1 | #
 2 | # The list of domains handled by internal rules
 3 | #
 4 | 
 5 | - 0rz.tw
 6 | - alturl.com
 7 | - amzn.to
 8 | - bit.do
 9 | - bit.ly:
10 |     aliases: [ aaja.de, adct.me, archdai.ly, aspt.co, ccwc.me, crks.me,
11 |       bcool.bz, detne.ws, digs.by, drudge.tw, emu.sc, go72.de, got.cr, j.mp,
12 |       j-tv.me, hnnng.de, s.htc.com, kon.gg, livesi.de, perez.ly, rol.st,
13 |       scr.bi, theatln.tc, tgr.ph, trib.in, utsd.us, yhoo.it ]
14 | - chilp.it
15 | - clck.ru
16 | - cort.as
17 | - cutt.us
18 | - db.tt
19 | - fave.co
20 | - flic.kr
21 | - goo.gl
22 | - google.com:
23 |     match: '^(https?:)//(www[.])?google([.]\w+)([.]\w+)?/'
24 | - is.gd
25 | - merky.de
26 | - notlong.com:
27 |     match: '^(https?:)//[a-zA-Z0-9_-]+[.]notlong[.]com/'
28 | - ow.ly:
29 |     aliases: [ owl.li, ht.ly ]
30 | - shorl.com
31 | - smu.gs
32 | - t.co
33 | # 2016.07.13, Domain blacklisted by idiots from RKN. Viva Russia!
34 | - tiny.cc
35 | - tiny.pl
36 | - tinyurl.com
37 | - tmblr.co:
38 |     aliases: [ tumblr.com ]
39 | # 2022.10.22 Shows DB Error
40 | - tw.gs:
41 |     link_selector: '#lurllink > a'
42 | # 2021.11.30 Unstable, but works.
43 | - url.ie
44 | - v.gd:
45 |     link_selector: '.biglink'
46 | - vk.cc
47 | - vk.com:
48 |     aliases: [ vkontakte.ru ]
49 | - vurl.com:
50 |     link_selector: '.padder a:nth-child(3)'
51 | - wp.me
52 | - xurl.es
53 | 


--------------------------------------------------------------------------------
/index.js:
--------------------------------------------------------------------------------
1 | 'use strict'
2 | 
3 | module.exports = require('./lib')
4 | module.exports.Error = require('./lib/error')
5 | module.exports.isErrorFatal = require('./lib/is_error_fatal')
6 | 


--------------------------------------------------------------------------------
/lib/error.js:
--------------------------------------------------------------------------------
 1 | // Error class based on http://stackoverflow.com/questions/8458984
 2 | //
 3 | 'use strict'
 4 | 
 5 | class UnshortError extends Error {
 6 |   constructor (message, code, statusCode) {
 7 |     super(message)
 8 |     this.name = this.constructor.name
 9 |     Error.captureStackTrace(this, this.constructor)
10 | 
11 |     if (code) this.code = code
12 |     if (statusCode) this.statusCode = statusCode
13 |   }
14 | }
15 | 
16 | module.exports = UnshortError
17 | 


--------------------------------------------------------------------------------
/lib/index.js:
--------------------------------------------------------------------------------
  1 | // Main class
  2 | //
  3 | 
  4 | 'use strict'
  5 | 
  6 | const $ = require('cheerio/lib/slim').load('')
  7 | const read = require('fs').readFileSync
  8 | const yaml = require('js-yaml')
  9 | const path = require('path')
 10 | const punycode = require('punycode/')
 11 | const got = require('got')
 12 | const escapeRe = require('escape-string-regexp')
 13 | const merge = require('lodash.merge')
 14 | const URL = require('url').URL
 15 | const UnshortError = require('./error')
 16 | const pkg = require('../package.json')
 17 | 
 18 | const config = yaml.load(read(path.join(__dirname, '..', 'domains.yml'), 'utf8'))
 19 | 
 20 | const defaultAgent = `${pkg.name}/${pkg.version} (+https://github.com/nodeca/url-unshort)`
 21 | 
 22 | const defaultOptions = {
 23 |   timeout: 30 * 1000,
 24 |   retry: 1,
 25 |   followRedirect: false, // redirects are handled manually
 26 |   headers: {
 27 |     'User-Agent': defaultAgent
 28 |   }
 29 | }
 30 | 
 31 | const customProviders = require('./providers/index')
 32 | 
 33 | // Create an unshortener instance
 34 | //
 35 | // options:
 36 | //  - cache   (Object) - cache instance
 37 | //    - get(key) -> Promise
 38 | //    - set(key, value)  -> Promise
 39 | //  - nesting (Number) - max amount of redirects to follow, default: `3`
 40 | //  - request (Object) - default options for `got` in `.request()` method
 41 | //
 42 | function Unshort (options = {}) {
 43 |   if (!(this instanceof Unshort)) return new Unshort(options)
 44 | 
 45 |   this._options = merge({}, defaultOptions, options.request || {})
 46 | 
 47 |   // config data with compiled regexps and fetch functions attached
 48 |   this._sites = []
 49 |   this._compiled_sites = []
 50 | 
 51 |   this.cache = options.cache || {
 52 |     get: async () => {},
 53 |     set: async () => {}
 54 |   }
 55 | 
 56 |   this.nesting = options.nesting || 3
 57 | 
 58 |   // Regexp that matches links to all the known services, it is used
 59 |   // to determine whether url should be processed at all or not.
 60 |   //
 61 |   // Initialized to regexp that matches nothing, it gets overwritten
 62 |   // when domains are added.
 63 |   //
 64 |   this._matchAllRE = /(?!)/
 65 | 
 66 |   // Merge config data & custom providers
 67 |   config.forEach(site => {
 68 |     let domain, options
 69 | 
 70 |     if (typeof site === 'string') {
 71 |       [domain, options] = [site, {}]
 72 |     } else {
 73 |       [domain, options] = Object.entries(site)[0]
 74 |     }
 75 | 
 76 |     if (customProviders[domain]) {
 77 |       Object.assign(options, customProviders[domain])
 78 |     }
 79 | 
 80 |     this.add(domain, options)
 81 |   })
 82 | }
 83 | 
 84 | // Remove previously added domain.
 85 | //
 86 | //  - domain (String|Array) - list of domain names (leave undefined to drop all)
 87 | //
 88 | Unshort.prototype.remove = function (domain) {
 89 |   if (!domain) {
 90 |     this._sites.length = 0
 91 |   } else if (!Array.isArray(domain)) {
 92 |     this._sites = this._sites.filter(s => s.id !== domain)
 93 |   } else {
 94 |     for (const d of domain) {
 95 |       this._sites = this._sites.filter(s => s.id !== d)
 96 |     }
 97 |   }
 98 | 
 99 |   this._compile()
100 | }
101 | 
102 | // Add a domain name to the list of known domains
103 | //
104 | //  - domain (String|Array) - list of domain names
105 | //  - options (Object)       - options for these domains
106 | //    - link_selector (String)  - jquery-like selector to retrieve url with
107 | //    - match (String|RegExp)   - custom regexp to use to match this domain
108 | //    - fetch (Function)         - custom function to retrieve expanded url
109 | //
110 | Unshort.prototype.add = function (domain, options = {}) {
111 |   if (Array.isArray(domain)) {
112 |     for (const d of domain) {
113 |       this._sites.push(Object.assign({ id: d }, options))
114 |     }
115 |   } else {
116 |     this._sites.push(Object.assign({ id: domain }, options))
117 |   }
118 | 
119 |   this._compile()
120 | }
121 | 
122 | // Normalize site data:
123 | //
124 | // - create default handlers if not exist
125 | // - build `match` regexp, depending on other fields
126 | //
127 | // Returns normalized object, suitable for unified processing.
128 | //
129 | Unshort.prototype._compileSingle = function (site) {
130 |   // Prepare list of all the domain names, including aliases
131 |   // and punycode variations
132 |   let dList = [site.id].concat(site.aliases || [])
133 | 
134 |   // create variations + make unique
135 |   dList = Array.from(new Set(
136 |     dList.map(punycode.toASCII).concat(dList.map(punycode.toUnicode))
137 |   ))
138 | 
139 |   let match
140 | 
141 |   if (site.match) {
142 |     // regexp is specified by a user
143 |     match = typeof site.match === 'string'
144 |       ? new RegExp(site.match, 'i')
145 |       : site.match
146 |   } else {
147 |     // regexp is auto-generated out of domain list
148 |     match = new RegExp(
149 |       `^(https?:)?//(www[.])?(${dList.map(escapeRe).join('|')})/`,
150 |       'i'
151 |     )
152 |   }
153 | 
154 |   return Object.assign(
155 |     { fetch: this._defaultFetch, validate: () => true },
156 |     site,
157 |     { match }
158 |   )
159 | }
160 | 
161 | // Rebuild all regexps & default handlers for fast run
162 | Unshort.prototype._compile = function () {
163 |   this._compiled_sites.length = 0
164 | 
165 |   for (const site of this._sites) {
166 |     this._compiled_sites.push(this._compileSingle(site))
167 |   }
168 | 
169 |   // Create global search regexp
170 |   this._matchAllRE = new RegExp(
171 |     this._compiled_sites.map(cs => cs.match.source).join('|'),
172 |     'i'
173 |   )
174 | }
175 | 
176 | // Internal method to perform an http(s) request, it's supposed to be used
177 | // in fetchers. You can override it with custom implementation (for example,
178 | // if you want to avoid http requests at all and use cache only, you can
179 | // replace this with a stub).
180 | //
181 | Unshort.prototype.request = function (url, options) {
182 |   const opts = merge({}, this._options, options || {})
183 | 
184 |   return got(url, opts).catch(err => {
185 |     let statusCode = err.statusCode
186 | 
187 |     if (err.code === 'ERR_NON_2XX_3XX_RESPONSE' && err.response) {
188 |       // https://github.com/sindresorhus/got/blob/main/documentation/8-errors.md
189 |       statusCode = err.response.statusCode
190 |     }
191 | 
192 |     throw new UnshortError(
193 |       `Remote server error, code ${err.code}, statusCode ${statusCode}`,
194 |       'EHTTP',
195 |       statusCode)
196 |   })
197 | }
198 | 
199 | // Expand an URL
200 | //
201 | //  - url      (String)   - url to expand
202 | //
203 | Unshort.prototype.expand = function (url) {
204 |   return this._expand(url)
205 | }
206 | 
207 | // Internal method that expands url recursively up to `nesting` times,
208 | // on each execution it parses input url and calls a fetcher of the
209 | // matching domain.
210 | //
211 | Unshort.prototype._expand = async function (origUrl) {
212 |   if (origUrl.startsWith('//')) {
213 |     try {
214 |       /* eslint-disable no-new */
215 |       new URL(origUrl)
216 |     } catch (e) {
217 |       try {
218 |         // set protocol for relative links like `//example.com`
219 |         new URL('http:' + origUrl)
220 |         origUrl = 'http:' + origUrl
221 |       } catch {}
222 |     }
223 |   }
224 | 
225 |   let url = origUrl
226 |   let shouldCache = false
227 |   let nestingLeft = this.nesting
228 | 
229 |   for (; nestingLeft >= 0; nestingLeft--) {
230 |     let hash = ''
231 | 
232 |     //
233 |     // Normalize url & pre-validate
234 |     //
235 | 
236 |     let u = new URL(url)
237 | 
238 |     // user-submitted url has weird protocol, just return `null` in this case
239 |     if (u.protocol !== 'http:' && u.protocol !== 'https:') break
240 | 
241 |     if (u.hash) {
242 |       // Copying browser-like behavior here: if we're not redirected to a hash,
243 |       // but original url has one, set it as a final hash.
244 |       hash = u.hash
245 |       u.hash = ''
246 |     }
247 | 
248 |     const urlNormalized = u.toString()
249 | 
250 |     //
251 |     // At top level try cache first. On recursive calls skip cache.
252 |     // !! Cache should be probed even for disabled services, to resolve old links.
253 |     //
254 |     let result
255 | 
256 |     if (nestingLeft === this.nesting) {
257 |       result = await this.cache.get(urlNormalized)
258 | 
259 |       // If cache exists - use it.
260 |       if (result || result === null) {
261 |         // forward hash if needed
262 |         if (hash && result) {
263 |           u = new URL(result)
264 |           u.hash = u.hash || hash
265 |           result = u.toString()
266 |         }
267 | 
268 |         return result
269 |       }
270 |     }
271 | 
272 |     //
273 |     // First pass validation (quick).
274 |     //
275 | 
276 |     if (!this._matchAllRE.test(urlNormalized)) break
277 | 
278 |     // Something found - run additional checks.
279 | 
280 |     const siteConfig = this._compiled_sites.find(cs => cs.match.exec(urlNormalized))
281 | 
282 |     if (!siteConfig || !siteConfig.validate(urlNormalized)) break
283 | 
284 |     // Valid redirector => should cache result
285 |     shouldCache = true
286 | 
287 |     result = await siteConfig.fetch.call(this, urlNormalized, siteConfig)
288 | 
289 |     // If unshortener has persistent fail - stop.
290 |     if (!result) break
291 | 
292 |     // Parse and check url
293 |     //
294 |     try {
295 |       u = new URL(result)
296 |     } catch (e) {
297 |       if (e instanceof TypeError && e.message === 'Invalid URL') {
298 |         throw new UnshortError('Redirected to an invalid location', 'EBADREDIRECT')
299 |       }
300 | 
301 |       throw e
302 |     }
303 | 
304 |     if (u.protocol !== 'http:' && u.protocol !== 'https:') {
305 |       // Accept:
306 |       //
307 |       //  - http:// protocol (e.g. http://example.org/)
308 |       //  - https:// protocol (e.g. https://example.org/)
309 |       //
310 |       // Restriction is done for security reasons. Even though browsers
311 |       // can redirect anywhere, most shorteners have similar restrictions.
312 |       //
313 |       throw new UnshortError('Redirected to an invalid location', 'EBADREDIRECT')
314 |     }
315 | 
316 |     // restore hash if needed
317 |     if (hash && !u.hash) {
318 |       u.hash = hash
319 |       result = u.toString()
320 |     }
321 | 
322 |     url = result
323 |   }
324 | 
325 |   if (nestingLeft < 0) {
326 |     throw new UnshortError('Too many redirects', 'EBADREDIRECT')
327 |   }
328 | 
329 |   const result = (url !== origUrl) ? url : null
330 | 
331 |   if (shouldCache) {
332 |     // Cache result.
333 |     // !! use normalized original URL for cache key.
334 |     const uo = new URL(origUrl)
335 | 
336 |     uo.hash = ''
337 | 
338 |     await this.cache.set(uo.toString(), result)
339 |   }
340 | 
341 |   return result
342 | }
343 | 
344 | Unshort.prototype._isRedirect = function (code) {
345 |   return [301, 302, 303, 307, 308].includes(code)
346 | }
347 | 
348 | // Default fetcher, it requests an url and retrieves url it redirects to
349 | // using following data sources:
350 | //
351 | //  - "Location" header if response code is 3xx
352 | //  - <meta http-equiv="refresh"> meta tag
353 | //  - $(selector).attr('href, src') if selector is specified
354 | //
355 | Unshort.prototype._defaultFetch = async function (url, options) {
356 |   let res
357 | 
358 |   try {
359 |     res = await this.request(url)
360 |   } catch (e) {
361 |     if (e.statusCode >= 400 && e.statusCode < 500) return null
362 |     throw e
363 |   }
364 | 
365 |   if (this._isRedirect(res.statusCode)) {
366 |     return res.headers.location ? res.headers.location.trim() : null
367 |   }
368 | 
369 |   if (res.statusCode >= 200 && res.statusCode < 300) {
370 |     if (!res.headers['content-type'] ||
371 |         res.headers['content-type'].split(';')[0].trim() !== 'text/html') {
372 |       return null
373 |     }
374 | 
375 |     const body = String(res.body)
376 | 
377 |     if (options.link_selector) {
378 |       // try to lookup selector if it's defined in the config
379 |       const el = $(body).find(options.link_selector)
380 |       const result = el.attr('href')
381 | 
382 |       if (result) return result.trim()
383 |     }
384 | 
385 |     // try <meta http-equiv="refresh" content="..."> tag
386 |     let refresh = $(body)
387 |       .find('meta[http-equiv="refresh"]')
388 |       .attr('content')
389 | 
390 |     if (!refresh) return null
391 | 
392 |     // parse meta-tag and remove timeout,
393 |     // refresh at this point is like `0.5; url=http://example.org`
394 |     refresh = refresh.replace(/^[^;]+;\s*url=/i, '').trim()
395 | 
396 |     return refresh
397 |   }
398 | 
399 |   throw new UnshortError(
400 |     `Remote server error, code ${res.code}, statusCode ${res.statusCode}`,
401 |     'EHTTP',
402 |     res.statusCode)
403 | }
404 | 
405 | module.exports = Unshort
406 | 


--------------------------------------------------------------------------------
/lib/is_error_fatal.js:
--------------------------------------------------------------------------------
 1 | // Function checks whether an error can be retried later or not
 2 | //
 3 | 'use strict'
 4 | 
 5 | function isErrorFatal (err) {
 6 |   // HTTP errors, fatal errors are everything except:
 7 |   // - 5xx - server-side errors
 8 |   // - 429 - rate limit
 9 |   // - 408 - request timeout
10 |   if (err.statusCode && !String(+err.statusCode).match(/^(5..|429|408)$/)) {
11 |     return true
12 |   }
13 | 
14 |   // EINVAL - bad urls like http://1234
15 |   if (err.code === 'EINVAL') return true
16 | 
17 |   // server returned invalid url or caused redirect loop
18 |   if (err.code === 'EBADREDIRECT') return true
19 | 
20 |   return false
21 | }
22 | 
23 | module.exports = isErrorFatal
24 | 


--------------------------------------------------------------------------------
/lib/providers/clck.ru.js:
--------------------------------------------------------------------------------
 1 | // clck.ru redirects to https://sba.yandex.net/redirect?url=..., which
 2 | // restricts allowed user agents.
 3 | // Let's extract url directly.
 4 | 
 5 | 'use strict'
 6 | 
 7 | const URL = require('url').URL
 8 | const UnshortError = require('../error')
 9 | 
10 | exports.fetch = async function (url) {
11 |   let res
12 |   try {
13 |     res = await this.request(url, { method: 'HEAD' })
14 |   } catch (e) {
15 |     if (e.statusCode >= 400 && e.statusCode < 500) return null
16 |     throw e
17 |   }
18 | 
19 |   if (!this._isRedirect(res.statusCode)) {
20 |     throw new UnshortError(
21 |       `Unexpected server response ${res.statusCode}, expect redirect`,
22 |       'EHTTP',
23 |       500
24 |     )
25 |   }
26 | 
27 |   let dest
28 |   try {
29 |     const u = new URL(res.headers.location)
30 |     dest = u.searchParams.get('url')
31 |   } catch (e) {
32 |     throw new UnshortError(
33 |       'Redirected to an invalid location',
34 |       'EBADREDIRECT'
35 |     )
36 |   }
37 | 
38 |   return dest
39 | }
40 | 


--------------------------------------------------------------------------------
/lib/providers/flic.kr.js:
--------------------------------------------------------------------------------
 1 | // Process flic.kr redirects (including relative urls default fetcher can't do)
 2 | //
 3 | 
 4 | 'use strict'
 5 | 
 6 | const URL = require('url').URL
 7 | const UnshortError = require('../error')
 8 | 
 9 | exports.fetch = async function (url) {
10 |   let nestingLeft = 5
11 | 
12 |   while (nestingLeft--) {
13 |     let res
14 | 
15 |     try {
16 |       res = await this.request(url, { method: 'HEAD' })
17 |     } catch (e) {
18 |       if (e.statusCode >= 400 && e.statusCode < 500) return null
19 |       throw e
20 |     }
21 | 
22 |     if (this._isRedirect(res.statusCode)) {
23 |       try {
24 |         const uDst = new URL(res.headers.location, url)
25 | 
26 |         url = uDst.toString()
27 |         continue
28 |       } catch (e) {
29 |         if (e instanceof TypeError && e.message === 'Invalid URL') {
30 |           throw new UnshortError('Redirected to an invalid location', 'EBADREDIRECT')
31 |         }
32 | 
33 |         throw e
34 |       }
35 |     }
36 | 
37 |     // reached destination
38 |     if (res.statusCode >= 200 && res.statusCode < 300) return url
39 | 
40 |     throw new UnshortError(`Unexpected status code: ${res.statusCode}`, 'EHTTP', res.statusCode)
41 |   }
42 | 
43 |   return null
44 | }
45 | 


--------------------------------------------------------------------------------
/lib/providers/google.com.js:
--------------------------------------------------------------------------------
 1 | // Process google.com/url?... redirects
 2 | //
 3 | 
 4 | 'use strict'
 5 | 
 6 | const URL = require('url').URL
 7 | const isGoogle = require('is-google-domain')
 8 | 
 9 | exports.validate = url => {
10 |   const u = new URL(url)
11 | 
12 |   if (!isGoogle(u.hostname)) return false
13 | 
14 |   return u.pathname === '/url' && u.searchParams.get('url')
15 | }
16 | 
17 | exports.fetch = async url => {
18 |   const u = new URL(url)
19 | 
20 |   return u.searchParams.get('url')
21 | }
22 | 


--------------------------------------------------------------------------------
/lib/providers/index.js:
--------------------------------------------------------------------------------
1 | 'use strict'
2 | 
3 | module.exports = {
4 |   'clck.ru': require('./clck.ru'),
5 |   'flic.kr': require('./flic.kr'),
6 |   'google.com': require('./google.com'),
7 |   'vk.com': require('./vk.com')
8 | }
9 | 


--------------------------------------------------------------------------------
/lib/providers/vk.com.js:
--------------------------------------------------------------------------------
 1 | // Process vk.com/away.php redirects
 2 | //
 3 | 
 4 | 'use strict'
 5 | 
 6 | const URL = require('url').URL
 7 | 
 8 | exports.validate = url => {
 9 |   const u = new URL(url)
10 | 
11 |   return u.pathname === '/away.php' && u.searchParams.get('to')
12 | }
13 | 
14 | exports.fetch = async url => {
15 |   const u = new URL(url)
16 | 
17 |   return u.searchParams.get('to')
18 | }
19 | 


--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
 1 | {
 2 |   "name": "url-unshort",
 3 |   "version": "6.1.0",
 4 |   "description": "Expand urls provided by url shortening services.",
 5 |   "keywords": [
 6 |     "unshort",
 7 |     "expand",
 8 |     "url"
 9 |   ],
10 |   "repository": "nodeca/url-unshort",
11 |   "license": "MIT",
12 |   "scripts": {
13 |     "lint": "standardx -v .",
14 |     "test": "npm run lint && mocha",
15 |     "test-all": "npm run lint && LINKS_CHECK=all mocha"
16 |   },
17 |   "files": [
18 |     "index.js",
19 |     "domains.yml",
20 |     "lib/"
21 |   ],
22 |   "dependencies": {
23 |     "cheerio": "^1.0.0-rc.12",
24 |     "escape-string-regexp": "^4.0.0",
25 |     "got": "^11.8.3",
26 |     "is-google-domain": "^1.0.0",
27 |     "js-yaml": "^4.1.0",
28 |     "lodash.merge": "^4.6.2",
29 |     "mdurl": "^1.0.0",
30 |     "punycode": "^2.0.1"
31 |   },
32 |   "devDependencies": {
33 |     "mocha": "^10.1.0",
34 |     "mocha.parallel": "^0.15.2",
35 |     "nock": "^13.2.4",
36 |     "standardx": "^7.0.0"
37 |   },
38 |   "mocha": {
39 |     "timeout": 60000
40 |   },
41 |   "engines": {
42 |     "node": ">=14"
43 |   }
44 | }
45 | 


--------------------------------------------------------------------------------
/support/unshort.js:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env node
 2 | 
 3 | 'use strict'
 4 | 
 5 | /* eslint-disable no-console */
 6 | 
 7 | const params = process.argv.slice(2)
 8 | 
 9 | if (!params.length) {
10 |   console.error('Usage: unshort.js URL')
11 |   require('process').exit()
12 | }
13 | 
14 | const url = params[0]
15 | 
16 | require('../')().expand(url).then((to) => {
17 |   console.log(to)
18 | })
19 | 


--------------------------------------------------------------------------------
/test/cache.js:
--------------------------------------------------------------------------------
 1 | 'use strict'
 2 | 
 3 | /* eslint-env mocha */
 4 | 
 5 | const assert = require('assert')
 6 | 
 7 | describe('Cache', function () {
 8 |   let uu
 9 |   let fetchCount = 0
10 |   let cache = {}
11 |   let result
12 | 
13 |   before(() => {
14 |     uu = require('../')({
15 |       cache: {
16 |         get: async key => cache[key],
17 |         set: async (key, value) => {
18 |           cache[key] = value
19 |           return true
20 |         }
21 |       }
22 |     })
23 | 
24 |     uu.add('example.org', {
25 |       async fetch () {
26 |         fetchCount++
27 |         return 'http://foo.bar/'
28 |       }
29 |     })
30 |   })
31 | 
32 |   it('should cache urls', async () => {
33 |     cache = {}
34 | 
35 |     result = await uu.expand('http://example.org/foo')
36 |     assert.strictEqual(result, 'http://foo.bar/')
37 | 
38 |     result = await uu.expand('http://example.org/foo')
39 |     assert.strictEqual(result, 'http://foo.bar/')
40 |     assert.strictEqual(fetchCount, 1)
41 |   })
42 | 
43 |   it('should not cache invalid urls', async () => {
44 |     cache = {}
45 | 
46 |     result = await uu.expand('http://invalid-url.com/foo')
47 |     assert.strictEqual(result, null)
48 |     assert.deepStrictEqual(cache, {})
49 |   })
50 | 
51 |   it('should resolve disabled services from cache, if used before', async () => {
52 |     cache = { 'http://old.service.com/123': 'http://redirected.to/' }
53 | 
54 |     result = await uu.expand('http://old.service.com/123')
55 |     assert.strictEqual(result, 'http://redirected.to/')
56 |   })
57 | 
58 |   it('should forward hash to cached value', async () => {
59 |     cache = { 'http://old.service.com/123': 'http://redirected.to/' }
60 | 
61 |     result = await uu.expand('http://old.service.com/123#foo')
62 |     assert.strictEqual(result, 'http://redirected.to/#foo')
63 |   })
64 | 
65 |   it('should cache null result after first fetch', async () => {
66 |     uu.add('example2.org', {
67 |       fetch: async () => null
68 |     })
69 | 
70 |     cache = {}
71 | 
72 |     result = await uu.expand('http://example2.org/foo')
73 |     assert.strictEqual(result, null)
74 |     assert.deepStrictEqual(cache, { 'http://example2.org/foo': null })
75 | 
76 |     result = await uu.expand('http://example2.org/foo')
77 |     assert.strictEqual(result, null)
78 |   })
79 | 
80 |   it('should properly cache last null fetch in nested redirects', async () => {
81 |     uu.add('example3.org', {
82 |       fetch: async () => 'http://example4.org/test'
83 |     })
84 | 
85 |     uu.add('example4.org', {
86 |       fetch: async () => null
87 |     })
88 | 
89 |     cache = {}
90 | 
91 |     result = await uu.expand('http://example3.org/foo')
92 |     assert.strictEqual(result, 'http://example4.org/test')
93 |     assert.deepStrictEqual(cache, { 'http://example3.org/foo': 'http://example4.org/test' })
94 |   })
95 | })
96 | 


--------------------------------------------------------------------------------
/test/default.js:
--------------------------------------------------------------------------------
  1 | 'use strict'
  2 | 
  3 | /* eslint-env mocha */
  4 | 
  5 | const assert = require('assert')
  6 | const nock = require('nock')
  7 | const { isErrorFatal } = require('../')
  8 | 
  9 | describe('Default', function () {
 10 |   let uu
 11 | 
 12 |   before(async () => {
 13 |     uu = require('..')({
 14 |       request: {
 15 |         retry: 0
 16 |       }
 17 |     })
 18 |     uu.add('example.org')
 19 |   })
 20 | 
 21 |   it('should process redirect', async () => {
 22 |     nock('http://example.org')
 23 |       .get('/foo')
 24 |       .reply(301, '', { location: 'https://github.com/0' })
 25 | 
 26 |     const result = await uu.expand('http://example.org/foo')
 27 |     assert.strictEqual(result, 'https://github.com/0')
 28 |   })
 29 | 
 30 |   it('should parse meta tags', async () => {
 31 |     const html = '<html><head><meta http-equiv="refresh" content="10; url=https://github.com/1 "></head><body></body></html>'
 32 |     nock('http://example.org')
 33 |       .get('/bar')
 34 |       .reply(200, html, { 'content-type': 'text/html' })
 35 | 
 36 |     const result = await uu.expand('http://example.org/bar')
 37 |     assert.strictEqual(result, 'https://github.com/1')
 38 |   })
 39 | 
 40 |   it("should not process file if it's not html", async () => {
 41 |     const html = '<html><head><meta http-equiv="refresh" content="10; url=https://github.com/1 "></head><body></body></html>'
 42 |     nock('http://example.org')
 43 |       .get('/zzz')
 44 |       .reply(200, html, { 'content-type': 'application/json' })
 45 | 
 46 |     const result = await uu.expand('http://example.org/zzz')
 47 |     assert.strictEqual(result, null)
 48 |   })
 49 | 
 50 |   it('should return nothing on 404', async () => {
 51 |     nock('http://example.org')
 52 |       .get('/baz')
 53 |       .reply(404, '')
 54 | 
 55 |     const result = await uu.expand('http://example.org/baz')
 56 |     assert.strictEqual(result, null)
 57 |   })
 58 | 
 59 |   it('should return errors on unknown status codes', async () => {
 60 |     nock('http://example.org')
 61 |       .get('/bazzz')
 62 |       .reply(503, '')
 63 | 
 64 |     await assert.rejects(
 65 |       async () => uu.expand('http://example.org/bazzz'),
 66 |       err => {
 67 |         assert.match(err.message, /Remote server error/)
 68 |         assert.strictEqual(err.code, 'EHTTP')
 69 |         assert.strictEqual(err.statusCode, 503)
 70 |         assert.strictEqual(isErrorFatal(err), false)
 71 |         return true
 72 |       }
 73 |     )
 74 |   })
 75 | 
 76 |   it('should treat invalid urls as fatal error', async () => {
 77 |     nock('http://example.org')
 78 |       .get('/invalid')
 79 |       .reply(301, '', { location: 'http://xn--/1' })
 80 | 
 81 |     await assert.rejects(
 82 |       async () => uu.expand('http://example.org/invalid'),
 83 |       err => {
 84 |         assert.match(err.message, /Redirected to an invalid location/)
 85 |         assert.strictEqual(err.code, 'EBADREDIRECT')
 86 |         assert.strictEqual(isErrorFatal(err), true)
 87 |         return true
 88 |       }
 89 |     )
 90 |   })
 91 | 
 92 |   it.skip('should fail on page > 100K', async () => {
 93 |     const html = ' '.repeat(110000) + '<html><head><meta http-equiv="refresh" content="10; url=https://github.com/1 "></head><body></body></html>'
 94 |     nock('http://example.org')
 95 |       .get('/large')
 96 |       .reply(200, html, { 'content-type': 'text/html' })
 97 | 
 98 |     const result = await uu.expand('http://example.org/large')
 99 |     assert.strictEqual(result, null)
100 |   })
101 | })
102 | 


--------------------------------------------------------------------------------
/test/expand.js:
--------------------------------------------------------------------------------
  1 | 'use strict'
  2 | 
  3 | /* eslint-env mocha */
  4 | 
  5 | const assert = require('assert')
  6 | 
  7 | const urls = {
  8 |   'http://example.org/regular': 'https://github.com/',
  9 | 
 10 |   // loop1 -> loop2 -> loop3 -> loop4 -> github
 11 |   'http://example.org/loop1': 'http://example.org/loop2',
 12 |   'http://example.org/loop2': 'http://example.org/loop3',
 13 |   'http://example.org/loop3': 'http://example.org/loop4',
 14 |   'http://example.org/loop4': 'https://github.com/',
 15 | 
 16 |   // self-referenced
 17 |   'http://example.org/cycle': 'http://example.org/cycle',
 18 | 
 19 |   // control characters in the output
 20 |   'http://example.org/control': 'https://github.com/<foo\rbar baz>',
 21 | 
 22 |   // invalid protocol
 23 |   'http://example.org/file': 'file:///etc/passwd',
 24 | 
 25 |   // result has anchor in it
 26 |   'http://example.org/hashy': 'https://github.com/foo#bar',
 27 | 
 28 |   // relative urls
 29 |   'http://example.org/rel1': '//github.com/foo',
 30 |   'http://example.org/rel2': '/foo',
 31 | 
 32 |   // invalid urls
 33 |   'http://example.org/invalid_punycode': 'https://xn--/',
 34 | 
 35 |   // internationalized domain names
 36 |   'http://example.org/idn1': 'http://www.bücher.de/',
 37 |   'http://example.org/idn2': 'http://www.xn--bcher-kva.de/',
 38 | 
 39 |   // l1 -> l2 -> null
 40 |   'http://example.org/l1': 'http://example.org/l2',
 41 |   'http://example.org/l2': null
 42 | }
 43 | 
 44 | describe('Expand', function () {
 45 |   let uu
 46 |   let result
 47 | 
 48 |   before(function () {
 49 |     uu = require('../')()
 50 | 
 51 |     uu.add('example.org', {
 52 |       fetch: url => urls[url.replace(/^https/, 'http')]
 53 |     })
 54 |   })
 55 | 
 56 |   it('should expand regular url via Promise', async () => {
 57 |     result = await uu.expand('http://example.org/regular')
 58 |     assert.strictEqual(result, 'https://github.com/')
 59 |   })
 60 | 
 61 |   it('should expand url up to 3 levels', async () => {
 62 |     result = await uu.expand('http://example.org/loop2')
 63 |     assert.strictEqual(result, 'https://github.com/')
 64 |   })
 65 | 
 66 |   it('should fail on url nested more than 3 levels', async () => {
 67 |     await assert.rejects(
 68 |       async () => uu.expand('http://example.org/loop1'),
 69 |       /Too many redirects/
 70 |     )
 71 |   })
 72 | 
 73 |   it('should fail on links redirecting to themselves', async () => {
 74 |     await assert.rejects(
 75 |       async () => uu.expand('http://example.org/cycle'),
 76 |       /Too many redirects/
 77 |     )
 78 |   })
 79 | 
 80 |   it('should fail on bad protocols', async () => {
 81 |     await assert.rejects(
 82 |       async () => uu.expand('http://example.org/file'),
 83 |       /Redirected to an invalid location/
 84 |     )
 85 |   })
 86 | 
 87 |   it('should not encode non-url characters', async () => {
 88 |     result = await uu.expand('http://example.org/control')
 89 |     assert.strictEqual(result, 'https://github.com/<foo\rbar baz>')
 90 |   })
 91 | 
 92 |   it('should preserve an anchor', async () => {
 93 |     result = await uu.expand('http://example.org/regular#foobar')
 94 |     assert.strictEqual(result, 'https://github.com/#foobar')
 95 |   })
 96 | 
 97 |   it('should respect destination anchor', async () => {
 98 |     result = await uu.expand('http://example.org/hashy#quux')
 99 |     assert.strictEqual(result, 'https://github.com/foo#bar')
100 |   })
101 | 
102 |   it('should accept relative urls without protocol', async () => {
103 |     result = await uu.expand('//example.org/regular')
104 |     assert.strictEqual(result, 'https://github.com/')
105 |   })
106 | 
107 |   it('should reject links to relative urls without protocol', async () => {
108 |     await assert.rejects(
109 |       async () => uu.expand('http://example.org/rel1'),
110 |       /Redirected to an invalid location/
111 |     )
112 |   })
113 | 
114 |   it('should reject links to relative urls without host', async () => {
115 |     await assert.rejects(
116 |       async () => uu.expand('http://example.org/rel2'),
117 |       /Redirected to an invalid location/
118 |     )
119 |   })
120 | 
121 |   it('should reject links to invalid urls', async () => {
122 |     await assert.rejects(
123 |       async () => uu.expand('http://example.org/invalid_punycode'),
124 |       /Redirected to an invalid location/
125 |     )
126 |   })
127 | 
128 |   it('should accept IDN (decoded)', async () => {
129 |     result = await uu.expand('http://example.org/idn1')
130 |     assert.strictEqual(result, 'http://www.bücher.de/')
131 |   })
132 | 
133 |   it('should accept IDN (punycode)', async () => {
134 |     result = await uu.expand('http://example.org/idn2')
135 |     assert.strictEqual(result, 'http://www.xn--bcher-kva.de/')
136 |   })
137 | 
138 |   it('should properly expand url with last null fetch in nested redirects', async () => {
139 |     result = await uu.expand('http://example.org/l1')
140 |     assert.strictEqual(result, 'http://example.org/l2')
141 |   })
142 | })
143 | 


--------------------------------------------------------------------------------
/test/services.js:
--------------------------------------------------------------------------------
 1 | 'use strict'
 2 | 
 3 | /* eslint-env mocha */
 4 | 
 5 | const assert = require('assert')
 6 | const read = require('fs').readFileSync
 7 | const YAML = require('js-yaml')
 8 | const path = require('path')
 9 | const punycode = require('punycode/')
10 | const URL = require('url').URL
11 | const uu = require('../')()
12 | const parallel = require('mocha.parallel')
13 | 
14 | const urls = YAML.load(read(path.join(__dirname, 'services.yml'), 'utf8'))
15 | const domains = YAML.load(read(path.join(__dirname, '..', 'domains.yml'), 'utf8'))
16 | 
17 | const checkAll = (process.env.LINKS_CHECK === 'all')
18 | 
19 | // get 2nd level domain, e.g. "foo.example.org" -> "example.org"
20 | function truncateDomain (str) {
21 |   return str.split('.').slice(-2).join('.')
22 | }
23 | 
24 | describe('Services', function () {
25 |   it('all services should be tested', function () {
26 |     let expected = []
27 |     const actual = []
28 | 
29 |     domains.forEach(function (d) {
30 |       if (typeof d === 'string') {
31 |         expected.push(d)
32 |       } else {
33 |         expected = expected.concat(Object.keys(d))
34 |       }
35 |     })
36 | 
37 |     Object.keys(urls).forEach(function (url) {
38 |       const u = new URL(url)
39 | 
40 |       actual.push(u.host)
41 |     })
42 | 
43 |     assert.deepStrictEqual(
44 |       expected.map(truncateDomain).map(punycode.toUnicode).sort(),
45 |       actual.map(truncateDomain).map(punycode.toUnicode).sort()
46 |     )
47 |   })
48 | 
49 |   parallel('ping services', function () {
50 |     let links = Object.keys(urls)
51 | 
52 |     if (!checkAll) { links = links.slice(0, 1) }
53 | 
54 |     links.forEach(function (link) {
55 |       it(link, async () => {
56 |         const result = await uu.expand(link)
57 |         assert.strictEqual(result, urls[link])
58 |       })
59 |     })
60 |   })
61 | })
62 | 


--------------------------------------------------------------------------------
/test/services.yml:
--------------------------------------------------------------------------------
 1 | # In normal mode we run only the first test, to avoid unnecessary errors in CY.
 2 | http://bit.ly/1gMSVzZ:          https://github.com/nodeca/url-unshort
 3 | 
 4 | 
 5 | http://0rz.tw/DvhxQ:            https://www.google.ru/maps
 6 | http://alturl.com/8iyap:        https://github.com/nodeca/url-unshort
 7 | http://amzn.to/MyKindleBook:    http://www.amazon.com/The-Most-Useful-Websites-ebook/dp/B006R4RN3U
 8 | http://bit.do/9ZiH:             https://github.com/nodeca/url-unshort
 9 | http://chilp.it/e8dcdce:        https://github.com/nodeca/url-unshort
10 | https://clck.ru/9ZEvf:          https://github.com/nodeca/url-unshort
11 | http://cort.as/VgV0:            https://github.com/nodeca/url-unshort
12 | http://cutt.us/0B2XI:           https://github.com/nodeca/url-unshort
13 | https://db.tt/c0mFuu1Y:         https://www.dropbox.com/s/toyzur6e0m34t7v/dropbox-logos_dropbox-glyph-blue.png
14 | http://fave.co/1Tba9m5:         https://github.com/nodeca/url-unshort
15 | https://flic.kr/p/p6kuZs:       https://www.flickr.com/photos/europeanspaceagency/15156592796/
16 | https://goo.gl/HwUfwd:          https://github.com/nodeca/url-unshort
17 | https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwj5m5GD_bjLAhUrMJoKHVQiBWsQFggcMAA&url=http%3A%2F%2Fwww.rcdesign.ru%2Farticles%2Favia%2Fdvs_trnr&usg=AFQjCNHTKYt-4cx-4_r7vljdRdm1wspAlA: http://www.rcdesign.ru/articles/avia/dvs_trnr
18 | http://is.gd/YkmUG5:            https://github.com/nodeca/url-unshort
19 | http://merky.de/1fce5b:         https://github.com/nodeca/url-unshort
20 | http://ow.ly/QCoNe:             https://github.com/nodeca/url-unshort
21 | http://shorl.com/gridejynybyso: https://github.com/nodeca/url-unshort
22 | http://smu.gs/1JUWjme:          http://www.ianbrodiephoto.net/Image-of-the-Day/Image-of-the-Day/i-mchdBZ3/
23 | https://t.co/DD3MKQZtXj:        https://github.com/nodeca/url-unshort
24 | http://tiny.cc/6d2muz:          https://www.youtube.com/watch?v=EHLTVVMxXuA
25 | http://tiny.pl/gx13v:           https://github.com/nodeca/url-unshort
26 | https://tinyurl.com/nzezbl8:    https://github.com/nodeca/url-unshort
27 | http://tmblr.co/ZIaJJw1raVase:  https://calciofication.tumblr.com/post/126240050600/1990
28 | http://tw.gs/3xT0fX:            https://github.com/nodeca/url-unshort
29 | http://url.ie/z346:             https://github.com/nodeca/url-unshort
30 | http://v.gd/rSZr8O:             https://github.com/nodeca/url-unshort
31 | http://vk.cc/45cFoR:            https://github.com/nodeca/url-unshort
32 | http://vk.com/away.php?to=https%3A%2F%2Fgithub.com%2Fnodeca%2Furl-unshort: https://github.com/nodeca/url-unshort
33 | http://vurl.com/mZGMO:          https://github.com/nodeca/url-unshort
34 | http://winpe.notlong.com/:      'http://technet2.microsoft.com/WindowsVista/en/library/08629d0b-56b0-4194-9782-88d01a488ae01033.mspx?mfr=true'
35 | http://wp.me/pBMYe-Lu:          https://wptavern.com/?p=2944
36 | http://xurl.es/p8xti:           https://github.com/nodeca/url-unshort
37 | 


--------------------------------------------------------------------------------