├── .github
    └── FUNDING.yml
├── README.md
├── download.sh
└── download_articles.sh


/.github/FUNDING.yml:
--------------------------------------------------------------------------------
1 | github: AlexanderMelde
2 | custom: "https://www.paypal.me/melde"
3 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Downloader for Heise Magazines
 2 | This is a simple bash script to download magazines as PDF file from https://www.heise.de/select.
 3 | 
 4 | You will need an active subscription to download anything. This is just an alternative to clicking buttons in your browser.
 5 | 
 6 | 
 7 | ## Usage
 8 | 1) Download the script, mark as executable if needed
 9 | 2) Edit script to include your email adress and password for heise.de (at the very beginning of the script)
10 | 2) Only for Windows Users: Install [Ubuntu for Windows](https://ubuntu.com/tutorials/ubuntu-on-windows#1-overview) 
11 | 3) Open the (Ubuntu) bash console terminal window.
12 | 4) Change to the directory you have downloaded the script to (e.g. `cd dl_for_heise`)
13 | 4) Run the script, e.g. to download all issues of the magazine c't from the year 2021:
14 | `./download.sh ct 2021`
15 | 5) You will find all downloaded PDF files as well as .jpg cover thumbnails in newly created subfolders divided by magazine name and year.
16 | 
17 | ## Further Options
18 | - download all c't magazines between 2014 and 2022: `./download.sh ct 2014 2022`
19 | - download other magazines: replace `ct` with whatever is in the URL of the [heise archive page](https://www.heise.de/select), e.g. for the archive of Make: `https://www.heise.de/select/make/archiv` the correct name is `make`, and for Retro Gamer `https://www.heise.de/select/retro-gamer/archiv`, the correct name is `retro-gamer`. Further options include: `ix`, `tr`, `mac-and-i`, `ct-foto`, `ct-wissen`, `ix-special`, ...
20 | - display additional console output by adding -v at the beginning of the command: `./download.sh -v ct 2014 2022`
21 | 
22 | ## Common Failures
23 | - sometimes heise does not provide proper PDF files but internal server errors, the script will detect this and retry a few times (see  `max_tries_per_download` in the script)
24 | - already downloaded files will not be downloaded again
25 | - if you are not authorized to download a certain issue, the script will retry a few times and finally skip the file.
26 | - make sure to replace your email and password in the script
27 | 
28 | ### I only get `Server refused connection, you might not be allowed to download this issue` errors
29 | Please check via your web browser if your subscription allows you to download full-page PDFs. Go to a page like this one https://www.heise.de/select/ct/archiv/2022/3 (replace `ct` with the magazine you want to download) and check if you have a "Download Issue as PDF" button. If you only see the green "Buy issue" and "Buy Subscription" buttons instead, you are not permitted to download full PDFs. Sometimes you will also only be able to download full PDFs for the last few months.
30 | 
31 | Some Heise+ subscriptions only allow you to download single articles. In this case, you can use the second script `download_articles.sh` that will download and merge articles individually.
32 | 
33 | For merging the PDF files you will need to install GhostScript (under Linux) and mark the download script executable:
34 | ```
35 | sudo apt-get install gs
36 | chmod a+x download_articles.sh
37 | ```
38 | 
39 | Edit the script `download_articles.sh` and adapt email and password. The usage is exactly like `download.sh`.
40 | 
41 | ### I get the errors `download.sh: line 2: $'\r': command not found` or `: not foundsh: 2:`
42 | Sometimes your text editor will convert your line endings to the windows format \r\n (CRLF) instead of \n (LF). Most editors will allow you to convert the line endings back to LF. You can also use the following command: `tr -d '\015' < download.sh > download_lf.sh` to create a new file with converted line endings.
43 | 
44 | ## Example Output
45 | ```
46 | Heise Magazine Downloader v1.0
47 | Logging in...
48 | [ct][2022/01][SKIP] Already downloaded.
49 | [ct][2022/02] Downloading...
50 | ################################################################################################################# 100.0%
51 | [ct][2022/02][SUCCESS] Downloaded ct/2022/ct.2022.02.pdf (size: 18488221)
52 | [ct][2022/03][SKIP] Magazine issue does not exist on the server, skipping.
53 | ...
54 | ```
55 | 
56 | ## Thank you to everyone who made this possible!
57 | MyDealz usernames: *tehlers*, *joboza*, *dasd1*
58 | 
59 | Please submit pull requests and write issues in this project if you want to further improve this script.
60 | 
61 | ## Disclaimer
62 | This poject is a community based non-commercial project and not affiliated with Heise Medien GmbH & Co. KG. The script only acts as a client to download files otherwise available via your webbrowser. It does not circumvent any security measures made by the magazines publishers, without an active subscription to their services no downloads will be possible.
63 | 


--------------------------------------------------------------------------------
/download.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/sh
  2 | 
  3 | # User Configuration
  4 | email='name@example.com'
  5 | password='nutella123'
  6 | 
  7 | minimum_pdf_size=5000000 # minimum file size to check if downloaded file is a valid pdf
  8 | wait_between_downloads=80 # wait a few seconds between repetitions on errors to prevent rate limiting
  9 | max_tries_per_download=3 # if a download fails (or is not a valid pdf), repeat this often
 10 | 
 11 | max_nr_of_magazines_per_year=13 # number of issues per year, e.g. ct=27, ix=13 (due to special editions)
 12 | 
 13 | echo 'Heise Magazine Downloader v1.2'
 14 | 
 15 | usage()
 16 | {  
 17 |     echo "Usage: $0 [-v] <ct|make|ix|retro-gamer|...> year [end_year=year]"
 18 |     echo "Example: $0 ct 2022"
 19 |     echo "Example: $0 ct 2011 2022"
 20 |     echo "-v: Verbose Output"
 21 |     exit 1  
 22 | } 
 23 | 
 24 | # Initialize defaults
 25 | verbose=false
 26 | curl_session_file="/tmp/curl_session$(date +%s)"
 27 | count_success=0; count_fail=0; count_skip=0
 28 | info="[\033[0;36mINFO\033[0m]"
 29 | 
 30 | # Read Flags
 31 | while getopts v name; do
 32 |     case $name in
 33 |     v)  verbose=true;;
 34 |     ? | h)  usage
 35 |         exit 2;;
 36 | esac; done
 37 | shift $(($OPTIND -1))
 38 | $verbose && silent_param='' || silent_param='-s'
 39 | 
 40 | # Read Arguments
 41 | [ "$2" = "" ] && usage
 42 | magazine=${1}
 43 | start_year=${2}
 44 | [ "$3" = "" ] && end_year=${start_year} || end_year=${3}
 45 | 
 46 | 
 47 | # Define function to sleep with progessbar
 48 | sleepbar()
 49 | {
 50 |     count=0
 51 |     total=$1
 52 |     pstr="[=============================================================]"
 53 | 
 54 |     while [ $count -lt $total ]; do
 55 |         sleep 1
 56 |         count=$(( $count + 1 ))
 57 |         pd=$(( $count * ${#pstr} / $total ))
 58 |         printf "\rWaiting for retry... ${count}/${total}s - %3d.%1d%% %.${pd}s" $(( $count * 100 / $total )) $(( ($count * 1000 / $total) % 10 )) $pstr
 59 |     done
 60 |     printf "\33[2K\r"
 61 | }
 62 | 
 63 | # Login
 64 | echo "Logging in..."
 65 | curlparams="--no-progress-meter -b ${curl_session_file} -c ${curl_session_file} -k -L"
 66 | curl ${curlparams} "https://www.heise.de/sso/login" >/dev/null 2>&1
 67 | curl ${curlparams} -F 'forward=' -F "username=${email}" -F "password=${password}" -F 'ajax=1' "https://www.heise.de/sso/login/login" -o ${curl_session_file}.html
 68 | token1=$(cat ${curl_session_file}.html | sed "s/token/\ntoken/g" | grep ^token | head -1 | cut -f 3 -d '"')
 69 | token2=$(cat ${curl_session_file}.html | sed "s/token/\ntoken/g" | grep ^token | head -2 | tail -1 | cut -f 3 -d '"')
 70 | curl ${curlparams} -F "token=${token1}" "https://m.heise.de/sso/login/remote-login" >/dev/null 2>&1
 71 | curl ${curlparams} -F "token=${token2}" "https://shop.heise.de/customer/account/loginRemote" >/dev/null 2>&1
 72 | 
 73 | # Download PDFs and Thumbnails
 74 | for year in $(seq -f %g ${start_year} ${end_year}); do
 75 |     $verbose && printf "${info} YEAR ${year}\n" 
 76 |     for i in $(seq -f %g 1 ${max_nr_of_magazines_per_year}); do
 77 |         $verbose && printf "${info} ISSUE ${i}\n" 
 78 |         i_formatted=$(printf "%02d" ${i})
 79 |         file_base_path="${magazine}/${year}/${magazine}.${year}.${i_formatted}"
 80 |         actual_pdf_size=0
 81 |         downloads_tried=1
 82 |         logp="[${magazine}][${year}/${i_formatted}]"
 83 |         if [ ! -f "${file_base_path}.pdf" ]; then
 84 |             # If file is not already downloaded start by downloading the thumbnail
 85 |             $verbose && printf "${log}${info} Downloading Thumbnail\n" 
 86 |             curl ${silent_param} -b ${curl_session_file} -f -k -L --retry 99 "https://heise.cloudimg.io/v7/_www-heise-de_/select/thumbnail/${magazine}/${year}/${i}.jpg" -o "${file_base_path}.jpg" --create-dirs
 87 |             if [ $? -eq 22 ]; then
 88 |                 # If the thumbnail could not be downloaded, the requested issue most likely does not exist
 89 |                 printf "${logp}[\033[0;33mSKIP\033[0m] Magazine issue does not exist on the server, skipping.\n"
 90 |             else
 91 |                 $verbose && printf "${log}${info} Thumbnail downloaded\n" 
 92 |                 # Try downloading the requested issue until a PDF of minimum size is downloaded or until the maximum amount of tries has been reached
 93 |                 until [ ${actual_pdf_size} -gt ${minimum_pdf_size} ] || [ ${downloads_tried} -gt ${max_tries_per_download} ]; do
 94 |                     try="[TRY ${downloads_tried}/${max_tries_per_download}]"
 95 |                     # Download the Header of the requested issue
 96 |                     $verbose && printf "${log}${try}${info} Downloading Header\n"
 97 |                     content_type=$(curl ${silent_param} -f -I -b ${curl_session_file} -k -L "https://www.heise.de/select/${magazine}/archiv/${year}/${i}/download")
 98 |                     response_code=$?
 99 |                     content_type=$(echo "${content_type}" | grep -i "^Content-Type: " | cut -c15- | tr -d '\r')
100 |                     if [ ${response_code} -eq 22 ]; then
101 |                         # If the header could not be loaded, you most likely have no permission to request this file
102 |                         echo "${logp}${try} Server refused connection, you might not be allowed to download this issue."
103 |                         sleepbar ${wait_between_downloads}
104 |                     elif [ "${content_type}" = 'binary/octet-stream' ]; then
105 |                         # If the header states this is a pdf file, download it
106 |                         echo "${logp} Downloading..."
107 |                         actual_pdf_size=$(curl -# -b ${curl_session_file} -f -k -L --retry 99 "https://www.heise.de/select/${magazine}/archiv/${year}/${i}/download" -o "${file_base_path}.pdf" --create-dirs -w '%{size_download}')
108 |                         # actual_pdf_size=$(wc -c < "${file_base_path}.pdf")
109 |                         if [ ${actual_pdf_size} -lt ${minimum_pdf_size} ]; then
110 |                             # If the file size of the downloaded pdf is not reasonably big (too small), we will retry.
111 |                             # This is to prevent the saving of error pages, but should already be avoided using the content type check.
112 |                             echo "${logp}${try} Downloaded file is too small (size: ${actual_pdf_size}/${minimum_pdf_size})."
113 |                             sleepbar ${wait_between_downloads}
114 |                         else
115 |                             printf "${logp}[\033[0;32mSUCCESS\033[0m] Downloaded ${file_base_path}.pdf (size: ${actual_pdf_size})\n"
116 |                         fi
117 |                     else
118 |                         # If the header says it is not a pdf, we will try again.
119 |                         echo "${logp}${try} Server did not serve a valid pdf (instead ${content_type})."
120 |                         sleepbar ${wait_between_downloads}
121 |                     fi
122 |                     downloads_tried=$((downloads_tried+1))
123 |                 done
124 |                 if [ ! -f "${file_base_path}.pdf" ]; then
125 |                     # If for any of the above reasons the download was not succesfull, we log this to the console.
126 |                     printf "${logp}[\033[0;31mERROR\033[0m] Could not download magazine issue. Please try again later.\n"
127 |                     count_fail=$((count_fail+1))
128 |                 else
129 |                     $verbose && printf "${log}${info} Finished Succesfully\n"
130 |                     count_success=$((count_success+1))
131 |                 fi
132 |             fi
133 |         else
134 |             printf "${logp}[\033[0;33mSKIP\033[0m] Already downloaded.\n"
135 |             count_skip=$((count_skip+1))
136 |         fi
137 |     done
138 | done
139 | 
140 | # Summary
141 | echo "Summary: ${count_success} files downloaded succesfully, ${count_fail} failed, ${count_skip} were skipped."
142 | 
143 | # Cleanup Temp Session
144 | if [ -f "${curl_session_file}" ]; then
145 |     $verbose && printf "${info} Clearing Session\n"
146 |     rm ${curl_session_file}.html ${curl_session_file}
147 | fi


--------------------------------------------------------------------------------
/download_articles.sh:
--------------------------------------------------------------------------------
  1 | #!/bin/bash
  2 | 
  3 | # User Configuration
  4 | email='name@example.com'
  5 | password='nutella123'
  6 | 
  7 | minimum_pdf_size=5000 # minimum file size to check if downloaded file is a valid pdf
  8 | wait_between_downloads=80 # wait a few seconds between repetitions on errors to prevent rate limiting
  9 | max_tries_per_download=3 # if a download fails (or is not a valid pdf), repeat this often
 10 | 
 11 | max_nr_of_magazines_per_year=13 # number of issues per year, e.g. ct=27, ix=13 (due to special editions)
 12 | 
 13 | echo 'Heise Magazine Downloader v1.2'
 14 | 
 15 | usage()
 16 | {  
 17 |     echo "Usage: $0 [-v] <ct|make|ix|retro-gamer|...> year [end_year=year]"
 18 |     echo "Example: $0 ct 2022"
 19 |     echo "Example: $0 ct 2011 2022"
 20 |     echo "-v: Verbose Output"
 21 |     exit 1  
 22 | } 
 23 | 
 24 | # Initialize defaults
 25 | verbose=false
 26 | curl_session_file="/tmp/curl_session$(date +%s)"
 27 | count_success=0; count_fail=0; count_skip=0
 28 | info="[\033[0;36mINFO\033[0m]"
 29 | 
 30 | # Read Flags
 31 | while getopts v name; do
 32 |     case $name in
 33 |     v)  verbose=true;;
 34 |     ? | h)  usage
 35 |         exit 2;;
 36 | esac; done
 37 | shift $(($OPTIND -1))
 38 | $verbose && silent_param='' || silent_param='-s'
 39 | 
 40 | # Read Arguments
 41 | [ "$2" = "" ] && usage
 42 | magazine=${1}
 43 | start_year=${2}
 44 | [ "$3" = "" ] && end_year=${start_year} || end_year=${3}
 45 | 
 46 | 
 47 | # Define function to sleep with progessbar
 48 | sleepbar()
 49 | {
 50 |     count=0
 51 |     total=$1
 52 |     pstr="[=============================================================]"
 53 | 
 54 |     while [ $count -lt $total ]; do
 55 |         sleep 1
 56 |         count=$(( $count + 1 ))
 57 |         pd=$(( $count * ${#pstr} / $total ))
 58 |         printf "\rWaiting for retry... ${count}/${total}s - %3d.%1d%% %.${pd}s" $(( $count * 100 / $total )) $(( ($count * 1000 / $total) % 10 )) $pstr
 59 |     done
 60 |     printf "\33[2K\r"
 61 | }
 62 | 
 63 | # Login
 64 | echo "Logging in..."
 65 | curlparams="--no-progress-meter -b ${curl_session_file} -c ${curl_session_file} -k -L"
 66 | curl ${curlparams} "https://www.heise.de/sso/login" >/dev/null 2>&1
 67 | curl ${curlparams} -F 'forward=' -F "username=${email}" -F "password=${password}" -F 'ajax=1' "https://www.heise.de/sso/login/login" -o ${curl_session_file}.html
 68 | token1=$(cat ${curl_session_file}.html | sed "s/token/\ntoken/g" | grep ^token | head -1 | cut -f 3 -d '"')
 69 | token2=$(cat ${curl_session_file}.html | sed "s/token/\ntoken/g" | grep ^token | head -2 | tail -1 | cut -f 3 -d '"')
 70 | curl ${curlparams} -F "token=${token1}" "https://m.heise.de/sso/login/remote-login" >/dev/null 2>&1
 71 | curl ${curlparams} -F "token=${token2}" "https://shop.heise.de/customer/account/loginRemote" >/dev/null 2>&1
 72 | 
 73 | # Download PDFs and Thumbnails
 74 | for year in $(seq -f %g ${start_year} ${end_year}); do
 75 |     $verbose && printf "${info} YEAR ${year}\n" 
 76 |     for i in $(seq -f %g 1 ${max_nr_of_magazines_per_year}); do
 77 |         $verbose && printf "${info} ISSUE ${i}\n" 
 78 |         i_formatted=$(printf "%02d" ${i})
 79 |         file_base_path="${magazine}/${year}/${magazine}.${year}.${i_formatted}"
 80 |         if [ ! -f "${file_base_path}.pdf" ]; then
 81 |             # If file is not already downloaded start by downloading the thumbnail
 82 |             $verbose && printf "${log}${info} Downloading Thumbnail\n" 
 83 |             curl ${silent_param} -b ${curl_session_file} -f -k -L --retry 99 "https://heise.cloudimg.io/v7/_www-heise-de_/select/thumbnail/${magazine}/${year}/${i}.jpg" -o "${file_base_path}.jpg" --create-dirs
 84 |             logp="[${magazine}][${year}/${i_formatted}]"
 85 |             if [ $? -eq 22 ]; then
 86 |                 # If the thumbnail could not be downloaded, the requested issue most likely does not exist
 87 |                 printf "${logp}[\033[0;33mSKIP\033[0m] Magazine issue does not exist on the server, skipping.\n"
 88 |             else
 89 |                 $verbose && printf "${log}${info} Thumbnail downloaded\n" 
 90 | 
 91 |                 articles=$(curl -# -b ${curl_session_file} -f -k -L --retry 99 "https://www.heise.de/select/${magazine}/archiv/${year}/${i}" | grep /select/${magazine}/archiv/${year}/${i}/seite-[0-9]*/pdf -o | cut -d- -f2 | cut -d/ -f1)
 92 |                 for a in $articles; do
 93 |                     file_base_path_article="${magazine}/${year}/${i_formatted}/${magazine}.${year}.${i_formatted}.${a}"
 94 |                     actual_pdf_size=0
 95 |                     downloads_tried=1
 96 |                     # Try downloading the requested issue until a PDF of minimum size is downloaded or until the maximum amount of tries has been reached
 97 |                     until [ ${actual_pdf_size} -gt ${minimum_pdf_size} ] || [ ${downloads_tried} -gt ${max_tries_per_download} ]; do
 98 |                         try="[TRY ${downloads_tried}/${max_tries_per_download}]"
 99 |                         # Download the Header of the requested issue
100 |                         $verbose && printf "${log}${try}${info} Downloading Header\n"
101 |                         content_type=$(curl ${silent_param} -f -I -b ${curl_session_file} -k -L "https://www.heise.de/select/${magazine}/archiv/${year}/${i}/seite-${a}/pdf")
102 |                         response_code=$?
103 |                         content_type=$(echo "${content_type}" | grep -i "^Content-Type: " | cut -c15- | tr -d '\r')
104 |                         if [ ${response_code} -eq 22 ]; then
105 |                             # If the header could not be loaded, you most likely have no permission to request this file
106 |                             echo "${logp}${try} Server refused connection, you might not be allowed to download this issue."
107 |                             sleepbar ${wait_between_downloads}
108 |                         elif [ "${content_type}" = 'binary/octet-stream' ] || [ "${content_type}" = 'application/pdf' ]; then
109 |                             # If the header states this is a pdf file, download it
110 |                             echo "${logp} Downloading..."
111 |                             actual_pdf_size=$(curl -# -b ${curl_session_file} -f -k -L --retry 99 "https://www.heise.de/select/${magazine}/archiv/${year}/${i}/seite-${a}/pdf" -o "${file_base_path_article}.pdf" --create-dirs -w '%{size_download}')
112 |                             # actual_pdf_size=$(wc -c < "${file_base_path_article}.pdf")
113 |                             if [ ${actual_pdf_size} -lt ${minimum_pdf_size} ]; then
114 |                                 # If the file size of the downloaded pdf is not reasonably big (too small), we will retry.
115 |                                 # This is to prevent the saving of error pages, but should already be avoided using the content type check.
116 |                                 echo "${logp}${try} Downloaded file is too small (size: ${actual_pdf_size}/${minimum_pdf_size})."
117 |                                 sleepbar ${wait_between_downloads}
118 |                             else
119 |                                 printf "${logp}[\033[0;32mSUCCESS\033[0m] Downloaded ${file_base_path_article}.pdf (size: ${actual_pdf_size})\n"
120 |                             fi
121 |                         else
122 |                             # If the header says it is not a pdf, we will try again.
123 |                             echo "${logp}${try} Server did not serve a valid pdf (instead ${content_type})."
124 |                             sleepbar ${wait_between_downloads}
125 |                         fi
126 |                         downloads_tried=$((downloads_tried+1))
127 |                     done
128 |                     if [ ! -f "${file_base_path_article}.pdf" ]; then
129 |                         # If for any of the above reasons the download was not succesfull, we log this to the console.
130 |                         printf "${logp}[\033[0;31mERROR\033[0m] Could not download magazine issue. Please try again later.\n"
131 |                         count_fail=$((count_fail+1))
132 |                     else
133 |                         $verbose && printf "${log}${info} Finished Succesfully\n"
134 |                         count_success=$((count_success+1))
135 |                     fi
136 |                 done
137 |                 files=(${articles})
138 |                 if [ ${#files[@]} -gt 0 ]; then
139 |                     files=(${files[@]/#/${magazine}/${year}/${i_formatted}/${magazine}.${year}.${i_formatted}.})
140 |                     files=${files[@]/%/.pdf}
141 |                     gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="${file_base_path}.pdf" ${files}
142 |                     rm ${files}
143 |                 fi
144 |                 rmdir ${magazine}/${year}/${i_formatted} 2> /dev/null
145 |             fi
146 |         else
147 |             printf "${logp}[\033[0;33mSKIP\033[0m] Already downloaded.\n"
148 |             count_skip=$((count_skip+1))
149 |         fi
150 |     done
151 | done
152 | 
153 | # Summary
154 | echo "Summary: ${count_success} files downloaded succesfully, ${count_fail} failed, ${count_skip} were skipped."
155 | 
156 | # Cleanup Temp Session
157 | if [ -f "${curl_session_file}" ]; then
158 |     $verbose && printf "${info} Clearing Session\n"
159 |     rm ${curl_session_file}.html ${curl_session_file}
160 | fi
161 | 


--------------------------------------------------------------------------------