├── .github └── FUNDING.yml ├── README.md ├── download.sh └── download_articles.sh /.github/FUNDING.yml: -------------------------------------------------------------------------------- 1 | github: AlexanderMelde 2 | custom: "https://www.paypal.me/melde" 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Downloader for Heise Magazines 2 | This is a simple bash script to download magazines as PDF file from https://www.heise.de/select. 3 | 4 | You will need an active subscription to download anything. This is just an alternative to clicking buttons in your browser. 5 | 6 | 7 | ## Usage 8 | 1) Download the script, mark as executable if needed 9 | 2) Edit script to include your email adress and password for heise.de (at the very beginning of the script) 10 | 2) Only for Windows Users: Install [Ubuntu for Windows](https://ubuntu.com/tutorials/ubuntu-on-windows#1-overview) 11 | 3) Open the (Ubuntu) bash console terminal window. 12 | 4) Change to the directory you have downloaded the script to (e.g. `cd dl_for_heise`) 13 | 4) Run the script, e.g. to download all issues of the magazine c't from the year 2021: 14 | `./download.sh ct 2021` 15 | 5) You will find all downloaded PDF files as well as .jpg cover thumbnails in newly created subfolders divided by magazine name and year. 16 | 17 | ## Further Options 18 | - download all c't magazines between 2014 and 2022: `./download.sh ct 2014 2022` 19 | - download other magazines: replace `ct` with whatever is in the URL of the [heise archive page](https://www.heise.de/select), e.g. for the archive of Make: `https://www.heise.de/select/make/archiv` the correct name is `make`, and for Retro Gamer `https://www.heise.de/select/retro-gamer/archiv`, the correct name is `retro-gamer`. Further options include: `ix`, `tr`, `mac-and-i`, `ct-foto`, `ct-wissen`, `ix-special`, ... 20 | - display additional console output by adding -v at the beginning of the command: `./download.sh -v ct 2014 2022` 21 | 22 | ## Common Failures 23 | - sometimes heise does not provide proper PDF files but internal server errors, the script will detect this and retry a few times (see `max_tries_per_download` in the script) 24 | - already downloaded files will not be downloaded again 25 | - if you are not authorized to download a certain issue, the script will retry a few times and finally skip the file. 26 | - make sure to replace your email and password in the script 27 | 28 | ### I only get `Server refused connection, you might not be allowed to download this issue` errors 29 | Please check via your web browser if your subscription allows you to download full-page PDFs. Go to a page like this one https://www.heise.de/select/ct/archiv/2022/3 (replace `ct` with the magazine you want to download) and check if you have a "Download Issue as PDF" button. If you only see the green "Buy issue" and "Buy Subscription" buttons instead, you are not permitted to download full PDFs. Sometimes you will also only be able to download full PDFs for the last few months. 30 | 31 | Some Heise+ subscriptions only allow you to download single articles. In this case, you can use the second script `download_articles.sh` that will download and merge articles individually. 32 | 33 | For merging the PDF files you will need to install GhostScript (under Linux) and mark the download script executable: 34 | ``` 35 | sudo apt-get install gs 36 | chmod a+x download_articles.sh 37 | ``` 38 | 39 | Edit the script `download_articles.sh` and adapt email and password. The usage is exactly like `download.sh`. 40 | 41 | ### I get the errors `download.sh: line 2: $'\r': command not found` or `: not foundsh: 2:` 42 | Sometimes your text editor will convert your line endings to the windows format \r\n (CRLF) instead of \n (LF). Most editors will allow you to convert the line endings back to LF. You can also use the following command: `tr -d '\015' < download.sh > download_lf.sh` to create a new file with converted line endings. 43 | 44 | ## Example Output 45 | ``` 46 | Heise Magazine Downloader v1.0 47 | Logging in... 48 | [ct][2022/01][SKIP] Already downloaded. 49 | [ct][2022/02] Downloading... 50 | ################################################################################################################# 100.0% 51 | [ct][2022/02][SUCCESS] Downloaded ct/2022/ct.2022.02.pdf (size: 18488221) 52 | [ct][2022/03][SKIP] Magazine issue does not exist on the server, skipping. 53 | ... 54 | ``` 55 | 56 | ## Thank you to everyone who made this possible! 57 | MyDealz usernames: *tehlers*, *joboza*, *dasd1* 58 | 59 | Please submit pull requests and write issues in this project if you want to further improve this script. 60 | 61 | ## Disclaimer 62 | This poject is a community based non-commercial project and not affiliated with Heise Medien GmbH & Co. KG. The script only acts as a client to download files otherwise available via your webbrowser. It does not circumvent any security measures made by the magazines publishers, without an active subscription to their services no downloads will be possible. 63 | -------------------------------------------------------------------------------- /download.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | # User Configuration 4 | email='name@example.com' 5 | password='nutella123' 6 | 7 | minimum_pdf_size=5000000 # minimum file size to check if downloaded file is a valid pdf 8 | wait_between_downloads=80 # wait a few seconds between repetitions on errors to prevent rate limiting 9 | max_tries_per_download=3 # if a download fails (or is not a valid pdf), repeat this often 10 | 11 | max_nr_of_magazines_per_year=13 # number of issues per year, e.g. ct=27, ix=13 (due to special editions) 12 | 13 | echo 'Heise Magazine Downloader v1.2' 14 | 15 | usage() 16 | { 17 | echo "Usage: $0 [-v] year [end_year=year]" 18 | echo "Example: $0 ct 2022" 19 | echo "Example: $0 ct 2011 2022" 20 | echo "-v: Verbose Output" 21 | exit 1 22 | } 23 | 24 | # Initialize defaults 25 | verbose=false 26 | curl_session_file="/tmp/curl_session$(date +%s)" 27 | count_success=0; count_fail=0; count_skip=0 28 | info="[\033[0;36mINFO\033[0m]" 29 | 30 | # Read Flags 31 | while getopts v name; do 32 | case $name in 33 | v) verbose=true;; 34 | ? | h) usage 35 | exit 2;; 36 | esac; done 37 | shift $(($OPTIND -1)) 38 | $verbose && silent_param='' || silent_param='-s' 39 | 40 | # Read Arguments 41 | [ "$2" = "" ] && usage 42 | magazine=${1} 43 | start_year=${2} 44 | [ "$3" = "" ] && end_year=${start_year} || end_year=${3} 45 | 46 | 47 | # Define function to sleep with progessbar 48 | sleepbar() 49 | { 50 | count=0 51 | total=$1 52 | pstr="[=============================================================]" 53 | 54 | while [ $count -lt $total ]; do 55 | sleep 1 56 | count=$(( $count + 1 )) 57 | pd=$(( $count * ${#pstr} / $total )) 58 | printf "\rWaiting for retry... ${count}/${total}s - %3d.%1d%% %.${pd}s" $(( $count * 100 / $total )) $(( ($count * 1000 / $total) % 10 )) $pstr 59 | done 60 | printf "\33[2K\r" 61 | } 62 | 63 | # Login 64 | echo "Logging in..." 65 | curlparams="--no-progress-meter -b ${curl_session_file} -c ${curl_session_file} -k -L" 66 | curl ${curlparams} "https://www.heise.de/sso/login" >/dev/null 2>&1 67 | curl ${curlparams} -F 'forward=' -F "username=${email}" -F "password=${password}" -F 'ajax=1' "https://www.heise.de/sso/login/login" -o ${curl_session_file}.html 68 | token1=$(cat ${curl_session_file}.html | sed "s/token/\ntoken/g" | grep ^token | head -1 | cut -f 3 -d '"') 69 | token2=$(cat ${curl_session_file}.html | sed "s/token/\ntoken/g" | grep ^token | head -2 | tail -1 | cut -f 3 -d '"') 70 | curl ${curlparams} -F "token=${token1}" "https://m.heise.de/sso/login/remote-login" >/dev/null 2>&1 71 | curl ${curlparams} -F "token=${token2}" "https://shop.heise.de/customer/account/loginRemote" >/dev/null 2>&1 72 | 73 | # Download PDFs and Thumbnails 74 | for year in $(seq -f %g ${start_year} ${end_year}); do 75 | $verbose && printf "${info} YEAR ${year}\n" 76 | for i in $(seq -f %g 1 ${max_nr_of_magazines_per_year}); do 77 | $verbose && printf "${info} ISSUE ${i}\n" 78 | i_formatted=$(printf "%02d" ${i}) 79 | file_base_path="${magazine}/${year}/${magazine}.${year}.${i_formatted}" 80 | actual_pdf_size=0 81 | downloads_tried=1 82 | logp="[${magazine}][${year}/${i_formatted}]" 83 | if [ ! -f "${file_base_path}.pdf" ]; then 84 | # If file is not already downloaded start by downloading the thumbnail 85 | $verbose && printf "${log}${info} Downloading Thumbnail\n" 86 | curl ${silent_param} -b ${curl_session_file} -f -k -L --retry 99 "https://heise.cloudimg.io/v7/_www-heise-de_/select/thumbnail/${magazine}/${year}/${i}.jpg" -o "${file_base_path}.jpg" --create-dirs 87 | if [ $? -eq 22 ]; then 88 | # If the thumbnail could not be downloaded, the requested issue most likely does not exist 89 | printf "${logp}[\033[0;33mSKIP\033[0m] Magazine issue does not exist on the server, skipping.\n" 90 | else 91 | $verbose && printf "${log}${info} Thumbnail downloaded\n" 92 | # Try downloading the requested issue until a PDF of minimum size is downloaded or until the maximum amount of tries has been reached 93 | until [ ${actual_pdf_size} -gt ${minimum_pdf_size} ] || [ ${downloads_tried} -gt ${max_tries_per_download} ]; do 94 | try="[TRY ${downloads_tried}/${max_tries_per_download}]" 95 | # Download the Header of the requested issue 96 | $verbose && printf "${log}${try}${info} Downloading Header\n" 97 | content_type=$(curl ${silent_param} -f -I -b ${curl_session_file} -k -L "https://www.heise.de/select/${magazine}/archiv/${year}/${i}/download") 98 | response_code=$? 99 | content_type=$(echo "${content_type}" | grep -i "^Content-Type: " | cut -c15- | tr -d '\r') 100 | if [ ${response_code} -eq 22 ]; then 101 | # If the header could not be loaded, you most likely have no permission to request this file 102 | echo "${logp}${try} Server refused connection, you might not be allowed to download this issue." 103 | sleepbar ${wait_between_downloads} 104 | elif [ "${content_type}" = 'binary/octet-stream' ]; then 105 | # If the header states this is a pdf file, download it 106 | echo "${logp} Downloading..." 107 | actual_pdf_size=$(curl -# -b ${curl_session_file} -f -k -L --retry 99 "https://www.heise.de/select/${magazine}/archiv/${year}/${i}/download" -o "${file_base_path}.pdf" --create-dirs -w '%{size_download}') 108 | # actual_pdf_size=$(wc -c < "${file_base_path}.pdf") 109 | if [ ${actual_pdf_size} -lt ${minimum_pdf_size} ]; then 110 | # If the file size of the downloaded pdf is not reasonably big (too small), we will retry. 111 | # This is to prevent the saving of error pages, but should already be avoided using the content type check. 112 | echo "${logp}${try} Downloaded file is too small (size: ${actual_pdf_size}/${minimum_pdf_size})." 113 | sleepbar ${wait_between_downloads} 114 | else 115 | printf "${logp}[\033[0;32mSUCCESS\033[0m] Downloaded ${file_base_path}.pdf (size: ${actual_pdf_size})\n" 116 | fi 117 | else 118 | # If the header says it is not a pdf, we will try again. 119 | echo "${logp}${try} Server did not serve a valid pdf (instead ${content_type})." 120 | sleepbar ${wait_between_downloads} 121 | fi 122 | downloads_tried=$((downloads_tried+1)) 123 | done 124 | if [ ! -f "${file_base_path}.pdf" ]; then 125 | # If for any of the above reasons the download was not succesfull, we log this to the console. 126 | printf "${logp}[\033[0;31mERROR\033[0m] Could not download magazine issue. Please try again later.\n" 127 | count_fail=$((count_fail+1)) 128 | else 129 | $verbose && printf "${log}${info} Finished Succesfully\n" 130 | count_success=$((count_success+1)) 131 | fi 132 | fi 133 | else 134 | printf "${logp}[\033[0;33mSKIP\033[0m] Already downloaded.\n" 135 | count_skip=$((count_skip+1)) 136 | fi 137 | done 138 | done 139 | 140 | # Summary 141 | echo "Summary: ${count_success} files downloaded succesfully, ${count_fail} failed, ${count_skip} were skipped." 142 | 143 | # Cleanup Temp Session 144 | if [ -f "${curl_session_file}" ]; then 145 | $verbose && printf "${info} Clearing Session\n" 146 | rm ${curl_session_file}.html ${curl_session_file} 147 | fi -------------------------------------------------------------------------------- /download_articles.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # User Configuration 4 | email='name@example.com' 5 | password='nutella123' 6 | 7 | minimum_pdf_size=5000 # minimum file size to check if downloaded file is a valid pdf 8 | wait_between_downloads=80 # wait a few seconds between repetitions on errors to prevent rate limiting 9 | max_tries_per_download=3 # if a download fails (or is not a valid pdf), repeat this often 10 | 11 | max_nr_of_magazines_per_year=13 # number of issues per year, e.g. ct=27, ix=13 (due to special editions) 12 | 13 | echo 'Heise Magazine Downloader v1.2' 14 | 15 | usage() 16 | { 17 | echo "Usage: $0 [-v] year [end_year=year]" 18 | echo "Example: $0 ct 2022" 19 | echo "Example: $0 ct 2011 2022" 20 | echo "-v: Verbose Output" 21 | exit 1 22 | } 23 | 24 | # Initialize defaults 25 | verbose=false 26 | curl_session_file="/tmp/curl_session$(date +%s)" 27 | count_success=0; count_fail=0; count_skip=0 28 | info="[\033[0;36mINFO\033[0m]" 29 | 30 | # Read Flags 31 | while getopts v name; do 32 | case $name in 33 | v) verbose=true;; 34 | ? | h) usage 35 | exit 2;; 36 | esac; done 37 | shift $(($OPTIND -1)) 38 | $verbose && silent_param='' || silent_param='-s' 39 | 40 | # Read Arguments 41 | [ "$2" = "" ] && usage 42 | magazine=${1} 43 | start_year=${2} 44 | [ "$3" = "" ] && end_year=${start_year} || end_year=${3} 45 | 46 | 47 | # Define function to sleep with progessbar 48 | sleepbar() 49 | { 50 | count=0 51 | total=$1 52 | pstr="[=============================================================]" 53 | 54 | while [ $count -lt $total ]; do 55 | sleep 1 56 | count=$(( $count + 1 )) 57 | pd=$(( $count * ${#pstr} / $total )) 58 | printf "\rWaiting for retry... ${count}/${total}s - %3d.%1d%% %.${pd}s" $(( $count * 100 / $total )) $(( ($count * 1000 / $total) % 10 )) $pstr 59 | done 60 | printf "\33[2K\r" 61 | } 62 | 63 | # Login 64 | echo "Logging in..." 65 | curlparams="--no-progress-meter -b ${curl_session_file} -c ${curl_session_file} -k -L" 66 | curl ${curlparams} "https://www.heise.de/sso/login" >/dev/null 2>&1 67 | curl ${curlparams} -F 'forward=' -F "username=${email}" -F "password=${password}" -F 'ajax=1' "https://www.heise.de/sso/login/login" -o ${curl_session_file}.html 68 | token1=$(cat ${curl_session_file}.html | sed "s/token/\ntoken/g" | grep ^token | head -1 | cut -f 3 -d '"') 69 | token2=$(cat ${curl_session_file}.html | sed "s/token/\ntoken/g" | grep ^token | head -2 | tail -1 | cut -f 3 -d '"') 70 | curl ${curlparams} -F "token=${token1}" "https://m.heise.de/sso/login/remote-login" >/dev/null 2>&1 71 | curl ${curlparams} -F "token=${token2}" "https://shop.heise.de/customer/account/loginRemote" >/dev/null 2>&1 72 | 73 | # Download PDFs and Thumbnails 74 | for year in $(seq -f %g ${start_year} ${end_year}); do 75 | $verbose && printf "${info} YEAR ${year}\n" 76 | for i in $(seq -f %g 1 ${max_nr_of_magazines_per_year}); do 77 | $verbose && printf "${info} ISSUE ${i}\n" 78 | i_formatted=$(printf "%02d" ${i}) 79 | file_base_path="${magazine}/${year}/${magazine}.${year}.${i_formatted}" 80 | if [ ! -f "${file_base_path}.pdf" ]; then 81 | # If file is not already downloaded start by downloading the thumbnail 82 | $verbose && printf "${log}${info} Downloading Thumbnail\n" 83 | curl ${silent_param} -b ${curl_session_file} -f -k -L --retry 99 "https://heise.cloudimg.io/v7/_www-heise-de_/select/thumbnail/${magazine}/${year}/${i}.jpg" -o "${file_base_path}.jpg" --create-dirs 84 | logp="[${magazine}][${year}/${i_formatted}]" 85 | if [ $? -eq 22 ]; then 86 | # If the thumbnail could not be downloaded, the requested issue most likely does not exist 87 | printf "${logp}[\033[0;33mSKIP\033[0m] Magazine issue does not exist on the server, skipping.\n" 88 | else 89 | $verbose && printf "${log}${info} Thumbnail downloaded\n" 90 | 91 | articles=$(curl -# -b ${curl_session_file} -f -k -L --retry 99 "https://www.heise.de/select/${magazine}/archiv/${year}/${i}" | grep /select/${magazine}/archiv/${year}/${i}/seite-[0-9]*/pdf -o | cut -d- -f2 | cut -d/ -f1) 92 | for a in $articles; do 93 | file_base_path_article="${magazine}/${year}/${i_formatted}/${magazine}.${year}.${i_formatted}.${a}" 94 | actual_pdf_size=0 95 | downloads_tried=1 96 | # Try downloading the requested issue until a PDF of minimum size is downloaded or until the maximum amount of tries has been reached 97 | until [ ${actual_pdf_size} -gt ${minimum_pdf_size} ] || [ ${downloads_tried} -gt ${max_tries_per_download} ]; do 98 | try="[TRY ${downloads_tried}/${max_tries_per_download}]" 99 | # Download the Header of the requested issue 100 | $verbose && printf "${log}${try}${info} Downloading Header\n" 101 | content_type=$(curl ${silent_param} -f -I -b ${curl_session_file} -k -L "https://www.heise.de/select/${magazine}/archiv/${year}/${i}/seite-${a}/pdf") 102 | response_code=$? 103 | content_type=$(echo "${content_type}" | grep -i "^Content-Type: " | cut -c15- | tr -d '\r') 104 | if [ ${response_code} -eq 22 ]; then 105 | # If the header could not be loaded, you most likely have no permission to request this file 106 | echo "${logp}${try} Server refused connection, you might not be allowed to download this issue." 107 | sleepbar ${wait_between_downloads} 108 | elif [ "${content_type}" = 'binary/octet-stream' ] || [ "${content_type}" = 'application/pdf' ]; then 109 | # If the header states this is a pdf file, download it 110 | echo "${logp} Downloading..." 111 | actual_pdf_size=$(curl -# -b ${curl_session_file} -f -k -L --retry 99 "https://www.heise.de/select/${magazine}/archiv/${year}/${i}/seite-${a}/pdf" -o "${file_base_path_article}.pdf" --create-dirs -w '%{size_download}') 112 | # actual_pdf_size=$(wc -c < "${file_base_path_article}.pdf") 113 | if [ ${actual_pdf_size} -lt ${minimum_pdf_size} ]; then 114 | # If the file size of the downloaded pdf is not reasonably big (too small), we will retry. 115 | # This is to prevent the saving of error pages, but should already be avoided using the content type check. 116 | echo "${logp}${try} Downloaded file is too small (size: ${actual_pdf_size}/${minimum_pdf_size})." 117 | sleepbar ${wait_between_downloads} 118 | else 119 | printf "${logp}[\033[0;32mSUCCESS\033[0m] Downloaded ${file_base_path_article}.pdf (size: ${actual_pdf_size})\n" 120 | fi 121 | else 122 | # If the header says it is not a pdf, we will try again. 123 | echo "${logp}${try} Server did not serve a valid pdf (instead ${content_type})." 124 | sleepbar ${wait_between_downloads} 125 | fi 126 | downloads_tried=$((downloads_tried+1)) 127 | done 128 | if [ ! -f "${file_base_path_article}.pdf" ]; then 129 | # If for any of the above reasons the download was not succesfull, we log this to the console. 130 | printf "${logp}[\033[0;31mERROR\033[0m] Could not download magazine issue. Please try again later.\n" 131 | count_fail=$((count_fail+1)) 132 | else 133 | $verbose && printf "${log}${info} Finished Succesfully\n" 134 | count_success=$((count_success+1)) 135 | fi 136 | done 137 | files=(${articles}) 138 | if [ ${#files[@]} -gt 0 ]; then 139 | files=(${files[@]/#/${magazine}/${year}/${i_formatted}/${magazine}.${year}.${i_formatted}.}) 140 | files=${files[@]/%/.pdf} 141 | gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile="${file_base_path}.pdf" ${files} 142 | rm ${files} 143 | fi 144 | rmdir ${magazine}/${year}/${i_formatted} 2> /dev/null 145 | fi 146 | else 147 | printf "${logp}[\033[0;33mSKIP\033[0m] Already downloaded.\n" 148 | count_skip=$((count_skip+1)) 149 | fi 150 | done 151 | done 152 | 153 | # Summary 154 | echo "Summary: ${count_success} files downloaded succesfully, ${count_fail} failed, ${count_skip} were skipped." 155 | 156 | # Cleanup Temp Session 157 | if [ -f "${curl_session_file}" ]; then 158 | $verbose && printf "${info} Clearing Session\n" 159 | rm ${curl_session_file}.html ${curl_session_file} 160 | fi 161 | --------------------------------------------------------------------------------