├── package.json
├── requirements.txt
├── .gitignore
├── LICENSE
├── README.md
├── singlefile.py
└── export.py


/package.json:
--------------------------------------------------------------------------------
1 | {
2 |   "dependencies": {
3 |     "single-file-cli": "2.0.75"
4 |   }
5 | }
6 | 


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | beautifulsoup4
2 | requests
3 | jsonpickle
4 | canvasapi
5 | python-dateutil
6 | PyYAML
7 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .vscode
2 | __pycache__/
3 | node_modules/
4 | output/
5 | 
6 | credentials.yaml
7 | cookies.txt
8 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 David Katsandres
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Introduction
  2 | 
  3 | The Canvas Student Data Export Tool exports nearly all of a student's data from the Instructure Canvas Learning Management System (Canvas LMS).  
  4 | This is useful when you are graduating or leaving your college or university, and would like to have a backup of all the data you had in canvas.
  5 | 
  6 | The tool exports the following data:
  7 | - Course Assignments (including submissions and attachments)
  8 | - Course Announcements
  9 | - Course Discussions
 10 | - Course Pages
 11 | - Course Files
 12 | - Course Modules
 13 | - (Optional) HTML snapshots of:
 14 |     - Course Home Page
 15 |     - Grades Page
 16 |     - Assignments
 17 |     - Announcements
 18 |     - Discussions
 19 |     - Modules
 20 | 
 21 | Data is saved in JSON (and optionally HTML) format and organized into folders by academic term and course.
 22 | 
 23 | Example output structure:
 24 | - Fall 2023
 25 |   - CS 101
 26 |     - announcements/
 27 |       - First Announcement/
 28 |         - announcement_1.html
 29 |       - announcement_list.html
 30 |     - assignments/
 31 |       - Sample Assignment/
 32 |         - assignment.html
 33 |         - submission.html
 34 |       - assignment_list.html
 35 |     - course files/
 36 |       - file_1.docx
 37 |       - file_2.png
 38 |     - discussions/
 39 |       - Sample Discussion
 40 |         - discussion_1.html
 41 |       - discussion_list.html
 42 |     - modules/
 43 |       - Sample Module
 44 |         - Sample Assignment.html
 45 |         - Sample Discussion.html
 46 |         - Sample Page.html
 47 |         - Sample Quiz.html
 48 |       - modules_list.html
 49 |     - grades.html
 50 |     - homepage.html
 51 |     - CS 101.json
 52 |   - ENGL 101
 53 |     - ...
 54 | - Spring 2024
 55 |   - ...
 56 | - all_output.json
 57 | 
 58 | # Getting Started
 59 | 
 60 | ## Dependencies
 61 | - Python 3.8 or newer
 62 | - Node.js 16 or newer (only needed for HTML snapshots)
 63 | 
 64 | 1.  **Install Python dependencies:**
 65 |     ```bash
 66 |     pip install -r requirements.txt
 67 |     ```
 68 | 
 69 | 2.  **(Optional) Install SingleFile for HTML snapshots:**
 70 |     This step requires Node.js.
 71 |     ```bash
 72 |     npm install
 73 |     ```
 74 | 
 75 | ## Configuration
 76 | 
 77 | To use the tool, you must create a `credentials.yaml` file in the project root. You can also specify a different path using the `-c` or `--config` command-line option.
 78 | 
 79 | Create the `credentials.yaml` file with the following content:
 80 | 
 81 | ```yaml
 82 | # The URL of your Canvas instance (e.g., https://your-school.instructure.com)
 83 | API_URL: https://example.instructure.com
 84 | # Your Canvas API token
 85 | API_KEY: <Your Canvas API token>
 86 | # Your Canvas User ID
 87 | USER_ID: 123456
 88 | # Path to your browser cookies file (Netscape format).
 89 | # This is only required when using the --singlefile flag.
 90 | COOKIES_PATH: ./cookies.txt
 91 | # (Optional) Path to your Chrome/Chromium executable if SingleFile cannot find it.
 92 | # CHROME_PATH: C:\Program Files\Google\Chrome\Application\chrome.exe
 93 | # (Optional) Timeout in seconds for SingleFile to capture a page. Default: 60
 94 | # Increase this if you see "Capture timeout" errors during HTML snapshots.
 95 | # SINGLEFILE_TIMEOUT: 180
 96 | # (Optional) A list of course IDs to skip when exporting data.
 97 | # COURSES_TO_SKIP:
 98 | #   - 12345
 99 | #   - 67890
100 | ```
101 | 
102 | ### Finding Your Credentials
103 | 
104 | -   **`API_URL`**: Your institution's Canvas URL.
105 | -   **`API_KEY`**: In Canvas, go to `Account` > `Settings`, scroll down to `Approved Integrations`, and click `+ New Access Token`.
106 | -   **`USER_ID`**: After logging into Canvas, visit `https://<your-canvas-url>/api/v1/users/self`. Your browser will show a JSON response; find the `id` field.
107 | -   **`COOKIES_PATH`**: Required **only if** you use the `--singlefile` flag. Browser cookies are needed to download complete HTML pages as if you were logged in. The script will now detect if your cookies are expired or invalid and will stop downloading HTML pages to prevent errors. For best results, log into Canvas and then export your cookies right before running the script. Use a browser extension like "Get cookies.txt Clean" for Chrome to export them in Netscape format.
108 | -   **`CHROME_PATH`** (Optional): The script attempts to auto-detect Chrome/Chromium on Windows, macOS, and Linux. If it fails, you can specify the path here.
109 | -   **`SINGLEFILE_TIMEOUT`** (Optional): Maximum time in seconds to wait for SingleFile to capture a single HTML page. Default is `60` seconds. If you have a slow connection or a busy computer and see "Capture timeout" errors, increase this value.
110 | -   **`COURSES_TO_SKIP`** (Optional): A list of course IDs to exclude from the export. To find a course ID, go to the course's homepage and look at the URL for the number that follows `/courses/`.
111 | 
112 | ## Running the Exporter
113 | 
114 | Once your `credentials.yaml` is set up, run the script:
115 | 
116 | ```bash
117 | python export.py [options]
118 | ```
119 | 
120 | **Options:**
121 | 
122 | | Flag                    | Description                                   | Default            |
123 | | ----------------------- | --------------------------------------------- | ------------------ |
124 | | `-c`, `--config <path>` | Path to your YAML credentials file.           | `credentials.yaml` |
125 | | `-o`, `--output <path>` | Directory to store exported data.             | `./output`         |
126 | | `--singlefile`          | Enable HTML snapshot capture with SingleFile. | Disabled           |
127 | | `-v`, `--verbose`       | Enable verbose output for debugging.          | Disabled           |
128 | | `--version`             | Show the version of the tool and exit.        | N/A                |
129 | 
130 | **Example:**
131 | 
132 | ```bash
133 | # Run with default settings (uses ./credentials.yaml, outputs to ./output)
134 | python export.py
135 | 
136 | # Run with a custom output directory and enable HTML snapshots
137 | python export.py -o /path/to/my-canvas-backup --singlefile
138 | ```
139 | 
140 | After the export is complete, the tool will display a detailed summary of all the data that was successfully extracted, including counts of assignments, files, and pages, as well as any warnings or errors encountered.
141 | 
142 | # Contribute
143 | 
144 | I would love to see this script's functionality expanded and improved! I welcome all pull requests 🙂  
145 | Thank you!
146 | 


--------------------------------------------------------------------------------
/singlefile.py:
--------------------------------------------------------------------------------
  1 | from subprocess import CalledProcessError, run
  2 | import os
  3 | import platform
  4 | import shutil
  5 | import time
  6 | 
  7 | if platform.system() == "Windows":
  8 |     SINGLEFILE_BINARY_PATH = os.path.join("node_modules", ".bin", "single-file.cmd")
  9 | else:
 10 |     SINGLEFILE_BINARY_PATH = os.path.join("node_modules", ".bin", "single-file")
 11 | 
 12 | # Prefer calling the Node entry directly for reliable cross-platform arg passing
 13 | SINGLEFILE_NODE_ENTRY = os.path.join("node_modules", "single-file-cli", "single-file-node.js")
 14 | 
 15 | # Default Chrome/Chromium executable path is determined heuristically per-OS.
 16 | 
 17 | 
 18 | def _detect_chrome_path() -> str:
 19 |     """Return a best-guess path to a Chrome/Chromium executable for the current OS."""
 20 |     system = platform.system().lower()
 21 | 
 22 |     candidates = []
 23 | 
 24 |     if system == "windows":
 25 |         candidates = [
 26 |             r"C:\Program Files\Google\Chrome\Application\chrome.exe",
 27 |             r"C:\Program Files (x86)\Google\Chrome\Application\chrome.exe",
 28 |             r"C:\Program Files\Chromium\Application\chrome.exe",
 29 |         ]
 30 |     elif system == "darwin":  # macOS
 31 |         candidates = [
 32 |             "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
 33 |             "/Applications/Chromium.app/Contents/MacOS/Chromium",
 34 |             "/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary",
 35 |         ]
 36 |     else:  # assume Linux/Unix
 37 |         for name in ["google-chrome", "google-chrome-stable", "chromium-browser", "chromium", "chrome"]:
 38 |             path = shutil.which(name)
 39 |             if path:
 40 |                 return path
 41 | 
 42 |     for path in candidates:
 43 |         if os.path.exists(path):
 44 |             return path
 45 | 
 46 |     # Fallback – rely on SingleFile auto-detect; returns empty string
 47 |     return ""
 48 | 
 49 | 
 50 | # Mutable global – can be overridden at runtime by export.py
 51 | CHROME_PATH = _detect_chrome_path()
 52 | 
 53 | 
 54 | # Default timeout in seconds for SingleFile to complete. Can be overridden.
 55 | SINGLEFILE_TIMEOUT = 60.0  # 1 minute
 56 | 
 57 | 
 58 | def override_chrome_path(path: str):
 59 |     """Allow callers to override the detected Chrome path at runtime."""
 60 |     global CHROME_PATH
 61 |     CHROME_PATH = path.strip()
 62 | 
 63 | 
 64 | def override_singlefile_timeout(timeout: float):
 65 |     """Allow callers to override the SingleFile timeout at runtime."""
 66 |     global SINGLEFILE_TIMEOUT
 67 |     if timeout > 0:
 68 |         SINGLEFILE_TIMEOUT = timeout
 69 | 
 70 | 
 71 | def addQuotes(str):
 72 |     return "\"" + str.strip("\"") + "\""
 73 | 
 74 | 
 75 | def download_page(url, cookies_path, output_path, output_name_template = "", additional_args = (), verbose=False):
 76 |     # Build full output path we expect SingleFile to create
 77 |     expected_output = os.path.join(output_path, output_name_template) if output_name_template else output_path
 78 | 
 79 |     # Prepare argument list for robust cross-platform execution
 80 |     node_path = shutil.which("node")
 81 |     use_shell_string = False
 82 | 
 83 |     # Convert timeout to milliseconds for SingleFile CLI argument
 84 |     timeout_ms = str(int(SINGLEFILE_TIMEOUT * 1000))
 85 | 
 86 |     if node_path and os.path.exists(SINGLEFILE_NODE_ENTRY):
 87 |         cmd_args = [
 88 |             node_path,
 89 |             SINGLEFILE_NODE_ENTRY,
 90 |             url,
 91 |             expected_output,
 92 |             "--filename-conflict-action=overwrite",
 93 |             "--browser-capture-max-time=" + timeout_ms,
 94 |         ]
 95 |         if CHROME_PATH:
 96 |             cmd_args.append("--browser-executable-path=" + CHROME_PATH.strip("\""))
 97 |         if cookies_path:
 98 |             cmd_args.append("--browser-cookies-file=" + cookies_path)
 99 |         # Append any additional CLI args as-is
100 |         cmd_args.extend(list(additional_args))
101 |     else:
102 |         # Fallback to the shim in node_modules/.bin using a shell command
103 |         use_shell_string = True
104 |         args = [
105 |             addQuotes(SINGLEFILE_BINARY_PATH),
106 |             addQuotes(url),
107 |             addQuotes(expected_output),
108 |             "--filename-conflict-action=overwrite",
109 |             "--browser-capture-max-time=" + timeout_ms,
110 |         ]
111 |         if CHROME_PATH:
112 |             args.append("--browser-executable-path=" + addQuotes(CHROME_PATH.strip("\"")))
113 |         if cookies_path:
114 |             args.append("--browser-cookies-file=" + addQuotes(cookies_path))
115 |         args.extend(additional_args)
116 |         cmd_args = " ".join(args)
117 | 
118 |     try:
119 |         if verbose:
120 |             if isinstance(cmd_args, list):
121 |                 print(f"    Executing: {' '.join(cmd_args)}")
122 |             else:
123 |                 print(f"    Executing: {cmd_args}")
124 | 
125 |         proc = run(cmd_args, shell=use_shell_string, check=True, capture_output=True)
126 | 
127 |         # Decode outputs immediately so we can surface them even if the file check fails
128 |         stdout_text = proc.stdout.decode("utf-8", errors="replace").strip()
129 |         stderr_text = proc.stderr.decode("utf-8", errors="replace").strip()
130 | 
131 |         # Optionally show SingleFile logs right after the process exits
132 |         if verbose:
133 |             if stdout_text:
134 |                 print(stdout_text)
135 |             if stderr_text:
136 |                 # SingleFile prints non-error info to stderr; show only in verbose mode
137 |                 print(stderr_text)
138 | 
139 |         # Wait for the file to exist and be readable (handles Windows write/lock delays)
140 |         start_time = time.monotonic()
141 |         deadline = start_time + SINGLEFILE_TIMEOUT + 5.0  # seconds, add buffer
142 |         delay = 0.1
143 |         while True:
144 |             try:
145 |                 if not os.path.exists(expected_output):
146 |                     raise FileNotFoundError(expected_output)
147 |                 with open(expected_output, "r", encoding="utf-8") as f:
148 |                     content = f.read()
149 | 
150 |                 # Detect login page content
151 |                 login_indicators = [
152 |                     "<title>Log in to Canvas</title>",
153 |                     'id="new_login_data"',
154 |                     'autocomplete="current-password"',
155 |                 ]
156 |                 if any(indicator in content for indicator in login_indicators):
157 |                     # Clean up the invalid file
158 |                     try:
159 |                         os.remove(expected_output)
160 |                     except Exception:
161 |                         pass
162 |                     raise Exception("Authentication failed, downloaded a login page. Please update your cookies.")
163 | 
164 |                 break  # success
165 |             except (PermissionError, FileNotFoundError) as e:
166 |                 now = time.monotonic()
167 |                 if now >= deadline:
168 |                     # Enrich the error with SingleFile logs for better diagnostics
169 |                     elapsed = now - start_time
170 |                     details = [
171 |                         f"SingleFile produced no readable output within {elapsed:.1f}s",
172 |                         f"URL: {url}",
173 |                         f"Expected path: {expected_output}",
174 |                         f"Exit code: {proc.returncode}",
175 |                     ]
176 |                     if stdout_text:
177 |                         details.append(f"stdout:\n{stdout_text}")
178 |                     if stderr_text:
179 |                         details.append(f"stderr:\n{stderr_text}")
180 |                     raise Exception("\n".join(details)) from e
181 |                 time.sleep(min(delay, deadline - now))
182 |                 delay = min(delay * 1.5, 1.0)
183 | 
184 |     except CalledProcessError as e:
185 |         # Re-raise with more context including both stdout and stderr
186 |         stderr_text = ""
187 |         stdout_text = ""
188 |         try:
189 |             stderr_text = e.stderr.decode('utf-8', errors='replace') if e.stderr is not None else ""
190 |         except Exception:
191 |             pass
192 |         try:
193 |             stdout_text = e.stdout.decode('utf-8', errors='replace') if e.stdout is not None else ""
194 |         except Exception:
195 |             pass
196 |         msg_parts = [f"SingleFile failed for {url}."]
197 |         if stdout_text:
198 |             msg_parts.append(f"stdout:\n{stdout_text}")
199 |         if stderr_text:
200 |             msg_parts.append(f"stderr:\n{stderr_text}")
201 |         raise Exception("\n".join(msg_parts)) from e
202 |     except Exception as e:
203 |         # Propagate our own exceptions
204 |         raise e
205 | 
206 | #if __name__ == "__main__":
207 |     #download_page("https://www.google.com/", "", "./output/test", "test.html")
208 | 


--------------------------------------------------------------------------------
/export.py:
--------------------------------------------------------------------------------
   1 | # built in
   2 | import json
   3 | import os
   4 | import itertools
   5 | import re
   6 | import string
   7 | import unicodedata
   8 | import argparse
   9 | import sys
  10 | 
  11 | # external
  12 | from bs4 import BeautifulSoup
  13 | from canvasapi import Canvas
  14 | from canvasapi.exceptions import ResourceDoesNotExist, Unauthorized, Forbidden, InvalidAccessToken, CanvasException
  15 | from singlefile import download_page, override_chrome_path, override_singlefile_timeout
  16 | import dateutil.parser
  17 | import jsonpickle
  18 | import requests
  19 | import yaml
  20 | 
  21 | # Canvas API Error Handling Utility
  22 | class CanvasErrorHandler:
  23 |     @staticmethod
  24 |     def handle_canvas_exception(e, operation_description="operation"):
  25 |         """
  26 |         Handle Canvas API exceptions with appropriate messaging and classification.
  27 |         Returns (error_type, message)
  28 |         """
  29 |         if isinstance(e, InvalidAccessToken):
  30 |             return "authentication", f"Invalid Canvas API token. Please check your credentials.yaml file."
  31 |         
  32 |         elif isinstance(e, Unauthorized):
  33 |             # Check if this is a known student limitation
  34 |             if "submissions" in operation_description.lower():
  35 |                 return "student_limitation", f"Not authorized to download every student's assignment submission. This is normal for student accounts."
  36 |             elif "file" in operation_description.lower():
  37 |                 return "student_limitation", f"Not authorized to download some course files. This is normal for student accounts."
  38 |             else:
  39 |                 return "authorization", f"Not authorized to perform {operation_description}. Check your Canvas permissions."
  40 |         
  41 |         elif isinstance(e, Forbidden):
  42 |             return "student_limitation", f"Access forbidden for {operation_description}. This may be normal for student accounts."
  43 |         
  44 |         elif isinstance(e, ResourceDoesNotExist):
  45 |             return "not_found", f"Resource not found for {operation_description}. It may have been deleted or moved."
  46 |         
  47 |         elif isinstance(e, CanvasException):
  48 |             return "canvas_error", f"Canvas API error during {operation_description}: {str(e)}"
  49 |         
  50 |         else:
  51 |             return "unknown_error", f"Unexpected error during {operation_description}: {str(e)}"
  52 |     
  53 |     @staticmethod
  54 |     def log_error(error_type, message, show_details=True, verbose=False):
  55 |         """Log error messages with appropriate formatting"""
  56 |         if error_type == "student_limitation":
  57 |             if show_details:
  58 |                 print(f"    Note: {message}")
  59 |         elif error_type == "not_found":
  60 |             print(f"    Skipping: {message}")
  61 |         elif error_type in ["authentication", "authorization", "canvas_error", "unknown_error"]:
  62 |             print(f"    ERROR: {message}")
  63 |             if verbose:
  64 |                 import traceback
  65 |                 traceback.print_exc()
  66 |         else:
  67 |             print(f"    {message}")
  68 |             
  69 |     @staticmethod
  70 |     def is_fatal_error(error_type):
  71 |         """Check if an error type should stop execution"""
  72 |         return error_type in ["authentication", "canvas_error", "authorization"]
  73 | 
  74 | # Add counters for tracking successful extractions
  75 | class ExtractionStats:
  76 |     def __init__(self):
  77 |         self.assignments_found = 0
  78 |         self.submissions_found = 0
  79 |         self.announcements_found = 0
  80 |         self.discussions_found = 0
  81 |         self.pages_found = 0
  82 |         self.modules_found = 0
  83 |         self.module_items_found = 0
  84 |         self.files_downloaded = 0
  85 |         self.attachments_downloaded = 0
  86 |         self.html_pages_downloaded = 0
  87 |         self.json_files_created = 0
  88 |         self.student_limitation_warnings = 0
  89 |         self.error_count = 0
  90 |         
  91 |     def summary(self, dl_location, singlefile_enabled=False):
  92 |         summary_text = f"""
  93 | Data Extraction Summary:
  94 |   • {self.assignments_found} assignments found
  95 |   • {self.submissions_found} submissions found (your own)
  96 |   • {self.announcements_found} announcements found
  97 |   • {self.discussions_found} discussions found
  98 |   • {self.pages_found} pages found
  99 |   • {self.modules_found} modules found
 100 |   • {self.module_items_found} module items found
 101 | 
 102 | Files Downloaded:
 103 |   • {self.files_downloaded} course files downloaded
 104 |   • {self.attachments_downloaded} assignment attachments downloaded"""
 105 | 
 106 |         if singlefile_enabled:
 107 |             summary_text += f"\n  • {self.html_pages_downloaded} HTML pages captured"
 108 | 
 109 |         summary_text += f"""
 110 | 
 111 | Data Exports Created:
 112 |   • {self.json_files_created} JSON data files created
 113 |   • Individual course data: {dl_location}/[Term]/[Course]/[Course].json
 114 |   • Combined data: {dl_location}/all_output.json
 115 | 
 116 | Student Account Limitations: {self.student_limitation_warnings} (expected)
 117 | Errors Encountered: {self.error_count}
 118 | """
 119 |         return summary_text
 120 | 
 121 | # Global stats tracker
 122 | extraction_stats = ExtractionStats()
 123 | 
 124 | def _load_credentials(path: str) -> dict:
 125 |     """Return a dict with API_URL, API_KEY, USER_ID, COOKIES_PATH or empty dict if file missing."""
 126 |     try:
 127 |         with open(path, "r", encoding="utf-8") as f:
 128 |             return yaml.full_load(f) or {}
 129 |     except FileNotFoundError:
 130 |         return {}
 131 | 
 132 | # Placeholder globals – will be overwritten in __main__ once we have parsed CLI args.
 133 | API_URL = ""
 134 | API_KEY = ""
 135 | USER_ID = 0
 136 | COOKIES_PATH = ""
 137 | 
 138 | # Directory in which to download course information to (will be created if not
 139 | # present)
 140 | DL_LOCATION = "./output"
 141 | # List of Course IDs that should be skipped
 142 | COURSES_TO_SKIP = []
 143 | 
 144 | DATE_TEMPLATE = "%B %d, %Y %I:%M %p"
 145 | 
 146 | # Max PATH length is 260 characters on Windows. 70 is just an estimate for a reasonable max folder name to prevent the chance of reaching the limit
 147 | # Applies to modules, assignments, announcements, and discussions
 148 | # If a folder exceeds this limit, a "-" will be added to the end to indicate it was shortened ("..." not valid)
 149 | MAX_FOLDER_NAME_SIZE = 70
 150 | 
 151 | # Global flag to stop HTML downloads if cookies are invalid
 152 | stop_html_downloads = False
 153 | 
 154 | 
 155 | class moduleItemView():
 156 |     id = 0
 157 |     
 158 |     title = ""
 159 |     content_type = ""
 160 |     
 161 |     url = ""
 162 |     external_url = ""
 163 | 
 164 | 
 165 | class moduleView():
 166 |     id = 0
 167 | 
 168 |     name = ""
 169 |     items = []
 170 | 
 171 |     def __init__(self):
 172 |         self.items = []
 173 | 
 174 | 
 175 | class pageView():
 176 |     id = 0
 177 | 
 178 |     title = ""
 179 |     body = ""
 180 |     created_date = ""
 181 |     last_updated_date = ""
 182 | 
 183 | 
 184 | class topicReplyView():
 185 |     id = 0
 186 | 
 187 |     author = ""
 188 |     posted_date = ""
 189 |     body = ""
 190 | 
 191 | 
 192 | class topicEntryView():
 193 |     id = 0
 194 | 
 195 |     author = ""
 196 |     posted_date = ""
 197 |     body = ""
 198 |     topic_replies = []
 199 | 
 200 |     def __init__(self):
 201 |         self.topic_replies = []
 202 | 
 203 | 
 204 | class discussionView():
 205 |     id = 0
 206 | 
 207 |     title = ""
 208 |     author = ""
 209 |     posted_date = ""
 210 |     body = ""
 211 |     topic_entries = []
 212 | 
 213 |     url = ""
 214 |     amount_pages = 0
 215 | 
 216 |     def __init__(self):
 217 |         self.topic_entries = []
 218 | 
 219 | 
 220 | class submissionView():
 221 |     id = 0
 222 | 
 223 |     attachments = []
 224 |     grade = ""
 225 |     raw_score = ""
 226 |     submission_comments = ""
 227 |     total_possible_points = ""
 228 |     attempt = 0
 229 |     user_id = "no-id"
 230 | 
 231 |     preview_url = ""
 232 |     ext_url = ""
 233 | 
 234 |     def __init__(self):
 235 |         self.attachments = []
 236 | 
 237 | class attachmentView():
 238 |     id = 0
 239 | 
 240 |     filename = ""
 241 |     url = ""
 242 | 
 243 | class assignmentView():
 244 |     id = 0
 245 | 
 246 |     title = ""
 247 |     description = ""
 248 |     assigned_date = ""
 249 |     due_date = ""
 250 |     submissions = []
 251 | 
 252 |     html_url = ""
 253 |     ext_url = ""
 254 |     updated_url = ""
 255 |     
 256 |     def __init__(self):
 257 |         self.submissions = []
 258 | 
 259 | 
 260 | class courseView():
 261 |     course_id = 0
 262 |     
 263 |     term = ""
 264 |     course_code = ""
 265 |     name = ""
 266 |     assignments = []
 267 |     announcements = []
 268 |     discussions = []
 269 |     modules = []
 270 | 
 271 |     def __init__(self):
 272 |         self.assignments = []
 273 |         self.announcements = []
 274 |         self.discussions = []
 275 |         self.modules = []
 276 | 
 277 | def makeValidFilename(input_str):
 278 |     if(not input_str):
 279 |         return input_str
 280 | 
 281 |     # Normalize Unicode and whitespace
 282 |     input_str = unicodedata.normalize('NFKC', input_str)
 283 |     input_str = input_str.replace("\u00A0", " ") # NBSP to space
 284 |     input_str = re.sub(r"\s+", " ", input_str)
 285 | 
 286 |     # Remove invalid characters
 287 |     valid_chars = "-_.() %s%s" % (string.ascii_letters, string.digits)
 288 |     input_str = input_str.replace("+"," ") # Canvas default for spaces
 289 |     input_str = input_str.replace(":","-")
 290 |     input_str = input_str.replace("/","-")
 291 |     input_str = "".join(c for c in input_str if c in valid_chars)
 292 | 
 293 |     # Remove leading and trailing whitespace
 294 |     input_str = input_str.lstrip().rstrip()
 295 | 
 296 |     # Remove trailing periods
 297 |     input_str = input_str.rstrip(".")
 298 | 
 299 |     return input_str
 300 | 
 301 | def makeValidFolderPath(input_str):
 302 |     # Normalize Unicode and whitespace
 303 |     input_str = unicodedata.normalize('NFKC', input_str)
 304 |     input_str = input_str.replace("\u00A0", " ") # NBSP to space
 305 |     input_str = re.sub(r"\s+", " ", input_str)
 306 | 
 307 |     # Remove invalid characters
 308 |     valid_chars = "-_.()/ %s%s" % (string.ascii_letters, string.digits)
 309 |     input_str = input_str.replace("+"," ") # Canvas default for spaces
 310 |     input_str = input_str.replace(":","-")
 311 |     input_str = "".join(c for c in input_str if c in valid_chars)
 312 | 
 313 |     # Remove leading and trailing whitespace, separators
 314 |     input_str = input_str.lstrip().rstrip().strip("/").strip("\\")
 315 | 
 316 |     # Remove trailing periods
 317 |     input_str = input_str.rstrip(".")
 318 | 
 319 |     # Replace path separators with OS default
 320 |     input_str=input_str.replace("/",os.sep)
 321 | 
 322 |     return input_str
 323 | 
 324 | def shortenFileName(string, shorten_by) -> str:
 325 |     if (not string or shorten_by <= 0):
 326 |         return string
 327 | 
 328 |     # Shorten string by specified value + 1 for "-" to indicate incomplete file name (trailing periods not allowed)
 329 |     string = string[:len(string)-(shorten_by + 1)]
 330 | 
 331 |     string = string.rstrip().rstrip(".").rstrip("-")
 332 |     string += "-"
 333 |     
 334 |     return string
 335 | 
 336 | 
 337 | def findCourseModules(course, course_view):
 338 |     modules_dir = os.path.join(DL_LOCATION, course_view.term,
 339 |                                course_view.course_code, "modules")
 340 | 
 341 |     # Create modules directory if not present
 342 |     if not os.path.exists(modules_dir):
 343 |         os.makedirs(modules_dir)
 344 | 
 345 |     module_views = []
 346 | 
 347 |     try:
 348 |         modules = course.get_modules()
 349 |         modules_list = list(modules)  # Convert to list to get count
 350 |         
 351 |         if not modules_list:
 352 |             print("    No modules found in this course")
 353 |         else:
 354 |             print(f"    Found {len(modules_list)} modules")
 355 | 
 356 |         for module in modules_list:
 357 |             module_view = moduleView()
 358 | 
 359 |             # ID
 360 |             module_view.id = module.id if hasattr(module, "id") else 0
 361 | 
 362 |             # Name
 363 |             module_view.name = str(module.name) if hasattr(module, "name") else ""
 364 |             print(f"      Processing module: {module_view.name}")
 365 | 
 366 |             try:
 367 |                 # Get module items
 368 |                 module_items = module.get_module_items()
 369 |                 module_items_list = list(module_items)
 370 |                 
 371 |                 if module_items_list:
 372 |                     print(f"        Found {len(module_items_list)} items")
 373 |                 
 374 |                 for module_item in module_items_list:
 375 |                     module_item_view = moduleItemView()
 376 | 
 377 |                     # ID
 378 |                     module_item_view.id = module_item.id if hasattr(module_item, "id") else 0
 379 | 
 380 |                     # Title
 381 |                     module_item_view.title = str(module_item.title) if hasattr(module_item, "title") else ""
 382 |                     # Type
 383 |                     module_item_view.content_type = str(module_item.type) if hasattr(module_item, "type") else ""
 384 | 
 385 |                     # URL
 386 |                     module_item_view.url = str(module_item.html_url) if hasattr(module_item, "html_url") else ""
 387 |                     # External URL
 388 |                     module_item_view.external_url = str(module_item.external_url) if hasattr(module_item, "external_url") else ""
 389 | 
 390 |                     if module_item_view.content_type == "File":
 391 |                         # If problems arise due to long pathnames, changing module.name to module.id might help
 392 |                         # A change would also have to be made in downloadCourseModulePages(api_url, course_view, cookies_path)
 393 |                         module_name = makeValidFilename(str(module.name))
 394 |                         module_name = shortenFileName(module_name, len(module_name) - MAX_FOLDER_NAME_SIZE)
 395 |                         module_dir = os.path.join(modules_dir, module_name, "files")
 396 | 
 397 |                         try:
 398 |                             # Create directory for current module if not present
 399 |                             if not os.path.exists(module_dir):
 400 |                                 os.makedirs(module_dir)
 401 | 
 402 |                             # Get the file object
 403 |                             module_file = course.get_file(str(module_item.content_id))
 404 | 
 405 |                             # Create path for module file download
 406 |                             module_file_path = os.path.join(module_dir, makeValidFilename(str(module_file.display_name)))
 407 | 
 408 |                             # Download file if it doesn't already exist
 409 |                             if not os.path.exists(module_file_path):
 410 |                                 module_file.download(module_file_path)
 411 |                                 extraction_stats.files_downloaded += 1
 412 |                                 print(f"        Downloaded: {module_file.display_name}")
 413 |                             else:
 414 |                                 print(f"        File already exists: {module_file.display_name}")
 415 |                         except Exception as e:
 416 |                             error_type, message = CanvasErrorHandler.handle_canvas_exception(
 417 |                                 e, "module file download"
 418 |                             )
 419 |                             if error_type == "student_limitation":
 420 |                                 extraction_stats.student_limitation_warnings += 1
 421 |                             elif error_type == "not_found":
 422 |                                 pass  # Already handled by log_error
 423 |                             else:
 424 |                                 extraction_stats.error_count += 1
 425 |                             CanvasErrorHandler.log_error(error_type, message)
 426 | 
 427 |                     module_view.items.append(module_item_view)
 428 |                     extraction_stats.module_items_found += 1
 429 |             except Exception as e:
 430 |                 error_type, message = CanvasErrorHandler.handle_canvas_exception(
 431 |                     e, "module item processing"
 432 |                 )
 433 |                 CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 434 |                 extraction_stats.error_count += 1
 435 | 
 436 |             module_views.append(module_view)
 437 |             extraction_stats.modules_found += 1
 438 | 
 439 |     except Exception as e:
 440 |         error_type, message = CanvasErrorHandler.handle_canvas_exception(
 441 |             e, "module processing"
 442 |         )
 443 |         CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 444 |         extraction_stats.error_count += 1
 445 | 
 446 |     return module_views
 447 | 
 448 | 
 449 | def downloadCourseFiles(course, course_view):
 450 |     # file full_name starts with "course files"
 451 |     dl_dir = os.path.join(DL_LOCATION, course_view.term,
 452 |                           course_view.course_code)
 453 | 
 454 |     # Create directory if not present
 455 |     if not os.path.exists(dl_dir):
 456 |         os.makedirs(dl_dir)
 457 | 
 458 |     try:
 459 |         files = course.get_files()
 460 |         files_list = list(files)  # Convert to list for consistency and count
 461 | 
 462 |         for file in files_list:
 463 |             file_folder=course.get_folder(file.folder_id)
 464 |             
 465 |             folder_dl_dir=os.path.join(dl_dir, makeValidFolderPath(file_folder.full_name))
 466 |             
 467 |             if not os.path.exists(folder_dl_dir):
 468 |                 os.makedirs(folder_dl_dir)
 469 |         
 470 |             dl_path = os.path.join(folder_dl_dir, makeValidFilename(str(file.display_name)))
 471 |             
 472 |             print(f"    Downloading: {file.display_name}...")
 473 |             if not os.path.exists(dl_path):
 474 |                 try:
 475 |                     file.download(dl_path)
 476 |                     extraction_stats.files_downloaded += 1
 477 |                     print(f"      ✓ Saved: {file.display_name}")
 478 |                 except Exception as e:
 479 |                     error_type, message = CanvasErrorHandler.handle_canvas_exception(e, f"file download for {file.display_name}")
 480 |                     CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 481 |                     extraction_stats.error_count += 1
 482 |             else:
 483 |                 print(f"      ✓ Already exists: {file.display_name}")
 484 | 
 485 |     except Exception as e:
 486 |         error_type, message = CanvasErrorHandler.handle_canvas_exception(
 487 |             e, "course file download"
 488 |         )
 489 |         if error_type == "student_limitation":
 490 |             extraction_stats.student_limitation_warnings += 1
 491 |         else:
 492 |             extraction_stats.error_count += 1
 493 |         CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 494 | 
 495 | 
 496 | def download_submission_attachments(course, course_view):
 497 |     course_dir = os.path.join(DL_LOCATION, course_view.term,
 498 |                               course_view.course_code)
 499 | 
 500 |     # Create directory if not present
 501 |     if not os.path.exists(course_dir):
 502 |         os.makedirs(course_dir)
 503 | 
 504 |     for assignment in course_view.assignments:
 505 |         for submission in assignment.submissions:
 506 |             assignment_title = makeValidFilename(str(assignment.title))
 507 |             assignment_title = shortenFileName(assignment_title, len(assignment_title) - MAX_FOLDER_NAME_SIZE)
 508 |             attachment_dir = os.path.join(course_dir, "assignments", assignment_title)
 509 |             if(len(assignment.submissions)!=1):
 510 |                 attachment_dir = os.path.join(attachment_dir,str(submission.user_id))
 511 |             if (not os.path.exists(attachment_dir)) and (submission.attachments):
 512 |                 os.makedirs(attachment_dir)
 513 |             for attachment in submission.attachments:
 514 |                 filepath = os.path.join(attachment_dir, makeValidFilename(str(attachment.id) +
 515 |                                         "_" + attachment.filename))
 516 |                 
 517 |                 print(f"    Downloading attachment: {attachment.filename}...")
 518 |                 if not os.path.exists(filepath):
 519 |                     try:
 520 |                         r = requests.get(attachment.url, allow_redirects=True)
 521 |                         r.raise_for_status()
 522 |                         with open(filepath, 'wb') as f:
 523 |                             f.write(r.content)
 524 |                         extraction_stats.attachments_downloaded += 1
 525 |                         print(f"      ✓ Saved: {attachment.filename}")
 526 |                     except Exception as e:
 527 |                         print(f"      ❌ Failed to download {attachment.filename}: {e}")
 528 |                         extraction_stats.error_count += 1
 529 |                 else:
 530 |                     print(f"      ✓ Already exists: {attachment.filename}")
 531 | 
 532 | 
 533 | def getCoursePageUrls(course):
 534 |     page_urls = []
 535 | 
 536 |     try:
 537 |         # Get all pages
 538 |         pages = course.get_pages()
 539 | 
 540 |         for page in pages:
 541 |             if hasattr(page, "url"):
 542 |                 page_urls.append(str(page.url))
 543 |     except Exception as e:
 544 |         error_msg = str(e)
 545 |         if "Not Found" not in error_msg:
 546 |             error_type, message = CanvasErrorHandler.handle_canvas_exception(
 547 |                 e, "page URL retrieval"
 548 |             )
 549 |             CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 550 |             if error_type != "student_limitation":
 551 |                 extraction_stats.error_count += 1
 552 |             else:
 553 |                 extraction_stats.student_limitation_warnings += 1
 554 | 
 555 |     return page_urls
 556 | 
 557 | 
 558 | def findCoursePages(course):
 559 |     page_views = []
 560 | 
 561 |     try:
 562 |         # Get all page URLs
 563 |         page_urls = getCoursePageUrls(course)
 564 | 
 565 |         for url in page_urls:
 566 |             page = course.get_page(url)
 567 | 
 568 |             page_view = pageView()
 569 | 
 570 |             # ID
 571 |             page_view.id = page.id if hasattr(page, "id") else 0
 572 | 
 573 |             # Title
 574 |             page_view.title = str(page.title) if hasattr(page, "title") else ""
 575 |             # Body
 576 |             page_view.body = str(page.body) if hasattr(page, "body") else ""
 577 |             # Date created
 578 |             try:
 579 |                 page_view.created_date = dateutil.parser.parse(page.created_at).strftime(DATE_TEMPLATE) if \
 580 |                     hasattr(page, "created_at") else ""
 581 |             except (ValueError, TypeError):
 582 |                 page_view.created_date = ""
 583 |                 
 584 |             # Date last updated
 585 |             try:
 586 |                 page_view.last_updated_date = dateutil.parser.parse(page.updated_at).strftime(DATE_TEMPLATE) if \
 587 |                     hasattr(page, "updated_at") else ""
 588 |             except (ValueError, TypeError):
 589 |                 page_view.last_updated_date = ""
 590 | 
 591 |             page_views.append(page_view)
 592 |             extraction_stats.pages_found += 1
 593 |     except Exception as e:
 594 |         error_type, message = CanvasErrorHandler.handle_canvas_exception(
 595 |             e, "page download"
 596 |         )
 597 |         CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 598 |         extraction_stats.error_count += 1
 599 | 
 600 |     return page_views
 601 | 
 602 | 
 603 | def findCourseAssignments(course):
 604 |     assignment_views = []
 605 | 
 606 |     # Get all assignments
 607 |     assignments = course.get_assignments()
 608 |     assignments_list = list(assignments)  # Convert to list for consistency
 609 |     
 610 |     try:
 611 |         for assignment in assignments_list:
 612 |             # Create a new assignment view
 613 |             assignment_view = assignmentView()
 614 | 
 615 |             #ID
 616 |             assignment_view.id = assignment.id if \
 617 |                 hasattr(assignment, "id") else 0
 618 | 
 619 |             # Title
 620 |             assignment_view.title = makeValidFilename(str(assignment.name)) if \
 621 |                 hasattr(assignment, "name") else ""
 622 |             # Description
 623 |             assignment_view.description = str(assignment.description) if \
 624 |                 hasattr(assignment, "description") else ""
 625 |             
 626 |             # Assigned date
 627 |             try:
 628 |                 assignment_view.assigned_date = dateutil.parser.parse(assignment.created_at).strftime(DATE_TEMPLATE) if \
 629 |                     hasattr(assignment, "created_at") and assignment.created_at else ""
 630 |             except (ValueError, TypeError):
 631 |                 assignment_view.assigned_date = ""
 632 |             
 633 |             # Due date
 634 |             try:
 635 |                 assignment_view.due_date = dateutil.parser.parse(assignment.due_at).strftime(DATE_TEMPLATE) if \
 636 |                     hasattr(assignment, "due_at") and assignment.due_at else ""
 637 |             except (ValueError, TypeError):
 638 |                 assignment_view.due_date = ""
 639 | 
 640 |             # HTML Url
 641 |             assignment_view.html_url = assignment.html_url if \
 642 |                 hasattr(assignment, "html_url") else ""   
 643 |             # External URL
 644 |             assignment_view.ext_url = str(assignment.url) if \
 645 |                 hasattr(assignment, "url") else ""
 646 |             # Other URL (more up-to-date)
 647 |             assignment_view.updated_url = str(assignment.submissions_download_url).split("submissions?")[0] if \
 648 |                 hasattr(assignment, "submissions_download_url") else ""
 649 | 
 650 |             try:
 651 |                 try: # Download all submissions for entire class
 652 |                     submissions = assignment.get_submissions()
 653 |                     submissions[0] # Trigger Unauthorized if not allowed
 654 |                 except (Unauthorized, Forbidden) as e:
 655 |                     error_type, message = CanvasErrorHandler.handle_canvas_exception(
 656 |                         e, "class submission download"
 657 |                     )
 658 |                     if error_type == "student_limitation":
 659 |                         extraction_stats.student_limitation_warnings += 1
 660 |                         if extraction_stats.student_limitation_warnings == 1:
 661 |                             print(f"    Note: Not authorized to download every student's assignment submission. Downloading submission for user {USER_ID} only.")
 662 |                     else:
 663 |                         CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 664 |                         extraction_stats.error_count += 1
 665 |                     
 666 |                     # Download submission for this user only
 667 |                     submissions = [assignment.get_submission(USER_ID)]
 668 |                 submissions[0] #throw error if no submissions found at all but without error
 669 |             except (ResourceDoesNotExist, NameError, IndexError) as e:
 670 |                 error_type, message = CanvasErrorHandler.handle_canvas_exception(
 671 |                     e, "submission retrieval"
 672 |                 )
 673 |                 CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 674 |                 extraction_stats.error_count += 1
 675 |             except Exception as e:
 676 |                 error_type, message = CanvasErrorHandler.handle_canvas_exception(
 677 |                     e, "submission retrieval"
 678 |                 )
 679 |                 CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 680 |                 extraction_stats.error_count += 1
 681 |             else:
 682 |                 try:
 683 |                     for submission in submissions:
 684 | 
 685 |                         sub_view = submissionView()
 686 | 
 687 |                         # Submission ID
 688 |                         sub_view.id = submission.id if \
 689 |                             hasattr(submission, "id") else 0
 690 |                             
 691 |                         # My grade
 692 |                         sub_view.grade = str(submission.grade) if \
 693 |                             hasattr(submission, "grade") else ""
 694 |                         # My raw score
 695 |                         sub_view.raw_score = str(submission.score) if \
 696 |                             hasattr(submission, "score") else ""
 697 |                         # Total possible score
 698 |                         sub_view.total_possible_points = str(assignment.points_possible) if \
 699 |                             hasattr(assignment, "points_possible") else ""
 700 |                         # Submission comments
 701 |                         sub_view.submission_comments = str(submission.submission_comments) if \
 702 |                             hasattr(submission, "submission_comments") else ""
 703 |                         # Attempt
 704 |                         sub_view.attempt = submission.attempt if \
 705 |                             hasattr(submission, "attempt") and submission.attempt is not None else 0
 706 |                         # User ID
 707 |                         sub_view.user_id = str(submission.user_id) if \
 708 |                             hasattr(submission, "user_id") else ""
 709 |                             
 710 |                         # Submission URL
 711 |                         sub_view.preview_url = str(submission.preview_url) if \
 712 |                             hasattr(submission, "preview_url") else ""
 713 |                         #   External URL
 714 |                         sub_view.ext_url = str(submission.url) if \
 715 |                             hasattr(submission, "url") else ""
 716 | 
 717 |                         try:
 718 |                             submission.attachments
 719 |                         except AttributeError:
 720 |                             pass  # No attachments message removed for cleaner output
 721 |                         else:
 722 |                             attachment_count = len(submission.attachments) if submission.attachments else 0
 723 |                             if attachment_count > 0:
 724 |                                 print(f"        Found {attachment_count} attachments")
 725 |                             for attachment in submission.attachments:
 726 |                                 attach_view = attachmentView()
 727 |                                 attach_view.url = attachment.url
 728 |                                 attach_view.id = attachment.id
 729 |                                 attach_view.filename = attachment.filename
 730 |                                 sub_view.attachments.append(attach_view)
 731 |                             assignment_view.submissions.append(sub_view)
 732 |                             extraction_stats.submissions_found += 1
 733 |                 except Exception as e:
 734 |                     error_type, message = CanvasErrorHandler.handle_canvas_exception(
 735 |                         e, "submission processing"
 736 |                     )
 737 |                     CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 738 |                     extraction_stats.error_count += 1
 739 | 
 740 |             assignment_views.append(assignment_view)
 741 |             extraction_stats.assignments_found += 1
 742 |     except Exception as e:
 743 |         error_type, message = CanvasErrorHandler.handle_canvas_exception(
 744 |             e, "course assignments processing"
 745 |         )
 746 |         CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 747 |         extraction_stats.error_count += 1
 748 | 
 749 |     return assignment_views
 750 | 
 751 | 
 752 | def findCourseAnnouncements(course):
 753 |     announcement_views = []
 754 | 
 755 |     try:
 756 |         announcements = course.get_discussion_topics(only_announcements=True)
 757 | 
 758 |         for announcement in announcements:
 759 |             discussion_view = getDiscussionView(announcement)
 760 | 
 761 |             announcement_views.append(discussion_view)
 762 |             extraction_stats.announcements_found += 1
 763 |     except Exception as e:
 764 |         error_type, message = CanvasErrorHandler.handle_canvas_exception(
 765 |             e, "announcement processing"
 766 |         )
 767 |         CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 768 |         extraction_stats.error_count += 1
 769 | 
 770 |     return announcement_views
 771 | 
 772 | 
 773 | def getDiscussionView(discussion_topic):
 774 |     # Create discussion view
 775 |     discussion_view = discussionView()
 776 | 
 777 |     #ID
 778 |     discussion_view.id = discussion_topic.id if hasattr(discussion_topic, "id") else 0
 779 | 
 780 |     # Title
 781 |     discussion_view.title = str(discussion_topic.title) if hasattr(discussion_topic, "title") else ""
 782 |     # Author
 783 |     discussion_view.author = str(discussion_topic.user_name) if hasattr(discussion_topic, "user_name") else ""
 784 |     # Posted date
 785 |     try:
 786 |         discussion_view.posted_date = dateutil.parser.parse(discussion_topic.created_at).strftime("%B %d, %Y %I:%M %p") if \
 787 |             hasattr(discussion_topic, "created_at") and discussion_topic.created_at else ""
 788 |     except (ValueError, TypeError):
 789 |         discussion_view.posted_date = ""
 790 |     # Body
 791 |     discussion_view.body = str(discussion_topic.message) if hasattr(discussion_topic, "message") else ""
 792 | 
 793 |     # URL
 794 |     discussion_view.url = str(discussion_topic.html_url) if hasattr(discussion_topic, "html_url") else ""
 795 |     
 796 |     # Keeps track of how many topic_entries there are.
 797 |     topic_entries_counter = 0
 798 | 
 799 |     # Topic entries
 800 |     if hasattr(discussion_topic, "discussion_subentry_count") and discussion_topic.discussion_subentry_count > 0:
 801 |         # Need to get replies to entries recursively?
 802 | 
 803 |         discussion_topic_entries = discussion_topic.get_topic_entries()
 804 | 
 805 |         try:
 806 |             for topic_entry in discussion_topic_entries:
 807 |                 topic_entries_counter += 1
 808 |                 
 809 |                 # Create new discussion view for the topic_entry
 810 |                 topic_entry_view = topicEntryView()
 811 | 
 812 |                 # ID
 813 |                 topic_entry_view.id = topic_entry.id if hasattr(topic_entry, "id") else 0
 814 |                 # Author
 815 |                 topic_entry_view.author = str(topic_entry.user_name) if hasattr(topic_entry, "user_name") else ""
 816 |                 # Posted date
 817 |                 try:
 818 |                     topic_entry_view.posted_date = dateutil.parser.parse(topic_entry.created_at).strftime("%B %d, %Y %I:%M %p") if \
 819 |                         hasattr(topic_entry, "created_at") and topic_entry.created_at else ""
 820 |                 except (ValueError, TypeError):
 821 |                     topic_entry_view.posted_date = ""
 822 |                 # Body
 823 |                 topic_entry_view.body = str(topic_entry.message) if hasattr(topic_entry, "message") else ""
 824 | 
 825 |                 # Get this topic's replies
 826 |                 topic_entry_replies = topic_entry.get_replies()
 827 | 
 828 |                 try:
 829 |                     for topic_reply in topic_entry_replies:
 830 |                         # Create new topic reply view
 831 |                         topic_reply_view = topicReplyView()
 832 |                         
 833 |                         # ID
 834 |                         topic_reply_view.id = topic_reply.id if hasattr(topic_reply, "id") else 0
 835 | 
 836 |                         # Author
 837 |                         topic_reply_view.author = str(topic_reply.user_name) if hasattr(topic_reply, "user_name") else ""
 838 |                         # Posted Date
 839 |                         try:
 840 |                             topic_reply_view.posted_date = dateutil.parser.parse(topic_reply.created_at).strftime("%B %d, %Y %I:%M %p") if \
 841 |                                 hasattr(topic_reply, "created_at") and topic_reply.created_at else ""
 842 |                         except (ValueError, TypeError):
 843 |                             topic_reply_view.posted_date = ""
 844 |                         # Body
 845 |                         topic_reply_view.body = str(topic_reply.message) if hasattr(topic_reply, "message") else ""
 846 | 
 847 |                         topic_entry_view.topic_replies.append(topic_reply_view)
 848 |                 except Exception as e:
 849 |                     error_type, message = CanvasErrorHandler.handle_canvas_exception(
 850 |                         e, "discussion topic reply processing"
 851 |                     )
 852 |                     CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 853 |                     if error_type == "student_limitation":
 854 |                         extraction_stats.student_limitation_warnings += 1
 855 |                     elif error_type == "not_found":
 856 |                         pass  # Already handled by log_error
 857 |                     else:
 858 |                         extraction_stats.error_count += 1
 859 | 
 860 |                 discussion_view.topic_entries.append(topic_entry_view)
 861 |         except Exception as e:
 862 |             error_type, message = CanvasErrorHandler.handle_canvas_exception(
 863 |                 e, "discussion topic entry processing"
 864 |             )
 865 |             CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 866 |             if error_type == "student_limitation":
 867 |                 extraction_stats.student_limitation_warnings += 1
 868 |             elif error_type == "not_found":
 869 |                 pass  # Already handled by log_error
 870 |             else:
 871 |                 extraction_stats.error_count += 1
 872 |         
 873 |     # Amount of pages  
 874 |     discussion_view.amount_pages = int(topic_entries_counter/50) + 1 # Typically 50 topic entries are stored on a page before it creates another page.
 875 |     
 876 |     return discussion_view
 877 | 
 878 | 
 879 | def findCourseDiscussions(course):
 880 |     discussion_views = []
 881 | 
 882 |     try:
 883 |         discussion_topics = course.get_discussion_topics()
 884 | 
 885 |         for discussion_topic in discussion_topics:
 886 |             discussion_view = None
 887 |             discussion_view = getDiscussionView(discussion_topic)
 888 | 
 889 |             discussion_views.append(discussion_view)
 890 |             extraction_stats.discussions_found += 1
 891 |     except Exception as e:
 892 |         error_type, message = CanvasErrorHandler.handle_canvas_exception(
 893 |             e, "discussion processing"
 894 |         )
 895 |         CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
 896 |         extraction_stats.error_count += 1
 897 | 
 898 |     return discussion_views
 899 | 
 900 | 
 901 | def getCourseView(course):
 902 |     course_view = courseView()
 903 | 
 904 |     # Course ID
 905 |     course_view.course_id = course.id if hasattr(course, "id") else 0
 906 | 
 907 |     # Course term
 908 |     course_view.term = makeValidFilename(course.term.name if hasattr(course, "term") and hasattr(course.term, "name") else "")
 909 | 
 910 |     # Course code
 911 |     course_view.course_code = makeValidFilename(course.course_code if hasattr(course, "course_code") else "")
 912 | 
 913 |     # Course name
 914 |     course_view.name = course.name if hasattr(course, "name") else ""
 915 | 
 916 |     print(f"Working on: {course_view.term}: {course_view.name}")
 917 | 
 918 |     # Track HTML pages saved per course
 919 |     html_pages_saved_in_course = 0
 920 | 
 921 |     # Course assignments
 922 |     print("  Getting assignments")
 923 |     course_view.assignments = findCourseAssignments(course)
 924 |     print(f"    Found {len(course_view.assignments)} assignments")
 925 | 
 926 |     # Course announcements
 927 |     print("  Getting announcements")
 928 |     course_view.announcements = findCourseAnnouncements(course)
 929 |     print(f"    Found {len(course_view.announcements)} announcements")
 930 | 
 931 |     # Course discussions
 932 |     print("  Getting discussions")
 933 |     course_view.discussions = findCourseDiscussions(course)
 934 |     print(f"    Found {len(course_view.discussions)} discussions")
 935 | 
 936 |     # Course pages
 937 |     print("  Getting pages")
 938 |     course_view.pages = findCoursePages(course)
 939 |     print(f"    Found {len(course_view.pages)} pages")
 940 | 
 941 |     return course_view
 942 | 
 943 | 
 944 | def exportAllCourseData(course_view):
 945 |     json_str = json.dumps(json.loads(jsonpickle.encode(course_view, unpicklable = False)), indent = 4)
 946 | 
 947 |     course_output_dir = os.path.join(DL_LOCATION, course_view.term,
 948 |                                      course_view.course_code)
 949 | 
 950 |     # Create directory if not present
 951 |     if not os.path.exists(course_output_dir):
 952 |         os.makedirs(course_output_dir)
 953 | 
 954 |     course_output_path = os.path.join(course_output_dir,
 955 |                                       course_view.course_code + ".json")
 956 | 
 957 |     print(f"    Exporting JSON data for {course_view.course_code}...")
 958 |     with open(course_output_path, "w") as out_file:
 959 |         out_file.write(json_str)
 960 |         
 961 |     extraction_stats.json_files_created += 1
 962 |     print(f"      ✓ Data saved to: {course_output_path}")
 963 | 
 964 | def _download_page_if_not_exists(url, output_path, cookies_path, additional_args=(), verbose=False):
 965 |     """
 966 |     Downloads a single HTML page if it doesn't exist, updating stats.
 967 |     Returns True if downloaded, False otherwise.
 968 |     """
 969 |     global stop_html_downloads
 970 |     if stop_html_downloads:
 971 |         return False
 972 |         
 973 |     filename = os.path.basename(output_path)
 974 |     print(f"    Downloading: {filename}...")
 975 | 
 976 |     if not os.path.exists(output_path):
 977 |         output_dir = os.path.dirname(output_path)
 978 |         os.makedirs(output_dir, exist_ok=True)
 979 |         
 980 |         try:
 981 |             download_page(url, cookies_path, output_dir, filename, additional_args, verbose)
 982 |             extraction_stats.html_pages_downloaded += 1
 983 |             print(f"      ✓ Saved: {filename}")
 984 |             return True
 985 |         except Exception as e:
 986 |             print(f"      ❌ Failed: {e}")
 987 |             extraction_stats.error_count += 1
 988 |             if "Authentication failed" in str(e):
 989 |                 print("      Stopping all subsequent HTML downloads.")
 990 |                 stop_html_downloads = True
 991 |             return False
 992 |     else:
 993 |         print(f"      ✓ Already exists: {filename}")
 994 |         return True # Return True because the file exists, which is a success condition for the caller
 995 | 
 996 | def downloadCourseHTML(api_url, cookies_path, verbose=False):
 997 |     if not cookies_path or stop_html_downloads:
 998 |         return 0
 999 |     
1000 |     course_list_path = os.path.join(DL_LOCATION, "course_list.html")
1001 |     url = f"{api_url}/courses/"
1002 |     
1003 |     if _download_page_if_not_exists(url, course_list_path, cookies_path, verbose=verbose):
1004 |         return 1
1005 |     return 0
1006 | 
1007 | def downloadCourseHomePageHTML(api_url, course_view, cookies_path, verbose=False):
1008 |     if not cookies_path or stop_html_downloads:
1009 |         return 0
1010 | 
1011 |     dl_dir = os.path.join(DL_LOCATION, course_view.term, course_view.course_code)
1012 |     homepage_path = os.path.join(dl_dir, "homepage.html")
1013 |     url = f"{api_url}/courses/{course_view.course_id}"
1014 |     
1015 |     if _download_page_if_not_exists(url, homepage_path, cookies_path, verbose=verbose):
1016 |         return 1
1017 |     return 0
1018 | 
1019 | def downloadCourseGradesHTML(api_url, course_view, cookies_path, verbose=False):
1020 |     if not cookies_path or stop_html_downloads:
1021 |         return 0
1022 | 
1023 |     dl_dir = os.path.join(DL_LOCATION, course_view.term,
1024 |                          course_view.course_code)
1025 |     grades_path = os.path.join(dl_dir, "grades.html")
1026 |     url = f"{api_url}/courses/{course_view.course_id}/grades"
1027 |     additional_args=("--remove-hidden-elements=false",)
1028 | 
1029 |     if _download_page_if_not_exists(url, grades_path, cookies_path, additional_args, verbose=verbose):
1030 |         # We only proceed with BeautifulSoup modifications if the file was newly downloaded or already existed.
1031 |         with open(grades_path, "r+t", encoding="utf-8") as grades_file:
1032 |             grades_html = BeautifulSoup(grades_file, "html.parser")
1033 | 
1034 |             button = grades_html.select_one("#show_all_details_button")
1035 |             if button is not None:
1036 |                 button_class = button.get_attribute_list("class", [])
1037 |                 if "showAll" not in button_class:
1038 |                     button_class.append("showAll")
1039 |                 button["class"] = button_class
1040 |                 button.string = "Hide All Details" # Unfortunately this cannot handle i18n.
1041 | 
1042 |             assignments = grades_html.select("tr.student_assignment.editable")
1043 |             for assignment in assignments:
1044 |                 assignment_id = str(assignment.get("id", "")).removeprefix("submission_")
1045 |                 muted = str(assignment.get("data-muted", "")).casefold() in {"true"}
1046 |                 if not muted:
1047 |                     for element in itertools.chain(
1048 |                         grades_html.select(f"#comments_thread_{assignment_id}"),
1049 |                         grades_html.select(f"#rubric_{assignment_id}"),
1050 |                         grades_html.select(f"#grade_info_{assignment_id}"),
1051 |                         grades_html.select(f"#final_grade_info_{assignment_id}"),
1052 |                         grades_html.select(f".parent_assignment_id_{assignment_id}"),
1053 |                     ):
1054 |                         element_style = str(element.get("style", ""))
1055 |                         element_style = re.sub(r"display:\s*none", "", element_style)
1056 |                         element["style"] = element_style
1057 | 
1058 |                     assignment_arrow = grades_html.select_one(f"#parent_assignment_id_{assignment_id} i")
1059 |                     if assignment_arrow is not None:
1060 |                         assignment_arrow_class = assignment_arrow.get_attribute_list("class", [])
1061 |                         assignment_arrow_class.remove("icon-arrow-open-end")
1062 |                         assignment_arrow_class.append("icon-arrow-open-down")
1063 |                         assignment_arrow["class"] = assignment_arrow_class
1064 | 
1065 |             grades_file.seek(0)
1066 |             grades_file.write(grades_html.prettify(formatter="html"))
1067 |             grades_file.truncate()
1068 |         return 1
1069 |     return 0
1070 |         
1071 | def downloadAssignmentPages(api_url, course_view, cookies_path, verbose=False):
1072 |     pages_saved = 0
1073 |     if not cookies_path or not course_view.assignments or stop_html_downloads:
1074 |         return pages_saved
1075 | 
1076 |     base_assign_dir = os.path.join(DL_LOCATION, course_view.term,
1077 |         course_view.course_code, "assignments")
1078 | 
1079 |     # Download assignment list page
1080 |     assignment_list_path = os.path.join(base_assign_dir, "assignment_list.html")
1081 |     list_url = f"{api_url}/courses/{course_view.course_id}/assignments/"
1082 |     if _download_page_if_not_exists(list_url, assignment_list_path, cookies_path, verbose=verbose):
1083 |         pages_saved += 1
1084 | 
1085 |     for assignment in course_view.assignments:
1086 |         assignment_title = makeValidFilename(str(assignment.title))
1087 |         assignment_title = shortenFileName(assignment_title, len(assignment_title) - MAX_FOLDER_NAME_SIZE)  
1088 |         assign_dir = os.path.join(base_assign_dir, assignment_title)
1089 | 
1090 |         if assignment.html_url:
1091 |             assignment_page_path = os.path.join(assign_dir, "assignment.html")
1092 |             if _download_page_if_not_exists(assignment.html_url, assignment_page_path, cookies_path, verbose=verbose):
1093 |                 pages_saved += 1
1094 | 
1095 |         for submission in assignment.submissions:
1096 |             submission_dir = assign_dir
1097 | 
1098 |             if len(assignment.submissions) != 1:
1099 |                 submission_dir = os.path.join(assign_dir, str(submission.user_id))
1100 | 
1101 |             if submission.preview_url:
1102 |                 submission_page_path = os.path.join(submission_dir, "submission.html")
1103 |                 if _download_page_if_not_exists(submission.preview_url, submission_page_path, cookies_path, verbose=verbose):
1104 |                     pages_saved += 1
1105 | 
1106 |             if (submission.attempt and submission.attempt > 1 and assignment.updated_url and assignment.html_url 
1107 |                 and assignment.html_url.rstrip("/") != assignment.updated_url.rstrip("/")):
1108 |                 attempts_dir = os.path.join(assign_dir, "attempts")
1109 |                 
1110 |                 for i in range(submission.attempt):
1111 |                     filename = f"attempt_{i+1}.html"
1112 |                     attempt_path = os.path.join(attempts_dir, filename)
1113 |                     attempt_url = f"{assignment.updated_url}/history?version={i+1}"
1114 |                     if _download_page_if_not_exists(attempt_url, attempt_path, cookies_path, verbose=verbose):
1115 |                         pages_saved += 1
1116 |     return pages_saved
1117 | 
1118 | def downloadCourseModulePages(api_url, course_view, cookies_path, verbose=False): 
1119 |     pages_saved = 0
1120 |     if not cookies_path or not course_view.modules or stop_html_downloads:
1121 |         return pages_saved
1122 | 
1123 |     modules_dir = os.path.join(DL_LOCATION, course_view.term,
1124 |         course_view.course_code, "modules")
1125 | 
1126 |     # Downloads the modules page
1127 |     module_list_path = os.path.join(modules_dir, "modules_list.html")
1128 |     list_url = f"{api_url}/courses/{course_view.course_id}/modules/"
1129 |     if _download_page_if_not_exists(list_url, module_list_path, cookies_path, verbose=verbose):
1130 |         pages_saved += 1
1131 | 
1132 |     for module in course_view.modules:
1133 |         for item in module.items:
1134 |             module_name = makeValidFilename(str(module.name))
1135 |             module_name = shortenFileName(module_name, len(module_name) - MAX_FOLDER_NAME_SIZE)
1136 |             items_dir = os.path.join(modules_dir, module_name)
1137 |             
1138 |             if item.url:
1139 |                 filename = makeValidFilename(str(item.title)) + ".html"
1140 |                 module_item_path = os.path.join(items_dir, filename)
1141 |                 if _download_page_if_not_exists(item.url, module_item_path, cookies_path, verbose=verbose):
1142 |                     pages_saved += 1
1143 |     return pages_saved
1144 | 
1145 | def downloadCourseAnnouncementPages(api_url, course_view, cookies_path, verbose=False):
1146 |     pages_saved = 0
1147 |     if not cookies_path or not course_view.announcements or stop_html_downloads:
1148 |         return pages_saved
1149 | 
1150 |     base_announce_dir = os.path.join(DL_LOCATION, course_view.term,
1151 |         course_view.course_code, "announcements")
1152 | 
1153 |     # Download announcement list
1154 |     announcement_list_path = os.path.join(base_announce_dir, "announcement_list.html")
1155 |     list_url = f"{api_url}/courses/{course_view.course_id}/announcements/"
1156 |     if _download_page_if_not_exists(list_url, announcement_list_path, cookies_path, verbose=verbose):
1157 |         pages_saved += 1
1158 | 
1159 |     for announcement in course_view.announcements:
1160 |         if not announcement.url:
1161 |             continue
1162 | 
1163 |         announcements_title = makeValidFilename(str(announcement.title))
1164 |         announcements_title = shortenFileName(announcements_title, len(announcements_title) - MAX_FOLDER_NAME_SIZE)
1165 |         announce_dir = os.path.join(base_announce_dir, announcements_title)
1166 | 
1167 |         if not os.path.exists(announce_dir):
1168 |             os.makedirs(announce_dir)
1169 | 
1170 |         for i in range(announcement.amount_pages):
1171 |             filename = f"announcement_{i+1}.html"
1172 |             page_path = os.path.join(announce_dir, filename)
1173 |             page_url = f"{announcement.url}/page-{i+1}"
1174 |             if _download_page_if_not_exists(page_url, page_path, cookies_path, verbose=verbose):
1175 |                 pages_saved += 1
1176 |     return pages_saved
1177 |         
1178 | def downloadCourseDiscussionPages(api_url, course_view, cookies_path, verbose=False):
1179 |     pages_saved = 0
1180 |     if not cookies_path or not course_view.discussions or stop_html_downloads:
1181 |         return pages_saved
1182 | 
1183 |     base_discussion_dir = os.path.join(DL_LOCATION, course_view.term,
1184 |         course_view.course_code, "discussions")
1185 | 
1186 |     # Download discussion list
1187 |     discussion_list_path = os.path.join(base_discussion_dir, "discussion_list.html")
1188 |     list_url = f"{api_url}/courses/{course_view.course_id}/discussion_topics/"
1189 |     if _download_page_if_not_exists(list_url, discussion_list_path, cookies_path, verbose=verbose):
1190 |         pages_saved += 1
1191 | 
1192 |     for discussion in course_view.discussions:
1193 |         if not discussion.url:
1194 |             continue
1195 | 
1196 |         discussion_title = makeValidFilename(str(discussion.title))
1197 |         discussion_title = shortenFileName(discussion_title, len(discussion_title) - MAX_FOLDER_NAME_SIZE)
1198 |         discussion_dir = os.path.join(base_discussion_dir, discussion_title)
1199 | 
1200 |         if not os.path.exists(discussion_dir):
1201 |             os.makedirs(discussion_dir)
1202 | 
1203 |         for i in range(discussion.amount_pages):
1204 |             filename = f"discussion_{i+1}.html"
1205 |             page_path = os.path.join(discussion_dir, filename)
1206 |             page_url = f"{discussion.url}/page-{i+1}"
1207 |             if _download_page_if_not_exists(page_url, page_path, cookies_path, verbose=verbose):
1208 |                 pages_saved += 1
1209 |     return pages_saved
1210 | 
1211 | if __name__ == "__main__":
1212 | 
1213 |     print("Welcome to the Canvas Student Data Export Tool\n")
1214 | 
1215 |     parser = argparse.ArgumentParser(description="Export nearly all of a student's Canvas LMS data.")
1216 |     parser.add_argument("-c", "--config", default="credentials.yaml", help="Path to YAML credentials file (default: credentials.yaml)")
1217 |     parser.add_argument("-o", "--output", default="./output", help="Directory to store exported data (default: ./output)")
1218 |     parser.add_argument("--singlefile", action="store_true", help="Enable HTML snapshot capture with SingleFile.")
1219 |     parser.add_argument("-v", "--verbose", action="store_true", help="Enable verbose output for debugging.")
1220 |     parser.add_argument("--version", action="version", version="Canvas Student Data Export Tool 1.0")
1221 | 
1222 |     args = parser.parse_args()
1223 | 
1224 |     # Load credentials from YAML
1225 |     creds = _load_credentials(args.config)
1226 |     
1227 |     # Validate credentials
1228 |     required = ["API_URL", "API_KEY", "USER_ID"]
1229 |     missing = [k for k in required if not creds.get(k)]
1230 | 
1231 |     # COOKIES_PATH is required if singlefile is active, but it can be missing.
1232 |     if args.singlefile:
1233 |         print("Note: --singlefile is enabled. Please ensure your browser cookies")
1234 |         print("      are fresh by logging into Canvas and then re-exporting")
1235 |         print("      them using the chrome extension right before running this script.\n")
1236 |         input("Press Enter to continue...")
1237 |         if "COOKIES_PATH" not in creds or not creds["COOKIES_PATH"]:
1238 |             missing.append("COOKIES_PATH")
1239 | 
1240 |     if missing:
1241 |         print(f"Error: {args.config} is missing required field(s): {', '.join(missing)}.")
1242 |         print("Please create the YAML file with the following structure:\n"
1243 |               "API_URL: https://<your>.instructure.com\n"
1244 |               "API_KEY: <your key>\n"
1245 |               "USER_ID: 123456\n"
1246 |               "COOKIES_PATH: path/to/cookies.txt\n")
1247 |         sys.exit(1)
1248 | 
1249 |     # Populate globals expected throughout the script
1250 |     API_URL = creds["API_URL"].strip().rstrip('/')
1251 |     API_KEY = creds["API_KEY"].strip()  # Remove leading/trailing whitespace which is a common issue
1252 |     USER_ID = creds["USER_ID"]
1253 |     # Use .get() to safely access optional/conditionally required keys
1254 |     COOKIES_PATH = creds.get("COOKIES_PATH", "")
1255 |     COURSES_TO_SKIP = creds.get("COURSES_TO_SKIP", [])
1256 | 
1257 |     chrome_path_override = creds.get("CHROME_PATH")
1258 |     if chrome_path_override:
1259 |         override_chrome_path(chrome_path_override)
1260 | 
1261 |     # Optional: Override SingleFile capture timeout (in seconds)
1262 |     singlefile_timeout_override = creds.get("SINGLEFILE_TIMEOUT")
1263 |     if singlefile_timeout_override is not None:
1264 |         try:
1265 |             override_singlefile_timeout(float(singlefile_timeout_override))
1266 |         except (ValueError, TypeError):
1267 |             print(f"Warning: Invalid SINGLEFILE_TIMEOUT value in {args.config}; using default.")
1268 | 
1269 |     # Update output directory
1270 |     DL_LOCATION = args.output
1271 | 
1272 |     print("\nConnecting to Canvas…\n")
1273 | 
1274 |     # Initialize a new Canvas object
1275 |     canvas = Canvas(API_URL, API_KEY)
1276 |     
1277 |     # Test the connection and API key
1278 |     try:
1279 |         user = canvas.get_current_user()
1280 |         print(f"Successfully authenticated as: {user.name} (ID: {user.id})")
1281 |         if user.id != USER_ID:
1282 |             print(f"Warning: Authenticated user ID ({user.id}) does not match configured USER_ID ({USER_ID})")
1283 |     except Exception as e:
1284 |         error_type, message = CanvasErrorHandler.handle_canvas_exception(
1285 |             e, "Canvas authentication"
1286 |         )
1287 |         if CanvasErrorHandler.is_fatal_error(error_type):
1288 |             print(f"FATAL: {message}")
1289 |             sys.exit(1)
1290 |         else:
1291 |             CanvasErrorHandler.log_error(error_type, message, verbose=args.verbose)
1292 |  
1293 |     print(f"Creating output directory: {DL_LOCATION}\n")
1294 |     os.makedirs(DL_LOCATION, exist_ok=True)
1295 |  
1296 |     all_courses_views = []
1297 | 
1298 |     print("Getting list of all courses\n")
1299 |     courses_list = [
1300 |         canvas.get_courses(enrollment_state = "active", include="term"),
1301 |         canvas.get_courses(enrollment_state = "completed", include="term")
1302 |     ]
1303 | 
1304 |     skip = set(COURSES_TO_SKIP)
1305 | 
1306 | 
1307 |     if COOKIES_PATH and args.singlefile:
1308 |         print("  Downloading course list page")
1309 |         downloadCourseHTML(API_URL, COOKIES_PATH, verbose=args.verbose)
1310 | 
1311 |     for courses in courses_list:
1312 |         for course in courses:
1313 |             if course.id in skip or not hasattr(course, "name") or not hasattr(course, "term"):
1314 |                 continue
1315 |             
1316 |             html_pages_saved_in_course = 0
1317 | 
1318 |             course_view = getCourseView(course)
1319 | 
1320 |             all_courses_views.append(course_view)
1321 | 
1322 |             print("  Downloading all files")
1323 |             downloadCourseFiles(course, course_view)
1324 | 
1325 |             print("  Downloading submission attachments")
1326 |             download_submission_attachments(course, course_view)
1327 | 
1328 |             print("  Getting modules and downloading module files")
1329 |             course_view.modules = findCourseModules(course, course_view)
1330 | 
1331 |             if COOKIES_PATH and args.singlefile:
1332 |                 print("  Downloading course home page")
1333 |                 html_pages_saved_in_course += downloadCourseHomePageHTML(API_URL, course_view, COOKIES_PATH, verbose=args.verbose)
1334 | 
1335 |                 print("  Downloading course grades")
1336 |                 html_pages_saved_in_course += downloadCourseGradesHTML(API_URL, course_view, COOKIES_PATH, verbose=args.verbose)
1337 | 
1338 |                 print("  Downloading assignment pages")
1339 |                 html_pages_saved_in_course += downloadAssignmentPages(API_URL, course_view, COOKIES_PATH, verbose=args.verbose)
1340 | 
1341 |                 print("  Downloading course module pages")
1342 |                 html_pages_saved_in_course += downloadCourseModulePages(API_URL, course_view, COOKIES_PATH, verbose=args.verbose)
1343 | 
1344 |                 print("  Downloading course announcements pages")
1345 |                 html_pages_saved_in_course += downloadCourseAnnouncementPages(API_URL, course_view, COOKIES_PATH, verbose=args.verbose)   
1346 | 
1347 |                 print("  Downloading course discussion pages")
1348 |                 html_pages_saved_in_course += downloadCourseDiscussionPages(API_URL, course_view, COOKIES_PATH, verbose=args.verbose)
1349 | 
1350 |             print("  Exporting all course data")
1351 |             exportAllCourseData(course_view)
1352 |             
1353 |             # Show mini-summary for this course
1354 |             assignments_count = len(course_view.assignments)
1355 |             submissions_count = sum(len(a.submissions) for a in course_view.assignments)
1356 |             modules_count = len(course_view.modules)
1357 |             pages_count = len(course_view.pages)
1358 |             announcements_count = len(course_view.announcements)
1359 |             discussions_count = len(course_view.discussions)
1360 |             
1361 |             print(f"  ✓ Course data exported:")
1362 |             print(f"    • {assignments_count} assignments with {submissions_count} submissions (JSON)")
1363 |             print(f"    • {modules_count} modules (JSON)")
1364 |             print(f"    • {pages_count} pages (JSON)")
1365 |             print(f"    • {announcements_count} announcements (JSON)")
1366 |             print(f"    • {discussions_count} discussions (JSON)")
1367 |             if COOKIES_PATH and args.singlefile:
1368 |                 print(f"    • {html_pages_saved_in_course} HTML snapshots saved")
1369 |             print()
1370 | 
1371 |     print("Exporting data from all courses combined as one file: "
1372 |           "all_output.json")
1373 |     json_str = jsonpickle.encode(all_courses_views, unpicklable=False, indent=4)
1374 | 
1375 |     all_output_path = os.path.join(DL_LOCATION, "all_output.json")
1376 | 
1377 |     with open(all_output_path, "w") as out_file:
1378 |         out_file.write(json_str)
1379 |     
1380 |     extraction_stats.json_files_created += 1
1381 |     print(f"Combined JSON data exported to: {all_output_path}")
1382 | 
1383 |     print("\nProcess complete. All canvas data exported!")
1384 |     print(extraction_stats.summary(DL_LOCATION, singlefile_enabled=args.singlefile))
1385 | 


--------------------------------------------------------------------------------