├── assets ├── citadel.png ├── AIAC-1.1.0.png └── AIAC-Governance-1.1.0.png ├── CHANGELOG.md ├── .github ├── CODE_OF_CONDUCT.md ├── ISSUE_TEMPLATE.md └── PULL_REQUEST_TEMPLATE.md ├── LICENSE.md ├── CONTRIBUTING.md ├── .gitignore ├── README.md ├── Citadel-WAF-Alignment.md └── CITADEL-TECHNICAL-GUIDE.md /assets/citadel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/foundry-citadel-platform/HEAD/assets/citadel.png -------------------------------------------------------------------------------- /assets/AIAC-1.1.0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/foundry-citadel-platform/HEAD/assets/AIAC-1.1.0.png -------------------------------------------------------------------------------- /assets/AIAC-Governance-1.1.0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure-Samples/foundry-citadel-platform/HEAD/assets/AIAC-Governance-1.1.0.png -------------------------------------------------------------------------------- /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | ## [project-title] Changelog 2 | 3 | 4 | # x.y.z (yyyy-mm-dd) 5 | 6 | *Features* 7 | * ... 8 | 9 | *Bug Fixes* 10 | * ... 11 | 12 | *Breaking Changes* 13 | * ... 14 | -------------------------------------------------------------------------------- /.github/CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | # Microsoft Open Source Code of Conduct 2 | 3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 4 | 5 | Resources: 6 | 7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/) 8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) 9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns 10 | -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | 4 | > Please provide us with the following information: 5 | > --------------------------------------------------------------- 6 | 7 | ### This issue is for a: (mark with an `x`) 8 | ``` 9 | - [ ] bug report -> please search issues before submitting 10 | - [ ] feature request 11 | - [ ] documentation issue or request 12 | - [ ] regression (a behavior that used to work and stopped in a new release) 13 | ``` 14 | 15 | ### Minimal steps to reproduce 16 | > 17 | 18 | ### Any log messages given by the failure 19 | > 20 | 21 | ### Expected/desired behavior 22 | > 23 | 24 | ### OS and Version? 25 | > Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) 26 | 27 | ### Versions 28 | > 29 | 30 | ### Mention any other details that might be useful 31 | 32 | > --------------------------------------------------------------- 33 | > Thanks! We'll be in touch soon. 34 | -------------------------------------------------------------------------------- /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | ## Purpose 2 | 3 | * ... 4 | 5 | ## Does this introduce a breaking change? 6 | 7 | ``` 8 | [ ] Yes 9 | [ ] No 10 | ``` 11 | 12 | ## Pull Request Type 13 | What kind of change does this Pull Request introduce? 14 | 15 | 16 | ``` 17 | [ ] Bugfix 18 | [ ] Feature 19 | [ ] Code style update (formatting, local variables) 20 | [ ] Refactoring (no functional changes, no api changes) 21 | [ ] Documentation content changes 22 | [ ] Other... Please describe: 23 | ``` 24 | 25 | ## How to Test 26 | * Get the code 27 | 28 | ``` 29 | git clone [repo-address] 30 | cd [repo-name] 31 | git checkout [branch-name] 32 | npm install 33 | ``` 34 | 35 | * Test the code 36 | 37 | ``` 38 | ``` 39 | 40 | ## What to Check 41 | Verify that the following are valid 42 | * ... 43 | 44 | ## Other Information 45 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) Microsoft Corporation. 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to [project-title] 2 | 3 | This project welcomes contributions and suggestions. Most contributions require you to agree to a 4 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us 5 | the rights to use your contribution. For details, visit [Contributor License Agreements](https://cla.opensource.microsoft.com). 6 | 7 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide 8 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions 9 | provided by the bot. You will only need to do this once across all repos using our CLA. 10 | 11 | - [Code of Conduct](#coc) 12 | - [Issues and Bugs](#issue) 13 | - [Feature Requests](#feature) 14 | - [Submission Guidelines](#submit) 15 | 16 | ## Code of Conduct 17 | Help us keep this project open and inclusive. Please read and follow our [Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 18 | 19 | ## Found an Issue? 20 | If you find a bug in the source code or a mistake in the documentation, you can help us by 21 | [submitting an issue](#submit-issue) to the GitHub Repository. Even better, you can 22 | [submit a Pull Request](#submit-pr) with a fix. 23 | 24 | ## Want a Feature? 25 | You can *request* a new feature by [submitting an issue](#submit-issue) to the GitHub 26 | Repository. If you would like to *implement* a new feature, please submit an issue with 27 | a proposal for your work first, to be sure that we can use it. 28 | 29 | * **Small Features** can be crafted and directly [submitted as a Pull Request](#submit-pr). 30 | 31 | ## Submission Guidelines 32 | 33 | ### Submitting an Issue 34 | Before you submit an issue, search the archive, maybe your question was already answered. 35 | 36 | If your issue appears to be a bug, and hasn't been reported, open a new issue. 37 | Help us to maximize the effort we can spend fixing issues and adding new 38 | features, by not reporting duplicate issues. Providing the following information will increase the 39 | chances of your issue being dealt with quickly: 40 | 41 | * **Overview of the Issue** - if an error is being thrown a non-minified stack trace helps 42 | * **Version** - what version is affected (e.g. 0.1.2) 43 | * **Motivation for or Use Case** - explain what are you trying to do and why the current behavior is a bug for you 44 | * **Browsers and Operating System** - is this a problem with all browsers? 45 | * **Reproduce the Error** - provide a live example or a unambiguous set of steps 46 | * **Related Issues** - has a similar issue been reported before? 47 | * **Suggest a Fix** - if you can't fix the bug yourself, perhaps you can point to what might be 48 | causing the problem (line of code or commit) 49 | 50 | You can file new issues by providing the above information at the corresponding repository's issues link: 51 | replace`[organization-name]` and `[repository-name]` in 52 | `https://github.com/[organization-name]/[repository-name]/issues/new` . 53 | 54 | ### Submitting a Pull Request (PR) 55 | Before you submit your Pull Request (PR) consider the following guidelines: 56 | 57 | * Search the repository's [pull requests](https://github.com/[organization-name]/[repository-name]/pulls) for an open or closed PR 58 | that relates to your submission. You don't want to duplicate effort. 59 | 60 | * Make your changes in a new git fork: 61 | 62 | * Commit your changes using a descriptive commit message 63 | * Push your fork to GitHub: 64 | * In GitHub, create a pull request 65 | * If we suggest changes then: 66 | * Make the required updates. 67 | * Rebase your fork and force push to your GitHub repository (this will update your Pull Request): 68 | 69 | ```shell 70 | git rebase main -i 71 | git push -f 72 | ``` 73 | 74 | That's it! Thank you for your contribution! 75 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | ## Ignore Visual Studio temporary files, build results, and 2 | ## files generated by popular Visual Studio add-ons. 3 | ## 4 | ## Get latest from https://github.com/github/gitignore/blob/main/VisualStudio.gitignore 5 | 6 | # User-specific files 7 | *.rsuser 8 | *.suo 9 | *.user 10 | *.userosscache 11 | *.sln.docstates 12 | *.env 13 | 14 | # User-specific files (MonoDevelop/Xamarin Studio) 15 | *.userprefs 16 | 17 | # Mono auto generated files 18 | mono_crash.* 19 | 20 | # Build results 21 | [Dd]ebug/ 22 | [Dd]ebugPublic/ 23 | [Rr]elease/ 24 | [Rr]eleases/ 25 | x64/ 26 | x86/ 27 | [Ww][Ii][Nn]32/ 28 | [Aa][Rr][Mm]/ 29 | [Aa][Rr][Mm]64/ 30 | [Aa][Rr][Mm]64[Ee][Cc]/ 31 | bld/ 32 | [Oo]bj/ 33 | [Oo]ut/ 34 | [Ll]og/ 35 | [Ll]ogs/ 36 | 37 | # Build results on 'Bin' directories 38 | **/[Bb]in/* 39 | # Uncomment if you have tasks that rely on *.refresh files to move binaries 40 | # (https://github.com/github/gitignore/pull/3736) 41 | #!**/[Bb]in/*.refresh 42 | 43 | # Visual Studio 2015/2017 cache/options directory 44 | .vs/ 45 | # Uncomment if you have tasks that create the project's static files in wwwroot 46 | #wwwroot/ 47 | 48 | # Visual Studio 2017 auto generated files 49 | Generated\ Files/ 50 | 51 | # MSTest test Results 52 | [Tt]est[Rr]esult*/ 53 | [Bb]uild[Ll]og.* 54 | *.trx 55 | 56 | # NUnit 57 | *.VisualState.xml 58 | TestResult.xml 59 | nunit-*.xml 60 | 61 | # Approval Tests result files 62 | *.received.* 63 | 64 | # Build Results of an ATL Project 65 | [Dd]ebugPS/ 66 | [Rr]eleasePS/ 67 | dlldata.c 68 | 69 | # Benchmark Results 70 | BenchmarkDotNet.Artifacts/ 71 | 72 | # .NET Core 73 | project.lock.json 74 | project.fragment.lock.json 75 | artifacts/ 76 | 77 | # ASP.NET Scaffolding 78 | ScaffoldingReadMe.txt 79 | 80 | # StyleCop 81 | StyleCopReport.xml 82 | 83 | # Files built by Visual Studio 84 | *_i.c 85 | *_p.c 86 | *_h.h 87 | *.ilk 88 | *.meta 89 | *.obj 90 | *.idb 91 | *.iobj 92 | *.pch 93 | *.pdb 94 | *.ipdb 95 | *.pgc 96 | *.pgd 97 | *.rsp 98 | # but not Directory.Build.rsp, as it configures directory-level build defaults 99 | !Directory.Build.rsp 100 | *.sbr 101 | *.tlb 102 | *.tli 103 | *.tlh 104 | *.tmp 105 | *.tmp_proj 106 | *_wpftmp.csproj 107 | *.log 108 | *.tlog 109 | *.vspscc 110 | *.vssscc 111 | .builds 112 | *.pidb 113 | *.svclog 114 | *.scc 115 | 116 | # Chutzpah Test files 117 | _Chutzpah* 118 | 119 | # Visual C++ cache files 120 | ipch/ 121 | *.aps 122 | *.ncb 123 | *.opendb 124 | *.opensdf 125 | *.sdf 126 | *.cachefile 127 | *.VC.db 128 | *.VC.VC.opendb 129 | 130 | # Visual Studio profiler 131 | *.psess 132 | *.vsp 133 | *.vspx 134 | *.sap 135 | 136 | # Visual Studio Trace Files 137 | *.e2e 138 | 139 | # TFS 2012 Local Workspace 140 | $tf/ 141 | 142 | # Guidance Automation Toolkit 143 | *.gpState 144 | 145 | # ReSharper is a .NET coding add-in 146 | _ReSharper*/ 147 | *.[Rr]e[Ss]harper 148 | *.DotSettings.user 149 | 150 | # TeamCity is a build add-in 151 | _TeamCity* 152 | 153 | # DotCover is a Code Coverage Tool 154 | *.dotCover 155 | 156 | # AxoCover is a Code Coverage Tool 157 | .axoCover/* 158 | !.axoCover/settings.json 159 | 160 | # Coverlet is a free, cross platform Code Coverage Tool 161 | coverage*.json 162 | coverage*.xml 163 | coverage*.info 164 | 165 | # Visual Studio code coverage results 166 | *.coverage 167 | *.coveragexml 168 | 169 | # NCrunch 170 | _NCrunch_* 171 | .NCrunch_* 172 | .*crunch*.local.xml 173 | nCrunchTemp_* 174 | 175 | # MightyMoose 176 | *.mm.* 177 | AutoTest.Net/ 178 | 179 | # Web workbench (sass) 180 | .sass-cache/ 181 | 182 | # Installshield output folder 183 | [Ee]xpress/ 184 | 185 | # DocProject is a documentation generator add-in 186 | DocProject/buildhelp/ 187 | DocProject/Help/*.HxT 188 | DocProject/Help/*.HxC 189 | DocProject/Help/*.hhc 190 | DocProject/Help/*.hhk 191 | DocProject/Help/*.hhp 192 | DocProject/Help/Html2 193 | DocProject/Help/html 194 | 195 | # Click-Once directory 196 | publish/ 197 | 198 | # Publish Web Output 199 | *.[Pp]ublish.xml 200 | *.azurePubxml 201 | # Note: Comment the next line if you want to checkin your web deploy settings, 202 | # but database connection strings (with potential passwords) will be unencrypted 203 | *.pubxml 204 | *.publishproj 205 | 206 | # Microsoft Azure Web App publish settings. Comment the next line if you want to 207 | # checkin your Azure Web App publish settings, but sensitive information contained 208 | # in these scripts will be unencrypted 209 | PublishScripts/ 210 | 211 | # NuGet Packages 212 | *.nupkg 213 | # NuGet Symbol Packages 214 | *.snupkg 215 | # The packages folder can be ignored because of Package Restore 216 | **/[Pp]ackages/* 217 | # except build/, which is used as an MSBuild target. 218 | !**/[Pp]ackages/build/ 219 | # Uncomment if necessary however generally it will be regenerated when needed 220 | #!**/[Pp]ackages/repositories.config 221 | # NuGet v3's project.json files produces more ignorable files 222 | *.nuget.props 223 | *.nuget.targets 224 | 225 | # Microsoft Azure Build Output 226 | csx/ 227 | *.build.csdef 228 | 229 | # Microsoft Azure Emulator 230 | ecf/ 231 | rcf/ 232 | 233 | # Windows Store app package directories and files 234 | AppPackages/ 235 | BundleArtifacts/ 236 | Package.StoreAssociation.xml 237 | _pkginfo.txt 238 | *.appx 239 | *.appxbundle 240 | *.appxupload 241 | 242 | # Visual Studio cache files 243 | # files ending in .cache can be ignored 244 | *.[Cc]ache 245 | # but keep track of directories ending in .cache 246 | !?*.[Cc]ache/ 247 | 248 | # Others 249 | ClientBin/ 250 | ~$* 251 | *~ 252 | *.dbmdl 253 | *.dbproj.schemaview 254 | *.jfm 255 | *.pfx 256 | *.publishsettings 257 | orleans.codegen.cs 258 | 259 | # Including strong name files can present a security risk 260 | # (https://github.com/github/gitignore/pull/2483#issue-259490424) 261 | #*.snk 262 | 263 | # Since there are multiple workflows, uncomment next line to ignore bower_components 264 | # (https://github.com/github/gitignore/pull/1529#issuecomment-104372622) 265 | #bower_components/ 266 | 267 | # RIA/Silverlight projects 268 | Generated_Code/ 269 | 270 | # Backup & report files from converting an old project file 271 | # to a newer Visual Studio version. Backup files are not needed, 272 | # because we have git ;-) 273 | _UpgradeReport_Files/ 274 | Backup*/ 275 | UpgradeLog*.XML 276 | UpgradeLog*.htm 277 | ServiceFabricBackup/ 278 | *.rptproj.bak 279 | 280 | # SQL Server files 281 | *.mdf 282 | *.ldf 283 | *.ndf 284 | 285 | # Business Intelligence projects 286 | *.rdl.data 287 | *.bim.layout 288 | *.bim_*.settings 289 | *.rptproj.rsuser 290 | *- [Bb]ackup.rdl 291 | *- [Bb]ackup ([0-9]).rdl 292 | *- [Bb]ackup ([0-9][0-9]).rdl 293 | 294 | # Microsoft Fakes 295 | FakesAssemblies/ 296 | 297 | # GhostDoc plugin setting file 298 | *.GhostDoc.xml 299 | 300 | # Node.js Tools for Visual Studio 301 | .ntvs_analysis.dat 302 | node_modules/ 303 | 304 | # Visual Studio 6 build log 305 | *.plg 306 | 307 | # Visual Studio 6 workspace options file 308 | *.opt 309 | 310 | # Visual Studio 6 auto-generated workspace file (contains which files were open etc.) 311 | *.vbw 312 | 313 | # Visual Studio 6 auto-generated project file (contains which files were open etc.) 314 | *.vbp 315 | 316 | # Visual Studio 6 workspace and project file (working project files containing files to include in project) 317 | *.dsw 318 | *.dsp 319 | 320 | # Visual Studio 6 technical files 321 | *.ncb 322 | *.aps 323 | 324 | # Visual Studio LightSwitch build output 325 | **/*.HTMLClient/GeneratedArtifacts 326 | **/*.DesktopClient/GeneratedArtifacts 327 | **/*.DesktopClient/ModelManifest.xml 328 | **/*.Server/GeneratedArtifacts 329 | **/*.Server/ModelManifest.xml 330 | _Pvt_Extensions 331 | 332 | # Paket dependency manager 333 | **/.paket/paket.exe 334 | paket-files/ 335 | 336 | # FAKE - F# Make 337 | **/.fake/ 338 | 339 | # CodeRush personal settings 340 | **/.cr/personal 341 | 342 | # Python Tools for Visual Studio (PTVS) 343 | **/__pycache__/ 344 | *.pyc 345 | 346 | # Cake - Uncomment if you are using it 347 | #tools/** 348 | #!tools/packages.config 349 | 350 | # Tabs Studio 351 | *.tss 352 | 353 | # Telerik's JustMock configuration file 354 | *.jmconfig 355 | 356 | # BizTalk build output 357 | *.btp.cs 358 | *.btm.cs 359 | *.odx.cs 360 | *.xsd.cs 361 | 362 | # OpenCover UI analysis results 363 | OpenCover/ 364 | 365 | # Azure Stream Analytics local run output 366 | ASALocalRun/ 367 | 368 | # MSBuild Binary and Structured Log 369 | *.binlog 370 | MSBuild_Logs/ 371 | 372 | # AWS SAM Build and Temporary Artifacts folder 373 | .aws-sam 374 | 375 | # NVidia Nsight GPU debugger configuration file 376 | *.nvuser 377 | 378 | # MFractors (Xamarin productivity tool) working folder 379 | **/.mfractor/ 380 | 381 | # Local History for Visual Studio 382 | **/.localhistory/ 383 | 384 | # Visual Studio History (VSHistory) files 385 | .vshistory/ 386 | 387 | # BeatPulse healthcheck temp database 388 | healthchecksdb 389 | 390 | # Backup folder for Package Reference Convert tool in Visual Studio 2017 391 | MigrationBackup/ 392 | 393 | # Ionide (cross platform F# VS Code tools) working folder 394 | **/.ionide/ 395 | 396 | # Fody - auto-generated XML schema 397 | FodyWeavers.xsd 398 | 399 | # VS Code files for those working on multiple tools 400 | .vscode/* 401 | !.vscode/settings.json 402 | !.vscode/tasks.json 403 | !.vscode/launch.json 404 | !.vscode/extensions.json 405 | !.vscode/*.code-snippets 406 | 407 | # Local History for Visual Studio Code 408 | .history/ 409 | 410 | # Built Visual Studio Code Extensions 411 | *.vsix 412 | 413 | # Windows Installer files from build outputs 414 | *.cab 415 | *.msi 416 | *.msix 417 | *.msm 418 | *.msp 419 | 420 | # Local files 421 | local/** -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Executive Summary: AI Governance at Speed 2 | 3 | ## Bridging Governance Requirements and Developer Velocity with Foundry Citadel Platform 4 | 5 | --- 6 | 7 | ## The AI Governance Imperative 8 | 9 | As AI systems become more powerful and integrated into everyday life, **governance is no longer a "nice-to-have"; it's a must**. Whether you're aligning to emerging regulations like the EU AI Act, meeting internal standards for risk and safety, or ensuring your AI systems are meeting your enterprise's business goals with scale and efficiency, the ability to govern AI responsibly at speed is a game-changer. 10 | 11 | --- 12 | 13 | ## The Governance-Velocity Paradox 14 | 15 | Yet, **governance and developer velocity often feel fundamentally misaligned**. Organizations face critical bottlenecks: 16 | 17 | - **Manual Risk Assessments**: Frequently time-consuming and lacking standardization 18 | - **Scattered Evaluation Tools**: Fragmented across different teams and systems 19 | - **Unclear Governance Requirements**: Ambiguous policies that are difficult to operationalize 20 | - **Implementation Gaps**: Policies rarely map cleanly to real-world technical implementation 21 | 22 | **The result?** Bottlenecks and delays that frustrate both governance teams and developers, slowing AI adoption and increasing organizational risk. 23 | 24 | --- 25 | 26 | ## The Collaboration Challenge 27 | 28 | Effective AI governance demands a new balance—**one that enforces oversight without impeding innovation**. It also requires multiple stakeholders collaborating effectively with each other: 29 | 30 | ### 👔 **Compliance Officers & Chief AI Officers** 31 | Must determine **what needs to be assessed** to comply with company policies and regulations 32 | 33 | ### 👨💻 **AI Developers & Engineering Teams** 34 | Need to **operationalize these requirements** by generating the right qualitative and quantitative evidence 35 | 36 | **Unfortunately**, the handshake between these personas is often not smooth and can create friction in the governance process. Traditional methods tend to create friction, slowing down deployment or leading to incomplete compliance. **It's a trade-off most organizations can no longer afford.** 37 | 38 | --- 39 | 40 | ## The Foundry Citadel Platform Solution 41 | 42 | **That's where Foundry Citadel Platform steps in.** 43 | 44 | Foundry Citadel Platform is a comprehensive solution accelerator that bridges the gap between governance requirements and technical implementation, enabling organizations to: 45 | 46 | ### 🛡️ **Govern AI Responsibly** 47 | - **Unified AI Gateway**: Single control point for all AI model access with enterprise-wide policy enforcement 48 | - **Automated Compliance**: Built-in safety checks, content filtering, and policy validation without manual intervention 49 | - **Central AI Registry**: Catalog and govern all AI assets—models, agents, and tools—across the enterprise 50 | 51 | ### 📊 **Maintain Complete Visibility** 52 | - **Platform-Level Observability**: Centralized monitoring across all AI workloads without code changes 53 | - **Agent-Level Tracing**: Detailed execution paths for debugging and quality assurance 54 | - **Automated Evaluations**: Continuous quality, safety, and compliance assessments applied consistently 55 | 56 | ### 🚀 **Accelerate Innovation** 57 | - **Pre-built Templates**: One-click deployment of secure, governed AI environments 58 | - **Flexible Development Options**: From low-code (Azure Logic Apps Agent Loop), managed agents runtime (AI Foundry Agents) to pro-code (Microsoft Agent Framework, LangChain,...) 59 | - **DevOps Integration**: CI/CD pipelines with automated testing and evaluation 60 | 61 | --- 62 | 63 | ## Key Business Outcomes 64 | 65 | Organizations adopting Foundry Citadel Platform achieve: 66 | 67 | | Outcome | Impact | 68 | |---------|--------| 69 | | **🎯 Faster Time-to-Value** | Deploy AI solutions in days, not months, with pre-configured infrastructure | 70 | | **🔒 Reduced Risk** | Automated governance ensures compliance from day one | 71 | | **💰 Cost Control** | Granular usage tracking and quota enforcement per team/project | 72 | | **📈 Scalable Adoption** | Repeatable patterns that grow with your organization | 73 | | **🤝 Cross-Functional Alignment** | Clear contracts between governance and development teams | 74 | | **🌐 Universal Gateway & Registry** | Unified access, governance and discovery of central AI assets | 75 | 76 | --- 77 | 78 | ## Enterprise Statistics Driving Citadel Adoption 79 | 80 | Real-world enterprise challenges that Citadel addresses: 81 | 82 | - **62%** of practitioners cite **security concerns** as the top blocker to wider AI adoption 83 | - **71%** of enterprises struggle to **track AI usage, enforce quotas, and report costs** per team 84 | - **47%** of organizations require **explicit guardrails** before deploying autonomous AI agents safely 85 | - **70%** of customers need an **AI registry** for LLMs, agents, and tools to adopt AI at scale 86 | 87 | --- 88 | 89 | ## The Three Pillars of Foundry Citadel Platform 90 | 91 | ### 1️⃣ **Governance & Security** – *Trustworthy AI Operations at Scale* 92 | Without centralized AI governance, organizations face unpredictable costs, reliability issues, security risks, and compliance nightmares. Citadel builds guardrails into every AI call through: 93 | - Unified AI Gateway for centralized control 94 | - Granular access control and key management 95 | - Multi-cloud and hybrid support 96 | - AI content safety and prompt shields 97 | - Central AI registry for agents and tools 98 | 99 | ### 2️⃣ **Observability & Compliance** – *End-to-End Monitoring, Evaluation & Trust* 100 | Full visibility creates trust and confidence. Citadel provides holistic observability through: 101 | - **Platform-Level**: Centralized APM, usage tracking, automated evaluations, and enterprise alerting 102 | - **Agent-Level**: Detailed execution traces, performance monitoring, and debugging tools 103 | - **Rich Dashboards**: Integrated views for both operational and development teams 104 | 105 | ### 3️⃣ **AI Development Velocity** – *Accelerating Innovation with Templates & Tools* 106 | Build fast, build right. Citadel empowers teams to innovate quickly within established guardrails: 107 | - Pre-built deployment templates 108 | - Flexible agent development options (low-code to pro-code) 109 | - Citadel AI Registry for asset discovery and reuse 110 | - DevOps integration for continuous delivery 111 | 112 | --- 113 | 114 | ## Two-Tier Architecture for Enterprise Scale 115 | 116 | ### **Citadel Governance Hub (CGH)** – Central Control Plane 117 | The enterprise-wide governance layer providing: 118 | - Unified AI gateway for all centralized AI models and MCP tools access 119 | - Universal AI registry for discovery and cataloging 120 | - Platform-level evaluations and compliance reporting 121 | - Usage analytics and cost allocation 122 | - Centralized security and safety enforcement 123 | 124 | ### **Citadel Agent Spoke (CAS)** – Domain-Specific Deployments 125 | Secure, isolated environments for AI agent workloads featuring: 126 | - Azure AI Foundry with agent capabilities 127 | - Comprehensive AI services (Search, Cosmos DB, Storage) 128 | - Zero Trust architecture with private endpoints 129 | - Auto-scaling container infrastructure 130 | - Hub-spoke integration with enterprise networks 131 | 132 | --- 133 | 134 | ## From Challenge to Solution: The Citadel Advantage 135 | 136 | | Traditional Approach | Foundry Citadel Platform | 137 | |---------------------|-------------------------| 138 | | ❌ Manual risk assessments | ✅ Automated compliance checks | 139 | | ❌ Scattered evaluation tools | ✅ Unified observability platform | 140 | | ❌ Unclear requirements | ✅ Codified governance contracts | 141 | | ❌ Implementation gaps | ✅ Pre-built, proven patterns | 142 | | ❌ Friction between teams | ✅ Streamlined collaboration | 143 | | ❌ Slow deployment cycles | ✅ Rapid, repeatable deployments | 144 | 145 | --- 146 | 147 | ## Strategic Partnerships & Integrations 148 | 149 | Foundry Citadel Platform bridges the gap between governance requirements and technical implementation through strategic integrations: 150 | 151 | - **Azure AI Foundry**: Enterprise AI platform with advance model catalog, managed agent services, and AI evaluations/observability 152 | - **Azure API Management**: Unified AI gateway for governance and policy enforcement 153 | - **Azure Monitor & Application Insights**: Comprehensive observability 154 | - **Azure Content Safety**: Automated GenAI safety checks and content filtering 155 | - **Microsoft Entra ID**: Identity and access management 156 | - **Microsoft Defender for AI**: Threat detection and security monitoring 157 | - **Microsoft Purview**: Data governance and sensitivity labeling 158 | 159 | --- 160 | 161 | ## The Bottom Line 162 | 163 | **Effective AI governance no longer means choosing between speed and safety.** 164 | 165 | Foundry Citadel Platform enables organizations to: 166 | - ✅ **Deploy AI with confidence** – knowing governance is built-in, not bolted-on 167 | - ✅ **Scale AI responsibly** – with consistent policies across all projects 168 | - ✅ **Accelerate innovation** – within secure, compliant guardrails 169 | - ✅ **Bridge organizational silos** – aligning governance, development, and operations 170 | 171 | --- 172 | 173 | ## Call to Action 174 | 175 | The challenge of governing AI at speed is precisely why Foundry Citadel Platform exists. By providing a comprehensive solution that addresses governance, observability, and development velocity in an integrated way, Citadel transforms the traditional trade-off between control and innovation into a **synergistic relationship**. 176 | 177 | **Organizations can now:** 178 | 1. Establish enterprise-wide AI governance from day one 179 | 2. Empower developers with self-service, governed AI capabilities 180 | 3. Maintain complete visibility and control as AI adoption scales 181 | 4. Meet regulatory requirements with automated compliance evidence 182 | 5. Accelerate time-to-value while reducing organizational risk 183 | 184 | --- 185 | 186 | ## Next Steps 187 | 188 | To learn more about how Foundry Citadel Platform can help your organization govern AI responsibly at speed: 189 | 190 | - **📘 Review the Full Documentation**: See [README.md](./CITADEL-TECHNICAL-GUIDE.md) for comprehensive technical details 191 | - **🏗️ Explore the AI Hub Gateway (Citadel Governance Hub)**: Visit the [AI Hub Gateway repository](https://aka.ms/ai-hub-gateway) 192 | - **🤖 Deploy Citadel Agent Spoke (Citadel Agent Spoke)**: Check out the [AI Landing Zones repository](https://github.com/Azure/AI-Landing-Zones) 193 | - **💬 Engage with Our Team**: Reach out to discuss your specific governance and AI adoption challenges 194 | 195 | --- 196 | 197 | *"Build the future, safely"* – Foundry Citadel Platform provides the **speed** that business demands with the **safeguards** that IT requires, all in one comprehensive, evolving platform. 198 | -------------------------------------------------------------------------------- /Citadel-WAF-Alignment.md: -------------------------------------------------------------------------------- 1 | # Foundry Citadel Platform - Azure Well-Architected Framework Alignment 2 | 3 | > **How Citadel Implements Microsoft Well-Architected Framework Principles for AI Workloads** 4 | 5 | **Document Version:** 1.0 6 | **Last Updated:** November 10, 2025 7 | **Reference:** [Azure Well-Architected Framework - AI Design Principles](https://learn.microsoft.com/en-us/azure/well-architected/ai/design-principles) 8 | 9 | --- 10 | 11 | ## Overview 12 | 13 | The **Foundry Citadel Platform** is architected to align with the Microsoft Well-Architected Framework (WAF) for AI workloads, delivering enterprise-grade AI solutions through three core pillars: 14 | 15 | - **Governance & Security** - Enterprise-grade controls, responsible AI, and data protection 16 | - **Observability & Compliance** - Comprehensive monitoring, auditing, and regulatory compliance 17 | - **AI Development Velocity** - Accelerated development with best practices and automation 18 | 19 | This document demonstrates how Citadel's concrete technical implementations address the five WAF pillars: **Reliability**, **Security**, **Cost Optimization**, **Operational Excellence**, and **Performance Efficiency**. 20 | 21 | --- 22 | 23 | ## Architecture Alignment Summary 24 | 25 | | WAF Pillar | Alignment Status | Key Citadel Capabilities | 26 | |------------|------------------|--------------------------| 27 | | **Reliability** | 🟢 Strong | Multi-region support, high availability, automated failover, resilient architecture | 28 | | **Security** | 🟢 Strong | Zero Trust, content safety, RBAC, encryption, network isolation | 29 | | **Cost Optimization** | 🟢 Strong | Usage tracking, quota management, auto-scaling, cost attribution | 30 | | **Operational Excellence** | 🟢 Strong | Automated monitoring, CI/CD integration, DevOps/AIOps support | 31 | | **Performance Efficiency** | 🟢 Strong | Load balancing, auto-scaling, performance monitoring, quality metrics | 32 | 33 | **Legend:** 🟢 Strong | 🟡 Partial | 🔴 Limited 34 | 35 | --- 36 | 37 | ## 1. Reliability - Building Resilient AI Workloads 38 | 39 | ### WAF Principle: Design Reliable AI Systems 40 | 41 | Citadel ensures AI workloads remain available and can recover from failures while maintaining model performance over time. 42 | 43 | ### Citadel Implementation 44 | 45 | #### Multi-Region High Availability 46 | - **Multi-region LLM deployments** with automated failover for continuous service availability 47 | - **Reliable state** for conversation history and agent state on Cosmos DB and Azure Monitor 48 | - **Availability zones support** for critical components in supported regions 49 | 50 | #### Fault Tolerance & Resilience 51 | - **AI Gateway (Azure API Management)** provides circuit breakers, retry logic, and bulkhead patterns 52 | - **Distributed architecture** with separate Citadel Governance Hub and Citadel Agents Spoke landing zones following Hub/Spoke model 53 | - **Network isolation** with NSGs and private endpoints preventing cascading failures 54 | - **Service isolation** through containerization and separate resource boundaries where every agentic deployment in separate Citadel Agent Spoke with central RBAC through Citadel Governance Hub 55 | 56 | #### Operational Reliability 57 | - **Automated workflows via Logic Apps** reducing manual intervention and human error 58 | - **Azure AI Foundry managed runtime** providing reliable, maintained agent execution environment. 59 | - **Version-controlled infrastructure** using Bicep and source controlled configurations (Citadel Contracts) for consistent, repeatable deployments of both central components and day-2 configurations 60 | 61 | ### Key Features 62 | 63 | | Feature | Benefit | Implementation | 64 | |---------|---------|----------------| 65 | | Multi-Region LLM | Ensures API availability even during regional outages | Automated failover between LLM backends/regions | 66 | | High-Availability Gateway | 99.95% SLA for API requests | Azure API Management Premium tier with multi-availability-zones and/or multi-region | 67 | | Distributed Data Stores | Data remains accessible during failures | Leveraging Cosmos DB and Azure Monitor log analytics | 68 | | Auto-Recovery | Minimizes downtime from transient failures | Circuit breakers and exponential backoff retry policies | 69 | 70 | --- 71 | 72 | ## 2. Security - Protecting AI Workloads and Data 73 | 74 | ### WAF Principle: Secure AI Systems and Earn User Trust 75 | 76 | Citadel implements defense-in-depth security with Zero Trust architecture, content safety, and comprehensive data protection. 77 | 78 | ### Citadel Implementation 79 | 80 | #### Earn User Trust with Responsible AI 81 | - **Azure AI Content Safety integration** for all incoming requests and outgoing responses 82 | - **Prompt Shield protection** against jailbreak attempts and prompt injection attacks 83 | - **Protected content detection** screening for sensitive data at both AI Gateway level and at LLM model level (through Microsoft Purview) 84 | - **Bidirectional content moderation** ensuring both user inputs and AI outputs are safe 85 | - **Groundedness detection** validating AI responses against source documents to prevent hallucinations through AI Foundry Evals 86 | 87 | #### Data Protection at All Layers 88 | - **Encryption at rest** with platfrom managed keys 89 | - **Encryption in transit** enforced via HTTPS/TLS 1.2+ for all communication 90 | - **Private endpoints** for all AI services eliminating public internet exposure 91 | - **Network security groups (NSGs)** controlling traffic flow between subnets 92 | - **Virtual network integration** for all compute and data services 93 | 94 | #### Robust Access Management 95 | - **Gateway-keys pattern** - No direct API key exposure to users or applications 96 | - **Managed identities** for service-to-service authentication eliminating stored credentials 97 | - **Azure RBAC integration** providing granular permissions across all components 98 | - **Role-based authorization at AI Gateway** enforcing least-privilege access per user/team 99 | - **Azure Key Vault** for centralized secrets management with audit logging 100 | 101 | #### Network Segmentation & Zero Trust 102 | - **Zero Trust architecture** with assume breach mentality 103 | - **Dedicated subnets with NSGs** for each service tier (web, app, data, AI) 104 | - **Private networking** for container images, training data, and source code 105 | - **Separate landing zones** (CGH for governance, CAS for agents) with controlled connectivity 106 | - **Hub-spoke network topology** with centralized security controls 107 | 108 | #### Security Testing & Compliance 109 | - **CI/CD integration** for automated security scanning in deployment pipelines 110 | - **Security policy enforcement** at gateway level before requests reach AI services 111 | - **Container vulnerability scanning** in Azure Container Registry 112 | - **Microsoft Purview integration** for data classification and governance 113 | - **Audit logging** of all access and operations for compliance reporting 114 | 115 | #### Minimize Attack Surface 116 | - **Authentication required** for all inferencing endpoints - no anonymous access 117 | - **Constrained API design** through AI Gateway limiting exposed functionality 118 | - **API versioning and deprecation** allowing secure evolution of interfaces 119 | - **Rate/token limiting and throttling** preventing abuse and resource exhaustion 120 | - **Input validation** at multiple layers preventing injection attacks 121 | 122 | ### Key Features 123 | 124 | | Feature | Benefit | Implementation | 125 | |---------|---------|----------------| 126 | | Content Safety | Prevents harmful content from entering or leaving the system | Azure AI Content Safety with custom policies | 127 | | Zero Trust Networking | Eliminates implicit trust, reduces breach impact | Private endpoints, NSGs, no public internet access | 128 | | Managed Identities | No credentials in code or config files | Azure Managed Identity for all service-to-service auth | 129 | | Gateway-Keys Pattern | Centralized access control and monitoring | API keys managed at gateway, not exposed to clients | 130 | | Data Encryption | Protects sensitive data at rest and in transit | Platform managed or with CMK encryption with Key Vault integration | 131 | 132 | --- 133 | 134 | ## 3. Cost Optimization - Maximizing ROI 135 | 136 | ### WAF Principle: Optimize Costs Without Sacrificing Quality 137 | 138 | Citadel provides comprehensive cost visibility, tracking, and optimization to maximize return on AI investments. 139 | 140 | ### Citadel Implementation 141 | 142 | #### Determine Cost Drivers 143 | - **Granular usage analytics** tracking consumption by team, use case, and individual agent 144 | - **Token consumption trends** with historical analysis in Cosmos DB 145 | - **Cost attribution dashboard** in Citadel Governance Hub showing spend breakdown 146 | - **Resource tagging strategy** enabling chargeback and showback models 147 | - **Integrated Azure Cost Management** with budget alerts and forecasting 148 | 149 | #### Pay for What You Intend to Use 150 | - **Auto-scaling Container Apps & Foundry Agents** automatically adjusting compute based on demand 151 | - **Multiple AI service tiers** supporting different performance and cost profiles 152 | - **Serverless options** via Logic Apps and Azure Functions for event-driven workloads 153 | - **Consumption-based pricing** for applicable compoenets (like Azure OpenAI pay-as-you-go) 154 | - **Flexible deployment options** allowing teams to choose cost-performance balance 155 | 156 | #### Use What You Pay For (Minimize Waste) 157 | - **Token quotas and rate limiting** preventing accidental overspending 158 | - **Auto-scaling with scale-to-zero** deallocating resources during idle periods 159 | - **Centralized monitoring** of utilization metrics identifying underused resources 160 | - **Cost accountability** assigned to operations teams with regular reviews 161 | - **Automated resource cleanup** removing unused deployments and test environments 162 | 163 | #### Optimize Operational Costs 164 | - **Automated workflows** via Logic Apps reducing manual operational overhead 165 | - **PaaS-first approach** minimizing infrastructure management costs 166 | - **Shared infrastructure** across multiple agents and teams reducing duplication 167 | - **DevOps automation** reducing time-to-market and manual deployment costs 168 | 169 | ### Key Features 170 | 171 | | Feature | Benefit | Implementation | 172 | |---------|---------|----------------| 173 | | Usage Analytics | Understand where AI costs are incurred | Real-time dashboards with drill-down by dimension | 174 | | Token Quotas | Prevent runaway costs from misbehaving agents | Configurable limits per user/team/agent | 175 | | Auto-Scaling | Pay only for active workloads | Container Apps/Foundry Agents with scale-to-zero capability | 176 | | Cost Attribution | Chargeback/showback to business units | Tagging and reporting by cost center | 177 | | Monitoring & Alerts | Proactive cost anomaly detection | Azure Monitor alerts on budget thresholds | 178 | 179 | --- 180 | 181 | ## 4. Operational Excellence - Automation and Continuous Improvement 182 | 183 | ### WAF Principle: Streamline Operations and Enable Innovation 184 | 185 | Citadel enables DevOps, and GenAIOps practices with comprehensive automation, monitoring, and safe deployment patterns. 186 | 187 | ### Citadel Implementation Recommendations 188 | 189 | #### Minimize Operational Burden 190 | - **PaaS-first architecture** using managed services (AI Foundry, API Management, Container Apps) 191 | - **Managed identities** eliminating credential rotation and secret management overhead 192 | - **Automated workflow orchestration** via Logic Apps for common operational tasks 193 | - **Infrastructure-as-Code (Bicep)** enabling one-click deployments and consistent environments 194 | - **Template-based agent deployment** empowering developers while maintaining governance 195 | 196 | #### Automated Monitoring with Actionable Alerts 197 | - **Azure Monitor Application Insights** integrated across all components 198 | - **Comprehensive dashboards** at platform and individual agent levels 199 | - **Actionable alerts** with context-specific remediation guidance 200 | - **Enterprise notification integration** (Teams, email, ticketing systems) 201 | - **Automated quality measurements** with trend analysis and anomaly detection 202 | - **End-to-end tracing** from user request through AI processing to response 203 | 204 | #### Detect and Mitigate Model Performance Issues 205 | - **Automated evaluations** at platform level measuring groundedness, relevance, coherence 206 | - **CI/CD integration** for regression testing before deployment 207 | - **Quality metrics tracking** over time identifying model drift 208 | - **Conversation replay capability** for debugging and quality analysis 209 | - **Feedback collection** from users feeding continuous improvement 210 | 211 | #### Safe Deployments 212 | - **CI/CD pipelines** with automated testing gates 213 | - **Multiple deployment strategies** support (blue/green, canary, rolling updates) 214 | - **Pre-production testing environments** mirroring production configuration 215 | - **Automated rollback capabilities** when health checks fail 216 | - **Change tracking and audit logs** for compliance and troubleshooting 217 | 218 | #### Evaluate and Improve User Experience 219 | - **User feedback mechanisms** integrated into agent interfaces 220 | - **Conversation logging with consent** enabling analysis and improvement 221 | - **Engagement metrics** tracking user satisfaction and agent effectiveness 222 | - **Session analytics** understanding user behavior patterns 223 | - **Continuous improvement loop** from feedback to model refinement 224 | 225 | ### Key Features 226 | 227 | | Feature | Benefit | Implementation | 228 | |---------|---------|----------------| 229 | | Comprehensive Monitoring | Full visibility into AI workload health | Application Insights with custom metrics and logs | 230 | | Automated Evaluations | Ensure quality before and after deployment | AI Foundry evaluation pipelines in CI/CD | 231 | | DevOps Integration | Accelerate development while maintaining quality | GitHub/Azure DevOps with automated gates | 232 | | Feedback Loops | Continuous improvement from production insights | User feedback, conversation analytics, quality metrics | 233 | | Infrastructure-as-Code | Consistent, repeatable deployments | Bicep templates with version control | 234 | 235 | --- 236 | 237 | ## 5. Performance Efficiency - Optimizing AI Workload Performance 238 | 239 | ### WAF Principle: Meet Performance Requirements Efficiently 240 | 241 | Citadel ensures AI workloads meet performance targets through proper resource allocation, monitoring, and continuous optimization. 242 | 243 | ### Citadel Implementation 244 | 245 | #### Establish Performance Benchmarks 246 | - **Agent-level performance monitoring** tracking latency, throughput, and token consumption 247 | - **Quality metrics tracking** measuring groundedness, relevance, coherence, fluency 248 | - **Continuous re-evaluation** ensuring performance remains within acceptable ranges 249 | - **Baseline establishment** for each agent type and use case 250 | - **Performance trend analysis** identifying degradation over time 251 | 252 | #### Evaluate and Right-Size Resources 253 | - **Multiple SKU options** allowing teams to balance performance and cost 254 | - **Load balancing via Application Gateway** distributing traffic for optimal resource utilization 255 | - **Auto-scaling Container Apps** dynamically adjusting resources based on actual demand 256 | - **Container resource quotas** preventing resource contention and ensuring fair allocation 257 | 258 | #### Collect and Analyze Performance Metrics 259 | - **Telemetry from all layers** - data pipeline, orchestration, model inference, and UI 260 | - **Query latency and throughput tracking** with percentile analysis (p50, p95, p99) 261 | - **End-to-end tracing** of agent execution identifying bottlenecks 262 | - **Token consumption monitoring** optimizing prompt engineering for efficiency 263 | - **Near real-time dashboards** enabling quick performance issue identification 264 | 265 | #### Continuous Performance Improvement 266 | - **Automated metric collection** feeding analysis and optimization 267 | - **CI/CD integration** for performance regression testing 268 | - **Production feedback loops** informing optimization decisions 269 | - **Performance optimization recommendations** based on observed patterns 270 | - **Caching strategies** reducing redundant processing and API calls 271 | 272 | #### Load Balancing and Distribution 273 | - **Multi-region load distribution** balancing traffic across Azure LLM deployments 274 | 275 | ### Key Features 276 | 277 | | Feature | Benefit | Implementation | 278 | |---------|---------|----------------| 279 | | Performance Monitoring | Near real-time visibility into latency and throughput | Application Insights with custom telemetry | 280 | | Auto-Scaling | Automatically match resources to demand | Container Apps with CPU/memory-based triggers | 281 | | Load Balancing | Distribute traffic for optimal performance | Application Gateway with backend health monitoring | 282 | | Quality Metrics | Ensure AI outputs meet standards efficiently | Automated evaluation of groundedness, relevance | 283 | | Resource Optimization | Right-size compute for cost-performance balance | Monitoring with recommendations engine | 284 | 285 | --- 286 | 287 | ## Cross-Cutting Capabilities 288 | 289 | ### Governance & Control 290 | 291 | Citadel Governance Hub (CGH) provides centralized governance across all WAF pillars: 292 | 293 | - **Policy Enforcement** - Centralized security, cost, and quality policies applied consistently 294 | - **Usage Analytics** - Real-time visibility into consumption patterns and costs 295 | - **Compliance Reporting** - Audit trails, access logs, and regulatory compliance dashboards 296 | - **Resource Management** - Centralized control over AI model deployments and configurations 297 | - **Team Isolation** - Multi-tenancy with resource boundaries and access controls 298 | 299 | ### Observability 300 | 301 | Comprehensive observability enables all WAF pillars: 302 | 303 | - **Application Insights Integration** - Full-stack monitoring from UI to AI backend 304 | - **Custom Dashboards** - Role-specific views for developers, operations, security, executives 305 | - **Distributed Tracing** - End-to-end request tracking across service boundaries 306 | - **Log Aggregation** - Centralized logging with advanced query and analysis capabilities 307 | - **Alerting & Notification** - Context-aware alerts with automated remediation 308 | 309 | ### DevOps & Automation 310 | 311 | Platform automation accelerates delivery while maintaining quality: 312 | 313 | - **CI/CD Pipelines** - Automated build, test, deploy for agents and infrastructure 314 | - **Infrastructure-as-Code** - Bicep templates for consistent environment provisioning 315 | - **Automated Testing** - Unit, integration, and quality tests in deployment pipeline 316 | - **Version Control** - Git-based workflow for code, configuration, and policies 317 | - **Self-Service Deployment** - Empowering teams while maintaining governance guardrails 318 | 319 | --- 320 | 321 | ## Well-Architected Framework Trade-offs 322 | 323 | Citadel provides balanced approaches to common WAF trade-offs: 324 | 325 | ### Security vs. Performance 326 | - **Configurable security levels** - Adjust Content Safety strictness based on use case 327 | - **Private endpoints optional** - Choose network isolation vs. simplified connectivity 328 | 329 | ### Cost vs. Reliability 330 | - **Multi-region optional** - Deploy single-region for cost, multi-region for high availability 331 | - **Tiered deployment patterns** - Basic, standard, premium configurations with clear trade-offs 332 | - **Auto-scaling boundaries** - Set maximum scale to control costs while ensuring performance 333 | 334 | ### Performance vs. Cost 335 | - **PTU vs. PAYG** - Choose reserved vs. consumption pricing for Azure LLM 336 | - **Caching strategies** - Reduce costs and improve performance for repeated queries 337 | 338 | ### Developer Velocity vs. Governance 339 | - **Template-based with guardrails** - Teams deploy reusable templates within policy boundaries 340 | - **Automated compliance** - Security scanning and policy enforcement in centrally 341 | - **Flexible approval gates** - Required for production, optional for development 342 | 343 | --- 344 | 345 | ## Getting Started with WAF Alignment 346 | 347 | ### 1. Assess Your Requirements 348 | 349 | Determine your priorities across WAF pillars: 350 | 351 | - **Mission-critical workloads** - Emphasize reliability and security 352 | - **Cost-sensitive projects** - Focus on cost optimization and right-sizing 353 | - **Innovation initiatives** - Prioritize developer velocity and experimentation 354 | - **Regulated industries** - Ensure security and compliance 355 | 356 | ### 2. Configure Citadel for Your Needs 357 | 358 | Citadel's modular architecture allows customization: 359 | 360 | - **Network topology** - Hub-spoke vs. single VNet based on isolation needs 361 | - **Deployment scope** - Single-region vs. multi-region based on availability requirements 362 | - **Compute tier** - Container Apps vs. AI Foundry based on control and cost needs 363 | - **Monitoring depth** - Adjust telemetry collection based on operational requirements 364 | 365 | ### 3. Implement Best Practices 366 | 367 | Follow Citadel's reference implementations: 368 | 369 | - **Use Infrastructure-as-Code** - Deploy via Bicep templates for consistency 370 | - **Enable all security features** - Private endpoints, managed identities, Content Safety 371 | - **Configure monitoring** - Set up dashboards and alerts appropriate for your team 372 | - **Establish governance** - Define policies, quotas, and approval workflows 373 | 374 | ### 4. Continuous Improvement 375 | 376 | Leverage Citadel's observability for ongoing optimization: 377 | 378 | - **Review cost reports** - Monthly analysis of spending patterns and optimization opportunities 379 | - **Monitor performance** - Track latency and quality metrics, adjust resources as needed 380 | - **Security audits** - Regular review of access logs, security alerts, compliance status 381 | - **Model quality** - Continuous evaluation and refinement based on production feedback 382 | 383 | --- 384 | 385 | ## Additional Resources 386 | 387 | ### Documentation 388 | - [Citadel Technical Guide](./CITADEL-TECHNICAL-GUIDE.md) - Complete platform architecture and components 389 | - [Contributing Guide](./CONTRIBUTING.md) - How to extend and customize Citadel 390 | 391 | ### External References 392 | - [Azure Well-Architected Framework](https://learn.microsoft.com/en-us/azure/well-architected/) 393 | - [Azure Well-Architected Framework for AI](https://learn.microsoft.com/en-us/azure/well-architected/ai/) 394 | - [Azure AI Foundry Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/) 395 | - [Responsible AI Principles](https://www.microsoft.com/en-us/ai/responsible-ai) 396 | 397 | --- 398 | 399 | ## Conclusion 400 | 401 | The **Foundry Citadel Platform** provides comprehensive alignment with the Microsoft Well-Architected Framework for AI workloads through: 402 | 403 | ✅ **Strong Security** - Zero Trust, content safety, encryption, and access management 404 | ✅ **High Reliability** - Multi-region support, fault tolerance, and automated recovery 405 | ✅ **Cost Efficiency** - Granular tracking, quotas, auto-scaling, and optimization 406 | ✅ **Operational Excellence** - Comprehensive monitoring, automation, and safe deployments 407 | ✅ **Performance Optimization** - Load balancing, auto-scaling, and continuous monitoring 408 | 409 | By building on Azure's platform services and implementing proven patterns, Citadel enables organizations to deploy enterprise-grade AI solutions that balance governance, security, cost, performance, and innovation velocity—all while maintaining alignment with Microsoft's Well-Architected Framework principles. 410 | 411 | --- -------------------------------------------------------------------------------- /CITADEL-TECHNICAL-GUIDE.md: -------------------------------------------------------------------------------- 1 | # Foundry Citadel Platform 2 | 3 | >*Scalable **AI Landing Zone** with Governance, Observability & Rapid Development* 4 | 5 | Foundry **Citadel** Platform is a solution accelerator designed as a **supplemental AI landing zone** that integrates seamlessly with your Azure environment. It provides a **secure, scalable foundation** for running AI applications and agents in production – with **unified governance**, **end-to-end observability**, and tools to **accelerate development**. Citadel delivers a **pre-configured reference architecture** (aligned to Azure’s Cloud Adoption and Well-Architected Frameworks) that can be deployed with one click and includes ready-made code, templates, and documentation following Microsoft’s best practices. This comprehensive approach helps organisations adopt AI **responsibly and efficiently**, ensuring that advanced AI agents can be developed **quickly** while remaining **well-managed** and **compliant** with enterprise requirements. 6 | 7 | > ### Citadel Adoption Signals 8 | > _Enterprise teams highlight these blockers and enablers for scaling AI responsibly._ 9 | 10 | | 🛡️ **Security** | 📊 **Consumption** | 🧭 **Guardrails** | 🗂️ **Registry** | 11 | | --- | --- | --- | --- | 12 | | **62 %** of practitioners cite security concerns as the top blocker to wider AI or agent adoption. | **71 %** of enterprises struggle to track AI usage, enforce quotas, and report costs per team. | **47 %** of organisations require explicit guardrails before deploying autonomous AI agents safely. | **70 %** of customers need an AI registry for both agents and tools to adopt AI at scale. | 13 | 14 | > 🧩 _Citadel turns these pain points into platform strengths—governed access, transparent consumption, defensible guardrails, and a shared catalog of reusable AI capabilities._ 15 | 16 | These challenges highlight why **Citadel’s capabilities** are crucial. **Foundry Citadel Platform** focuses on **three key pillars** – **Governance & Security**, **Observability & Compliance**, and **AI Development Velocity** – to address these concerns end-to-end. Below, we outline each pillar and the core capabilities provided, with architecture components and features that ensure enterprise-grade AI deployments: 17 | 18 | *** 19 | 20 | ## **1. Governance & Security Pillar** – *Trustworthy AI Operations at Scale* 21 | 22 | > ### Why Governance Matters 23 | > Without centralized AI governance, organisations face **unpredictable costs, reliability issues, security risks, developer friction,** and compliance nightmares. Citadel fixes this by building guardrails into every AI call. 24 | 25 | **Foundry Citadel Platform** implements strong governance and security controls so that enterprises can adopt generative AI **safely and in compliance**. Key capabilities of this pillar include a **unified AI gateway** for all model access, granular policy enforcement, and robust safety mechanisms: 26 | 27 | * **🔐 Unified AI Gateway:** At the core of Citadel’s security is the **“AI Gateway”** – a central entry point (built on Azure API Management) through which **all AI model requests** are routed. This gateway enforces organisation-wide policies consistently. For example, it implements **universal LLM policies like rate limiting and token quotas** to prevent misuse or cost overrun. **No application calls the model directly**; instead, apps call the gateway, which authenticates and forwards requests to the appropriate model (Azure OpenAI, open-source, or even third-party services like Amazon Bedrock) while applying the required controls. This design **centralises oversight** of all AI consumption. 28 | 29 | * **🗝️ **Granular Access Control & Key Management:**** Citadel’s gateway introduces a **gateway-keys model access pattern** for developers. Rather than embedding master API keys for various AI services, teams use **managed credentials issued by the gateway**. 30 | The gateway can map these to backend keys or identity tokens, ensuring that **no master keys are directly exposed** in code. Access can be segmented by team or use-case, with **role-based authorisation** (e.g. only approved apps or users can invoke certain AI endpoints) for greater security. This prevents uncontrolled use of AI services and allows rapid **revocation or rotation** of credentials from a single place. 31 | 32 | * **🔑 Credential Management:** Citadel secures API keys and service credentials by leveraging Azure Key Vault. Secrets are stored securely and accessed at runtime, ensuring that **no raw keys are exposed in code or logs**. 33 | 34 | * **🛡️ Policy Enforcement and Compliance:** The governance layer allows administrators to define and enforce a range of **custom policies**. These include **traffic mediation rules** (e.g. routing requests to different model endpoints based on content or load) and **usage policies** (per-user or per-app call rate limits and monthly token budgets). It also supports complex **expressions for policies** – for example, automatically choosing an Azure OpenAI instance in a specific region for compliance, or requiring certain **request headers/tags for auditing**. All usage is captured centrally, enabling compliance auditing and simplifying answer to the question *“Who is using which model, and how?”*. 35 | 36 | * **🌐 Multi-Cloud and Hybrid Support:** Citadel’s governance is flexible – it can govern not only Azure OpenAI, but also **open-source model servers or third-party AI APIs**. The AI Gateway speaks **OpenAI-compatible APIs** natively, meaning it can front-end virtually any generative model service. For instance, it can direct certain requests to Azure OpenAI or to an on-premises GPU-VM model, or even to Amazon Bedrock, all under the same policy umbrella. This multi-cloud ability gives organisations a **single control plane** for heterogeneous AI systems. Citadel’s gateway and related services can themselves run on-premises if needed (via APIM self-hosted gateways), supporting scenarios with strict data residency or partially air-gapped networks. 37 | 38 | * **🛡️ AI Content Safety & Guardrails:** Citadel includes built-in **AI safety** mechanisms to enforce responsible AI usage. Every request and response can be scanned by **Azure AI Content Safety** – which detects **hate speech, violent or sexual content, self-harm indications, and other harmful outputs**. If an application user tries to prompt an agent to produce disallowed content or if a model’s answer contains such content, the system can **block or filter** that response automatically. Citadel’s safety system also includes **“prompt shields”** that detect attempts to jailbreak the agent with malicious instructions hidden in user input or documents. This protects the AI agents from executing unintended commands. Additionally, **“protected content”** checks can recognise if a model’s answer includes large verbatim excerpts of known copyrighted text (lyrics, articles, etc.) and prevent accidental leakage of such content. These guardrails give organisations confidence that AI systems won’t go off-policy or create liability. 39 | 40 | * **📊 Central Monitoring & Cost Governance:** All AI usage through the gateway is logged centrally (calls, tokens used, timings, outcome). FCP provides **built-in reports and dashboards** to track this usage by application or department. This **solves the cost attribution problem** – e.g. you can see how many tokens the Finance team’s chatbot consumed this week and enforce per-team quotas. It also enables **cost optimisation** – detecting anomalous spikes or inefficient prompt usage. Combined with Azure Monitor, admins can set **alerts** (e.g. if a project exceeds its monthly AI budget, or if a spike in requests suggests a rogue script). By providing this transparency and control, Citadel helps prevent the “blank cheque” scenario of uncontrolled AI API spend. It effectively addresses the **“shadow AI”** governance nightmare by keeping all AI calls within the managed guardrails. 41 | 42 | * **📘 Central AI Registry for Agents and Tools:** FCP provides a unified **AI Registry** powered by the **Model Context Protocol (MCP)**, enabling organisations to manage and discover both **first-party** and **third-party** AI agents and tools. This registry acts as a central catalog where teams can securely share, document, and govern AI capabilities across the enterprise. By standardising metadata and access policies, the registry ensures that all agents and tools – whether developed in-house or sourced externally – are easily discoverable and can be integrated seamlessly into workflows. This capability fosters collaboration, reduces duplication of effort, and ensures consistent governance for all AI assets. 43 | 44 | * **🔒 Data Security:** Citadel ensures the protection of sensitive data in AI workflows by integrating with Microsoft Purview. This enables governance through **data sensitivity labels and policies**, ensuring that sensitive information remains within approved boundaries. For example, an AI agent accessing a database will operate under Purview’s oversight, with all usage logged and any policy violations (such as accessing restricted customer data) flagged for review. 45 | 46 | **Governance & Security Features and Components:** The table below summarises some of the key governance components of Citadel and their roles: 47 | 48 |
| Governance Feature | 51 |Description | 52 |
|---|---|
| Unified AI Gateway | 55 |Central gateway that mediates every AI call. It applies global policies (rate limits, authentication, routing) and provides a single secure endpoint for clients. This ensures all AI usage is centrally visible and controlled. | 56 |
| Policy Engine | 59 |Rich rule framework to enforce business rules – e.g. restrict certain models to specific regions, apply token quotas per user, or inject safety prompts. Administrators can write custom policies or use built-in templates for common requirements. | 60 |
| Managed Credentials | 63 |Uses gateway-keys with/without Identity Platform issued tokens (like Microsoft Entra ID) to abstract backend secrets. Developers no longer handle raw AI services master keys – the gateway issues tokens/keys with scoped access. This prevents key leakage and allows instant revocation if needed. | 64 |
| Content Safety Filters | 67 |Automated checks on prompts and responses using Azure AI Content Safety. Flags or removes profanity, hate, sexual or violent content, and can block outputs that violate compliance policies (e.g. privacy or confidential data). | 68 |
| AI Registry & Catalog | 71 |A registry (via Azure API Center) for discovering and managing AI endpoints and tools (known as MCP servers). This catalogue lets teams securely share AI “skills” (Agents, APIs, functions) across the enterprise with proper metadata and governance. | 72 |
| Multi-cloud Connectors | 75 |Built-in support to govern AI services beyond Azure. The gateway can proxy requests to open-source model APIs or other cloud’s AI endpoints (e.g. Bedrock) securely. This ensures consistent security and monitoring even for third-party AI services. | 76 |
| Azure Key Vault | 79 |Secure store for secrets and credentials by AI Apps/Agents. All API keys, connection strings, etc., used by agents or the gateway are kept in spoke Key Vault, and accessed via managed identities. This eliminates hard-coded secrets and protects sensitive data at rest. | 80 |
| Observability Feature | 183 |Purpose | 184 |Layer | 185 |
|---|---|---|
| Central APM Monitoring | 188 |Infrastructure-level monitoring, resource utilisation, and system health indicators across all AI workloads without requiring agent code changes. | 189 |Platform | 190 |
| Usage Analytics & Cost Tracking | 193 |Granular tracking of token consumption, request patterns, and cost allocation segmented by team, use case, or agent for enterprise resource management. | 194 |Platform | 195 |
| Centralised AI Evaluations | 198 |Automated quality, safety, and compliance evaluations applied consistently across all agents without requiring code modifications from development teams. | 199 |Platform | 200 |
| Enterprise Alerting & Remediation | 203 |Sophisticated alerting with automated responses for sensitive use cases, including agent disabling, human escalation, and compliance notifications. | 204 |Platform | 205 |
| End-to-End Tracing | 208 |Captures every step of an AI agent's reasoning and interactions (prompts, tool calls, responses), enabling transparent debugging and post-mortem analysis. | 209 |Agent | 210 |
| Agent Performance Monitoring | 213 |Detailed real-time metrics including response latency breakdown, token usage patterns, tool efficiency, and resource consumption per agent. | 214 |Agent | 215 |
| Agent-Specific Evaluations | 218 |Tailored quality metrics for individual agent behaviours including intent fulfilment, tool use correctness, and reasoning efficiency. | 219 |Agent | 220 |
| Advanced Debugging Tools | 223 |Powerful querying, filtering, and trace replay capabilities for root-cause analysis and issue reproduction at the agent level. | 224 |Agent | 225 |
| Unified Dashboards | 228 |Integrated visual dashboards providing both platform-wide overviews and agent-specific drill-downs for comprehensive operational visibility. | 229 |Both | 230 |
| Continuous Improvement Loop | 233 |Connects operational data back to development with CI/CD integration, A/B testing, and regression detection for ongoing AI system enhancement. | 234 |Both | 235 |
| Development Accelerator | 272 |Role in Citadel | 273 |
|---|---|
| Deployment Templates | 276 |Pre-built, one-click templates (i.e. Bicep) to provision a secure, governed cloud environment for single or multiple agent types, accelerating time-to-production. | 277 |
| Flexible Agent Runtimes (Copilot Studio, Managed Runtime, BYO) |
280 | Supports a spectrum of development models, from low-code (Copilot Studio) to managed services (AI Foundry Agent Service) and fully custom "Bring-Your-Own" orchestrators (Semantic Kernel, LangChain), allowing teams to choose the best fit for their use case. | 281 |
| Citadel AI Registry | 284 |A central, governed catalog for discovering, managing, and reusing AI assets. It provides managed access to LLMs and tools and allows teams to publish their own, fostering collaboration and preventing redundant work. | 285 |
| Reusable Blueprints (Gold Standard Solutions) |
288 | End-to-end solution examples that demonstrate common AI patterns. They serve as accelerators for new projects, embodying proven architectures and best practices. | 289 |
| DevOps Integration | 292 |Integrates with GitHub and Azure DevOps for CI/CD, automated testing, and lifecycle management of AI solutions. Supports A/B testing and canary releases to bring modern software engineering speed to AI development. | 293 |