├── assets
    ├── citadel.png
    ├── AIAC-1.1.0.png
    └── AIAC-Governance-1.1.0.png
├── CHANGELOG.md
├── .github
    ├── CODE_OF_CONDUCT.md
    ├── ISSUE_TEMPLATE.md
    └── PULL_REQUEST_TEMPLATE.md
├── LICENSE.md
├── CONTRIBUTING.md
├── .gitignore
├── README.md
├── Citadel-WAF-Alignment.md
└── CITADEL-TECHNICAL-GUIDE.md


/assets/citadel.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/foundry-citadel-platform/HEAD/assets/citadel.png


--------------------------------------------------------------------------------
/assets/AIAC-1.1.0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/foundry-citadel-platform/HEAD/assets/AIAC-1.1.0.png


--------------------------------------------------------------------------------
/assets/AIAC-Governance-1.1.0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Azure-Samples/foundry-citadel-platform/HEAD/assets/AIAC-Governance-1.1.0.png


--------------------------------------------------------------------------------
/CHANGELOG.md:
--------------------------------------------------------------------------------
 1 | ## [project-title] Changelog
 2 | 
 3 | <a name="x.y.z"></a>
 4 | # x.y.z (yyyy-mm-dd)
 5 | 
 6 | *Features*
 7 | * ...
 8 | 
 9 | *Bug Fixes*
10 | * ...
11 | 
12 | *Breaking Changes*
13 | * ...
14 | 


--------------------------------------------------------------------------------
/.github/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
 1 | # Microsoft Open Source Code of Conduct
 2 | 
 3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
 4 | 
 5 | Resources:
 6 | 
 7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
 8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
 9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
10 | 


--------------------------------------------------------------------------------
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | <!--
 2 | IF SUFFICIENT INFORMATION IS NOT PROVIDED VIA THE FOLLOWING TEMPLATE THE ISSUE MIGHT BE CLOSED WITHOUT FURTHER CONSIDERATION OR INVESTIGATION
 3 | -->
 4 | > Please provide us with the following information:
 5 | > ---------------------------------------------------------------
 6 | 
 7 | ### This issue is for a: (mark with an `x`)
 8 | ```
 9 | - [ ] bug report -> please search issues before submitting
10 | - [ ] feature request
11 | - [ ] documentation issue or request
12 | - [ ] regression (a behavior that used to work and stopped in a new release)
13 | ```
14 | 
15 | ### Minimal steps to reproduce
16 | >
17 | 
18 | ### Any log messages given by the failure
19 | >
20 | 
21 | ### Expected/desired behavior
22 | >
23 | 
24 | ### OS and Version?
25 | > Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
26 | 
27 | ### Versions
28 | >
29 | 
30 | ### Mention any other details that might be useful
31 | 
32 | > ---------------------------------------------------------------
33 | > Thanks! We'll be in touch soon.
34 | 


--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
 1 | ## Purpose
 2 | <!-- Describe the intention of the changes being proposed. What problem does it solve or functionality does it add? -->
 3 | * ...
 4 | 
 5 | ## Does this introduce a breaking change?
 6 | <!-- Mark one with an "x". -->
 7 | ```
 8 | [ ] Yes
 9 | [ ] No
10 | ```
11 | 
12 | ## Pull Request Type
13 | What kind of change does this Pull Request introduce?
14 | 
15 | <!-- Please check the one that applies to this PR using "x". -->
16 | ```
17 | [ ] Bugfix
18 | [ ] Feature
19 | [ ] Code style update (formatting, local variables)
20 | [ ] Refactoring (no functional changes, no api changes)
21 | [ ] Documentation content changes
22 | [ ] Other... Please describe:
23 | ```
24 | 
25 | ## How to Test
26 | *  Get the code
27 | 
28 | ```
29 | git clone [repo-address]
30 | cd [repo-name]
31 | git checkout [branch-name]
32 | npm install
33 | ```
34 | 
35 | * Test the code
36 | <!-- Add steps to run the tests suite and/or manually test -->
37 | ```
38 | ```
39 | 
40 | ## What to Check
41 | Verify that the following are valid
42 | * ...
43 | 
44 | ## Other Information
45 | <!-- Add any other helpful information that may be needed here. -->


--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
 1 |     MIT License
 2 | 
 3 |     Copyright (c) Microsoft Corporation.
 4 | 
 5 |     Permission is hereby granted, free of charge, to any person obtaining a copy
 6 |     of this software and associated documentation files (the "Software"), to deal
 7 |     in the Software without restriction, including without limitation the rights
 8 |     to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 |     copies of the Software, and to permit persons to whom the Software is
10 |     furnished to do so, subject to the following conditions:
11 | 
12 |     The above copyright notice and this permission notice shall be included in all
13 |     copies or substantial portions of the Software.
14 | 
15 |     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 |     IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 |     FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 |     AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 |     LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 |     OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 |     SOFTWARE


--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
 1 | # Contributing to [project-title]
 2 | 
 3 | This project welcomes contributions and suggestions.  Most contributions require you to agree to a
 4 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
 5 | the rights to use your contribution. For details, visit [Contributor License Agreements](https://cla.opensource.microsoft.com).
 6 | 
 7 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide
 8 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
 9 | provided by the bot. You will only need to do this once across all repos using our CLA.
10 | 
11 |  - [Code of Conduct](#coc)
12 |  - [Issues and Bugs](#issue)
13 |  - [Feature Requests](#feature)
14 |  - [Submission Guidelines](#submit)
15 | 
16 | ## <a name="coc"></a> Code of Conduct
17 | Help us keep this project open and inclusive. Please read and follow our [Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
18 | 
19 | ## <a name="issue"></a> Found an Issue?
20 | If you find a bug in the source code or a mistake in the documentation, you can help us by
21 | [submitting an issue](#submit-issue) to the GitHub Repository. Even better, you can
22 | [submit a Pull Request](#submit-pr) with a fix.
23 | 
24 | ## <a name="feature"></a> Want a Feature?
25 | You can *request* a new feature by [submitting an issue](#submit-issue) to the GitHub
26 | Repository. If you would like to *implement* a new feature, please submit an issue with
27 | a proposal for your work first, to be sure that we can use it.
28 | 
29 | * **Small Features** can be crafted and directly [submitted as a Pull Request](#submit-pr).
30 | 
31 | ## <a name="submit"></a> Submission Guidelines
32 | 
33 | ### <a name="submit-issue"></a> Submitting an Issue
34 | Before you submit an issue, search the archive, maybe your question was already answered.
35 | 
36 | If your issue appears to be a bug, and hasn't been reported, open a new issue.
37 | Help us to maximize the effort we can spend fixing issues and adding new
38 | features, by not reporting duplicate issues.  Providing the following information will increase the
39 | chances of your issue being dealt with quickly:
40 | 
41 | * **Overview of the Issue** - if an error is being thrown a non-minified stack trace helps
42 | * **Version** - what version is affected (e.g. 0.1.2)
43 | * **Motivation for or Use Case** - explain what are you trying to do and why the current behavior is a bug for you
44 | * **Browsers and Operating System** - is this a problem with all browsers?
45 | * **Reproduce the Error** - provide a live example or a unambiguous set of steps
46 | * **Related Issues** - has a similar issue been reported before?
47 | * **Suggest a Fix** - if you can't fix the bug yourself, perhaps you can point to what might be
48 |   causing the problem (line of code or commit)
49 | 
50 | You can file new issues by providing the above information at the corresponding repository's issues link: 
51 | replace`[organization-name]` and `[repository-name]` in
52 | `https://github.com/[organization-name]/[repository-name]/issues/new` .
53 | 
54 | ### <a name="submit-pr"></a> Submitting a Pull Request (PR)
55 | Before you submit your Pull Request (PR) consider the following guidelines:
56 | 
57 | * Search the repository's [pull requests](https://github.com/[organization-name]/[repository-name]/pulls) for an open or closed PR
58 |   that relates to your submission. You don't want to duplicate effort.
59 | 
60 | * Make your changes in a new git fork:
61 | 
62 | * Commit your changes using a descriptive commit message
63 | * Push your fork to GitHub:
64 | * In GitHub, create a pull request
65 | * If we suggest changes then:
66 |   * Make the required updates.
67 |   * Rebase your fork and force push to your GitHub repository (this will update your Pull Request):
68 | 
69 |     ```shell
70 |     git rebase main -i
71 |     git push -f
72 |     ```
73 | 
74 | That's it! Thank you for your contribution!
75 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
  1 | ## Ignore Visual Studio temporary files, build results, and
  2 | ## files generated by popular Visual Studio add-ons.
  3 | ##
  4 | ## Get latest from https://github.com/github/gitignore/blob/main/VisualStudio.gitignore
  5 | 
  6 | # User-specific files
  7 | *.rsuser
  8 | *.suo
  9 | *.user
 10 | *.userosscache
 11 | *.sln.docstates
 12 | *.env
 13 | 
 14 | # User-specific files (MonoDevelop/Xamarin Studio)
 15 | *.userprefs
 16 | 
 17 | # Mono auto generated files
 18 | mono_crash.*
 19 | 
 20 | # Build results
 21 | [Dd]ebug/
 22 | [Dd]ebugPublic/
 23 | [Rr]elease/
 24 | [Rr]eleases/
 25 | x64/
 26 | x86/
 27 | [Ww][Ii][Nn]32/
 28 | [Aa][Rr][Mm]/
 29 | [Aa][Rr][Mm]64/
 30 | [Aa][Rr][Mm]64[Ee][Cc]/
 31 | bld/
 32 | [Oo]bj/
 33 | [Oo]ut/
 34 | [Ll]og/
 35 | [Ll]ogs/
 36 | 
 37 | # Build results on 'Bin' directories
 38 | **/[Bb]in/*
 39 | # Uncomment if you have tasks that rely on *.refresh files to move binaries
 40 | # (https://github.com/github/gitignore/pull/3736)
 41 | #!**/[Bb]in/*.refresh
 42 | 
 43 | # Visual Studio 2015/2017 cache/options directory
 44 | .vs/
 45 | # Uncomment if you have tasks that create the project's static files in wwwroot
 46 | #wwwroot/
 47 | 
 48 | # Visual Studio 2017 auto generated files
 49 | Generated\ Files/
 50 | 
 51 | # MSTest test Results
 52 | [Tt]est[Rr]esult*/
 53 | [Bb]uild[Ll]og.*
 54 | *.trx
 55 | 
 56 | # NUnit
 57 | *.VisualState.xml
 58 | TestResult.xml
 59 | nunit-*.xml
 60 | 
 61 | # Approval Tests result files
 62 | *.received.*
 63 | 
 64 | # Build Results of an ATL Project
 65 | [Dd]ebugPS/
 66 | [Rr]eleasePS/
 67 | dlldata.c
 68 | 
 69 | # Benchmark Results
 70 | BenchmarkDotNet.Artifacts/
 71 | 
 72 | # .NET Core
 73 | project.lock.json
 74 | project.fragment.lock.json
 75 | artifacts/
 76 | 
 77 | # ASP.NET Scaffolding
 78 | ScaffoldingReadMe.txt
 79 | 
 80 | # StyleCop
 81 | StyleCopReport.xml
 82 | 
 83 | # Files built by Visual Studio
 84 | *_i.c
 85 | *_p.c
 86 | *_h.h
 87 | *.ilk
 88 | *.meta
 89 | *.obj
 90 | *.idb
 91 | *.iobj
 92 | *.pch
 93 | *.pdb
 94 | *.ipdb
 95 | *.pgc
 96 | *.pgd
 97 | *.rsp
 98 | # but not Directory.Build.rsp, as it configures directory-level build defaults
 99 | !Directory.Build.rsp
100 | *.sbr
101 | *.tlb
102 | *.tli
103 | *.tlh
104 | *.tmp
105 | *.tmp_proj
106 | *_wpftmp.csproj
107 | *.log
108 | *.tlog
109 | *.vspscc
110 | *.vssscc
111 | .builds
112 | *.pidb
113 | *.svclog
114 | *.scc
115 | 
116 | # Chutzpah Test files
117 | _Chutzpah*
118 | 
119 | # Visual C++ cache files
120 | ipch/
121 | *.aps
122 | *.ncb
123 | *.opendb
124 | *.opensdf
125 | *.sdf
126 | *.cachefile
127 | *.VC.db
128 | *.VC.VC.opendb
129 | 
130 | # Visual Studio profiler
131 | *.psess
132 | *.vsp
133 | *.vspx
134 | *.sap
135 | 
136 | # Visual Studio Trace Files
137 | *.e2e
138 | 
139 | # TFS 2012 Local Workspace
140 | $tf/
141 | 
142 | # Guidance Automation Toolkit
143 | *.gpState
144 | 
145 | # ReSharper is a .NET coding add-in
146 | _ReSharper*/
147 | *.[Rr]e[Ss]harper
148 | *.DotSettings.user
149 | 
150 | # TeamCity is a build add-in
151 | _TeamCity*
152 | 
153 | # DotCover is a Code Coverage Tool
154 | *.dotCover
155 | 
156 | # AxoCover is a Code Coverage Tool
157 | .axoCover/*
158 | !.axoCover/settings.json
159 | 
160 | # Coverlet is a free, cross platform Code Coverage Tool
161 | coverage*.json
162 | coverage*.xml
163 | coverage*.info
164 | 
165 | # Visual Studio code coverage results
166 | *.coverage
167 | *.coveragexml
168 | 
169 | # NCrunch
170 | _NCrunch_*
171 | .NCrunch_*
172 | .*crunch*.local.xml
173 | nCrunchTemp_*
174 | 
175 | # MightyMoose
176 | *.mm.*
177 | AutoTest.Net/
178 | 
179 | # Web workbench (sass)
180 | .sass-cache/
181 | 
182 | # Installshield output folder
183 | [Ee]xpress/
184 | 
185 | # DocProject is a documentation generator add-in
186 | DocProject/buildhelp/
187 | DocProject/Help/*.HxT
188 | DocProject/Help/*.HxC
189 | DocProject/Help/*.hhc
190 | DocProject/Help/*.hhk
191 | DocProject/Help/*.hhp
192 | DocProject/Help/Html2
193 | DocProject/Help/html
194 | 
195 | # Click-Once directory
196 | publish/
197 | 
198 | # Publish Web Output
199 | *.[Pp]ublish.xml
200 | *.azurePubxml
201 | # Note: Comment the next line if you want to checkin your web deploy settings,
202 | # but database connection strings (with potential passwords) will be unencrypted
203 | *.pubxml
204 | *.publishproj
205 | 
206 | # Microsoft Azure Web App publish settings. Comment the next line if you want to
207 | # checkin your Azure Web App publish settings, but sensitive information contained
208 | # in these scripts will be unencrypted
209 | PublishScripts/
210 | 
211 | # NuGet Packages
212 | *.nupkg
213 | # NuGet Symbol Packages
214 | *.snupkg
215 | # The packages folder can be ignored because of Package Restore
216 | **/[Pp]ackages/*
217 | # except build/, which is used as an MSBuild target.
218 | !**/[Pp]ackages/build/
219 | # Uncomment if necessary however generally it will be regenerated when needed
220 | #!**/[Pp]ackages/repositories.config
221 | # NuGet v3's project.json files produces more ignorable files
222 | *.nuget.props
223 | *.nuget.targets
224 | 
225 | # Microsoft Azure Build Output
226 | csx/
227 | *.build.csdef
228 | 
229 | # Microsoft Azure Emulator
230 | ecf/
231 | rcf/
232 | 
233 | # Windows Store app package directories and files
234 | AppPackages/
235 | BundleArtifacts/
236 | Package.StoreAssociation.xml
237 | _pkginfo.txt
238 | *.appx
239 | *.appxbundle
240 | *.appxupload
241 | 
242 | # Visual Studio cache files
243 | # files ending in .cache can be ignored
244 | *.[Cc]ache
245 | # but keep track of directories ending in .cache
246 | !?*.[Cc]ache/
247 | 
248 | # Others
249 | ClientBin/
250 | ~$*
251 | *~
252 | *.dbmdl
253 | *.dbproj.schemaview
254 | *.jfm
255 | *.pfx
256 | *.publishsettings
257 | orleans.codegen.cs
258 | 
259 | # Including strong name files can present a security risk
260 | # (https://github.com/github/gitignore/pull/2483#issue-259490424)
261 | #*.snk
262 | 
263 | # Since there are multiple workflows, uncomment next line to ignore bower_components
264 | # (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
265 | #bower_components/
266 | 
267 | # RIA/Silverlight projects
268 | Generated_Code/
269 | 
270 | # Backup & report files from converting an old project file
271 | # to a newer Visual Studio version. Backup files are not needed,
272 | # because we have git ;-)
273 | _UpgradeReport_Files/
274 | Backup*/
275 | UpgradeLog*.XML
276 | UpgradeLog*.htm
277 | ServiceFabricBackup/
278 | *.rptproj.bak
279 | 
280 | # SQL Server files
281 | *.mdf
282 | *.ldf
283 | *.ndf
284 | 
285 | # Business Intelligence projects
286 | *.rdl.data
287 | *.bim.layout
288 | *.bim_*.settings
289 | *.rptproj.rsuser
290 | *- [Bb]ackup.rdl
291 | *- [Bb]ackup ([0-9]).rdl
292 | *- [Bb]ackup ([0-9][0-9]).rdl
293 | 
294 | # Microsoft Fakes
295 | FakesAssemblies/
296 | 
297 | # GhostDoc plugin setting file
298 | *.GhostDoc.xml
299 | 
300 | # Node.js Tools for Visual Studio
301 | .ntvs_analysis.dat
302 | node_modules/
303 | 
304 | # Visual Studio 6 build log
305 | *.plg
306 | 
307 | # Visual Studio 6 workspace options file
308 | *.opt
309 | 
310 | # Visual Studio 6 auto-generated workspace file (contains which files were open etc.)
311 | *.vbw
312 | 
313 | # Visual Studio 6 auto-generated project file (contains which files were open etc.)
314 | *.vbp
315 | 
316 | # Visual Studio 6 workspace and project file (working project files containing files to include in project)
317 | *.dsw
318 | *.dsp
319 | 
320 | # Visual Studio 6 technical files
321 | *.ncb
322 | *.aps
323 | 
324 | # Visual Studio LightSwitch build output
325 | **/*.HTMLClient/GeneratedArtifacts
326 | **/*.DesktopClient/GeneratedArtifacts
327 | **/*.DesktopClient/ModelManifest.xml
328 | **/*.Server/GeneratedArtifacts
329 | **/*.Server/ModelManifest.xml
330 | _Pvt_Extensions
331 | 
332 | # Paket dependency manager
333 | **/.paket/paket.exe
334 | paket-files/
335 | 
336 | # FAKE - F# Make
337 | **/.fake/
338 | 
339 | # CodeRush personal settings
340 | **/.cr/personal
341 | 
342 | # Python Tools for Visual Studio (PTVS)
343 | **/__pycache__/
344 | *.pyc
345 | 
346 | # Cake - Uncomment if you are using it
347 | #tools/**
348 | #!tools/packages.config
349 | 
350 | # Tabs Studio
351 | *.tss
352 | 
353 | # Telerik's JustMock configuration file
354 | *.jmconfig
355 | 
356 | # BizTalk build output
357 | *.btp.cs
358 | *.btm.cs
359 | *.odx.cs
360 | *.xsd.cs
361 | 
362 | # OpenCover UI analysis results
363 | OpenCover/
364 | 
365 | # Azure Stream Analytics local run output
366 | ASALocalRun/
367 | 
368 | # MSBuild Binary and Structured Log
369 | *.binlog
370 | MSBuild_Logs/
371 | 
372 | # AWS SAM Build and Temporary Artifacts folder
373 | .aws-sam
374 | 
375 | # NVidia Nsight GPU debugger configuration file
376 | *.nvuser
377 | 
378 | # MFractors (Xamarin productivity tool) working folder
379 | **/.mfractor/
380 | 
381 | # Local History for Visual Studio
382 | **/.localhistory/
383 | 
384 | # Visual Studio History (VSHistory) files
385 | .vshistory/
386 | 
387 | # BeatPulse healthcheck temp database
388 | healthchecksdb
389 | 
390 | # Backup folder for Package Reference Convert tool in Visual Studio 2017
391 | MigrationBackup/
392 | 
393 | # Ionide (cross platform F# VS Code tools) working folder
394 | **/.ionide/
395 | 
396 | # Fody - auto-generated XML schema
397 | FodyWeavers.xsd
398 | 
399 | # VS Code files for those working on multiple tools
400 | .vscode/*
401 | !.vscode/settings.json
402 | !.vscode/tasks.json
403 | !.vscode/launch.json
404 | !.vscode/extensions.json
405 | !.vscode/*.code-snippets
406 | 
407 | # Local History for Visual Studio Code
408 | .history/
409 | 
410 | # Built Visual Studio Code Extensions
411 | *.vsix
412 | 
413 | # Windows Installer files from build outputs
414 | *.cab
415 | *.msi
416 | *.msix
417 | *.msm
418 | *.msp
419 | 
420 | # Local files
421 | local/**


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Executive Summary: AI Governance at Speed
  2 | 
  3 | ## Bridging Governance Requirements and Developer Velocity with Foundry Citadel Platform
  4 | 
  5 | ---
  6 | 
  7 | ## The AI Governance Imperative
  8 | 
  9 | As AI systems become more powerful and integrated into everyday life, **governance is no longer a "nice-to-have"; it's a must**. Whether you're aligning to emerging regulations like the EU AI Act, meeting internal standards for risk and safety, or ensuring your AI systems are meeting your enterprise's business goals with scale and efficiency, the ability to govern AI responsibly at speed is a game-changer.
 10 | 
 11 | ---
 12 | 
 13 | ## The Governance-Velocity Paradox
 14 | 
 15 | Yet, **governance and developer velocity often feel fundamentally misaligned**. Organizations face critical bottlenecks:
 16 | 
 17 | - **Manual Risk Assessments**: Frequently time-consuming and lacking standardization
 18 | - **Scattered Evaluation Tools**: Fragmented across different teams and systems
 19 | - **Unclear Governance Requirements**: Ambiguous policies that are difficult to operationalize
 20 | - **Implementation Gaps**: Policies rarely map cleanly to real-world technical implementation
 21 | 
 22 | **The result?** Bottlenecks and delays that frustrate both governance teams and developers, slowing AI adoption and increasing organizational risk.
 23 | 
 24 | ---
 25 | 
 26 | ## The Collaboration Challenge
 27 | 
 28 | Effective AI governance demands a new balance—**one that enforces oversight without impeding innovation**. It also requires multiple stakeholders collaborating effectively with each other:
 29 | 
 30 | ### 👔 **Compliance Officers & Chief AI Officers**
 31 | Must determine **what needs to be assessed** to comply with company policies and regulations
 32 | 
 33 | ### 👨‍💻 **AI Developers & Engineering Teams**
 34 | Need to **operationalize these requirements** by generating the right qualitative and quantitative evidence
 35 | 
 36 | **Unfortunately**, the handshake between these personas is often not smooth and can create friction in the governance process. Traditional methods tend to create friction, slowing down deployment or leading to incomplete compliance. **It's a trade-off most organizations can no longer afford.**
 37 | 
 38 | ---
 39 | 
 40 | ## The Foundry Citadel Platform Solution
 41 | 
 42 | **That's where Foundry Citadel Platform steps in.**
 43 | 
 44 | Foundry Citadel Platform is a comprehensive solution accelerator that bridges the gap between governance requirements and technical implementation, enabling organizations to:
 45 | 
 46 | ### 🛡️ **Govern AI Responsibly**
 47 | - **Unified AI Gateway**: Single control point for all AI model access with enterprise-wide policy enforcement
 48 | - **Automated Compliance**: Built-in safety checks, content filtering, and policy validation without manual intervention
 49 | - **Central AI Registry**: Catalog and govern all AI assets—models, agents, and tools—across the enterprise
 50 | 
 51 | ### 📊 **Maintain Complete Visibility**
 52 | - **Platform-Level Observability**: Centralized monitoring across all AI workloads without code changes
 53 | - **Agent-Level Tracing**: Detailed execution paths for debugging and quality assurance
 54 | - **Automated Evaluations**: Continuous quality, safety, and compliance assessments applied consistently
 55 | 
 56 | ### 🚀 **Accelerate Innovation**
 57 | - **Pre-built Templates**: One-click deployment of secure, governed AI environments
 58 | - **Flexible Development Options**: From low-code (Azure Logic Apps Agent Loop), managed agents runtime (AI Foundry Agents) to pro-code (Microsoft Agent Framework, LangChain,...)
 59 | - **DevOps Integration**: CI/CD pipelines with automated testing and evaluation
 60 | 
 61 | ---
 62 | 
 63 | ## Key Business Outcomes
 64 | 
 65 | Organizations adopting Foundry Citadel Platform achieve:
 66 | 
 67 | | Outcome | Impact |
 68 | |---------|--------|
 69 | | **🎯 Faster Time-to-Value** | Deploy AI solutions in days, not months, with pre-configured infrastructure |
 70 | | **🔒 Reduced Risk** | Automated governance ensures compliance from day one |
 71 | | **💰 Cost Control** | Granular usage tracking and quota enforcement per team/project |
 72 | | **📈 Scalable Adoption** | Repeatable patterns that grow with your organization |
 73 | | **🤝 Cross-Functional Alignment** | Clear contracts between governance and development teams |
 74 | | **🌐 Universal Gateway & Registry** | Unified access, governance and discovery of central AI assets |
 75 | 
 76 | ---
 77 | 
 78 | ## Enterprise Statistics Driving Citadel Adoption
 79 | 
 80 | Real-world enterprise challenges that Citadel addresses:
 81 | 
 82 | - **62%** of practitioners cite **security concerns** as the top blocker to wider AI adoption
 83 | - **71%** of enterprises struggle to **track AI usage, enforce quotas, and report costs** per team
 84 | - **47%** of organizations require **explicit guardrails** before deploying autonomous AI agents safely
 85 | - **70%** of customers need an **AI registry** for LLMs, agents, and tools to adopt AI at scale
 86 | 
 87 | ---
 88 | 
 89 | ## The Three Pillars of Foundry Citadel Platform
 90 | 
 91 | ### 1️⃣ **Governance & Security** – *Trustworthy AI Operations at Scale*
 92 | Without centralized AI governance, organizations face unpredictable costs, reliability issues, security risks, and compliance nightmares. Citadel builds guardrails into every AI call through:
 93 | - Unified AI Gateway for centralized control
 94 | - Granular access control and key management
 95 | - Multi-cloud and hybrid support
 96 | - AI content safety and prompt shields
 97 | - Central AI registry for agents and tools
 98 | 
 99 | ### 2️⃣ **Observability & Compliance** – *End-to-End Monitoring, Evaluation & Trust*
100 | Full visibility creates trust and confidence. Citadel provides holistic observability through:
101 | - **Platform-Level**: Centralized APM, usage tracking, automated evaluations, and enterprise alerting
102 | - **Agent-Level**: Detailed execution traces, performance monitoring, and debugging tools
103 | - **Rich Dashboards**: Integrated views for both operational and development teams
104 | 
105 | ### 3️⃣ **AI Development Velocity** – *Accelerating Innovation with Templates & Tools*
106 | Build fast, build right. Citadel empowers teams to innovate quickly within established guardrails:
107 | - Pre-built deployment templates
108 | - Flexible agent development options (low-code to pro-code)
109 | - Citadel AI Registry for asset discovery and reuse
110 | - DevOps integration for continuous delivery
111 | 
112 | ---
113 | 
114 | ## Two-Tier Architecture for Enterprise Scale
115 | 
116 | ### **Citadel Governance Hub (CGH)** – Central Control Plane
117 | The enterprise-wide governance layer providing:
118 | - Unified AI gateway for all centralized AI models and MCP tools access
119 | - Universal AI registry for discovery and cataloging
120 | - Platform-level evaluations and compliance reporting
121 | - Usage analytics and cost allocation
122 | - Centralized security and safety enforcement
123 | 
124 | ### **Citadel Agent Spoke (CAS)** – Domain-Specific Deployments
125 | Secure, isolated environments for AI agent workloads featuring:
126 | - Azure AI Foundry with agent capabilities
127 | - Comprehensive AI services (Search, Cosmos DB, Storage)
128 | - Zero Trust architecture with private endpoints
129 | - Auto-scaling container infrastructure
130 | - Hub-spoke integration with enterprise networks
131 | 
132 | ---
133 | 
134 | ## From Challenge to Solution: The Citadel Advantage
135 | 
136 | | Traditional Approach | Foundry Citadel Platform |
137 | |---------------------|-------------------------|
138 | | ❌ Manual risk assessments | ✅ Automated compliance checks |
139 | | ❌ Scattered evaluation tools | ✅ Unified observability platform |
140 | | ❌ Unclear requirements | ✅ Codified governance contracts |
141 | | ❌ Implementation gaps | ✅ Pre-built, proven patterns |
142 | | ❌ Friction between teams | ✅ Streamlined collaboration |
143 | | ❌ Slow deployment cycles | ✅ Rapid, repeatable deployments |
144 | 
145 | ---
146 | 
147 | ## Strategic Partnerships & Integrations
148 | 
149 | Foundry Citadel Platform bridges the gap between governance requirements and technical implementation through strategic integrations:
150 | 
151 | - **Azure AI Foundry**: Enterprise AI platform with advance model catalog, managed agent services, and AI evaluations/observability
152 | - **Azure API Management**: Unified AI gateway for governance and policy enforcement
153 | - **Azure Monitor & Application Insights**: Comprehensive observability
154 | - **Azure Content Safety**: Automated GenAI safety checks and content filtering
155 | - **Microsoft Entra ID**: Identity and access management
156 | - **Microsoft Defender for AI**: Threat detection and security monitoring
157 | - **Microsoft Purview**: Data governance and sensitivity labeling
158 | 
159 | ---
160 | 
161 | ## The Bottom Line
162 | 
163 | **Effective AI governance no longer means choosing between speed and safety.**
164 | 
165 | Foundry Citadel Platform enables organizations to:
166 | - ✅ **Deploy AI with confidence** – knowing governance is built-in, not bolted-on
167 | - ✅ **Scale AI responsibly** – with consistent policies across all projects
168 | - ✅ **Accelerate innovation** – within secure, compliant guardrails
169 | - ✅ **Bridge organizational silos** – aligning governance, development, and operations
170 | 
171 | ---
172 | 
173 | ## Call to Action
174 | 
175 | The challenge of governing AI at speed is precisely why Foundry Citadel Platform exists. By providing a comprehensive solution that addresses governance, observability, and development velocity in an integrated way, Citadel transforms the traditional trade-off between control and innovation into a **synergistic relationship**.
176 | 
177 | **Organizations can now:**
178 | 1. Establish enterprise-wide AI governance from day one
179 | 2. Empower developers with self-service, governed AI capabilities
180 | 3. Maintain complete visibility and control as AI adoption scales
181 | 4. Meet regulatory requirements with automated compliance evidence
182 | 5. Accelerate time-to-value while reducing organizational risk
183 | 
184 | ---
185 | 
186 | ## Next Steps
187 | 
188 | To learn more about how Foundry Citadel Platform can help your organization govern AI responsibly at speed:
189 | 
190 | - **📘 Review the Full Documentation**: See [README.md](./CITADEL-TECHNICAL-GUIDE.md) for comprehensive technical details
191 | - **🏗️ Explore the AI Hub Gateway (Citadel Governance Hub)**: Visit the [AI Hub Gateway repository](https://aka.ms/ai-hub-gateway)
192 | - **🤖 Deploy Citadel Agent Spoke (Citadel Agent Spoke)**: Check out the [AI Landing Zones repository](https://github.com/Azure/AI-Landing-Zones)
193 | - **💬 Engage with Our Team**: Reach out to discuss your specific governance and AI adoption challenges
194 | 
195 | ---
196 | 
197 | *"Build the future, safely"* – Foundry Citadel Platform provides the **speed** that business demands with the **safeguards** that IT requires, all in one comprehensive, evolving platform.
198 | 


--------------------------------------------------------------------------------
/Citadel-WAF-Alignment.md:
--------------------------------------------------------------------------------
  1 | # Foundry Citadel Platform - Azure Well-Architected Framework Alignment
  2 | 
  3 | > **How Citadel Implements Microsoft Well-Architected Framework Principles for AI Workloads**
  4 | 
  5 | **Document Version:** 1.0  
  6 | **Last Updated:** November 10, 2025  
  7 | **Reference:** [Azure Well-Architected Framework - AI Design Principles](https://learn.microsoft.com/en-us/azure/well-architected/ai/design-principles)
  8 | 
  9 | ---
 10 | 
 11 | ## Overview
 12 | 
 13 | The **Foundry Citadel Platform** is architected to align with the Microsoft Well-Architected Framework (WAF) for AI workloads, delivering enterprise-grade AI solutions through three core pillars:
 14 | 
 15 | - **Governance & Security** - Enterprise-grade controls, responsible AI, and data protection
 16 | - **Observability & Compliance** - Comprehensive monitoring, auditing, and regulatory compliance
 17 | - **AI Development Velocity** - Accelerated development with best practices and automation
 18 | 
 19 | This document demonstrates how Citadel's concrete technical implementations address the five WAF pillars: **Reliability**, **Security**, **Cost Optimization**, **Operational Excellence**, and **Performance Efficiency**.
 20 | 
 21 | ---
 22 | 
 23 | ## Architecture Alignment Summary
 24 | 
 25 | | WAF Pillar | Alignment Status | Key Citadel Capabilities |
 26 | |------------|------------------|--------------------------|
 27 | | **Reliability** | 🟢 Strong | Multi-region support, high availability, automated failover, resilient architecture |
 28 | | **Security** | 🟢 Strong | Zero Trust, content safety, RBAC, encryption, network isolation |
 29 | | **Cost Optimization** | 🟢 Strong | Usage tracking, quota management, auto-scaling, cost attribution |
 30 | | **Operational Excellence** | 🟢 Strong | Automated monitoring, CI/CD integration, DevOps/AIOps support |
 31 | | **Performance Efficiency** | 🟢 Strong | Load balancing, auto-scaling, performance monitoring, quality metrics |
 32 | 
 33 | **Legend:** 🟢 Strong | 🟡 Partial | 🔴 Limited
 34 | 
 35 | ---
 36 | 
 37 | ## 1. Reliability - Building Resilient AI Workloads
 38 | 
 39 | ### WAF Principle: Design Reliable AI Systems
 40 | 
 41 | Citadel ensures AI workloads remain available and can recover from failures while maintaining model performance over time.
 42 | 
 43 | ### Citadel Implementation
 44 | 
 45 | #### Multi-Region High Availability
 46 | - **Multi-region LLM deployments** with automated failover for continuous service availability
 47 | - **Reliable state** for conversation history and agent state on Cosmos DB and Azure Monitor
 48 | - **Availability zones support** for critical components in supported regions
 49 | 
 50 | #### Fault Tolerance & Resilience
 51 | - **AI Gateway (Azure API Management)** provides circuit breakers, retry logic, and bulkhead patterns
 52 | - **Distributed architecture** with separate Citadel Governance Hub and Citadel Agents Spoke landing zones following Hub/Spoke model
 53 | - **Network isolation** with NSGs and private endpoints preventing cascading failures
 54 | - **Service isolation** through containerization and separate resource boundaries where every agentic deployment in separate Citadel Agent Spoke with central RBAC through Citadel Governance Hub
 55 | 
 56 | #### Operational Reliability
 57 | - **Automated workflows via Logic Apps** reducing manual intervention and human error
 58 | - **Azure AI Foundry managed runtime** providing reliable, maintained agent execution environment.
 59 | - **Version-controlled infrastructure** using Bicep and source controlled configurations (Citadel Contracts) for consistent, repeatable deployments of both central components and day-2 configurations
 60 | 
 61 | ### Key Features
 62 | 
 63 | | Feature | Benefit | Implementation |
 64 | |---------|---------|----------------|
 65 | | Multi-Region LLM | Ensures API availability even during regional outages | Automated failover between LLM backends/regions |
 66 | | High-Availability Gateway | 99.95% SLA for API requests | Azure API Management Premium tier with multi-availability-zones and/or multi-region |
 67 | | Distributed Data Stores | Data remains accessible during failures | Leveraging Cosmos DB and Azure Monitor log analytics |
 68 | | Auto-Recovery | Minimizes downtime from transient failures | Circuit breakers and exponential backoff retry policies |
 69 | 
 70 | ---
 71 | 
 72 | ## 2. Security - Protecting AI Workloads and Data
 73 | 
 74 | ### WAF Principle: Secure AI Systems and Earn User Trust
 75 | 
 76 | Citadel implements defense-in-depth security with Zero Trust architecture, content safety, and comprehensive data protection.
 77 | 
 78 | ### Citadel Implementation
 79 | 
 80 | #### Earn User Trust with Responsible AI
 81 | - **Azure AI Content Safety integration** for all incoming requests and outgoing responses
 82 | - **Prompt Shield protection** against jailbreak attempts and prompt injection attacks
 83 | - **Protected content detection** screening for sensitive data at both AI Gateway level and at LLM model level (through Microsoft Purview)
 84 | - **Bidirectional content moderation** ensuring both user inputs and AI outputs are safe
 85 | - **Groundedness detection** validating AI responses against source documents to prevent hallucinations through AI Foundry Evals
 86 | 
 87 | #### Data Protection at All Layers
 88 | - **Encryption at rest** with platfrom managed keys
 89 | - **Encryption in transit** enforced via HTTPS/TLS 1.2+ for all communication
 90 | - **Private endpoints** for all AI services eliminating public internet exposure
 91 | - **Network security groups (NSGs)** controlling traffic flow between subnets
 92 | - **Virtual network integration** for all compute and data services
 93 | 
 94 | #### Robust Access Management
 95 | - **Gateway-keys pattern** - No direct API key exposure to users or applications
 96 | - **Managed identities** for service-to-service authentication eliminating stored credentials
 97 | - **Azure RBAC integration** providing granular permissions across all components
 98 | - **Role-based authorization at AI Gateway** enforcing least-privilege access per user/team
 99 | - **Azure Key Vault** for centralized secrets management with audit logging
100 | 
101 | #### Network Segmentation & Zero Trust
102 | - **Zero Trust architecture** with assume breach mentality
103 | - **Dedicated subnets with NSGs** for each service tier (web, app, data, AI)
104 | - **Private networking** for container images, training data, and source code
105 | - **Separate landing zones** (CGH for governance, CAS for agents) with controlled connectivity
106 | - **Hub-spoke network topology** with centralized security controls
107 | 
108 | #### Security Testing & Compliance
109 | - **CI/CD integration** for automated security scanning in deployment pipelines
110 | - **Security policy enforcement** at gateway level before requests reach AI services
111 | - **Container vulnerability scanning** in Azure Container Registry
112 | - **Microsoft Purview integration** for data classification and governance
113 | - **Audit logging** of all access and operations for compliance reporting
114 | 
115 | #### Minimize Attack Surface
116 | - **Authentication required** for all inferencing endpoints - no anonymous access
117 | - **Constrained API design** through AI Gateway limiting exposed functionality
118 | - **API versioning and deprecation** allowing secure evolution of interfaces
119 | - **Rate/token limiting and throttling** preventing abuse and resource exhaustion
120 | - **Input validation** at multiple layers preventing injection attacks
121 | 
122 | ### Key Features
123 | 
124 | | Feature | Benefit | Implementation |
125 | |---------|---------|----------------|
126 | | Content Safety | Prevents harmful content from entering or leaving the system | Azure AI Content Safety with custom policies |
127 | | Zero Trust Networking | Eliminates implicit trust, reduces breach impact | Private endpoints, NSGs, no public internet access |
128 | | Managed Identities | No credentials in code or config files | Azure Managed Identity for all service-to-service auth |
129 | | Gateway-Keys Pattern | Centralized access control and monitoring | API keys managed at gateway, not exposed to clients |
130 | | Data Encryption | Protects sensitive data at rest and in transit | Platform managed or with CMK encryption with Key Vault integration |
131 | 
132 | ---
133 | 
134 | ## 3. Cost Optimization - Maximizing ROI
135 | 
136 | ### WAF Principle: Optimize Costs Without Sacrificing Quality
137 | 
138 | Citadel provides comprehensive cost visibility, tracking, and optimization to maximize return on AI investments.
139 | 
140 | ### Citadel Implementation
141 | 
142 | #### Determine Cost Drivers
143 | - **Granular usage analytics** tracking consumption by team, use case, and individual agent
144 | - **Token consumption trends** with historical analysis in Cosmos DB
145 | - **Cost attribution dashboard** in Citadel Governance Hub showing spend breakdown
146 | - **Resource tagging strategy** enabling chargeback and showback models
147 | - **Integrated Azure Cost Management** with budget alerts and forecasting
148 | 
149 | #### Pay for What You Intend to Use
150 | - **Auto-scaling Container Apps & Foundry Agents** automatically adjusting compute based on demand
151 | - **Multiple AI service tiers** supporting different performance and cost profiles
152 | - **Serverless options** via Logic Apps and Azure Functions for event-driven workloads
153 | - **Consumption-based pricing** for applicable compoenets (like Azure OpenAI pay-as-you-go)
154 | - **Flexible deployment options** allowing teams to choose cost-performance balance
155 | 
156 | #### Use What You Pay For (Minimize Waste)
157 | - **Token quotas and rate limiting** preventing accidental overspending
158 | - **Auto-scaling with scale-to-zero** deallocating resources during idle periods
159 | - **Centralized monitoring** of utilization metrics identifying underused resources
160 | - **Cost accountability** assigned to operations teams with regular reviews
161 | - **Automated resource cleanup** removing unused deployments and test environments
162 | 
163 | #### Optimize Operational Costs
164 | - **Automated workflows** via Logic Apps reducing manual operational overhead
165 | - **PaaS-first approach** minimizing infrastructure management costs
166 | - **Shared infrastructure** across multiple agents and teams reducing duplication
167 | - **DevOps automation** reducing time-to-market and manual deployment costs
168 | 
169 | ### Key Features
170 | 
171 | | Feature | Benefit | Implementation |
172 | |---------|---------|----------------|
173 | | Usage Analytics | Understand where AI costs are incurred | Real-time dashboards with drill-down by dimension |
174 | | Token Quotas | Prevent runaway costs from misbehaving agents | Configurable limits per user/team/agent |
175 | | Auto-Scaling | Pay only for active workloads | Container Apps/Foundry Agents with scale-to-zero capability |
176 | | Cost Attribution | Chargeback/showback to business units | Tagging and reporting by cost center |
177 | | Monitoring & Alerts | Proactive cost anomaly detection | Azure Monitor alerts on budget thresholds |
178 | 
179 | ---
180 | 
181 | ## 4. Operational Excellence - Automation and Continuous Improvement
182 | 
183 | ### WAF Principle: Streamline Operations and Enable Innovation
184 | 
185 | Citadel enables DevOps, and GenAIOps practices with comprehensive automation, monitoring, and safe deployment patterns.
186 | 
187 | ### Citadel Implementation Recommendations
188 | 
189 | #### Minimize Operational Burden
190 | - **PaaS-first architecture** using managed services (AI Foundry, API Management, Container Apps)
191 | - **Managed identities** eliminating credential rotation and secret management overhead
192 | - **Automated workflow orchestration** via Logic Apps for common operational tasks
193 | - **Infrastructure-as-Code (Bicep)** enabling one-click deployments and consistent environments
194 | - **Template-based agent deployment** empowering developers while maintaining governance
195 | 
196 | #### Automated Monitoring with Actionable Alerts
197 | - **Azure Monitor Application Insights** integrated across all components
198 | - **Comprehensive dashboards** at platform and individual agent levels
199 | - **Actionable alerts** with context-specific remediation guidance
200 | - **Enterprise notification integration** (Teams, email, ticketing systems)
201 | - **Automated quality measurements** with trend analysis and anomaly detection
202 | - **End-to-end tracing** from user request through AI processing to response
203 | 
204 | #### Detect and Mitigate Model Performance Issues
205 | - **Automated evaluations** at platform level measuring groundedness, relevance, coherence
206 | - **CI/CD integration** for regression testing before deployment
207 | - **Quality metrics tracking** over time identifying model drift
208 | - **Conversation replay capability** for debugging and quality analysis
209 | - **Feedback collection** from users feeding continuous improvement
210 | 
211 | #### Safe Deployments
212 | - **CI/CD pipelines** with automated testing gates
213 | - **Multiple deployment strategies** support (blue/green, canary, rolling updates)
214 | - **Pre-production testing environments** mirroring production configuration
215 | - **Automated rollback capabilities** when health checks fail
216 | - **Change tracking and audit logs** for compliance and troubleshooting
217 | 
218 | #### Evaluate and Improve User Experience
219 | - **User feedback mechanisms** integrated into agent interfaces
220 | - **Conversation logging with consent** enabling analysis and improvement
221 | - **Engagement metrics** tracking user satisfaction and agent effectiveness
222 | - **Session analytics** understanding user behavior patterns
223 | - **Continuous improvement loop** from feedback to model refinement
224 | 
225 | ### Key Features
226 | 
227 | | Feature | Benefit | Implementation |
228 | |---------|---------|----------------|
229 | | Comprehensive Monitoring | Full visibility into AI workload health | Application Insights with custom metrics and logs |
230 | | Automated Evaluations | Ensure quality before and after deployment | AI Foundry evaluation pipelines in CI/CD |
231 | | DevOps Integration | Accelerate development while maintaining quality | GitHub/Azure DevOps with automated gates |
232 | | Feedback Loops | Continuous improvement from production insights | User feedback, conversation analytics, quality metrics |
233 | | Infrastructure-as-Code | Consistent, repeatable deployments | Bicep templates with version control |
234 | 
235 | ---
236 | 
237 | ## 5. Performance Efficiency - Optimizing AI Workload Performance
238 | 
239 | ### WAF Principle: Meet Performance Requirements Efficiently
240 | 
241 | Citadel ensures AI workloads meet performance targets through proper resource allocation, monitoring, and continuous optimization.
242 | 
243 | ### Citadel Implementation
244 | 
245 | #### Establish Performance Benchmarks
246 | - **Agent-level performance monitoring** tracking latency, throughput, and token consumption
247 | - **Quality metrics tracking** measuring groundedness, relevance, coherence, fluency
248 | - **Continuous re-evaluation** ensuring performance remains within acceptable ranges
249 | - **Baseline establishment** for each agent type and use case
250 | - **Performance trend analysis** identifying degradation over time
251 | 
252 | #### Evaluate and Right-Size Resources
253 | - **Multiple SKU options** allowing teams to balance performance and cost
254 | - **Load balancing via Application Gateway** distributing traffic for optimal resource utilization
255 | - **Auto-scaling Container Apps** dynamically adjusting resources based on actual demand
256 | - **Container resource quotas** preventing resource contention and ensuring fair allocation
257 | 
258 | #### Collect and Analyze Performance Metrics
259 | - **Telemetry from all layers** - data pipeline, orchestration, model inference, and UI
260 | - **Query latency and throughput tracking** with percentile analysis (p50, p95, p99)
261 | - **End-to-end tracing** of agent execution identifying bottlenecks
262 | - **Token consumption monitoring** optimizing prompt engineering for efficiency
263 | - **Near real-time dashboards** enabling quick performance issue identification
264 | 
265 | #### Continuous Performance Improvement
266 | - **Automated metric collection** feeding analysis and optimization
267 | - **CI/CD integration** for performance regression testing
268 | - **Production feedback loops** informing optimization decisions
269 | - **Performance optimization recommendations** based on observed patterns
270 | - **Caching strategies** reducing redundant processing and API calls
271 | 
272 | #### Load Balancing and Distribution
273 | - **Multi-region load distribution** balancing traffic across Azure LLM deployments
274 | 
275 | ### Key Features
276 | 
277 | | Feature | Benefit | Implementation |
278 | |---------|---------|----------------|
279 | | Performance Monitoring | Near real-time visibility into latency and throughput | Application Insights with custom telemetry |
280 | | Auto-Scaling | Automatically match resources to demand | Container Apps with CPU/memory-based triggers |
281 | | Load Balancing | Distribute traffic for optimal performance | Application Gateway with backend health monitoring |
282 | | Quality Metrics | Ensure AI outputs meet standards efficiently | Automated evaluation of groundedness, relevance |
283 | | Resource Optimization | Right-size compute for cost-performance balance | Monitoring with recommendations engine |
284 | 
285 | ---
286 | 
287 | ## Cross-Cutting Capabilities
288 | 
289 | ### Governance & Control
290 | 
291 | Citadel Governance Hub (CGH) provides centralized governance across all WAF pillars:
292 | 
293 | - **Policy Enforcement** - Centralized security, cost, and quality policies applied consistently
294 | - **Usage Analytics** - Real-time visibility into consumption patterns and costs
295 | - **Compliance Reporting** - Audit trails, access logs, and regulatory compliance dashboards
296 | - **Resource Management** - Centralized control over AI model deployments and configurations
297 | - **Team Isolation** - Multi-tenancy with resource boundaries and access controls
298 | 
299 | ### Observability
300 | 
301 | Comprehensive observability enables all WAF pillars:
302 | 
303 | - **Application Insights Integration** - Full-stack monitoring from UI to AI backend
304 | - **Custom Dashboards** - Role-specific views for developers, operations, security, executives
305 | - **Distributed Tracing** - End-to-end request tracking across service boundaries
306 | - **Log Aggregation** - Centralized logging with advanced query and analysis capabilities
307 | - **Alerting & Notification** - Context-aware alerts with automated remediation
308 | 
309 | ### DevOps & Automation
310 | 
311 | Platform automation accelerates delivery while maintaining quality:
312 | 
313 | - **CI/CD Pipelines** - Automated build, test, deploy for agents and infrastructure
314 | - **Infrastructure-as-Code** - Bicep templates for consistent environment provisioning
315 | - **Automated Testing** - Unit, integration, and quality tests in deployment pipeline
316 | - **Version Control** - Git-based workflow for code, configuration, and policies
317 | - **Self-Service Deployment** - Empowering teams while maintaining governance guardrails
318 | 
319 | ---
320 | 
321 | ## Well-Architected Framework Trade-offs
322 | 
323 | Citadel provides balanced approaches to common WAF trade-offs:
324 | 
325 | ### Security vs. Performance
326 | - **Configurable security levels** - Adjust Content Safety strictness based on use case
327 | - **Private endpoints optional** - Choose network isolation vs. simplified connectivity
328 | 
329 | ### Cost vs. Reliability
330 | - **Multi-region optional** - Deploy single-region for cost, multi-region for high availability
331 | - **Tiered deployment patterns** - Basic, standard, premium configurations with clear trade-offs
332 | - **Auto-scaling boundaries** - Set maximum scale to control costs while ensuring performance
333 | 
334 | ### Performance vs. Cost
335 | - **PTU vs. PAYG** - Choose reserved vs. consumption pricing for Azure LLM
336 | - **Caching strategies** - Reduce costs and improve performance for repeated queries
337 | 
338 | ### Developer Velocity vs. Governance
339 | - **Template-based with guardrails** - Teams deploy reusable templates within policy boundaries
340 | - **Automated compliance** - Security scanning and policy enforcement in centrally
341 | - **Flexible approval gates** - Required for production, optional for development
342 | 
343 | ---
344 | 
345 | ## Getting Started with WAF Alignment
346 | 
347 | ### 1. Assess Your Requirements
348 | 
349 | Determine your priorities across WAF pillars:
350 | 
351 | - **Mission-critical workloads** - Emphasize reliability and security
352 | - **Cost-sensitive projects** - Focus on cost optimization and right-sizing
353 | - **Innovation initiatives** - Prioritize developer velocity and experimentation
354 | - **Regulated industries** - Ensure security and compliance
355 | 
356 | ### 2. Configure Citadel for Your Needs
357 | 
358 | Citadel's modular architecture allows customization:
359 | 
360 | - **Network topology** - Hub-spoke vs. single VNet based on isolation needs
361 | - **Deployment scope** - Single-region vs. multi-region based on availability requirements
362 | - **Compute tier** - Container Apps vs. AI Foundry based on control and cost needs
363 | - **Monitoring depth** - Adjust telemetry collection based on operational requirements
364 | 
365 | ### 3. Implement Best Practices
366 | 
367 | Follow Citadel's reference implementations:
368 | 
369 | - **Use Infrastructure-as-Code** - Deploy via Bicep templates for consistency
370 | - **Enable all security features** - Private endpoints, managed identities, Content Safety
371 | - **Configure monitoring** - Set up dashboards and alerts appropriate for your team
372 | - **Establish governance** - Define policies, quotas, and approval workflows
373 | 
374 | ### 4. Continuous Improvement
375 | 
376 | Leverage Citadel's observability for ongoing optimization:
377 | 
378 | - **Review cost reports** - Monthly analysis of spending patterns and optimization opportunities
379 | - **Monitor performance** - Track latency and quality metrics, adjust resources as needed
380 | - **Security audits** - Regular review of access logs, security alerts, compliance status
381 | - **Model quality** - Continuous evaluation and refinement based on production feedback
382 | 
383 | ---
384 | 
385 | ## Additional Resources
386 | 
387 | ### Documentation
388 | - [Citadel Technical Guide](./CITADEL-TECHNICAL-GUIDE.md) - Complete platform architecture and components
389 | - [Contributing Guide](./CONTRIBUTING.md) - How to extend and customize Citadel
390 | 
391 | ### External References
392 | - [Azure Well-Architected Framework](https://learn.microsoft.com/en-us/azure/well-architected/)
393 | - [Azure Well-Architected Framework for AI](https://learn.microsoft.com/en-us/azure/well-architected/ai/)
394 | - [Azure AI Foundry Documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/)
395 | - [Responsible AI Principles](https://www.microsoft.com/en-us/ai/responsible-ai)
396 | 
397 | ---
398 | 
399 | ## Conclusion
400 | 
401 | The **Foundry Citadel Platform** provides comprehensive alignment with the Microsoft Well-Architected Framework for AI workloads through:
402 | 
403 | ✅ **Strong Security** - Zero Trust, content safety, encryption, and access management  
404 | ✅ **High Reliability** - Multi-region support, fault tolerance, and automated recovery  
405 | ✅ **Cost Efficiency** - Granular tracking, quotas, auto-scaling, and optimization  
406 | ✅ **Operational Excellence** - Comprehensive monitoring, automation, and safe deployments  
407 | ✅ **Performance Optimization** - Load balancing, auto-scaling, and continuous monitoring  
408 | 
409 | By building on Azure's platform services and implementing proven patterns, Citadel enables organizations to deploy enterprise-grade AI solutions that balance governance, security, cost, performance, and innovation velocity—all while maintaining alignment with Microsoft's Well-Architected Framework principles.
410 | 
411 | ---


--------------------------------------------------------------------------------
/CITADEL-TECHNICAL-GUIDE.md:
--------------------------------------------------------------------------------
  1 | # Foundry Citadel Platform
  2 | 
  3 | >*Scalable **AI Landing Zone** with Governance, Observability & Rapid Development*
  4 | 
  5 | Foundry **Citadel** Platform is a solution accelerator designed as a **supplemental AI landing zone** that integrates seamlessly with your Azure environment. It provides a **secure, scalable foundation** for running AI applications and agents in production – with **unified governance**, **end-to-end observability**, and tools to **accelerate development**. Citadel delivers a **pre-configured reference architecture** (aligned to Azure’s Cloud Adoption and Well-Architected Frameworks) that can be deployed with one click and includes ready-made code, templates, and documentation following Microsoft’s best practices. This comprehensive approach helps organisations adopt AI **responsibly and efficiently**, ensuring that advanced AI agents can be developed **quickly** while remaining **well-managed** and **compliant** with enterprise requirements.
  6 | 
  7 | > ### Citadel Adoption Signals  
  8 | > _Enterprise teams highlight these blockers and enablers for scaling AI responsibly._
  9 | 
 10 | | 🛡️ **Security** | 📊 **Consumption** | 🧭 **Guardrails** | 🗂️ **Registry** |
 11 | | --- | --- | --- | --- |
 12 | | **62 %** of practitioners cite security concerns as the top blocker to wider AI or agent adoption. | **71 %** of enterprises struggle to track AI usage, enforce quotas, and report costs per team. | **47 %** of organisations require explicit guardrails before deploying autonomous AI agents safely. | **70 %** of customers need an AI registry for both agents and tools to adopt AI at scale. |
 13 | 
 14 | > 🧩 _Citadel turns these pain points into platform strengths—governed access, transparent consumption, defensible guardrails, and a shared catalog of reusable AI capabilities._
 15 | 
 16 | These challenges highlight why **Citadel’s capabilities** are crucial. **Foundry Citadel Platform** focuses on **three key pillars** – **Governance & Security**, **Observability & Compliance**, and **AI Development Velocity** – to address these concerns end-to-end. Below, we outline each pillar and the core capabilities provided, with architecture components and features that ensure enterprise-grade AI deployments:
 17 | 
 18 | ***
 19 | 
 20 | ## **1. Governance & Security Pillar** – *Trustworthy AI Operations at Scale*
 21 | 
 22 | > ### Why Governance Matters
 23 | > Without centralized AI governance, organisations face **unpredictable costs, reliability issues, security risks, developer friction,** and compliance nightmares. Citadel fixes this by building guardrails into every AI call.
 24 | 
 25 | **Foundry Citadel Platform** implements strong governance and security controls so that enterprises can adopt generative AI **safely and in compliance**. Key capabilities of this pillar include a **unified AI gateway** for all model access, granular policy enforcement, and robust safety mechanisms:
 26 | 
 27 | *   **🔐 Unified AI Gateway:** At the core of Citadel’s security is the **“AI Gateway”** – a central entry point (built on Azure API Management) through which **all AI model requests** are routed. This gateway enforces organisation-wide policies consistently. For example, it implements **universal LLM policies like rate limiting and token quotas** to prevent misuse or cost overrun. **No application calls the model directly**; instead, apps call the gateway, which authenticates and forwards requests to the appropriate model (Azure OpenAI, open-source, or even third-party services like Amazon Bedrock) while applying the required controls. This design **centralises oversight** of all AI consumption.
 28 | 
 29 | *   **🗝️ **Granular Access Control & Key Management:**** Citadel’s gateway introduces a **gateway-keys model access pattern** for developers. Rather than embedding master API keys for various AI services, teams use **managed credentials issued by the gateway**. 
 30 | The gateway can map these to backend keys or identity tokens, ensuring that **no master keys are directly exposed** in code. Access can be segmented by team or use-case, with **role-based authorisation** (e.g. only approved apps or users can invoke certain AI endpoints) for greater security. This prevents uncontrolled use of AI services and allows rapid **revocation or rotation** of credentials from a single place.
 31 | 
 32 | *   **🔑 Credential Management:** Citadel secures API keys and service credentials by leveraging Azure Key Vault. Secrets are stored securely and accessed at runtime, ensuring that **no raw keys are exposed in code or logs**.
 33 | 
 34 | *   **🛡️ Policy Enforcement and Compliance:** The governance layer allows administrators to define and enforce a range of **custom policies**. These include **traffic mediation rules** (e.g. routing requests to different model endpoints based on content or load) and **usage policies** (per-user or per-app call rate limits and monthly token budgets). It also supports complex **expressions for policies** – for example, automatically choosing an Azure OpenAI instance in a specific region for compliance, or requiring certain **request headers/tags for auditing**. All usage is captured centrally, enabling compliance auditing and simplifying answer to the question *“Who is using which model, and how?”*.
 35 | 
 36 | *   **🌐 Multi-Cloud and Hybrid Support:** Citadel’s governance is flexible – it can govern not only Azure OpenAI, but also **open-source model servers or third-party AI APIs**. The AI Gateway speaks **OpenAI-compatible APIs** natively, meaning it can front-end virtually any generative model service. For instance, it can direct certain requests to Azure OpenAI or to an on-premises GPU-VM model, or even to Amazon Bedrock, all under the same policy umbrella. This multi-cloud ability gives organisations a **single control plane** for heterogeneous AI systems. Citadel’s gateway and related services can themselves run on-premises if needed (via APIM self-hosted gateways), supporting scenarios with strict data residency or partially air-gapped networks.
 37 | 
 38 | *   **🛡️ AI Content Safety & Guardrails:** Citadel includes built-in **AI safety** mechanisms to enforce responsible AI usage. Every request and response can be scanned by **Azure AI Content Safety** – which detects **hate speech, violent or sexual content, self-harm indications, and other harmful outputs**. If an application user tries to prompt an agent to produce disallowed content or if a model’s answer contains such content, the system can **block or filter** that response automatically. Citadel’s safety system also includes **“prompt shields”** that detect attempts to jailbreak the agent with malicious instructions hidden in user input or documents. This protects the AI agents from executing unintended commands. Additionally, **“protected content”** checks can recognise if a model’s answer includes large verbatim excerpts of known copyrighted text (lyrics, articles, etc.) and prevent accidental leakage of such content. These guardrails give organisations confidence that AI systems won’t go off-policy or create liability.
 39 | 
 40 | *   **📊 Central Monitoring & Cost Governance:** All AI usage through the gateway is logged centrally (calls, tokens used, timings, outcome). FCP provides **built-in reports and dashboards** to track this usage by application or department. This **solves the cost attribution problem** – e.g. you can see how many tokens the Finance team’s chatbot consumed this week and enforce per-team quotas. It also enables **cost optimisation** – detecting anomalous spikes or inefficient prompt usage. Combined with Azure Monitor, admins can set **alerts** (e.g. if a project exceeds its monthly AI budget, or if a spike in requests suggests a rogue script). By providing this transparency and control, Citadel helps prevent the “blank cheque” scenario of uncontrolled AI API spend. It effectively addresses the **“shadow AI”** governance nightmare by keeping all AI calls within the managed guardrails.
 41 | 
 42 | *   **📘 Central AI Registry for Agents and Tools:** FCP provides a unified **AI Registry** powered by the **Model Context Protocol (MCP)**, enabling organisations to manage and discover both **first-party** and **third-party** AI agents and tools. This registry acts as a central catalog where teams can securely share, document, and govern AI capabilities across the enterprise. By standardising metadata and access policies, the registry ensures that all agents and tools – whether developed in-house or sourced externally – are easily discoverable and can be integrated seamlessly into workflows. This capability fosters collaboration, reduces duplication of effort, and ensures consistent governance for all AI assets.
 43 | 
 44 | *   **🔒 Data Security:** Citadel ensures the protection of sensitive data in AI workflows by integrating with Microsoft Purview. This enables governance through **data sensitivity labels and policies**, ensuring that sensitive information remains within approved boundaries. For example, an AI agent accessing a database will operate under Purview’s oversight, with all usage logged and any policy violations (such as accessing restricted customer data) flagged for review.
 45 | 
 46 | **Governance & Security Features and Components:** The table below summarises some of the key governance components of Citadel and their roles:
 47 | 
 48 | <table>
 49 |   <tr>
 50 |     <th>Governance Feature</th>
 51 |     <th>Description</th>
 52 |   </tr>
 53 |   <tr>
 54 |     <td><b>Unified AI Gateway</b></td>
 55 |     <td>Central gateway that mediates <u>every</u> AI call. It applies global policies (rate limits, authentication, routing) and provides a single secure endpoint for clients. This ensures all AI usage is centrally visible and controlled.</td>
 56 |   </tr>
 57 |   <tr>
 58 |     <td><b>Policy Engine</b></td>
 59 |     <td>Rich rule framework to enforce business rules – e.g. restrict certain models to specific regions, apply <u>token quotas per user</u>, or inject safety prompts. Administrators can write custom policies or use built-in templates for common requirements.</td>
 60 |   </tr>
 61 |   <tr>
 62 |     <td><b>Managed Credentials</b></td>
 63 |     <td>Uses gateway-keys with/without Identity Platform issued tokens (like Microsoft Entra ID) to abstract backend secrets. Developers no longer handle raw AI services master keys – the gateway issues tokens/keys with scoped access. This prevents key leakage and allows instant revocation if needed.</td>
 64 |   </tr>
 65 |   <tr>
 66 |     <td><b>Content Safety Filters</b></td>
 67 |     <td>Automated checks on prompts and responses using Azure AI Content Safety. Flags or removes profanity, hate, sexual or violent content, and can block outputs that violate compliance policies (e.g. privacy or confidential data).</td>
 68 |   </tr>
 69 |   <tr>
 70 |     <td><b>AI Registry & Catalog</b></td>
 71 |     <td>A registry (via Azure API Center) for discovering and managing AI endpoints and tools (known as MCP servers). This catalogue lets teams securely share AI “skills” (Agents, APIs, functions) across the enterprise with proper metadata and governance.</td>
 72 |   </tr>
 73 |   <tr>
 74 |     <td><b>Multi-cloud Connectors</b></td>
 75 |     <td>Built-in support to govern AI services beyond Azure. The gateway can proxy requests to <u>open-source model APIs</u> or other cloud’s AI endpoints (e.g. Bedrock) securely. This ensures consistent security and monitoring even for third-party AI services.</td>
 76 |   </tr>
 77 |   <tr>
 78 |     <td><b>Azure Key Vault</b></td>
 79 |     <td>Secure store for secrets and credentials by AI Apps/Agents. All API keys, connection strings, etc., used by agents or the gateway are kept in spoke Key Vault, and accessed via managed identities. This eliminates hard-coded secrets and protects sensitive data at rest.</td>
 80 |   </tr>
 81 | </table>
 82 | 
 83 | **In practice, these governance features mean AI applications can be deployed with confidence.** For example, if you build a GPT-based internal assistant, it will run through Citadel’s gateway – **ensuring it only answers within approved data sources, filters any policy-breaking content, and logs its activity**. Administrators remain in control: they can update a policy to block a newly discovered prompt attack pattern, or quickly see which prompts are costing the most. FCP thus fosters **strong customer and stakeholder trust in AI** by providing the oversight needed beyond just “having a model”. Governance is no longer a roadblock – it’s baked into the platform so that **compliance officers and developers can collaborate effectively** without endless manual reviews. The result is faster deployment of AI solutions **“with the guardrails on”**, avoiding the common pitfalls of unchecked AI experimentation (data leaks, runaway costs, or reputational damage).
 84 | 
 85 | ***
 86 | 
 87 | ## **2. Observability & Compliance Pillar** – *End-to-End Monitoring, Evaluation & Trust*
 88 | 
 89 | > ### Full Visibility = Trust & Confidence
 90 | > Citadel provides **holistic observability** for AI systems through a **dual-layer approach**: centralised monitoring at the platform level and detailed tracing at the agent level. This ensures teams can debug issues, assure quality, and govern compliance in real time. *"You need a dashboard, not a crystal ball"* to manage AI.
 91 | 
 92 | The **Observability & Compliance** pillar of **Foundry Citadel Platform** equips organisations with the tools to **monitor, trace, and evaluate** AI agents and LLMs behaviour continuously through a structured **layered observability approach**. This ensures that AI applications are not a "black box" – instead, they are transparent and auditable at both platform and agent levels, which is essential for maintaining reliability and trust in their outputs.
 93 | 
 94 | ### **🏗️ Platform-Level Observability**
 95 | 
 96 | Platform observability provides **centralised monitoring and governance** across all AI workflows, offering enterprise-grade visibility without requiring any agent code changes:
 97 | 
 98 | *   **📊 Central Application Performance Monitoring (APM):** Citadel integrates seamlessly with **Azure Monitor Application Insights** to provide comprehensive platform-wide APM capabilities. This centralised monitoring captures infrastructure-level metrics, performance data, and system health indicators across all AI workloads. Teams gain visibility into resource utilisation, system bottlenecks, and overall platform performance without needing to instrument individual agents.
 99 | 
100 | *   **📈 Detailed Usage Tracking per Team/Use Case/Agent:** The platform provides **granular usage analytics** that can be segmented by team, use case, or individual agent. This includes tracking metrics such as:
101 |     *   **Token consumption trends** broken down by team, project, or agent
102 |     *   **Request volumes and patterns** across different use cases
103 |     *   **Cost allocation and budgeting** with detailed spend visibility per organisational unit
104 |     *   **User adoption patterns** and engagement metrics across different AI applications
105 |     
106 |     For example, operations teams can see that *"the Sales team's Q&A Bot consumed 1.2M tokens (cost ~£60) today across 5,000 requests, while the Legal team's document analysis agent used 800K tokens across 200 complex queries."* This granular tracking enables accurate **cost management, capacity planning, and resource allocation** across the organisation.
107 | 
108 | *   **🔍 Centralised AI Evaluation (No Code Changes Required):** One of FCP's key strengths is its ability to run **comprehensive AI evaluations** without requiring any modifications to agent code. The platform can:
109 |     *   **Automatically intercept and evaluate** AI outputs using predefined and custom metrics
110 |     *   Run **periodic batch evaluations** on historical data (e.g., evaluate 10% of conversations overnight)
111 |     *   Provide **comparative analysis** between different time periods, agent versions, or teams
112 |     *   Support **custom business-specific evaluators** that can be deployed centrally and applied across multiple agents
113 |     
114 |     The evaluation framework includes a comprehensive suite of **pre-defined metrics**:
115 |     *   *Response Quality Metrics:* **groundedness** (did the answer stick to the provided data sources?), **relevance** (did it address the user's query?), **coherence and fluency** of the language, and **completeness** (did it follow all instructions and provide all parts of the answer?)
116 |     *   *Retrieval Accuracy:* for agents using knowledge bases, Citadel checks whether facts in answers occur in retrieved documents (measuring **truthfulness** to sources)
117 |     *   *Safety Metrics:* evaluation for **potential harms** – offensive language, biased content, and **"jailbreak" susceptibility** (was the agent tricked into breaking rules?)
118 |     
119 |     This centralised approach means quality assurance and safety evaluations are **consistent across all AI applications** and can be managed by a central AI governance team without requiring development resources from individual agent teams.
120 | 
121 | *   **🚨 Enterprise Alerts and Automated Remediation:** For sensitive AI use cases, the platform provides **sophisticated alerting and automated response capabilities**:
122 |     *   **Configurable alert rules** on critical metrics (e.g., groundedness scores below thresholds, token usage spikes, error rate increases)
123 |     *   **Automated remediation actions** such as temporarily disabling agents, switching to backup models, or escalating to human oversight
124 |     *   **Integration with enterprise notification systems** (Teams, email, ITSM platforms) for immediate response
125 |     *   **Compliance monitoring** with automated reporting for regulatory requirements
126 |     
127 |     For high-stakes scenarios, teams can configure cascading responses – for instance, if safety scores drop below acceptable levels, the system can automatically route queries to human reviewers while alerting the responsible teams. Early warning of anomalies (maybe a new version of the model started "hallucinating" more, or an API that agents rely on is down) is critical for maintaining **high uptime and trust**.
128 | 
129 | ### **🤖 Agent-Level Observability**
130 | 
131 | Agent observability provides **detailed, granular insights** into individual AI agent behaviour, enabling deep debugging and optimisation:
132 | 
133 | *   **📋 Detailed Execution Traces:** Citadel guidance for agent deployments allows records comprehensive **execution traces** for each AI query or conversation. These traces capture **every step an agent takes** – from the initial user prompt, to system and tool prompts, all intermediate reasoning or chain-of-thought messages, calls to external tools or knowledge bases, and the final response. Along with the content, traces log **parameters, model identities, and timing** (latency and token counts) for each step. These traces are visualised in a **structured timeline** for developers and engineers.
134 | 
135 |     For example, if an AI agent uses a calculator API as part of its reasoning, you will see the exact API call and result in the trace. This level of insight makes it far easier to **debug issues** – such as figuring out *why* an agent gave a wrong answer (maybe it chose a flawed chain of actions), or why latency spiked (perhaps one tool took too long). Traces are stored durably (via Azure Application Insights/Log Analytics), allowing comparative analysis between runs and even between different versions of an agent. In short, **every conversation or action path is observable**; nothing is truly "hidden" behind an AI magic curtain.
136 | 
137 | *   **⚡ Performance Monitoring:** Agent-level monitoring captures detailed performance metrics including:
138 |     *   **Response latency** broken down by reasoning steps, tool calls, and model inference
139 |     *   **Token usage patterns** for each component of the agent's workflow
140 |     *   **Tool utilisation efficiency** and success rates
141 |     *   **Memory and resource consumption** during agent execution
142 |     
143 |     This granular performance data enables developers to identify bottlenecks, optimise agent workflows, and ensure consistent performance across different scenarios.
144 | 
145 | *   **🎯 Agent-Specific Quality Evaluations:** Beyond platform-wide evaluations, agent-level observability includes metrics tailored to specific agent behaviours:
146 |     *   **Intent fulfilment** (did the agent actually achieve what the user asked?)
147 |     *   **Tool use correctness** (did it call the right tool with correct parameters?)
148 |     *   **Reasoning efficiency** (were there unnecessary steps or redundant operations?)
149 |     *   **Multi-step coordination** (for complex agents with multiple reasoning phases)
150 |     
151 |     These agent-specific metrics provide developers with actionable insights for improving agent design and prompt engineering.
152 | 
153 | *   **🔧 Advanced Debugging & Diagnostics:** FCP's guidance provides rich tools to **search and inspect logs and traces** for any session or conversation. It has powerful filtering capabilities. This helps in **root cause analysis**. Moreover, developers can **replay traces**: taking a stored conversation and running it step-by-step (either on the same or updated version of the agent) to reproduce issues or test fixes.
154 | 
155 | *   **🔄 Continuous Improvement Integration:** Agent observability feeds directly into the **development and deployment lifecycle**:
156 |     *   **CI/CD integration** with automated testing using historical prompt datasets
157 |     *   **A/B testing capabilities** for comparing different agent versions
158 |     *   **Performance regression detection** when deploying new agent versions
159 |     *   **Feedback loop integration** allowing insights from production to inform development
160 |     
161 |     For example, AI Evaluation tests can be integrated with your CI/CD pipeline: whenever a new version of an agent is deployed, a battery of **automated tests and evaluations** can run (using stored prompt datasets) to compare its performance versus the previous version. If any metric regresses or new safety issues appear, the deployment can be halted or flagged. This DevOps-style approach – sometimes called **"AIOps"** – ensures that quality is maintained even as the AI system evolves.
162 | 
163 | ### **🔗 Unified Platform & Agent Observability**
164 | 
165 | The true power of FCP's observability recommendations lies in the **seamless integration** with platform layer and **clear guidance** for agent layers:
166 | 
167 | *   **🎛️ Unified Dashboards:** Ready-to-use dashboards provide both **platform-wide overviews** and **agent-specific drill-downs**. Operations teams can monitor overall system health while developers can dive deep into individual agent performance. These dashboards give a **bird's-eye view** of the system including:
168 |     *   **Platform metrics:** Overall usage, cost trends, system performance, and compliance status
169 |     *   **Agent metrics:** Individual performance, quality scores, and usage patterns
170 |     *   **Comparative analytics:** Performance trends across teams, use cases, and time periods
171 |     
172 |     For example, an operations engineer can see that *"today, across all teams, we served 15,000 AI requests, consumed 4.2M tokens (cost ~£180), with average platform latency 1.5s and 99.2% uptime, while the Sales Q&A Bot specifically had 1.8s average response time with 2 minor safety flags."* Having this in one place allows both **technical and business stakeholders to stay informed**. The dashboard is dynamic – teams can drill down into specific time windows or filter by scenario/agent.
173 | 
174 | *   **🚨 Coordinated Alerting:** Alerts can be configured at both platform and agent levels, with **intelligent escalation paths** that consider both individual agent issues and platform-wide concerns. For instance, if multiple agents start showing performance degradation simultaneously, this might indicate a platform-level issue rather than individual agent problems.
175 | 
176 | *   **📊 Cross-Layer Analytics:** The platform provides **correlation analysis** between platform metrics and agent performance (when Azure Monitor used end-to-end), helping teams understand how infrastructure changes, model updates, or usage patterns affect individual agent behaviour and overall system performance.
177 | 
178 | **Key Observability Tools in Citadel:**
179 | 
180 | <table>
181 |   <tr>
182 |     <th>Observability Feature</th>
183 |     <th>Purpose</th>
184 |     <th>Layer</th>
185 |   </tr>
186 |   <tr>
187 |     <td><b>Central APM Monitoring</b></td>
188 |     <td>Infrastructure-level monitoring, resource utilisation, and system health indicators across all AI workloads without requiring agent code changes.</td>
189 |     <td>Platform</td>
190 |   </tr>
191 |   <tr>
192 |     <td><b>Usage Analytics & Cost Tracking</b></td>
193 |     <td>Granular tracking of token consumption, request patterns, and cost allocation segmented by team, use case, or agent for enterprise resource management.</td>
194 |     <td>Platform</td>
195 |   </tr>
196 |   <tr>
197 |     <td><b>Centralised AI Evaluations</b></td>
198 |     <td>Automated quality, safety, and compliance evaluations applied consistently across all agents without requiring code modifications from development teams.</td>
199 |     <td>Platform</td>
200 |   </tr>
201 |   <tr>
202 |     <td><b>Enterprise Alerting & Remediation</b></td>
203 |     <td>Sophisticated alerting with automated responses for sensitive use cases, including agent disabling, human escalation, and compliance notifications.</td>
204 |     <td>Platform</td>
205 |   </tr>
206 |   <tr>
207 |     <td><b>End-to-End Tracing</b></td>
208 |     <td>Captures every step of an AI agent's reasoning and interactions (prompts, tool calls, responses), enabling transparent debugging and post-mortem analysis.</td>
209 |     <td>Agent</td>
210 |   </tr>
211 |   <tr>
212 |     <td><b>Agent Performance Monitoring</b></td>
213 |     <td>Detailed real-time metrics including response latency breakdown, token usage patterns, tool efficiency, and resource consumption per agent.</td>
214 |     <td>Agent</td>
215 |   </tr>
216 |   <tr>
217 |     <td><b>Agent-Specific Evaluations</b></td>
218 |     <td>Tailored quality metrics for individual agent behaviours including intent fulfilment, tool use correctness, and reasoning efficiency.</td>
219 |     <td>Agent</td>
220 |   </tr>
221 |   <tr>
222 |     <td><b>Advanced Debugging Tools</b></td>
223 |     <td>Powerful querying, filtering, and trace replay capabilities for root-cause analysis and issue reproduction at the agent level.</td>
224 |     <td>Agent</td>
225 |   </tr>
226 |   <tr>
227 |     <td><b>Unified Dashboards</b></td>
228 |     <td>Integrated visual dashboards providing both platform-wide overviews and agent-specific drill-downs for comprehensive operational visibility.</td>
229 |     <td>Both</td>
230 |   </tr>
231 |   <tr>
232 |     <td><b>Continuous Improvement Loop</b></td>
233 |     <td>Connects operational data back to development with CI/CD integration, A/B testing, and regression detection for ongoing AI system enhancement.</td>
234 |     <td>Both</td>
235 |   </tr>
236 | </table>
237 | 
238 | All these observability measures ensure AI systems are **reliable and accountable**. Teams using Citadel principals can confidently answer **"What is my AI doing and why?"** at any time – a question that's otherwise hard to address. This pillar thus mitigates one of the biggest barriers to enterprise AI adoption: the fear of not knowing what the AI might do. With Citadel, **governance and observability go hand-in-hand**: if the Governance pillar is about setting the rules and guardrails, the Observability pillar is about **watching and verifying** adherence to those rules, and catching anything that falls outside. Together, they create a closed-loop system for responsible AI management, where issues are not only prevented but also detected and learned from in an ongoing cycle.
239 | 
240 | ***
241 | 
242 | ## **3. AI Development Velocity Pillar** – *Accelerating Innovation with Templates & Tools*
243 | 
244 | > ### Build Fast, Build Right
245 | > Citadel provides both **low-code and pro-code** pathways to build AI agentic solutions, so teams can experiment and innovate quickly. Pre-built templates, integratable DevOps guidance, and flexible model choices enable rapid iteration *without* sacrificing governance or quality.
246 | 
247 | While governance and oversight are crucial, **Foundry Citadel Platform** is not a single tool but an **AI Landing Zone**—a pre-configured set of Azure resources designed to help organizations **move quickly** and capitalize on AI opportunities. The **AI Development Velocity** pillar ensures that the platform **empowers AI developers and data scientists** with a spectrum of agentic platform choices, frameworks, and reusable assets. FCP strikes a critical balance: it enables rapid development **within established guardrails** through a template-based approach, so speed doesn’t come at the cost of security or oversight. Key aspects of this pillar include:
248 | 
249 | *   **🚀 Pre-built Deployment Templates:** FCP accelerates the provisioning of cloud environments with predefined deployment templates that can target single or multiple types of agents. These templates are integrated with the platform's central governance and security, allowing teams to quickly establish a production-ready environment for building and operating agents without manual configuration.
250 | 
251 | *   **🤖 Flexible Agent Development Models:** FCP supports a variety of agent types, allowing teams to choose the right approach for their needs. Customers may use one or a mix of these agent types, even within a single multi-agent system. The built-in types include:
252 | 
253 |     *   **Copilot Studio Agents:** For a low-code approach, FCP integrates with **Copilot Studio**, a fully managed graphical interface for building and deploying AI agents. This drag-and-drop environment allows developers and power-users to design AI workflows visually. Within FCP, Copilot Studio agents are enhanced by integrating with the **Citadel AI Registry**, enabling them to securely discover and reuse existing agents and tools (like Model Context Protocol servers), ensuring governance even in a low-code context.
254 | 
255 |     *   **Managed Runtime Agents:** For developers who want more control without managing the underlying infrastructure, FCP offers managed runtimes. Options include the **AI Foundry Agent Service**, which provides a scalable environment for hosting agent logic, and **Logic Apps Agent Loop**, which allows for the creation of serverless agent workflows. These runtimes provide a balance of flexibility and operational simplicity.
256 | 
257 |     *   **Bring-Your-Own (BYO) Agents:** For maximum flexibility, FCP allows teams to bring their own agent architectures. Developers can leverage Microsoft's first-party AI orchestrators like **Semantic Kernel**, **Agent Framework**, and **AutoGen**, or third-party orchestrators such as **LangChain**. These agents can be containerized and deployed into Citadel’s environment, inheriting the platform's governance, security, and observability benefits.
258 | 
259 | *   **📚 The Citadel AI Registry:** At the heart of the platform's governance is the **Unified AI Gateway (CGH)** and its native capability to expose an **AI Registry**, which serves as a central catalog for all AI assets. Integration with the Citadel Governance Hub AI Gateway happens in two primary ways:
260 |     *   Getting **Managed AI Access:** Agents get secure access to LLMs, AI services, and published AI tools from the registry. This ensures that only approved models and tools are used within predefined capacity and security context.
261 |     *   **Publishing:** Service and team owners can publish their own tools and agents into the central AI Registry, making them discoverable and reusable by other teams across the organization. This fosters a secure collaborative environment and prevents duplication of effort.
262 | 
263 | *   **📦 One-Click Deployment & Reusable Blueprints:** To truly accelerate time-to-value, FCP provides automation for **environment setup and deployment**. The entire FCP reference architecture can be deployed via an **automated script or template** (e.g., Bicep), essentially a **“one-click deploy”** of the agent AI landing zone. This drastically reduces the initial setup time. On top of this, Microsoft offers **Gold Standard** assets—ready-made AI solution blueprints for common patterns like "Chat with your data" or "Conversation summarization." These blueprints come with code, configuration, and deployment scripts, serving as accelerators that allow teams to adapt proven solutions rather than starting from scratch.
264 | 
265 | *   **🔄 DevOps Integration & Lifecycle Management:** FCP treats AI solutions with the same rigor as any software project, embedding them into the DevOps toolchain. It provides seamless integration with **GitHub and Azure DevOps**, allowing developers to use pre-configured environments like **GitHub Codespaces**. Automated CI/CD pipelines can run evaluation suites on pull requests to ensure quality and deploy updated agents to staging or production environments. With **APIs and CLI tools**, teams can programmatically manage AI Gateway policies and safety settings as part of a release. FCP also supports **A/B testing and shadow deployments**, enabling teams to run multiple versions of an agent in parallel, compare their performance using observability data, and bring Agile principles to AI development.
266 | 
267 | **Key Development Tools & Components:**
268 | 
269 | <table>
270 |   <tr>
271 |     <th>Development Accelerator</th>
272 |     <th>Role in Citadel</th>
273 |   </tr>
274 |   <tr>
275 |     <td><b>Deployment Templates</b></td>
276 |     <td>Pre-built, one-click templates (i.e. Bicep) to provision a secure, governed cloud environment for single or multiple agent types, accelerating time-to-production.</td>
277 |   </tr>
278 |   <tr>
279 |     <td><b>Flexible Agent Runtimes</b> <br><em>(Copilot Studio, Managed Runtime, BYO)</em></td>
280 |     <td>Supports a spectrum of development models, from low-code (Copilot Studio) to managed services (AI Foundry Agent Service) and fully custom "Bring-Your-Own" orchestrators (Semantic Kernel, LangChain), allowing teams to choose the best fit for their use case.</td>
281 |   </tr>
282 |   <tr>
283 |     <td><b>Citadel AI Registry</b></td>
284 |     <td>A central, governed catalog for discovering, managing, and reusing AI assets. It provides managed access to LLMs and tools and allows teams to publish their own, fostering collaboration and preventing redundant work.</td>
285 |   </tr>
286 |   <tr>
287 |     <td><b>Reusable Blueprints</b> <br><em>(Gold Standard Solutions)</em></td>
288 |     <td>End-to-end solution examples that demonstrate common AI patterns. They serve as accelerators for new projects, embodying proven architectures and best practices.</td>
289 |   </tr>
290 |   <tr>
291 |     <td><b>DevOps Integration</b></td>
292 |     <td>Integrates with GitHub and Azure DevOps for CI/CD, automated testing, and lifecycle management of AI solutions. Supports A/B testing and canary releases to bring modern software engineering speed to AI development.</td>
293 |   </tr>
294 | </table>
295 | 
296 | All these capabilities mean that teams can innovate **rapidly** with AI. They can start with an idea, quickly assemble an MVP agent using existing building blocks, test it with real data (with governance in place), and iterate to improve it – all in a matter of days or weeks rather than months. Citadel’s approach of providing both low-code and pro-code options also ensures that **different personas can collaborate**: a business analyst could craft an initial agent behavior in Copilot Studio, then a software engineer could refine it using the code SDK for more complex logic – all deploying to the same managed environment.
297 | 
298 | Crucially, **development speed does not mean throwing caution to the wind**. Every agent built on Citadel pillars and guidance, no matter how fast it was created, **runs within the secure, monitored framework** described in the previous sections. This means organisations can encourage experimentation and pilot projects without fear: if something grows in importance, the governance and reliability scaffolding is already there for it. Citadel effectively frees teams from the heavy lifting of creating a safe AI infrastructure from scratch, so they can focus on applying AI in ways that differentiate their business (be it new customer experiences, process automation, or decision support).
299 | 
300 | Finally, this pillar embodies the idea of **scaling AI responsibly**. Once you’ve built one successful solution, Citadel makes it easier to rollout others (since the platform is already in place) and to templatise your approach. Over time, the catalogue of internal tools and connectors will grow – a “**network effect**” where each new AI project potentially adds reusable pieces for future projects. This accelerates AI adoption across the organisation in a governed way, helping build an **“AI factory”** capability. In summary, Citadel turns the typically slow, risky journey of AI solution development into a **fast, repeatable, and governed process**, accelerating innovation while maintaining **enterprise-grade standards**.
301 | 
302 | ***
303 | 
304 | ## **Architecture Overview:** *Inside the Foundry Citadel Platform Landing Zone*
305 | 
306 | To support the three pillars above, **Foundry Citadel Platform (FCP)** implements a **reference architecture** that covers all necessary layers – from networking and compute to integration and data. It is essentially an **extension of your Azure Landing Zone** tailored for AI workloads, meant to run alongside your existing cloud setup (reusing things like your networking, identity, and governance foundations). Here is a high-level look at the key components of the FCP architecture and how they relate to the pillars:
307 | 
308 | ![AI-Agents-Citadel-Architecture](/assets/AIAC-1.1.0.png)
309 | 
310 | The above architecture ensures that FCP is not a monolithic product but a **collection of Azure services** wired together in a reference design. This modularity means it can be adapted – e.g., if an organisation has an existing logging solution, that can be integrated, or if they prefer AKS over Container Apps for specific compliance reasons, the design supports that swap.
311 | 
312 | Critically, the **governance, observability, and dev velocity features are achieved by these components working in harmony**. 
313 | 
314 | For instance, the **Unified AI Gateway (Governance)** is a single point of entry for LLMs, published agents and tools, which allows it to mediate the traffic, enforcing governance policies and have observibility at a platfrom level across all AI agents and apps. 
315 | 
316 | Also, because the **landing zone is separate but connected** to the main enterprise landing zone, it can be introduced without disrupting existing applications – it’s an **add-on landing zone for AI** that still connects back to the core (network peering to the company’s Hub VNet, adhering to central governance via Azure Policy, etc.).
317 | 
318 | To summarise the architecture in simpler terms: **Citadel’s landing zone is like a secure factory for AI agents**. The **Unified AI Gateway** is the fortified front door and guard for LLMs, tools & agents, the **Agent hosting** (AI Foundry, containers, apps,...) is the assembly line where work gets done, and the **observability layer** is the set of instruments and dials that supervisors use to monitor the process and outcomes both at platform-level and agent-level. 
319 | 
320 | All of this is delivered with a blueprint so that organisations can set it up quickly and be confident that nothing was left out in the design (security, networking, ops, all accounted for). It gives you the **peace of mind** that as you scale up AI usage, you have an architecture that can handle **growth in users, agents, and integrations** without sacrificing control or performance.
321 | 
322 | **Foundry Citadel Platform** is divided into two deployments: the central **Citadel Governance Hub (CGH)** landing zone representing the **governance and security** pillar, and **Citadel Agent Spoke (CAS)** landing zones for single agents or multi-agent systems serving specific use cases or business units, representing the **AI development velocity** pillar. 
323 | 
324 | The **Observability and compliance** pillar spans both CGH and CAS landing zones, providing unified monitoring and evaluation capabilities.
325 | 
326 | ### **Citadel Governance Hub (CGH)**: Central governance & security
327 | 
328 | The **Citadel Governance Hub (CGH)** is an enterprise-grade solution accelerator that establishes a centralized, governable, and observable control plane for all AI service consumption across multiple teams, use cases, and environments. Often referred to as the **AI Hub Gateway**, CGH replaces fragmented, unmonitored, key-based model access with a **unified AI gateway** pattern built on Azure API Management (APIM), adding intelligent routing, security enforcement, compliance guardrails, usage analytics, AI registry and automated onboarding. 
329 | 
330 | This elevates AI consumption from ad hoc experimentation to a scalable, auditable, and cost-attributable platform capability.
331 | 
332 | > **🔗 Explore the AI Hub Gateway Repo:**  
333 | > For detailed guidance on deploying and operating the AI Hub Gateway—including architecture, templates, and best practices—visit the official [**AI Hub Gateway repository**](https://aka.ms/ai-hub-gateway).  
334 | > <br>
335 | > This resource is your starting point for hands-on instructions, reference implementations, and operational insights to accelerate secure, governed AI adoption in your enterprise.
336 | 
337 | #### 🏗️ What Gets Deployed
338 | 
339 | ![Azure components](./assets/AIAC-Governance-1.1.0.png)
340 | 
341 | | Component | Purpose | Enterprise Features |
342 | |-----------|---------|-------------------|
343 | | **🚪 API Management** | Unified AI gateway | LLM governance, AI resiliency, AI registry gateway |
344 | | **📘 API Center** | Universal AI Registry | Discovery of available AI tools, agents and AI services for 1st and 3rd party |
345 | | **🔍 AI Foundry** | Platform Observability and Compliance | Platform AI Evaluations & Compliance reports |
346 | | **📊 Log Analytics Workspace** | LLM Logs, metrics & audits | scalable enterprise telemetry ingestion and storage |
347 | | **📊 Application Insights** | Platform monitoring & analytics | performance dashboards, automated alerts |
348 | | **📨 Event Hub** | Usage data streaming & processing | Usage streaming, custom logging |
349 | | **🛡️ Azure Content Safety** | Centralized LLM protection | Prompt Shield and Content Safety protections |
350 | | **💳 Azure Language Service** | PII entity detection | Natural language based PII entity detection, anonymization |
351 | | **🗄️ Cosmos DB** | Usage analytics & cost allocation | Long term storage of usage, automatic scaling |
352 | | **⚡ Logic App** | Event processing & data transformation | Workflow-based processing of ingested usage/logs & AI Eval workflow |
353 | | **🔐 Managed Identity** | Zero-credential authentication | Secure service-to-service communication |
354 | | **🔗 Virtual Network** | Private connectivity & isolation | BYO-VNET support, private endpoints |
355 | | **🤖 Azure OpenAI (OPTIONAL)** | Multi-region OpenAI deployments (3 regions) |  GPT-models, Realtime API, fully private |
356 | 
357 | ***
358 | 
359 | ### **Citadel Agent Spoke (CAS)**: Local AI development velocity
360 | 
361 | The **Citadel Agent Spoke (CAS)** provides a comprehensive, enterprise-ready infrastructure foundation for deploying and scaling AI agent workloads on Azure. Built on Azure Verified Modules (AVM), CAS delivers a secure, network-isolated environment optimized for generative AI applications and agent services per domain or workload. The architecture centers around Azure AI Foundry with integrated agent capabilities, supported by a full suite of AI services, data stores, and enterprise-grade security controls.
362 | 
363 | > **🔗 Explore the Citadel Agent Spoke Repo:**  
364 | > For comprehensive guidance on deploying and operating Citadel Agent Spokes (CAS)—including architecture, deployment templates, and enterprise best practices—visit the official [**Citadel Agent Spoke repository**](https://github.com/Azure/AI-Landing-Zones).  
365 | > <br>
366 | > This resource provides step-by-step instructions, reference implementations, and operational insights to help you rapidly build, scale, and manage AI agent solutions in a secure, governed Azure environment.
367 | 
368 | #### 🏗️ What Gets Deployed
369 | 
370 | | Component | Purpose | Enterprise Features |
371 | |-----------|---------|-------------------|
372 | | **🤖 Azure AI Foundry** | AI agent development platform with Standard Agent Services | Agent capability hosts, project management, private networking, managed identities |
373 | | **🚪 API Management** | Unified AI gateway and service orchestration | LLM governance, API versioning, traffic control, usage analytics, security policies |
374 | | **🔍 Azure AI Search** | Vector and hybrid search for RAG patterns | Private endpoints, semantic search, vector indexing, enterprise security |
375 | | **🗄️ Azure Cosmos DB** | Distributed database for agent state and conversations | Global distribution, multi-region failover, private connectivity |
376 | | **💾 Azure Storage Account** | Blob storage for documents and model artifacts | Private endpoints, hierarchical namespace, lifecycle management |
377 | | **🔐 Azure Key Vault** | Secrets and certificate management | Private endpoints, RBAC integration, HSM-backed keys |
378 | | **📊 Azure Container Apps** | Containerized AI applications and microservices | Auto-scaling, managed environments, private networking |
379 | | **📦 Azure Container Registry** | Container image registry for AI workloads | Private endpoints, vulnerability scanning, geo-replication |
380 | | **📈 Application Insights** | Telemetry and performance monitoring | Custom metrics, distributed tracing, alerting |
381 | | **🌐 Virtual Network** | Network isolation and security | Private endpoints, NSGs, subnet segmentation |
382 | | **🌍 Application Gateway** | Web application firewall and load balancing | WAF protection, SSL termination, path-based routing |
383 | | **💻 Jump VM** | Secure access to private resources | Bastion integration, managed maintenance, RBAC |
384 | | **🏗️ Build VM** | DevOps and CI/CD operations | Automated deployments, secure build environment |
385 | | **🔒 Network Security Groups** | Subnet-level security controls | Fine-grained traffic rules, security logging |
386 | | **🌐 Private DNS Zones** | Name resolution for private endpoints | Automated DNS management, secure resolution |
387 | 
388 | >Note: Many of the above components highlighted as part of CAS like network and Application Gateway are optional with toggles to provision new, do not provision or use existing.
389 | 
390 | #### Key Enterprise Capabilities
391 | 
392 | ##### 🤖 **AI Agent Infrastructure**
393 | - **Agent Capability Hosts**: Dedicated infrastructure for AI agent services with Azure AI Foundry
394 | - **Project Management**: Multi-tenant project isolation and management capabilities
395 | - **Standard Agent Services**: Pre-configured agent runtime environment with networking integration/isolation
396 | 
397 | ##### 🔐 **Security & Compliance**
398 | - **Zero Trust Architecture**: All services communicate through private endpoints
399 | - **Network Isolation**: Dedicated subnets with network security groups for each service tier
400 | - **Identity & Access Management**: Managed identities and RBAC integration across all components
401 | - **Secrets Management**: Centralized key and certificate management with Azure Key Vault
402 | 
403 | ##### 🚀 **Scalability & Performance**
404 | - **Auto-scaling Container Apps**: Elastic compute for variable AI workloads
405 | - **Global Distribution**: Multi-region capabilities with Cosmos DB and geo-replication
406 | - **Load Balancing**: Application Gateway with WAF for high availability and security
407 | - **Caching & CDN**: Built-in caching strategies for optimal performance
408 | 
409 | ##### 🔧 **DevOps & Operations**
410 | - **Infrastructure as Code**: Complete Bicep templates with Azure Verified Modules
411 | - **Monitoring & Observability**: Comprehensive telemetry with AI Foundry observability powered by Application Insights and Log Analytics
412 | - **Automated Deployments**: CI/CD integration with build agents and maintenance windows
413 | - **Configuration Management**: Centralized app configuration with Azure App Configuration
414 | 
415 | ##### 🌍 **Networking & Connectivity**
416 | - **Hub-Spoke Architecture**: VNet peering capabilities for enterprise network integration
417 | - **Private Connectivity**: All AI services accessible only through private endpoints
418 | - **DNS Management**: Automated private DNS zone configuration and management
419 | - **Firewall Protection**: Azure Firewall with threat intelligence and custom rules
420 | 
421 | ##### 📈 **Data & Analytics**
422 | - **Vector Search**: Advanced AI Search capabilities for RAG and semantic search scenarios
423 | - **Document Storage**: Hierarchical blob storage with lifecycle management policies
424 | - **State Management**: Distributed database for conversation history and agent state
425 | - **Configuration Store**: Centralized configuration management with feature flags
426 | 
427 | #### Deployment Flexibility
428 | 
429 | The blueprint supports both stand alone **greenfield deployments** and **integration with existing Foundry Citadel Platform - Citadel Governance Hub (CGH)**:
430 | 
431 | - **Create New**: Deploy all components as new resources with optimized defaults
432 | - **Reuse Existing**: Integrate with existing virtual networks, DNS zones, and shared services
433 | - **Hybrid Approach**: Mix of new and existing resources based on organizational requirements
434 | 
435 | This enterprise-ready blueprint provides the foundation for building, deploying, and scaling AI agent solutions while maintaining the highest standards of security, compliance, and operational excellence.
436 | 
437 | ### **Citadel Governance Hub Integration** – *Automated Alignment Between Agents & Guardrails*
438 | 
439 | Citadel streamlines the handshake between each **Citadel Agent Spoke** and the central **Citadel Governance Hub**, ensuring that every agent inherits the platform’s security, policy, and observability standards from day one. Through a fully automated onboarding flow, teams can codify their integration in source control and wire it directly into CI/CD pipelines—enabling repeatable deployments, rapid environment cloning, and verifiable governance drift checks.
440 | 
441 | *   **AI Access Contract:** Declares the governed dependencies an agent needs—LLMs, AI services, tools (MCP), and reusable agents—along with the precise access policies (model selection, capacity, regions, safety requirements). When automated, this contract guarantees consistent consumption guardrails across environments and simplifies approvals by making entitlements explicit.
442 | *   **AI Publish Contract:** Describes the tools and agents a spoke exposes back to the hub, including the publishing rules, ownership metadata, and security posture. Automation turns this into a predictable cataloging workflow, accelerating time-to-discovery, enforcing compliance gates, and keeping the enterprise AI registry continuously in sync.
443 | 
444 | By treating governance onboarding as code, organisations gain **audit-ready traceability**, **faster release cycles**, and **reduced manual effort**, while ensuring every agent remains within the Citadel’s unified policy perimeter.
445 | 
446 | 
447 | 
448 | ## **Conclusion & Next Steps**
449 | 
450 | **Foundry Citadel Platform (FCP)** brings together everything an enterprise needs to **build, run, and scale AI-powered solutions responsibly**. By focusing on the three pillars of **Governance & Security**, **Observability & Compliance**, and **Development Velocity**, it ensures that AI projects can move fast from idea to production **with the right safety nets in place** at every stage. Organisations adopting FCP can accelerate their AI journey: teams are empowered to create powerful AI agents (from simple chatbots to complex multi-agent systems) using a rich set of tools and templates, while central IT can rest assured that **proper controls and insights** are enforced globally through the **Citadel Governance Hub (CGH)** and delivered securely through **Citadel Agent Spokes (CAS)**.
451 | 
452 | In practice, Citadel’s impact is significant: it can **accelerate time-to-value** for AI initiatives (by providing out-of-box infrastructure and best practices), and at the same time **reduce the risks** that typically accompany AI experiments (thanks to its rigorous governance and monitoring). It helps answer common executive concerns in AI projects – *“How do we prevent sensitive data leaks? How do we ensure the AI stays reliable and fair? How do we integrate these new AI apps into our existing systems and culture?”* – by providing a proven solution. This platform has already been leveraged in various industries, from finance (where auditability and security are paramount) to retail and manufacturing (where rapid innovation and cost control are key). Early adopters have reported increased confidence in deploying generative AI for critical use cases, knowing they can track usage, attribute costs, and meet compliance requirements.
453 | 
454 | In essence, **Foundry Citadel Platform** enables enterprises to **innovate with AI at scale – safely, efficiently, and transparently**. It represents a move from ad-hoc AI experiments to a **disciplined AI engineering approach**: akin to going from crafting one-off artisan pieces to running a well-oiled factory that can produce reliable, high-quality products repeatedly. With FCP, organisations can unlock the tremendous potential of generative AI and autonomous agents *“with the confidence that comes from having a Citadel around your AI operations.”*
455 | 
456 | *(Placeholder: Diagram of the AI Foundry Citadel Reference Architecture and Pillars)*
457 | 
458 | In summary, **Foundry Citadel Platform** helps you **“build the future, safely”** – delivering the **speed** that business demands, with the **safeguards** that IT requires, all in one comprehensive, evolving platform. It is your organisation’s Citadel in the new world of AI – providing **protection, structure, and strength** as you scale new heights with enterprise AI.
459 | 


--------------------------------------------------------------------------------
Governance Feature	Description
Unified AI Gateway	Central gateway that mediates every AI call. It applies global policies (rate limits, authentication, routing) and provides a single secure endpoint for clients. This ensures all AI usage is centrally visible and controlled.
Policy Engine	Rich rule framework to enforce business rules – e.g. restrict certain models to specific regions, apply token quotas per user, or inject safety prompts. Administrators can write custom policies or use built-in templates for common requirements.
Managed Credentials	Uses gateway-keys with/without Identity Platform issued tokens (like Microsoft Entra ID) to abstract backend secrets. Developers no longer handle raw AI services master keys – the gateway issues tokens/keys with scoped access. This prevents key leakage and allows instant revocation if needed.
Content Safety Filters	Automated checks on prompts and responses using Azure AI Content Safety. Flags or removes profanity, hate, sexual or violent content, and can block outputs that violate compliance policies (e.g. privacy or confidential data).
AI Registry & Catalog	A registry (via Azure API Center) for discovering and managing AI endpoints and tools (known as MCP servers). This catalogue lets teams securely share AI “skills” (Agents, APIs, functions) across the enterprise with proper metadata and governance.
Multi-cloud Connectors	Built-in support to govern AI services beyond Azure. The gateway can proxy requests to open-source model APIs or other cloud’s AI endpoints (e.g. Bedrock) securely. This ensures consistent security and monitoring even for third-party AI services.
Azure Key Vault	Secure store for secrets and credentials by AI Apps/Agents. All API keys, connection strings, etc., used by agents or the gateway are kept in spoke Key Vault, and accessed via managed identities. This eliminates hard-coded secrets and protects sensitive data at rest.
Observability Feature	Purpose	Layer
Central APM Monitoring	Infrastructure-level monitoring, resource utilisation, and system health indicators across all AI workloads without requiring agent code changes.	Platform
Usage Analytics & Cost Tracking	Granular tracking of token consumption, request patterns, and cost allocation segmented by team, use case, or agent for enterprise resource management.	Platform
Centralised AI Evaluations	Automated quality, safety, and compliance evaluations applied consistently across all agents without requiring code modifications from development teams.	Platform
Enterprise Alerting & Remediation	Sophisticated alerting with automated responses for sensitive use cases, including agent disabling, human escalation, and compliance notifications.	Platform
End-to-End Tracing	Captures every step of an AI agent's reasoning and interactions (prompts, tool calls, responses), enabling transparent debugging and post-mortem analysis.	Agent
Agent Performance Monitoring	Detailed real-time metrics including response latency breakdown, token usage patterns, tool efficiency, and resource consumption per agent.	Agent
Agent-Specific Evaluations	Tailored quality metrics for individual agent behaviours including intent fulfilment, tool use correctness, and reasoning efficiency.	Agent
Advanced Debugging Tools	Powerful querying, filtering, and trace replay capabilities for root-cause analysis and issue reproduction at the agent level.	Agent
Unified Dashboards	Integrated visual dashboards providing both platform-wide overviews and agent-specific drill-downs for comprehensive operational visibility.	Both
Continuous Improvement Loop	Connects operational data back to development with CI/CD integration, A/B testing, and regression detection for ongoing AI system enhancement.	Both
Development Accelerator	Role in Citadel
Deployment Templates	Pre-built, one-click templates (i.e. Bicep) to provision a secure, governed cloud environment for single or multiple agent types, accelerating time-to-production.
Flexible Agent Runtimes (Copilot Studio, Managed Runtime, BYO)	Supports a spectrum of development models, from low-code (Copilot Studio) to managed services (AI Foundry Agent Service) and fully custom "Bring-Your-Own" orchestrators (Semantic Kernel, LangChain), allowing teams to choose the best fit for their use case.
Citadel AI Registry	A central, governed catalog for discovering, managing, and reusing AI assets. It provides managed access to LLMs and tools and allows teams to publish their own, fostering collaboration and preventing redundant work.
Reusable Blueprints (Gold Standard Solutions)	End-to-end solution examples that demonstrate common AI patterns. They serve as accelerators for new projects, embodying proven architectures and best practices.
DevOps Integration	Integrates with GitHub and Azure DevOps for CI/CD, automated testing, and lifecycle management of AI solutions. Supports A/B testing and canary releases to bring modern software engineering speed to AI development.