├── .github └── workflows │ ├── deadlinkcheck.yml │ └── powershell.yml ├── .gitignore ├── Application ├── Analyzing_Application_Issues.md └── Remote_Debugging_Application_Code.md ├── Cluster ├── Changing DataPath.md ├── Cluster Not Reachable UpgradeServiceNotreachable.md ├── Collecting logs for failed VMs.md ├── Common issues customers experience when using Auto-scale with Service Fabric clusters.md ├── Connecting to secure clusters with PowerShell.md ├── Determine Process Listening on Port.md ├── FUS Stream Architecture.md ├── Fabric Upgrade Policy - Define a Custom Fabric Upgrade Policy.md ├── Get Cluster Upgrade history.md ├── How To Turn Off Resource Orchestrator Service.md ├── How to Fix one missing seed node.md ├── How to Fix two missing seed node.md ├── How to Query EventStore from PowerShell.md ├── How to Rotate Access Keys of Storage Account for Service Fabric logs.md ├── How to change Storage Account for Service Fabric logs.md ├── How to fix missing seednodes with Automated script.md ├── How to setup Azure Alerts for Service Fabric Linux node performance counters.md ├── How to setup Azure Alerts for Service Fabric Windows node performance counters.md ├── Incoming TCP Traffic Troubleshooting.md ├── Issues caused by Deallocating a VMSS.md ├── Out of Diskspace.md ├── Phantom nodes.md ├── README.md ├── SF Node Open Life Cycle.md ├── SF collect node info.md ├── SF process list with Security Context info.md ├── Service Fabric Managed Cluster monitoring with Azure Monitor.md ├── Service Fabric Red Hat File Locations.md ├── Service Fabric Standalone Cluster Data Collection.md ├── Service Fabric Ubuntu File Locations.md ├── Service Fabric Windows File Locations.md ├── Shared Log writes throttled.md ├── Troubleshooting failed Fabric Upgrade.md ├── Upgrade to Service Fabric 7.1 fails with certificate configuration errors.md ├── Why did my node Reboot.md ├── Why do cluster upgrades take so long.md ├── Why is my cluster reporting an unresponsive neighbor.md ├── Why is my node degraded.md ├── Why is my node unhealthy.md ├── how-to-configure-log-analytics-for-service-fabric-clusters.md ├── how-to-configure-service-fabric-placement-loadbalancing-constraints.md └── unsupported-ipprovider.md ├── Deployment ├── Container-Support-Decision-Graph.md ├── Deployments to upgrade existing applications from Azure Dev Ops or using sfpkg time out or fail.md ├── Example ASP.Net Core Kestrel Https Windows Container.md ├── How to Configure Service Fabric Cluster Automatic OS Image Upgrade.md ├── How to Configure Service Fabric Managed Cluster Automatic OS Image Upgrade.md ├── How to Migrate from XStore to the Native Image Store Service.md ├── How-do-I-know-if-I-am-using-Containers-in-my-SF-service.md ├── Image Store - Copy application fails with Invalid Argument (Value does not fall within the expected range.).md ├── Image Store Copy Application package fails with ACCESS_DENIED.md ├── Installing dependencies on virtual machine scaleset.md ├── Minimal-Cluster-Rebuild.md ├── Mirantis-Guidance.md ├── Mirantis-Installation.md ├── README.md ├── SFRP-VMSS-Validations.md ├── Schedule task on virtual machine scale set.md ├── Unmanaged-Disk-Deprecation-Guidance.md ├── Upgrade Service Fabric cluster basic load balancer (automated).md ├── Upgrade Service Fabric cluster basic load balancer (manual).md ├── Upgrade Service Fabric cluster basic load balancer.md ├── how-to-configure-azure-devops-for-service-fabric-arm-deployments.md ├── how-to-configure-azure-devops-for-service-fabric-cluster.md ├── how-to-configure-azure-devops-for-service-fabric-managed-cluster.md └── how-to-export-service-fabric-managed-cluster-configuration.md ├── Known_Issues ├── All VMSS operations are blocked on Silver or higher durability.md ├── Application crashes after upgrade on Fabric 6.4.md ├── BRS-stops-taking-backup-after-upgrading-to-latest-runtime.md ├── Backup-enabled-partition-stuck-in-reconfiguration.md ├── BackupRestoreService(BRS) stops taking periodic backup after upgrade to latest runtime.md ├── Breaking change in .NET 5+ causes Reliable Collections with string keys to crash.md ├── Cluster upgrade gets stuck.md ├── Cluster upgrades stuck.md ├── Container Service gets into an error state after upgrading the Cluster to 7.1.417-cu1.md ├── Container based services stop responding after upgrade to Service Fabric 7.1.417.9590.md ├── Fabric 6.4 Upgrade fails.md ├── FabricDCA Process High Memory and CPU.md ├── Microsoft.ServiceFabricMesh registration is stuck.md ├── Nodes FabricDCA DataCollectionAgent.DiskSpaceAvailable.md ├── Nodes FabricNode Certificate Expiration.md ├── Nodes unhealthy due to a FabricDCA exception.md ├── Service Fabric 7.1 High CPU Fabric.exe One Node.md ├── Service Fabric 8.2 Upgrade or Certificate Rotation Failure due to ImageStoreService Error.md ├── Service Fabric 9.x Repair Job Stuck.md ├── Service Fabric Common Name Digicert Multiple Issuer Thumbprints.md ├── Upgrade to Service Fabric 7.1 fails with certificate configuration errors.md └── service-fabric-10.0-sdk-7.0.1816-installation-failure.md ├── LICENSE ├── LICENSE-CODE ├── MigrationGuides ├── CloudServices_To_ServiceFabric_Migration_Guide.md ├── Migration_CloudServices_To_ServiceFabric.md ├── StateManagement_Migration_Example.md ├── WebRole_Migration_Example.md └── WorkerRole_Migration_Example.md ├── README.md ├── SECURITY.md ├── Scripts ├── Add_New_Cert_To_VMSS.ps1 ├── CheckIPProviderEnabled.ps1 ├── CreateKeyVaultAndCertificateForServiceFabric.ps1 ├── DisableIPProvider.ps1 ├── FixExpiredCert-AEPCC.ps1 ├── FixExpiredCert.ps1 ├── FixMissingSeednode.ps1 ├── Install-Mirantis.ps1 ├── PatchIssuerThumbprints.ps1 ├── Readme.md ├── Remove-Unreferenced-Replica-Files │ ├── README.md │ └── Remove-UnreferencedReplicaFiles.ps1 ├── SetupAnonymousShare.ps1 ├── UpdatePrivateKeyPermissionsOnCertificate.ps1 ├── directory-treesize.ps1 ├── enumerate-vmss-image-sku.ps1 ├── event-log-manager.ps1 ├── install-dotnet-48.ps1 ├── install-dotnet-60.ps1 ├── install-dotnet-80.ps1 ├── schedule-task.ps1 ├── sf-collect-node-info.ps1 ├── sfmc-connect.ps1 └── vmss-cse-tls.ps1 ├── Security ├── Add-AzureRmServiceFabricClusterCertificate throws error TwoCertificatesToTwoCertificatesNotAllowed.md ├── Authentication Issue with AAD.md ├── Change the RDP password for VMSS.md ├── Configure Azure Active Directory Authentication for Existing Cluster.md ├── Create a New Self Signed Certificate.md ├── DSC - ACL a certificate using Desired State Configuration.md ├── Determine Cert bound to a specific port.md ├── Download certificate from Keyvault in PFX or PEM or CER format.md ├── Failed to get the Certificates private key.md ├── Fix Expired Cluster Certificate Automated Script.md ├── Fix Expired Cluster Certificate Manual Steps.md ├── Fix-Untrusted-Cert-due-to-Untrusted-Issuer-Thumbprints.md ├── How to ACL application certificate from ApplicationManifest.md ├── How to Configure a Service Fabric Managed Cluster with Common Name Certificate.md ├── How to clean up Fabric firewall rules.md ├── How to recover from an Expired Cluster Certificate.md ├── Install intermediate certificates.md ├── Intermediate Certificate.md ├── NSG configuration for Service Fabric clusters Applied at VNET level.md ├── PowerShell ARM Template Deployment - Swap certificates.md ├── README.md ├── Recommended AV (Antivirus) software exclusions for Service Fabric clusters.md ├── Removing a Secondary certificate with expiry date later than Primary certificate expiry date.md ├── SF unable to authenticate with primary certificate.md ├── Securing Application Endpoint (ie. DoS DDoS prevention).md ├── SecurityApi_CertGetCertificateChain - CTL accessibility - CRL slow warnings.md ├── Set ACL for a SF certificate.md ├── StorageFirewall.md ├── Swap Reverse Proxy certificate.md ├── TLS Configuration.md ├── Use Azure Resource Explorer to add the Secondary Certificate.md └── View Cluster Certificate.md └── media ├── Autoscale001.PNG ├── BRS ├── BackupCallbackStuck.png └── ReconfigStuck.png ├── ClusterNodeUnhealthy01.PNG ├── ClusterNodeUnhealthy02.PNG ├── ClusterUpgradeTimerStuck.png ├── FabricBRS_001.png ├── FabricDCA001.png ├── FabricDCA002.png ├── FabricDCA003.png ├── Installing-dependencies-on-virtual-machine-scaleset ├── api-playground-get-response.png ├── api-playground-get.png ├── api-playground-patch-response.png ├── api-playground-patch.png ├── resource-explorer-1.png └── resource-explorer-copy-resource-uri.png ├── IntermediateCerts001.png ├── NodeDeactivationInfo1.png ├── NodeReboot001.jpg ├── ROSExperimentalFeature.png ├── SharedLogWriteThrottled.png ├── SharedLogWriteUnthrottled.png ├── azure-export-template.png ├── certlm-certificate-acl.png ├── certlm-manage-private-keys.png ├── certlm1.png ├── certlm2.png ├── certswap_image1.png ├── create-alert-signal-lx.png ├── create-alert-signal.png ├── create-alert.png ├── create-notification-action-group.png ├── dsc_image001.jpg ├── dsc_image002.jpg ├── dsc_image003.png ├── eventvwr-microsoft-service-fabric.png ├── eventvwr1.png ├── git-aspnetcore-sample-1.png ├── how-to-configure-azure-devops-for-service-fabric-arm-deployments ├── ado-new-pipeline-assistant-arm-connection.png ├── ado-new-pipeline-assistant-arm-service-connection.png ├── ado-new-pipeline-assistant-arm-template-settings.png ├── ado-new-pipeline-assistant-arm.png ├── ado-new-pipeline-assistant.png ├── ado-new-pipeline-repo.png ├── ado-new-pipeline-yaml-review.png ├── ado-new-pipeline-yaml.png ├── ado-new-pipeline.png ├── ado-run-pipeline-debug-download.png ├── ado-run-pipeline-debug-variable-ui.png ├── ado-run-pipeline-debug.png ├── ado-run-pipeline-jobs.png ├── ado-run-pipeline-permissions-warn.png ├── arm-portal-new-cluster-download-template.png └── arm-portal-new-cluster-save-template.png ├── how-to-configure-azure-devops-for-service-fabric-cluster ├── ado-aad-common-connection.png ├── ado-aad-thumprint-connection.png ├── ado-certificate-common-connection.png ├── ado-certificate-thumbprint-connection.png ├── ado-nsg-service-tag.png ├── portal-cluster-app-api-permissions.png ├── portal-cluster-app-registration-users.png ├── portal-cluster-app-registration.png ├── portal-cluster-user-applications.png ├── portal-cluster-user-overview.png └── portal-sfc-security.png ├── how-to-configure-azure-devops-for-service-fabric-managed-cluster ├── sfmc-ado-pool-type.png ├── sfmc-ado-service-connection.png ├── sfmc-cluster-id.png └── sfmc-enable-aad.png ├── how-to-configure-log-analytics-for-service-fabric-clusters ├── azure-portal-log-analytics-add-storage-account-event-type.png ├── azure-portal-log-analytics-add-storage-account-sf-event-type.png ├── azure-portal-log-analytics-add-storage-account.png ├── azure-portal-log-analytics-search.png ├── azure-portal-storage-wad-tables.png └── azure-storage-explorer.png ├── how-to-configure-service-fabric-cluster-automatic-os-image-upgrade ├── sfx-infrastructure-task-autoosupgrade.png └── sfx-repair-task-autoosupgrade.png ├── how-to-configure-service-fabric-managed-cluster-automatic-os-image-upgrade ├── sfx-repair-task-infra-autoosupgrade.png └── sfx-repair-task-sfrp-autoosupgrade.png ├── how-to-rotate-access-keys-of-storage-account-for-service-fabric-logs ├── sfx-eventstore-bad.png └── sfx-eventstore-good.png ├── knownissue_container_dns_image001.png ├── metric-explorer-virtual-machine-add-filter-lx.png ├── metric-explorer-virtual-machine-add-filter.png ├── metric-explorer-virtual-machine-apply-splitting-lx.png ├── metric-explorer-virtual-machine-apply-splitting.png ├── metric-explorer-virtual-machine-guest1-lx.png ├── metric-explorer-virtual-machine-guest1.png ├── metric-explorer-virtual-machine-guest2-lx.png ├── metric-explorer-virtual-machine-guest2.png ├── monitor-explorer-lx.png ├── monitor-explorer.png ├── mstsc-1.png ├── mstsc-2.png ├── mstsc-3.png ├── mstsc-4.png ├── mstsc-5.png ├── nsg01.png ├── oneseednode001.PNG ├── oneseednode002.PNG ├── oneseednode003.PNG ├── oneseednode004.PNG ├── outofdiskspace001.jpg ├── outofdiskspace002.jpg ├── outofdiskspace003.jpg ├── outofdiskspace004.jpg ├── outofdiskspace005.jpg ├── outofdiskspace006.jpg ├── outofdiskspace007.jpg ├── outofdiskspace008.png ├── perfmon-view1.png ├── perfmon-view2.png ├── phantomNode001.jpg ├── portal-upgrade-policy1.png ├── resource-explorer-1.png ├── resourcemgr1.png ├── resourcemgr10.png ├── resourcemgr11.png ├── resourcemgr12.png ├── resourcemgr13.png ├── resourcemgr16.png ├── resourcemgr2.png ├── resourcemgr3.png ├── resourcemgr4.png ├── resourcemgr5.png ├── resourcemgr6.png ├── resourcemgr7.png ├── resourcemgr8.png ├── resourcemgr9.png ├── resources-azure-wadcfg.png ├── rpcertswap_image001.png ├── rpcertswap_image002.PNG ├── rpcertswap_image003.PNG ├── rpcertswap_image004.PNG ├── seednodeauto03.PNG ├── service-fabric-10.0-sdk-7.0.1816-installation-failure └── installation-error.png ├── service-fabric-9x-repair-job-stuck ├── sfx-9x-stateful-known-issue.png └── sfx-9x-stateful-known-issue2.png ├── service-fabric-managed-cluster-monitoring-with-azure-monitor ├── azure-monitor-dcr-create-add-source.png ├── azure-monitor-dcr-create-counters.png ├── azure-monitor-dcr-create-custom-events.png ├── azure-monitor-dcr-create-dcr-review.png ├── azure-monitor-dcr-create-review.2.png ├── azure-monitor-dcr-create-select-scope.png ├── azure-monitor-dcr-create.2.png ├── azure-monitor-dcr-create.png ├── azure-monitor-dcr-created.log.png ├── azure-monitor-dcr-created.png ├── azure-monitor-dcr-custom-events-destination.png ├── log-analytics-sf-counter.png └── log-analytics-sf-event-logs.png ├── sfx-container-logs-2.png ├── sfx-imagestore-quorum-loss.png ├── sfx-node-restart.png ├── storage-account-access-keys.png ├── storagefirewall.jpg ├── task-manager-filestoreservice-terminate.png ├── task-manager-user-context.png ├── taskscheduler1.png ├── template-cse-extension.png ├── template-extension.png ├── template-wadcfg.png ├── tffu001.jpg ├── tffu0010.jpg ├── tffu002.jpg ├── tffu003.jpg ├── tffu004.jpg ├── tffu005.jpg ├── tffu006.jpg ├── tffu007.jpg ├── tffu008.jpg ├── tffu009.jpg ├── twoseednode001.PNG ├── twoseednode002.PNG ├── twoseednode003.PNG ├── twoseednode004.PNG ├── twoseednode005.PNG ├── twoseednode006.PNG ├── unmanaged-disk-deprecation-guidance └── azure-portal-managed-disk-disabled.png ├── upgrade-service-fabric-cluster-basic-load-balancer ├── sfx-cluster-events.png └── sfx-green.png ├── upgradehistory001.jpg ├── upgradehistory002.jpg ├── viewcert_image001.png ├── vs-build-output-1.png ├── vs-program-cs-1.png ├── vs-sf-container-solution-1.png ├── vs-sfx-container-log-1.png ├── vs-solution-add-orchestrator-1.png ├── vs-solution-publish-1.png ├── vs-solution-publish-2.png ├── vs-solution-run-1.png ├── vs-solution-save-1.png └── vs-web-add-orchestrator-1.png /.github/workflows/deadlinkcheck.yml: -------------------------------------------------------------------------------- 1 | # This workflow will do a clean install of node dependencies, build the source code and run tests across different versions of node 2 | # For more information see: https://help.github.com/actions/language-and-framework-guides/using-nodejs-with-github-actions 3 | 4 | name: Node.js CI 5 | 6 | on: 7 | push: 8 | branches: [ master ] 9 | pull_request: 10 | branches: [ master ] 11 | 12 | jobs: 13 | build: 14 | 15 | runs-on: ubuntu-latest 16 | 17 | strategy: 18 | matrix: 19 | node-version: [20.x] 20 | 21 | steps: 22 | - uses: actions/checkout@v2 23 | - name: Use Node.js ${{ matrix.node-version }} 24 | uses: actions/setup-node@v1 25 | with: 26 | node-version: ${{ matrix.node-version }} 27 | - run: npm install --global remark-cli remark-validate-links 28 | - run: remark -u validate-links . --frail 29 | 30 | -------------------------------------------------------------------------------- /.github/workflows/powershell.yml: -------------------------------------------------------------------------------- 1 | # This workflow uses actions that are not certified by GitHub. 2 | # They are provided by a third-party and are governed by 3 | # separate terms of service, privacy policy, and support 4 | # documentation. 5 | # 6 | # https://github.com/microsoft/action-psscriptanalyzer 7 | # For more information on PSScriptAnalyzer in general, see 8 | # https://github.com/PowerShell/PSScriptAnalyzer 9 | 10 | name: PSScriptAnalyzer 11 | 12 | on: 13 | push: 14 | branches: [ "master" ] 15 | pull_request: 16 | branches: [ "master" ] 17 | schedule: 18 | - cron: '21 18 * * 4' 19 | 20 | permissions: 21 | contents: read 22 | 23 | jobs: 24 | build: 25 | permissions: 26 | contents: read # for actions/checkout to fetch code 27 | security-events: write # for github/codeql-action/upload-sarif to upload SARIF results 28 | name: PSScriptAnalyzer 29 | runs-on: ubuntu-latest 30 | steps: 31 | - uses: actions/checkout@v3 32 | 33 | - name: Run PSScriptAnalyzer 34 | uses: microsoft/psscriptanalyzer-action@2044ae068e37d0161fa2127de04c19633882f061 35 | with: 36 | # Check https://github.com/microsoft/action-psscriptanalyzer for more info about the options. 37 | # The below set up runs PSScriptAnalyzer to your entire repository and runs some basic security rules. 38 | path: .\ 39 | recurse: true 40 | # Include your own basic security rules. Removing this option will run all the rules 41 | includeRule: '"PSAvoidGlobalAliases", "PSAvoidUsingConvertToSecureStringWithPlainText"' 42 | output: results.sarif 43 | 44 | # Upload the SARIF file generated in the previous step 45 | - name: Upload SARIF results file 46 | uses: github/codeql-action/upload-sarif@v2 47 | with: 48 | sarif_file: results.sarif 49 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Binaries for programs and plugins 2 | *.exe 3 | *.exe~ 4 | *.dll 5 | *.so 6 | *.dylib 7 | *.docx 8 | 9 | # Test binary, build with `go test -c` 10 | *.test 11 | 12 | # Output of the go coverage tool, specifically when used with LiteIDE 13 | *.out 14 | *.json 15 | *.sqlite 16 | *.vsidx 17 | /.vs/Service-Fabric-Troubleshooting-Guides/v17/DocumentLayout.json 18 | /.vs/Service-Fabric-Troubleshooting-Guides/v17/.wsuo 19 | /.vs/Service-Fabric-Troubleshooting-Guides/copilot-chat/ 20 | /.vs/Service-Fabric-Troubleshooting-Guides/CopilotIndices/ 21 | -------------------------------------------------------------------------------- /Application/Remote_Debugging_Application_Code.md: -------------------------------------------------------------------------------- 1 | # Remote debugging for services deployed in an Azure Service Fabric Cluster 2 | 3 | - Install AZ Cli from https://aka.ms/installazurecliwindowsx64 4 | - Publish the debug build of your application to the cluster 5 | - Az account set --subscription 6 | - Execute the following 2 commands ```(Get the values for , , from the portal)``` 7 | 8 | # Create an inbound-nat-pool and assign front end port range and backend ports. 9 | ```az network lb inbound-nat-pool create -g --lb-name -n VSDebugPool --protocol Tcp --frontend-port-range-start 40026 --frontend-port-range-end 40126 --backend-port 4026 ``` 10 | 11 | If successful, should return 12 | 13 | { "backendPort": 4026, "enableFloatingIP": false, "enableTcpReset": false, "etag": "W/"467fe3f7-8ed2-4e94-9da3-18068a6b45c8"", "frontendIPConfiguration": { "id": "/subscriptions//resourceGroups/djbsfcluster-rg/providers/Microsoft.Network/loadBalancers/LB-djbsfcluster-7kfjjnciu/frontendIPConfigurations/LoadBalancerIPConfig", "resourceGroup": "djbsfcluster-rg" }, "frontendPortRangeEnd": 40126, "frontendPortRangeStart": 40026, "id": "/subscriptions//resourceGroups/djbsfcluster-rg/providers/Microsoft.Network/loadBalancers/LB-djbsfcluster-7kfjjnciu/inboundNatPools/VSDebugPool", "idleTimeoutInMinutes": 4, "name": "VSDebugPool", "protocol": "Tcp", "provisioningState": "Succeeded", "resourceGroup": "djbsfcluster-rg", "type": "Microsoft.Network/loadBalancers/inboundNatPools" } 14 | 15 | # Adds the vmss instances to the inbound-nat-pool created above 16 | ```az vmss update -g -n --add virtualMachineProfile.networkProfile.networkInterfaceConfigurations[0].ipConfigurations[0].loadBalancerInboundNatPools id=``` 17 | 18 | - Log into the portal, then Remote into an instance (node) on VMSS of the SF cluster either using RDP (3389) or Bastion. 19 | - Install the Remote Tools from https://aka.ms/vs/17/release/RemoteTools.amd64ret.enu.exe 20 | - Launch “Remote Debugger” from Windows Search bar (this should launch msvsmon.exe) after a dialog asking to configure remote debugging 21 | - Get the IP address from the Load Balancer Frontend IP configuration in the portal 22 | - Attach to Process in Visual Studio Debug menu, specify “Remote (Windows)” as connection type and enter :40026 for the connection target. 23 | 24 | You should get a prompt for username/password that was used when setting up the cluster. Same account that you log into the VM with. 25 | 26 | NOTE: MSVSMon.exe can be started with /noauth and can be found at "C:\Program Files\Microsoft Visual Studio 17.0\Common7\IDE\Remote Debugger\x64". If doing that, choose "Remote (Windows - No Authentication) in Visual studio Connection Type. 27 | -------------------------------------------------------------------------------- /Cluster/Changing DataPath.md: -------------------------------------------------------------------------------- 1 | # Changing DataPath 2 | 3 | Customer sometimes change the DataPath for their cluster to avoid using temporary D: which is destroyed if the VM or VMSS is deallocated. This can happen if the subscription spending limits are reached or if they are trying to save on resource expenses by deallocating on the weekends or between QA milestones. 4 | 5 | This Article from Matt Schneider discusses this and why it can cause performance issues. 6 | 7 | You can change Setup/FabricDataRoot to move the Service Fabric local installation and all of the local application working directories, and/or TransactionalReplicator/SharedLogPath to move the reliable collections shared log. 8 | 9 | ## **Some things to consider** 10 | 11 | Service Fabric Services (and Service Fabric itself) are built to work on local disks and generally should not be hosted on XStore backed disks (premium or not): 12 | 13 | - Reliable Collections are definitely built to operate against local drives. There's no internal testing that I'm aware of that runs them in this configuration. 14 | 15 | - Waste of I/O: Assuming LRS replicates changes 3 times and you set TargetReplicaSetSize to 3, this configuration will generate 9 copies of the state. Do you need 9 copies of your state? 16 | 17 | - Impact on Latency and Performance: What should be a local disk IO will turn into network + disk IO, which has a chance to hurt your performance. 18 | 19 | - Impact on Availability: At a minimum you're adding another dependency, which usually reduces overall availability. If storage ever has an issue you're now more coupled to that other service. Today you're pretty coupled already since the VMSS drives are backed by blobs, so VM provisioning would fail, but that's different than the read/write/activation path for your services. 20 | 21 | ## **References** 22 | 23 | https://stackoverflow.com/questions/42379769/utilize-managed-disk-for-service-fabric-temporary-storage/42520824#42520824 24 | -------------------------------------------------------------------------------- /Cluster/Cluster Not Reachable UpgradeServiceNotreachable.md: -------------------------------------------------------------------------------- 1 | # Cluster Not Reachable / UpgradeServiceNotreachable 2 | 3 | ## Symptoms 4 | 5 | - Cluster State \'UpgradeServiceNotReachable\' in Azure Portal 6 | - Application/Node details are not displayed in Azure Portal 7 | - Unable to connect to the cluster through SFX/PowerShell 8 | - Service Fabric Explorer (SFX) warnings on WrpStreamChannel 9 | 10 | ## Possible Causes 11 | 12 | - Cluster is down due to seed node Quorum Loss / ring collapsed - too many seed nodes failed or were brought down at the same time 13 | - fabric:/System/UpgradeService (Upgrade service) is down 14 | - fabric:/System/UpgradeService is unable to reach regional Service Fabric Resource Provider (SFRP) 15 | - TLS 1.0/1.2 was disabled 16 | - Expired Certificate 17 | 18 | ## Mitigation 19 | 20 | ### Cluster down / ring collapse 21 | 22 | Sometimes the cluster is not recoverable and the worst case is a full rebuild of the cluster. 23 | 24 | #### VMs in VMSS associated with SF nodes deallocated 25 | 26 | - VMs deallocated / stopped, start the VMs 27 | - See [Issues caused by Deallocating a VMMS](./Issues%20caused%20by%20Deallocating%20a%20VMSS.md) 28 | - See [Common issues caused by AutoScale](./Common%20issues%20customers%20experience%20when%20using%20Auto-scale%20with%20Service%20Fabric%20clusters.md) 29 | 30 | #### VMs in VMSS associated with SF nodes are in failed state, check status of extensions on these VMs 31 | 32 | - Validate that the configuration of failed extensions is correct 33 | - Service Fabric extension in error, note down the error message if any 34 | - See [Collecting logs for failed VMs](./Collecting%20logs%20for%20failed%20VMs.md) for obtaining more details about the failure and mitigation 35 | 36 | #### VMs healthy 37 | 38 | - Service Fabric may be failing to start up on the VM due to various issues 39 | - RDP to one or more VMs 40 | - Identify the list of running processes that match Fabric*.exe (in taskmanager) 41 | - Are any of them restarting (changing PID) 42 | - Fabric.exe running and not restarting may indicate the VM is fine. Collect rest of the information listed below from the VM and repeat the process for other VMs. 43 | - Fabric.exe not running or restarting often, collect rest of the information listed below. It is not necessary to gather information from rest of the VMs at this point. 44 | - Check for errors in event viewer under the following 45 | - Applications and Services Logs \ Microsoft Service Fabric 46 | - Windows Logs \ System 47 | - Search for Service Fabric Node Bootstrap 48 | - Copy latest three trace files matching each prefix from the VM 49 | - Normally under D:\SvcFab\Log\Traces 50 | - Open a support ticket 51 | - Include information collected above including any warnings / errors in EventViewer 52 | - Indicate that the trace files from the VM(s) are available. Support engineer will reach out to get these uploaded 53 | 54 | ### Cluster is healthy but fabric:/System/UpgradeService (Upgrade service) is down 55 | 56 | - May be a transient error, wait for a while to see if the issue self-corrects 57 | - Cluster config upgrade stuck while copying the cluster package. This can block Upgrade Service from connecting to SFRP for the region 58 | - Restart the primary replica of fabric:/System/UpgradeService 59 | - Restart VM hosting primary replica for the service 60 | 61 | ### fabric:/System/UpgradeService is unable to reach SFRP 62 | 63 | - Investigate why Stream Channel is broken, see [FUS Stream Architecture](./FUS%20Stream%20Architecture.md) 64 | - NSG may be preventing the connection, see [Check for a Network Security Group](../Security/NSG%20configuration%20for%20Service%20Fabric%20clusters%20Applied%20at%20VNET%20level.md) 65 | 66 | ### TLS disabled 67 | 68 | - See [TLS Configuration](../Security/TLS%20Configuration.md) 69 | -------------------------------------------------------------------------------- /Cluster/Collecting logs for failed VMs.md: -------------------------------------------------------------------------------- 1 | # Collecting logs for failed VMs 2 | 3 | ## Symptoms 4 | 5 | VMs in VMSS are in Failed(Running) state in Azure Portal and preventing Service Fabric nodes from becoming healthy. In this state the VMs themselves are running but there is an error in one of the VMSS extensions. VMs will be accessible through RDP. 6 | 7 | ## Cause 8 | 9 | There can be a number of reasons for a VMSS extension to fail 10 | 11 | - Incorrect configuration 12 | - A resource referenced by the VMSS extension is not available 13 | - Timeouts or transient issues causing the extensions to fail 14 | 15 | ## Gathering more informaiton 16 | 17 | 1. RDP to one or more of the VMs in Failed state 18 | 2. Open command prompt and then run the following commands 19 | 20 | ```powershell 21 | md C:\guestlogs 22 | cd C:\guestlogs 23 | C:\WindowsAzure\GuestAgent_VERSION\CollectGuestLogs.exe (VERSION will be different values, use the highest version value) 24 | ``` 25 | 26 | 3. This will create a compressed file with various logs 27 | 4. Copy the file out of the VM 28 | 29 | ## Mitigation 30 | 31 | ### Config issues 32 | 33 | - Check failures in the logs to eliminate any config issues 34 | - For example, a reference to a resource could be wrong or no longer valid 35 | - Fix the config issue in the ARM template and redeploy it 36 | 37 | ### Other issues 38 | 39 | - Open Support Incident through the Azure Portal 40 | - Include the logs collected from the VM(s) or the error message found in the logs 41 | -------------------------------------------------------------------------------- /Cluster/FUS Stream Architecture.md: -------------------------------------------------------------------------------- 1 | FUS Stream Architecture - (fabric:/system/UpgradeService) 2 | 3 | ``` 4 | |------------- Service Fabric Cluster ------------| 5 | | | 6 | | | 7 | [portal] <------ > [ SFRP ] <======= STREAM ====== [NSG] ===== [FabricUS.exe] ----> [Gateway] ---> [FM/CM/HM] | 8 | | | 9 | | | 10 | |-------------------------------------------------| 11 | ``` 12 | 13 | ## **Notes** 14 | - Stream channel is always an outbound connection with port 443 15 | - The NSG (if present) should allow outbound traffic to the SFRP IP Address which can be determined by calling nslookup on the BaseUrl for UpgradeService, listed in the Cluster Manifest. 16 | 17 | ```xml 18 |
19 | ==> 13.91.252.58 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 |
29 | ``` 30 | - If the connection was blocked by NSG usually we'll see the evidence from Cluster Traces 31 | - FabricUS.exe communication with SFRP will be in the SFRP Traces 32 | 33 | - For full details on NSG configuration for Service Fabric clusters, see [Check for a Network Security Group](../Security/NSG%20configuration%20for%20Service%20Fabric%20clusters%20Applied%20at%20VNET%20level.md) 34 | 35 | -------------------------------------------------------------------------------- /Cluster/Get Cluster Upgrade history.md: -------------------------------------------------------------------------------- 1 | # Get Cluster Upgrade history 2 | 3 | ## **Steps** 4 | 5 | 1. RDP to one node in your primary node type. 6 | 7 | 2. Change to the directory, D:\\SvcFab\\**node_name**\\Fabric 8 | 9 | ```command 10 | cd D:\\SvcFab\\_sys_0\\Fabric 11 | ``` 12 | 3. You will be see all history upgrade info: 13 | 14 | ![Folder contents for D:\\SvcFab\\_sys_0\\Fabric](../media/upgradehistory001.jpg) 15 | 16 | 17 | 4. Go to individual version and timestamp one by one 18 | 19 | - Note file/folder timestamps 20 | - Open the Settings.xml 21 | - NodeVersion parameter contains the SF Runtime version information 22 | 23 | ![contents of settings.xml](../media/upgradehistory002.jpg) 24 | 25 | 26 | ## Get Cluster Upgrade history and other events using the EventStore 27 | 28 | - See example in [How to Query EventStore from PowerShell](./How%20to%20Query%20EventStore%20from%20PowerShell.md) -------------------------------------------------------------------------------- /Cluster/How To Turn Off Resource Orchestrator Service.md: -------------------------------------------------------------------------------- 1 | ## **Experimental Feature - How To Turn Off Resource Orchestrator Service** 2 | This TSG only applies to Service Fabric Release 9.0. 3 | 4 | The Resource Orchestrator Service is an experimental feature that is currently in development. It is currently not meant for production use and is by default set to off. However, if your cluster finds itself it a situation where the Resource Orchestrator Service is turned on, this document describes how to turn it off. 5 | 6 | ## **Verify Resource Orchestrator Service Has Been Turned On** 7 | You can verify that the Resource Orchestrator Service is turned on in your cluster by looking at the health events in Service Fabric Explorer. There should be a health event that indicates the experimental feature "Resource Orchestrator Service" has been turned on. It should look similar to the image below. 8 | 9 | ![ROSExperimentalFeature.png](../media/ROSExperimentalFeature.png) 10 | 11 | ## **Turn Off Resource Orchestrator Service** 12 | To turn off the Resource Orchestrator Service, the EnableResourceOrchestrator configuration in the FailoverManager section of the cluster's configuration needs to be set to False. 13 | 14 | | **Parameter** | **Allowed Values** | **Upgrade Policy** | **Guidance or Short Description** | 15 | | --- | --- | --- | --- | 16 | |EnableResourceOrchestrator|Bool, default is FALSE |Static|Flag that controls if the Resource Orchestrator Service is enable. | 17 | 18 | For more information on how to modify cluster configuration settings for your cluster, see this page: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-fabric-settings 19 | 20 | **Note** - the EnableResourceOrchestrator configuration has a static upgrade policy, which means the nodes will need to be restarted to pickup the change. 21 | -------------------------------------------------------------------------------- /Cluster/How to Query EventStore from PowerShell.md: -------------------------------------------------------------------------------- 1 | # How to Query Eventstore from PowerShell 2 | 3 | MSDN Reference: https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-diagnostics-eventstore-query 4 | 5 | ## Example: get Cluster Events including Fabric Upgrade history using the EventStore rest endpoint 6 | 7 | - If your cluster is using a CA signed certificate you can simply make the Rest call 8 | 9 | ```PowerShell 10 | Invoke-RestMethod -Uri 'https://mycluster.westus.cloudapp.azure.com:19080/EventsStore/Cluster/Events?api-version=6.2-preview&StartTimeUtc=2018-08-01T00:00:00Z&EndTimeUtc=2018-08-14T18:00:00Z' -CertificateThumbprint '677244db4c0add5770904a4269c81a0269aba2f5' -Method Get 11 | ``` 12 | - If you are using a self-signed certificate for your cluster cert, you will need to disable the certificate validation check. **Security Note:** this will disable certificate validation for the entire PowerShell session. 13 | 14 | ```PowerShell 15 | $source = @" 16 | using System.Net; 17 | using System.Net.Security; 18 | using System.Security.Cryptography.X509Certificates; 19 | 20 | public class SSLValidator 21 | { 22 | public SSLValidator() {} 23 | private bool OnValidateCertificate(object sender, X509Certificate certificate, X509Chain chain, 24 | SslPolicyErrors sslPolicyErrors) 25 | { 26 | return true; 27 | } 28 | public void OverrideValidation() 29 | { 30 | ServicePointManager.ServerCertificateValidationCallback = 31 | OnValidateCertificate; 32 | ServicePointManager.Expect100Continue = true; 33 | } 34 | } 35 | "@ 36 | 37 | Add-Type -TypeDefinition $source 38 | 39 | $validation = new-object SSLValidator 40 | $validation.OverrideValidation() 41 | 42 | Invoke-RestMethod -Uri 'https://mycluster.westus.cloudapp.azure.com:19080/EventsStore/Cluster/Events?api-version=6.2-preview&StartTimeUtc=2018-08-01T00:00:00Z&EndTimeUtc=2018-08-14T18:00:00Z' -CertificateThumbprint '677244db4c0add5770904a4269c81a0269aba2f5' -Method Get 43 | ``` 44 | -------------------------------------------------------------------------------- /Cluster/Incoming TCP Traffic Troubleshooting.md: -------------------------------------------------------------------------------- 1 | # Incoming TCP Traffic Troubleshooting 2 | 3 | 1. Verify the port mapping from the front end load balancer VIP to the back end VM DIP. This will help determine which port the client is accessing externally, and which port that maps to on the VM itself. 4 | 5 | - Verify the configuration of the Load Balancer Rules 6 | ```PowerShell 7 | $slb = Get-AzureRmLoadBalancer -Name "MyLoadBalancer" -ResourceGroupName "MyResourceGroup" 8 | Get-AzureRmLoadBalancerRuleConfig -LoadBalancer $slb 9 | ``` 10 | 11 | - Validate that the LB Health Probe is configured correctly for your [Application Ports](https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-upgrade#application-ports) 12 | ```PowerShell 13 | $slb = Get-AzureRmLoadBalancer -Name "MyLoadBalancer" -ResourceGroupName "MyResourceGroup" 14 | Get-AzureRmLoadBalancerProbeConfig -LoadBalancer $slb 15 | ``` 16 | 17 | 2. Verify which ports Service Fabric applications are configured to listen on. Check the application's service manifest to see which port port is configured. It is also possible for a service to be configured for dynamic port binding, in which case the port number is not assigned in the service manifest file, but will be assigned at runtime from the Application Port range which is defined in the Cluster Manifest. 18 | 19 | 3. [Check for a Network Security Group](../Security/NSG configuration for Service Fabric clusters Applied at VNET level.md) which might be blocking external traffic. 20 | 21 | 4. RDP to the VM to determine which EXE is listening on the internal port. 22 | 23 | - [Determine Processes Listening on Port](../Cluster/Determine%20Process%20Listening%20on%20Port.md) 24 | 25 | 5. RDP to the VM and try to connect locally 26 | 27 | - ie. http://localhost:8892/api/values 28 | 29 | 6. At this point you know how traffic should flow from the client to the server process and you can do basic network troubleshooting (ie. netmon, attach debugger, etc) 30 | 31 | 7. For a service where the external port is load balanced to all of the backend VMs it is often not possible to determine which VM the client will be connecting to. In this scenario you have a few options to troubleshoot: 32 | 33 | - If long running TCP connections are used, wait for the client to connect and then try to determine which VM the client is connected to (netmon on the Azure VMs). This can be problematic if there are a large number of VMs. 34 | 35 | - Try to run the client from one of the VMs and connect using the DIP. 36 | 37 | - Add a new external port to the load balancer, mapped to only a single backend VM, and then modify the client to connect to that specific port. 38 | 39 | -------------------------------------------------------------------------------- /Cluster/Issues caused by Deallocating a VMSS.md: -------------------------------------------------------------------------------- 1 | Issues caused by Deallocating a VMSS 2 | 3 | ## **Scenario** 4 | To save cost, some customer want to put their service fabric to sleep when we not in use, and then only start it when we need to use it. They assume they can deallocate the scaleset to achieve this. However, the problem is once the scaleset is deallocated, restarting them does not always work and often the deployed services are failed and need to be redeployed. 5 | 6 | ## **Recommendation** 7 | We do not recommend deallocating a VMSS for Service Fabric Clusters, this is essentially the same as scaling to 0 nodes in the Primary nodetype and will cause cluster instability or dataloss.. 8 | 9 | see https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-scale-up-down 10 | - Scaling down the primary node type to less than the minimum number make the cluster unstable or bring it down. This could result in data loss for your applications and for the system services. 11 | - The service fabric system services run in the Primary node type in your cluster. So should never shut down or scale down the number of instances in that node types less than what the reliability tier warrants. Refer to the details on [reliability tiers](https://learn.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-scale-in-out) here. 12 | 13 | Two issues with the stop (Deallocate) 14 | - When you resize or Stop (Deallocate) a virtual machine, this action destroys the contents of the D: (temporary disk) and may even trigger placement of the virtual machine to a new hypervisor. A planned or unplanned maintenance event may also trigger this placement. This can cause dataloss on any Stateful services running on the nodes, including System services. 15 | 16 | - It is possible on the Start (Reallocate), the VMMS can come up with a new IP address, in which case Service Fabric Resource Provider will no longer recognize the node(s) and the cluster will be down. 17 | 18 | ## **Other FAQ** 19 | Q. Will upgrading the durability of the cluster make Dealloation safer? (https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-capacity#the-durability-characteristics-of-the-cluster) 20 | 21 | A. **No**, Regardless of any durability level, Deallocation operation on VM Scale Set will destroy the cluster 22 | -------------------------------------------------------------------------------- /Cluster/Phantom nodes.md: -------------------------------------------------------------------------------- 1 | # Phantom nodes show in Service Fabric Explorer 2 | 3 | ## **Symptoms** 4 | 5 | You may notice down nodes with name similar to "nodeid:", please refer below screen for more details : 6 | 7 | ![phantomNode001.jpg](../media/phantomNode001.jpg) 8 | 9 | ## **Cause** 10 | 11 | This was a known issue with the Federation layer which intermittently added back the previously removed nodes. This was fixed in 6.2 12 | 13 | ## **Mitigation** 14 | 15 | ```PowerShell 16 | Remove-ServiceFabricNodeState -NodeName nodeid:108ae715efc61b45..26e -Force 17 | ``` 18 | -------------------------------------------------------------------------------- /Cluster/README.md: -------------------------------------------------------------------------------- 1 | This contains the Cluster related TSG's surfaced in the Azure Portal during support ticket creation: 2 | 3 | ## **Category** 4 | 5 | ### **Architecture** 6 | * [FabricUpgradeService Stream Architecture](./FUS%20Stream%20Architecture.md) 7 | * [Service Fabric process list with Security Context info](./SF%20process%20list%20with%20Security%20Context%20info.md) 8 | 9 | ### **Configuration** 10 | * [Changing the default DataPath](./Changing%20DataPath.md) 11 | * [Cluster Not Reachable / UpgradeServiceNotreachable](./Cluster%20Not%20Reachable%20%20UpgradeServiceNotreachable.md) 12 | * [Determine Process Listening on Port](./Determine%20Process%20Listening%20on%20Port.md) 13 | * [How to fix missing seed nodes from a cluster using the Automated script FixMissingSeednode.ps1](./How%20to%20fix%20missing%20seednodes%20with%20Automated%20script.md) 14 | * [How to Fix one missing seed node: Manual Steps](./How%20to%20Fix%20one%20missing%20seed%20node.md) 15 | * [How to Fix two missing seed node: Manual Steps](./How%20to%20Fix%20two%20missing%20seed%20node.md) 16 | 17 | ### **How to** 18 | * [Connecting to secure clusters with PowerShell](./Connecting%20to%20secure%20clusters%20with%20PowerShell.md) 19 | * [Get Cluster Upgrade history](./Get%20Cluster%20Upgrade%20history.md) 20 | * [How to Query Eventstore from PowerShell](./How%20to%20Query%20EventStore%20from%20PowerShell.md) 21 | * [Incoming TCP Traffic Troubleshooting](./Incoming%20TCP%20Traffic%20Troubleshooting.md) 22 | * [Determine Process Listening on Port](./Determine%20Process%20Listening%20on%20Port.md) 23 | * [How to troubleshoot failed Fabric upgrades](./Troubleshooting%20failed%20Fabric%20Upgrade.md) 24 | 25 | ### **Scaling** 26 | * [Common issues customers experience when using Auto-scale with Service Fabric clusters](./Common%20issues%20customers%20experience%20when%20using%20Auto-scale%20with%20Service%20Fabric%20clusters.md) 27 | 28 | ### **Deployments** 29 | * [Why do cluster upgrades take so long](./Why%20do%20cluster%20upgrades%20take%20so%20long.md) 30 | 31 | ### **Nodes** 32 | * [How to Fix one missing seed node](./How%20to%20Fix%20one%20missing%20seed%20node.md) 33 | * [Issues caused by Deallocating a VMSS](./Issues%20caused%20by%20Deallocating%20a%20VMSS.md) 34 | * [Phantom Nodes show in Service Fabric Explorer](./Phantom%20nodes.md) 35 | * [SF Node Open Life Cycle](./SF%20Node%20Open%20Life%20Cycle.md) 36 | * [Why did my node Reboot?](./Why%20did%20my%20node%20Reboot.md) 37 | * [dataDisk (D:\) out of disk space](./Out%20of%20Diskspace.md) 38 | -------------------------------------------------------------------------------- /Cluster/SF Node Open Life Cycle.md: -------------------------------------------------------------------------------- 1 | # SF Node Open Life Cycle 2 | 3 | ```code 4 | [ServiceFabricBootStrapper] 5 | --> [ FabricInstaller ] 6 | --> FabricHost 7 | --> FabricSetup->FabricDeployer 8 | --> Fabric 9 | ... 10 | ``` 11 | 12 | See more about [SF process list with Security Context info](./SF%20process%20list%20with%20Security%20Context%20info.md) -------------------------------------------------------------------------------- /Cluster/Service Fabric Red Hat File Locations.md: -------------------------------------------------------------------------------- 1 | # Service Fabric Red Hat File Locations (Preview) 2 | 3 | [Service Fabric Core File Locations](#Service-Fabric-Core-File-Locations) 4 | [Service Fabric Event Logs](#Service-Fabric-Event-Logs) 5 | [Guest Agent Logs](#Guest-Agent-Logs) 6 | [Service Fabric Extension Plugin](#Service-Fabric-Extension-Plugin) 7 | [Docker Daemon Logs](#Docker-Daemon-Logs) 8 | 9 | **The tables below list file locations for Service Fabric on Red Hat which is currently in Preview.** 10 | 11 | ## Service Fabric Core File Locations 12 | 13 | File Path | Content 14 | ----------|---------- 15 | /opt/microsoft/servicefabric/bin/Fabric/Fabric.Code/Fabric | default service fabric core code installation path 16 | /mnt/resource/sfroot | default application code, data, and log location 17 | /mnt/resource/sfroot/_App | default application code and data location 18 | /mnt/resource/sfroot/log | default service fabric diagnostic log path 19 | /mnt/resource/sfroot/log/CrashDumps | service fabric (fabric*.exe) crash dump location 20 | /mnt/resource/sfroot/log/Traces | service fabric diagnostic trace temporary storage 21 | /mnt/resource/sfroot/<_node_name\_#> | service fabric node configuration data path 22 | 23 | ## Service Fabric Event Logs 24 | 25 | File Path | Content 26 | ----------|---------- 27 | var/log/messages | linux system event log 28 | var/log/sfnode/sfnodelog | service fabric application event log 29 | 30 | ## Guest Agent Logs 31 | 32 | File Path | Content 33 | ----------|---------- 34 | /var/log/waagent.log | azure agent log 35 | 36 | ## Service Fabric Extension Plugin 37 | 38 | File Path | Content 39 | ----------|---------- 40 | var/lib/waagent/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode-#.#.#.# | service fabric extension download, configuration, and status 41 | /var/log/azure/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode TempClusterManifest.xml | service fabric cluster configuration 42 | var/lib/waagent/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode-#.#.#.#/config/0.settings | service fabric extension configuration 43 | var/lib/waagent/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode-#.#.#.#/config/0.status | service fabric extension installation status 44 | var/lib/waagent/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode-#.#.#.#/heartbeat.log | service fabric node status 45 | var/lib/waagent/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode-#.#.#.#/ServiceFabricLinuxExtension_install.log | service fabric extension installation log 46 | 47 | ## Docker Daemon Logs 48 | 49 | File Path | Content 50 | ----------|---------- 51 | /var/lib/docker/containers | docker daemon logs -------------------------------------------------------------------------------- /Cluster/Service Fabric Ubuntu File Locations.md: -------------------------------------------------------------------------------- 1 | # Service Fabric Ubuntu File Locations 2 | 3 | [Service Fabric Core File Locations](#Service-Fabric-Core-File-Locations) 4 | [Service Fabric Event Logs](#Service-Fabric-Event-Logs) 5 | [Guest Agent Logs](#Guest-Agent-Logs) 6 | [Service Fabric Extension Plugin](#Service-Fabric-Extension-Plugin) 7 | [Docker Daemon Logs](#Docker-Daemon-Logs) 8 | 9 | **The tables below list file locations for Service Fabric on Ubuntu.** 10 | 11 | ## Service Fabric Core File Locations 12 | 13 | File Path | Content 14 | ----------|---------- 15 | /opt/microsoft/servicefabric/bin/Fabric/Fabric.Code/Fabric | default service fabric core code installation path 16 | /mnt/sfroot | default application code, data, and log location 17 | /mnt/sfroot/_App | default application code and data location 18 | /mnt/sfroot/log | default service fabric diagnostic log path 19 | /mnt/sfroot/log/CrashDumps | service fabric (fabric*.exe) crash dump location 20 | /mnt/sfroot/log/Traces | service fabric diagnostic trace temporary storage 21 | /mnt/sfroot/<_node_name\_#> | service fabric node configuration data path 22 | 23 | ## Service Fabric Event Logs 24 | 25 | File Path | Content 26 | ----------|---------- 27 | var/log/syslog | linux system event log 28 | var/log/sfnode/sfnodelog | service fabric application event log 29 | 30 | ## Guest Agent Logs 31 | 32 | File Path | Content 33 | ----------|---------- 34 | /var/log/waagent.log | azure agent log 35 | 36 | ## Service Fabric Extension Plugin 37 | 38 | File Path | Content 39 | ----------|---------- 40 | var/lib/waagent/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode-#.#.#.# | service fabric extension download, configuration, and status 41 | /var/log/azure/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode TempClusterManifest.xml | service fabric cluster configuration 42 | var/lib/waagent/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode-#.#.#.#/config/0.settings | service fabric extension configuration 43 | var/lib/waagent/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode-#.#.#.#/config/0.status | service fabric extension installation status 44 | var/lib/waagent/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode-#.#.#.#/heartbeat.log | service fabric node status 45 | var/lib/waagent/Microsoft.Azure.ServiceFabric.ServiceFabricLinuxNode-#.#.#.#/ServiceFabricLinuxExtension_install.log | service fabric extension installation log 46 | 47 | ## Docker Daemon Logs 48 | 49 | File Path | Content 50 | ----------|---------- 51 | /var/lib/docker/containers | docker daemon logs -------------------------------------------------------------------------------- /Cluster/Upgrade to Service Fabric 7.1 fails with certificate configuration errors.md: -------------------------------------------------------------------------------- 1 | # Upgrade to Service Fabric 7.1 fails with certificate configuration errors 2 | 3 | Upgrade to Service Fabric 7.1 fails and rolls back with application health errors trying to configure application certificates. 4 | 5 | ## Symptom 6 | 7 | Certificate configuration error is highlighted in a sample output below. The cause of the failure can be found in SFX under the details tab for the cluster. 8 | 9 | ### UNHEALTHY EVALUATIONS (UPGRADE) 10 | 11 | Kind | Health State | Description 12 | -----|--------------|------------ 13 | Applications | Error | 1% (1/100) applications are unhealthy. The evaluation tolerates 0% unhealthy applications. 14 |  Application | Error | Application 'fabric:/Application' is in Error. | 15 | |   DeployedApplications | Error | 100% (1/1) deployed applications are unhealthy. The evaluation tolerates 0% unhealthy deployed applications. 16 |    DeployedApplication | Error | Deployed application on node 'NodeName' is in Error. 17 |     **Event** | **Error** | **'System.Hosting' reported Error for property 'Activation:1.0'. There was an error during activation.Failed to configure certificate permissions. Error E_FAIL.** 18 |     DeployedServicePackages | Error | 100% (1/1) deployed service packages are unhealthy. 19 |      DeployedServicePackage | Error | Service package for manifest 'ServicePkg' and service package activation ID '' is in Error. 20 |       Event | Error | 'System.Hosting' reported Error for property 'ServiceTypeRegistration:ServiceType'. The ServiceType was disabled on the node. 21 | 22 | ## Cause 23 | 24 | Permissions for all certificates specified for endpoints in the Application manifest are configured to give services access to these certificates. If service fabric does not find a certificate on the node, the configuration will fail resulting in activation failures. 25 | 26 | We are investigating why activation succeeded in versions prior to 7.1, when the certificate was missing from the node. 27 | 28 | ## Mitigation 29 | 30 | One of the following mitigation can be applied 31 | 32 | 1. Install all certificates referenced in Application manifests on the nodes where the Application package can be deployed. 33 | 2. Provision any certificates required on the nodes. Modify Application manifest to remove references to certificates not required by application services and redeploy the application package. 34 | 35 | After applying one of the above mitigation, retry upgrade to 7.1. 36 | -------------------------------------------------------------------------------- /Cluster/Why did my node Reboot.md: -------------------------------------------------------------------------------- 1 | # Why did my node Reboot? 2 | 3 | ## **Check Eventlogs** 4 | Event Viewer -> Windows Logs -> System -> select "Filter Current Logs" -> and include just 6013,1074 5 | 6 | ![NodeReboot001.jpg](../media/NodeReboot001.jpg) 7 | 8 | You should see events similar to this if WU was the reason the node was restarted: 9 | 10 | ```EventInfo 11 | Information 7/15/2016 11:00:00 AM User32 1074 None 12 | Reason Code: 0x80020002 13 | The process C:\\Windows\\system32\\wuauclt.exe (FRONTEND01) has initiated the restart of computer FRONTEND01 on behalf of user FRONTEND01\\adminuser for the following reason: Operating System: Recovery (Planned) 14 | Shutdown Type: restart 15 | 16 | Comment: 17 | Information 7/15/2016 7:00:01 AM EventLog 6013 None 18 | The system uptime is 305578 seconds. 19 | ``` 20 | 21 | ## Other events you can filter on to find additional info indicating reboot events 22 | - [Event ID 6005](http://www.microsoft.com/technet/support/ee/transform.aspx?ProdName=Windows%20Operating%20System&ProdVer=5.2&EvtID=6005&EvtSrc=EventLog&LCID=1033): "The event log service was started." You will see many of these at a system startup. 23 | - [Event ID 6006](http://www.microsoft.com/technet/support/ee/transform.aspx?ProdName=Windows%20Operating%20System&ProdVer=5.2&EvtID=6006&EvtSrc=EventLog&LCID=1033): "The event log service was stopped." You will see many of these at a system shutdown. 24 | - [Event ID 6008](http://www.microsoft.com/technet/support/ee/transform.aspx?ProdName=Windows%20Operating%20System&ProdVer=5.0&EvtID=6008&EvtSrc=User32&LCID=1033/): \"The previous system shutdown was unexpected.\" Records that the system started after it was not shut down properly. 25 | - [Event ID 6009](http://www.microsoft.com/technet/support/ee/transform.aspx?ProdName=Windows%20Operating%20System&ProdVer=5.2&EvtID=6009&EvtSrc=EventLog&LCID=1033): Indicates the Windows product name, version, build number, service pack number, and operating system type detected at time. 26 | - Event ID 6013: Displays the uptime of the computer. There is no TechNet page for this id. 27 | - [Event ID 1074](http://www.microsoft.com/technet/support/ee/transform.aspx?ProdName=Windows%20Operating%20System&ProdVer=5.2&EvtID=1074&EvtSrc=User32&LCID=1033): \"The process X has initiated the restart / shutdown of computer on behalf of user Y for the following reason: Z.\" Indicates that an application or a user initiated a restart or shutdown. 28 | - [Event ID 1076](http://www.microsoft.com/technet/support/ee/transform.aspx?ProdName=Windows%20Operating%20System&ProdVer=5.2&EvtID=1076&EvtSrc=USER32&LCID=1033): \"The reason supplied by user X for the last unexpected shutdown of this computer is: Y.\" Records when the first user with shutdown privileges logs on to the computer after an unexpected restart or shutdown and supplies a reason for the occurrence. 29 | 30 | ## **Additional tip** 31 | - On Windows 2012-R2 check c:\windows\windowsupdate.log to match up the time with the above event log entries 32 | - On Windows 2016, a powershell cmdlet Get-WindowsUpdateLog 33 | - This cmdlet consolidates the logs into C:\Users\username\Desktop\WindowsUpdate.log 34 | -------------------------------------------------------------------------------- /Cluster/Why do cluster upgrades take so long.md: -------------------------------------------------------------------------------- 1 | ## Why do cluster upgrades take so long 2 | 3 | Cluster upgrade performance questions come up somewhat frequently, I hope the following helps explain in some detail how the Service Fabric cluster upgrade process works and what effect it has on the end-to-end performance. 4 | 5 | 6 | The default settings can be found in the Advanced Upgrade settings for the cluster. Azure Portal -> Resource group -> Service Fabric Cluster -> Fabric Upgrades (check Advanced Settings) 7 | 8 | 9 | - Service Fabric handles cluster wide settings changes such as Security changes, Placement Settings, custom fabric settings, etc as a cluster upgrade and as such it will trigger a two phase full UD (Upgrade Domain) walk to apply these changes to the cluster one upgrade domain at a time. After the changes are applied it will wait for some period of time based on the configured health and stability settings to ensure the change does not cause your cluster to destabilize. 10 | 11 | 12 | - By default the Health Check policy are assigned the following settings, which will translate to ~26 minutes to walk each UD as there are two phases to apply the change for each UD, so calculation would be ((wait + stable + 3 min runtime upgrade)*2) = total time per UD, + if any errors are detected it will wait 45 (default health_check_retry_timeout) minutes before retrying again. You can change the health check wait and stable duration times to make the operations complete faster in Advanced Upgrade settings, but it’s a tradeoff between speed and safety as SF will monitor the upgrade and attempt to rollback if the changes cause any errors to occur. These defaults are pretty conservative, but we do not recommend lowering these duration below 30 seconds. If you have many stateful services deployed keep in mind that partitions are failing over from the node being updated to another node still running and it can take some time for reconfiguration to take place and have the replicas back up and running, so changing the duration to a very short timespan could potentially allow deployment of a breaking change which would not manifest any errors until the upgrade already advanced to the next UD. 13 | 14 | | Setting | Description | 15 | |---|---| 16 | | health_check_retry_timeout | The length of time between attempts to perform a health checks if the application or cluster is not healthy. Default value: "PT0H0M0S". | 17 | | health_check_wait_duration_in_seconds | The length of time to wait after completing an upgrade domain before starting the health checks process. Default value: "PT0H0M0S". | 18 | | health_check_stable_duration_in_seconds | The length of time that the application or cluster must remain healthy. Default value: "PT0H0M0S". | 19 | 20 | 21 | ## More info here: 22 | 23 | [https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-upgrade#fabric-upgrade-settings---health-polices](https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-upgrade#fabric-upgrade-settings---health-polices "Fabric Upgrade Settings") 24 | 25 | [https://docs.microsoft.com/en-us/rest/api/servicefabric/sfclient-v62-model-clusterconfigurationupgradedescription](https://docs.microsoft.com/en-us/rest/api/servicefabric/sfclient-v62-model-clusterconfigurationupgradedescription "ClusterConfigurationUpgradeDescription") 26 | 27 | -------------------------------------------------------------------------------- /Cluster/Why is my cluster reporting an unresponsive neighbor.md: -------------------------------------------------------------------------------- 1 | # Unresponsive Neighbor Detected 2 | You might see a cluster health warning being reported in SFX/Events indicating 'UnresponsiveNeighborDetected'. If that is the case, you can follow this article. 3 | > This warning will be triggered in case the lease and/or fabric ports are not reachable from a neighbor node. Make sure that the reported node isn't in warning state due to another report. If that is the case, the health report could indicate an issue that might be related to this failure. 4 | 5 | ## Diagnose: 6 | 7 | 1. RDP to the node reported as the destination 8 | > If the node is not responding to the RDP session, check why the host is not responsive. A VM restart is suggested. 9 | 2. Once you have connected to the VM, Check that Fabric is working by local ping. Check the manifest and use the port defined as clusterConnectionEndpointPort
10 | run `Test-NetConnection localhost -Port ` 11 | 12 | 3. Now, check that Service Fabric Lease driver is working by local ping. Check the manifest and use the port defined as leaseDriverEndpointPort
13 | run `Test-NetConnection localhost -Port ` 14 | 4. **If both local pings succeed**, RDP into the node that reported the event. Run the same steps as in 2 and 3, but replace localhost with the IP address of the node reported as destination. 15 | you might discover the node is not able to reach the remote end. Check for any firewall policy or any network rule that might prevent the connection to succeed. If the policies look good, you will need to investigate why these nodes can't talk to each other. 16 | 5. **If one/both of the local pings failed**, check the [mitigations](#mitigations) section. 17 | 18 | ## Mitigations 19 | 20 | ### If the ping to clusterConnectionEndpointPort failed, stop Fabric.exe process 21 | Go to the Windows Task Manager, right click on Fabric.exe and click on end task. You can also stop the process using [taskkill](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/taskkill).
22 | `taskkill /f /pid ` 23 | 24 | 25 | ### If the ping to leaseDriverEndpointPort failed, Restart the node 26 | **WARNING:** Make sure restarting the Service Fabric Node won't cause problems to Service Fabric. If the node to restart is a seed node, make sure the remaining count of alive seed nodes is **greater** than the total count of seed nodes divided by two. 27 | - Try to restart the node by using Restart-ServiceFabricNode; however, this command could fail to restart the node since it is already in a degraded state. 28 | - RDP into the node and manually restart it. 29 | - If RDP is not an option, you might need to manually restart the node using the compute provider, such as restarting the node from the Azure Portal. 30 | 31 | 32 | 33 | 34 | 35 | -------------------------------------------------------------------------------- /Cluster/unsupported-ipprovider.md: -------------------------------------------------------------------------------- 1 | # Recover an unsupported cluster that is using Open Networking after Jan 19th 2 | 3 | This article will demonstrate how to try to recover a cluster that is on an unsupported version that is using Open Networking and it is down, in order to upgrade the cluster. 4 | 5 | ## [Applies to] 6 | 7 | **All** Service Fabric clusters running 6.3 or higher that uses the Open Network Container feature and that are not upgraded to a version as detailed in [LINK to unsupported]. 8 | 9 | ## [Symptoms] 10 | 11 | * Cluster is using Open Network, not upgraded and the cluster is down 12 | * Cluster State 'UpgradeServiceNotReachable' in Azure Portal 13 | * Application/Node details are not displayed in Azure Portal 14 | * Unable to connect to the cluster through SFX/PowerShell 15 | * Node(s) goes down for any reason and cannot restart (stuck down) 16 | 17 | ## [Remediation] 18 | 19 | * RDP into the node of question 20 | * Find Fabric data root directory: 21 | C:\WFRoot on a PaaS V1 VM 22 | D:\SvcFab on a VMSS VM (mostly) 23 | 24 | * Find FabricHostSettings.xml 25 | * Make a backup of the FabricHostSettings.xml file 26 | * Open the FabricHostSettings.xml file 27 | Look for the Hosting section and the parameter "IPProviderEnabled": 28 | 29 | ```xml 30 |
31 | ... 32 | 33 | ``` 34 | 35 | Replace that with Value="false" 36 | 37 | 38 | ```xml 39 |
40 | ... 41 | 42 | ``` 43 | 44 | The node would pick up the change and should start normally. All service fabric processes (Fabric, FabricHost, etc) should start normally. 45 | 46 | * Need to do the above steps on every node on the cluster 47 | 48 | When all the nodes have the settings disabled, the clusters should come back online. The applications that are using open networking won't work since the setting is disabled. 49 | * Need to open a Support Ticket with Microsoft to disable the same IPProviderEnabled setting in the backend configuration (so upgrade will be able to proceed) 50 | * Upgrade the cluster to a supported version 51 | 52 | ## [Additional References] 53 | 54 | -------------------------------------------------------------------------------- /Deployment/Container-Support-Decision-Graph.md: -------------------------------------------------------------------------------- 1 | # Decision Graph companion to Container Support Guidance 2 | 3 | [Container Support Guidance](https://github.com/Azure/Service-Fabric-Troubleshooting-Guides/blob/master/Deployment/Mirantis-Guidance.md) 4 | 5 | ```mermaid 6 | graph TD; 7 | A[Start Here] --> Decision1[VMSS running on deprecated image] 8 | Decision1 --> HostingContainerDecision{Hosting Containerized Applications} 9 | HostingContainerDecision -->|Yes| UpgradeRuntime[Upgrade SF Cluster runtime 9.0 CU2 or later] --> PickReplacementOption{Choose a replacement image} 10 | HostingContainerDecision -->|No| NodeTypeDecision2{Which NodeType is affected} 11 | PickReplacementOption --> CustomOSImage[Custom OS Image] 12 | PickReplacementOption --> ManualInstallImage[Install Container Runtime manually] 13 | PickReplacementOption --> GalleryImage[OS Image from Azure Gallery] 14 | CustomOSImage --> PrepNewImageStep1[Get started: Prep Windows for containers] --> NewImageIsPrepped1[Automatic OS image upgrade for custom images] 15 | ManualInstallImage --> ManualInstallStep1[Install Container Runtime via Custom Script VM Extension] --> NewImageIsPrepped2[Sequence extension provisioning in virtual machine scale sets] 16 | GalleryImage --> NewImageIsPrepped3[Find and use Azure Marketplace VM images with Azure PowerShell] 17 | NewImageIsPrepped1 --> NodeTypeDecision1{Which NodeType is affected} 18 | NewImageIsPrepped2 --> NodeTypeDecision1{Which NodeType is affected} 19 | NewImageIsPrepped3 --> NodeTypeDecision1{Which NodeType is affected} 20 | NodeTypeDecision1 --> YesHostingContainersPrimary[Primary Node Type] 21 | NodeTypeDecision1 --> YesHostingContainersSecondary[Secondary Node Type] 22 | NodeTypeDecision2 --> NoHostingContainersPrimary[Primary Node Type] 23 | NodeTypeDecision2 --> NoHostingContainersSecondary[Secondary Node Type] 24 | NoHostingContainersPrimary --> IsProductionDecision{Production System} 25 | IsProductionDecision --> K[Scenario 1/Option 2] 26 | IsProductionDecision --> L[Scenario 1/Option 3] 27 | NoHostingContainersSecondary --> AddNodeType1[Add new Node Type] 28 | AddNodeType1 --> MigrateWorkloads1[Migrate Workloads] 29 | YesHostingContainersPrimary --> HostingPrepImagePrimary[Scenario 2/Option 2] 30 | HostingPrepImagePrimary --> HostingPrepImagePrimaryStep1[Create new VMSS based on prepped Image] --> HostingPrepImagePrimaryStep2[Install container runtime] --> HostingPrepImagePrimaryStep3[OS SKU upgrade] 31 | YesHostingContainersSecondary --> HostingPrepImageSecondaryStep1[Create new VMSS based on prepped Image] --> AddNodeType2[Add New Node Type] 32 | AddNodeType2 --> MigrateWorkloads2[Migrate workloads] 33 | ``` 34 | -------------------------------------------------------------------------------- /Deployment/Deployments to upgrade existing applications from Azure Dev Ops or using sfpkg time out or fail.md: -------------------------------------------------------------------------------- 1 | # Deployments to upgrade existing applications from Azure Dev Ops or using sfpkg time out or fail 2 | 3 | ## Symptom 4 | 5 | Deployment from Azure Dev Ops times out for application that were successfully deployed previously. Further deployments fail. 6 | 7 | In SFX, one of the Application types shows a message similar to the following 8 | 9 | Name | Version / Message | Status 10 | -----|-------------------|------- 11 | ApplicationType | ApplicationType Version | Provisioning 12 | | | Downloading : 5304320 bytes received, 9140238 expected (58.0%). 13 | 14 | ## Cause 15 | 16 | Known bug in 7.0 version of SF. Fixed in SF version 7.0.472.9590 and later. 17 | 18 | ## Mitigation 19 | 20 | * In SFX, locate ClusterManagerService under System. 21 | * Expand the replicas for ClusterManagerService and note down the node with the primary replica 22 | * RDP to the VM with the ClusterManagerService primary replica (If RDP access is not available, restart VM) 23 | * Stop Imagebuilder.exe service 24 | * Redeploy the failed deployment 25 | -------------------------------------------------------------------------------- /Deployment/How-do-I-know-if-I-am-using-Containers-in-my-SF-service.md: -------------------------------------------------------------------------------- 1 | # How do I know if I am using Containers in my SF service? 2 | 3 | You would need to look at all of the ServiceManifest.xml files for your applications. Note that you might have multiple application manifests. One application manifest can reference multiple service manifests. One service manifest can contain multiple code packages. Please take a look at ``. If you are using Docker/container based services, your service manifest would look like below: 4 | 5 | ```xml 6 | 7 | 12 | ... 13 | 14 | 15 | 16 | 17 | 18 | mycr.azurecr.io/folder/image:latest 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | ... 27 | 28 | ``` 29 | 30 | By contrast, if you are using Process based services, your service manifest would look like below: 31 | 32 | ```xml 33 | 34 | 39 | ... 40 | 41 | 42 | 43 | 44 | MISampleWeb.exe 45 | CodePackage 46 | 47 | 48 | 49 | 50 | 51 | 52 | 53 | 54 | 55 | 56 | ... 57 | 58 | ``` 59 | -------------------------------------------------------------------------------- /Deployment/Image Store - Copy application fails with Invalid Argument (Value does not fall within the expected range.).md: -------------------------------------------------------------------------------- 1 | Customers using 3.1 SDK targeting v6.3 RTO cluster, sometimes might hit the upload issue where copy application package fails with Invalid argument 2 | 3 | > Copy-ServiceFabricApplicationPackage -ApplicationPackagePath C:\temp\package\package\MyApplicationPkg 4 | Using ImageStoreConnectionString='fabric:imageStore' 5 | Copy-ServiceFabricApplicationPackage : Value does not fall within the expected range. 6 | At line:1 char:1 7 | + Copy-ServiceFabricApplicationPackage -ApplicationPackagePath C:\temp... 8 | + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 9 |     + CategoryInfo          : InvalidArgument: (:) [Copy-ServiceFabricApplicationPackage], ArgumentException 10 |     + FullyQualifiedErrorId : CopyApplicationPackageErrorId,Microsoft.ServiceFabric.Powershell.CopyApplicationPackage 11 | 12 | 13 | 14 | Mitigation: 15 | 16 | 1. Check the value of NamingService::MaxMessageSize on your cluster. If the value is anything other than default 4MB then it might cause this this issue. 17 | To mitigate, update the Naming:MessageSize to 4194304 (or delete that config entry) on your cluster. 18 | 19 | You can use resource.azure.com and navigate to your cluster and edit the settings as below and push it (PUT). 20 | 21 | "fabricSettings": [ 22 | { 23 | "name": "NamingService", 24 | "parameters": [ 25 | { 26 | "name": "MaxMessageSize", 27 | "value": "4194304" 28 | } 29 | ] 30 | } 31 | ], 32 | 33 | NamingService::MaxMessageSize describes max message size (packet size) that can be used to communicate from client to cluster. Default in local is set to 4MB. So if your cluster has different value than default 4MB, there is no agreement between cluster and client resulting in InvalidArgument. 34 | 35 | This has been fixed in 6.3 CU1 (3.2.176.9494 SDK). So it can also be mitigated by moving to SDK version above 3.2.176.9494. 36 | 37 | 2. Use 3.2.176.9494 SDK which has this issue fixed 38 | 39 | 40 | -------------------------------------------------------------------------------- /Deployment/Minimal-Cluster-Rebuild.md: -------------------------------------------------------------------------------- 1 | # Azure Service Fabric cluster rebuild (minimal approach) 2 | 3 | If your resource group contains resources which cannot easily redeployed by applying an ARM template, then a selectively deletion might help you to achieve a quicker deployment. 4 | 5 | A fast rebuild of an Azure Service Fabric cluster might be needed in the cases for example where the latest template deployment contained wrong data or unsupported configurations. In those cases the fastest mitigation can be to rebuild the cluster. 6 | 7 | Quickly explained, this approach recommends to delete specific Azure resources manually and reapply the ARM template with old or corrected configuration. The Azure Service Fabric (ASF) cluster resource including all associated Azure Virtual Machine Scale Sets (VMSS) must be removed. 8 | 9 | ## Pre-requisites 10 | 11 | Please have a well maintained and tested ARM template handy which can create the resources in the desired form. 12 | 13 | The PowerShell CMDlet [New-AzResourceGroupDeployment](https://docs.microsoft.com/powershell/module/az.resources/new-azresourcegroupdeployment) needs to be executed with the parameter DeploymentMode=Incremental. 14 | 15 | > :warning: 16 | > Exporting the ARM template from the Azure portal via "Export template" function might be not sufficient as it cannot contain secrets and dynamically changed values. 17 | 18 | ## Step by Step 19 | 20 | 1. Remove ASF cluster including all associated VMSS 21 | 22 | ```powershell 23 | Connect-AzAccount 24 | Set-AzContext -SubscriptionId 25 | $resourceGroupName = "" 26 | Get-AzResource -ResourceGroupName $resourceGroupName | ft 27 | 28 | Remove-AzResource -ResourceName "" -ResourceType "Microsoft.ServiceFabric/clusters" -ResourceGroupName $resourceGroupName -Force 29 | 30 | Remove-AzVmss -ResourceGroupName $resourceGroupName -VMScaleSetName "" 31 | ``` 32 | 33 | Documentation: 34 | https://docs.microsoft.com/azure/service-fabric/service-fabric-tutorial-delete-cluster#selectively-delete-the-cluster-resource-and-the-associated-resources 35 | 36 | 2. Deploy the ARM template to create ASF cluster and VMSS 37 | 38 | Apply the latest version of your ARM template with [New-AzResourceGroupDeployment](https://docs.microsoft.com/powershell/module/az.resources/new-azresourcegroupdeployment). 39 | 40 | -------------------------------------------------------------------------------- /Deployment/README.md: -------------------------------------------------------------------------------- 1 | This contains the Deployment related TSG's surfaced in the Azure Portal during support ticket creation: 2 | 3 | ## **Category** 4 | 5 | ### **Image Store** 6 | [Image Store: Copy Application package fails with ACCESS_DENIED.md](./Image%20Store%20Copy%20Application%20package%20fails%20with%20ACCESS_DENIED.md) 7 | 8 | ### **SFRP VMSS Validations** 9 | [SFRP VMSS Validations Errors](./SFRP-VMSS-Validations.md) -------------------------------------------------------------------------------- /Deployment/Upgrade Service Fabric cluster basic load balancer.md: -------------------------------------------------------------------------------- 1 | # Upgrade from Basic to Standard SKU on Azure Load Balancer for Azure Service Fabric clusters 2 | 3 | ## Abstract 4 | 5 | Azure Load Balancers with Basic SKU will be retired on September 30th, 2025. To ensure that your Azure Load Balancer continues to function properly, we recommend that you migrate to a Azure Load Balancer with Standard SKU before the deprecation date. Read more in the [official retirement announcement](https://azure.microsoft.com/updates/azure-basic-load-balancer-will-be-retired-on-30-september-2025-upgrade-to-standard-load-balancer/). If you have an Azure Load Balancer with Basic SKU associated with a Azure Service Fabric cluster, please follow this migration guide to keep your cluster safe. Plan accordingly the migration path you will take based on your current load balancer configuration, number of node types, and workloads in your cluster. 6 | 7 | To check the SKU of your existing load balancers, please navigate to the [Load Balancers](https://portal.azure.com/#view/Microsoft_Azure_Network/LoadBalancingHubMenuBlade/~/loadBalancers) resources in the Azure Portal. On the overview page, you will find the SKU information. 8 | 9 | ## Document overview 10 | 11 | This document specifies the options available to upgrade a Azure Load Balancer with Basic SKU to a Standard SKU with Azure IP Address and Azure Load Balancer for a Azure Service Fabric cluster. Choose one of the options below based on availability requirements. 12 | 13 | > [!NOTE] 14 | > This does not apply to [Azure Service Fabric Managed Clusters](https://learn.microsoft.com/azure/service-fabric/overview-managed-cluster). Service Fabric Managed Clusters with Basic SKU are provisioned with a Azure Load Balancer on Basic SKU but cannot be upgraded and must be redeployed. Service Fabric Managed Clusters with Standard SKU have are provisioned with a Azure Load Balancer on Standard SKU and are not impacted. 15 | 16 | 17 | ## Migration decision guide 18 | 19 | The following table captures the risk and effort evaluation of the various migration options 20 | | Scenario | Effort | Risk | Process | 21 | | --- | --- | --- | --- | 22 | | Manual upgrade with no downtime | High | Low | [Manual process](./Upgrade%20Service%20Fabric%20cluster%20basic%20load%20balancer%20(manual).md) | 23 | | Automatic upgrade with downtime | Low | High | [Automatic process](./Upgrade%20Service%20Fabric%20cluster%20basic%20load%20balancer%20(automated).md) | 24 | | Cluster recreation | Medium | Low | - | 25 | 26 | 27 | -------------------------------------------------------------------------------- /Known_Issues/Application crashes after upgrade on Fabric 6.4.md: -------------------------------------------------------------------------------- 1 | # Application with dependency on wastorage.dll crashes on Service Fabric runtime 6.4.617.9590 2 | 3 | An known issue on Windows clusters with 6.4.617.9590 runtime has been identified which causes applications with a dependency on wastorage.dll to crash. 4 | 5 | ## Symptoms 6 | - Application crash due to dependency load failure for wastorage.dll. 7 | - This may also cause application upgrades to fail if this component is in the initialization path. 8 | 9 | **Conditions for this to happen:** 10 | - Application has **wastorage.dll** in application package 11 | - Clusters is currently running version 6.4.617.9590 12 | - Application is upgraded to a new app/code package version 13 | 14 | ## Root Cause Analysis 15 | - Service Fabric Clusters running on the 6.4.617.9590 runtime have wastorage.dll added in our runtime dependencies, therefore we will automatically strip wastorage.dll from applications packages being deployed. 16 | - If the version of wastorage.dll in the customers application package doesn’t match the Service Fabric runtime's version of wastorage.dll(3.2.2) the customers application will crash continuously. 17 | - Analysis of the exception (crash dump analysis) will show it is failing to load wastorage.dll. 18 | 19 | ## Possible Mitigations 20 | - Downgrade the cluster to latest 6.3, this will prevent the wastorage.dll from being stripped out. 21 | - If the previous version of the application is still provisioned the customer may be able to downgrade to their previous version(deployed prior to when the Cluster was upgraded to the 6.4 runtime) since it should still have the wastorage.dll present. The application should be able to load it from the app folder, though there may still be load order issues. 22 | 23 | ## Additional information 24 | The Service Fabric team is planning to fix this in 6.4 CU1 25 | 26 | **Update:** A fix is being rolled out in 6.4.622.9590: https://blogs.msdn.microsoft.com/azureservicefabric/2018/12/12/azure-service-fabric-6-4-refresh-for-windows-clusters/ 27 | -------------------------------------------------------------------------------- /Known_Issues/BRS-stops-taking-backup-after-upgrading-to-latest-runtime.md: -------------------------------------------------------------------------------- 1 | # BackupRestoreService(BRS) stops taking periodic backup after upgrade to latest runtime 2 | 3 | ## Issue 4 | Periodic backups stop for configured backup policies 5 | 6 | ## Cluster versions impacted 7 | Clusters upgraded to 8.2.1686.9590 / 9.0.1107.9590 / 9.1.1387.9590 which have existing backup policies enabled on any stateful partition/service/app. 8 | 9 | ## Impact 10 | If SF cluster is upgraded to 8.2.1686.9590 / 9.0.1107.9590 / 9.1.1387.9590 which has existing backup policies, post upgrade BRS fails deserialize old metadata with changes in new release. It will stop taking backup and restore on the partition/service/app in question, though cluster and BRS remains healthy. 11 | 12 | ## Symptoms 13 | There are two ways to identifying and confirming the issue 14 | 15 | 1. If periodic backups were happening on any partition, it should be visible on SFX under Cluster->Application->Service->Partition->Backup. Here list of all backups being taken with creation time is available. Using this info and upgrade time, customer can identify wether backup policy was enabled, backups were happening before upgrade and whether backups are happening post upgrade. 16 | 17 | 2. Another way of checking and enumerating backups is calling this API [Get partition backup list](https://learn.microsoft.com/en-us/rest/api/servicefabric/sfclient-api-getpartitionbackuplist). 18 | 19 | 20 | ## Mitigation 21 | 22 | To mitigate, we need to update the existing policy after upgrading to runtime 8.2.1686.9590 / 9.0.1107.9590 / 9.1.1387.9590. User can call [UpdateBackupPolicy](https://learn.microsoft.com/en-us/rest/api/servicefabric/sfclient-api-updatebackuppolicy) with existing policy values. It will update the policy model inside BRS with new data model and BRS will start taking periodic backups again. 23 | 24 | **Steps**: 25 | 26 | 1. As per section "Identifying the issue" above, check wether issue is being hit in cluster. If issue is confirmed, follow next steps. 27 | 2. Update the backup policy with same old values by calling UpdateBackupPolicy API. Below is one sample - 28 | 29 | ```powershell 30 | $BackupPolicy=@{ 31 | Name = "DailyAzureBackupPolicy" 32 | AutoRestoreOnDataLoss = "false" 33 | MaxIncrementalBackups = "3" 34 | Schedule = @{ 35 | ScheduleKind = "FrequencyBased" 36 | Interval = "PT3M" 37 | } 38 | Storage = @{ 39 | StorageKind = "AzureBlobStore" 40 | FriendlyName = "Azure_storagesample" 41 | ConnectionString = "" 42 | ContainerName = "" 43 | } 44 | RetentionPolicy = @{ 45 | RetentionPolicyType = "Basic" 46 | MinimumNumberOfBackups = "20" 47 | RetentionDuration = "P3M" 48 | } 49 | } 50 | $body = (ConvertTo-Json $BackupPolicy) 51 | $url = 'https://:19080/BackupRestore/BackupPolicies/DailyAzureBackupPolicy/$/Update?api-version=6.4' 52 | Invoke-WebRequest -Uri $url -Method Post -Body $body -ContentType 'application/json' -CertificateThumbprint '' 53 | # User should update the name of backup policy [DailyAzureBackupPolicy being used here and other possible values accordingly]. 54 | ``` 55 | 56 | 3. Wait for 1-2 mins and policy should get updated across all entities. 57 | 4. Periodic backups will start happening as per backup policy. 58 | -------------------------------------------------------------------------------- /Known_Issues/Cluster upgrade gets stuck.md: -------------------------------------------------------------------------------- 1 | # Ongoing cluster upgrade get stuck and UD upgrade duration has stopped updating 2 | 3 | ## Problem 4 | 5 | - Ongoing cluster upgrade get stuck in a random UD and will not progress 6 | 7 | ## Symptoms 8 | 9 | - Ongoing cluster upgrade get stuck in a random UD and will not progress 10 | - The **Upgrade Domain Duration** timer is a very small value and/or has stopped updating. This value should update approximately once per minute 11 | 12 | ![Cluster upgrade timers](../media/ClusterUpgradeTimerStuck.png) 13 | 14 | ## Cause 15 | 16 | - A race condition after CM is moved, and the new CM primary recovered pending contexts, it starts the pending cluster/fabric upgrade--which fails and aborts. 17 | 18 | ## Mitigation 19 | 20 | - To unblock cluster upgrade the primary replica for CM (ClusterManagerService) needs to be restarted or moved 21 | 22 | [Move-ServiceFabricPrimaryReplica](https://docs.microsoft.com/en-us/powershell/module/servicefabric/move-servicefabricprimaryreplica?msclkid=47ef7b40cfa711ec9442ebda21d7a8f2&view=azureservicefabricps) 23 | 24 | ```powershell 25 | Move-ServiceFabricPrimaryReplica -ServiceName fabric:/System/ClusterManagerService -PartitionId 00000000-0000-0000-0000-000000002000 26 | ``` 27 | 28 | ## Additional information 29 | 30 | The Service Fabric team is planning to fix this in an upcoming 9.0 Cummulative Update. 31 | -------------------------------------------------------------------------------- /Known_Issues/Cluster upgrades stuck.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Cluster upgrades stuck 3 | --- 4 | 5 | ## Problem Description/Impact 6 | 7 | Service Fabric clusters configured with automatic or manual runtime upgrades may get stuck in an upgrade domain without impacting customer workloads. Effected Service Fabric runtime versions include:
8 | • 8.2.1235.9590
9 | • 8.2.1363.9590
10 | • 8.2.1486.9590
11 | • 8.2.1571.9590
12 | • 8.2.1620.9590
13 | • 9.0.1017.9590
14 | • 9.0.1028.9590
15 | 16 | ## How to identify Service Fabric runtime version
17 | The runtime version can be verified based on the type of Service Fabric cluster using the following:
18 | • Azure Service Fabric cluster 19 | [Visualize your cluster using Azure Service Fabric Explorer](https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-visualizing-your-cluster#connect-to-a-service-fabric-cluster)

20 | • Standalone Service Fabric cluster 21 | [Get-ServiceFabricClusterConfiguration](https://docs.microsoft.com/en-us/powershell/module/servicefabric/get-servicefabricclusterupgrade?view=azureservicefabricps) 22 | 23 | ## How to identify a cluster upgrade is stuck in Service Fabric 24 | Validate if your cluster runtime upgrade is stuck making any progress across upgrade domains by:
25 | • Service Fabric Explorer:
26 | In the details tab at the cluster level check the Start Timestamp, Upgrade state and the Code Version to which the cluster is upgrading to. If the Start Timestamp is >24 hours and Upgrade state continues to not change from “Upgrade in Progress” then follow the mitigation steps listed under “Required Action from customer”.

27 | • PowerShell:
28 | 1. Connect to the Service Fabric cluster using the command
29 | [Connect-ServiceFabricCluster](https://docs.microsoft.com/en-us/powershell/module/servicefabric/connect-servicefabriccluster?view=azureservicefabricps)
30 | 2. Execute the below command to retrieve the current progress of the upgrade.
31 | [Get-ServiceFabricClusterUpgrade](https://docs.microsoft.com/en-us/powershell/module/servicefabric/get-servicefabricclusterupgrade?view=azureservicefabricps)
32 | If the StartTimestampUtc is >24 hours and Upgrade state continues to not change from “Upgrade in Progress”. Please follow the mitigation steps listed under “Required Action from customer” 33 | 34 | ## Required Action from customer 35 | Take the following steps to mitigate the issue:
36 | 1. From the PowerShell connect to Service Fabric cluster using the command
37 | [Connect-ServiceFabricCluster](https://docs.microsoft.com/en-us/powershell/module/servicefabric/connect-servicefabriccluster?view=azureservicefabricps)

38 | 2. Get the PartitionId of ClusterManagerService using the command
39 | [Get-ServiceFabricPartition](https://docs.microsoft.com/en-us/powershell/module/servicefabric/get-servicefabricpartition?view=azureservicefabricps)
40 | Eg. Get-ServiceFabricPartition -ServiceName fabric:/System/ClusterManagerService

41 | 3. Restart ClusterManagerService primary replica using the command
42 | [Restart-ServiceFabricReplica](https://docs.microsoft.com/en-us/powershell/module/servicefabric/restart-servicefabricreplica?view=azureservicefabricps)
43 | Eg. Restart-ServiceFabricReplica -PartitionId 00000000-0000-0000-0000-000000002000 -ReplicaKindPrimary -ServiceName fabric:/System/ClusterManagerService
44 | 45 | ## When will the Fix for this issue be rolled out? 46 | Service Fabric is rolling out a fix as part of 9.0 CU2 and 8.2 CU4 in July that resolves the stuck upgrade problem once upgraded to latest versions. 47 | -------------------------------------------------------------------------------- /Known_Issues/Container Service gets into an error state after upgrading the Cluster to 7.1.417-cu1.md: -------------------------------------------------------------------------------- 1 | # DNS resolution within a container running on 7.1.417 cu1 or later (Linux based clusters only) requires Service Fabric DNS 2 | 3 | After upgrading to Service Fabric 7.1.417 (cu1), containerized applications start failing with DNS related errors. "The remote name could not be resolved" 4 | 5 | ![knownissue_container_dns_image001.png](../media/knownissue_container_dns_image001.png) 6 | 7 | 8 | ## Symptoms 9 | - SFC cluster was running version 7.1.409 and an application running on top of it, a container app in hyper-v isolation mode, was working fine. After upgrading SF cluster to 7.1.417 (cu1) the application wouldn't start and failed with DNS related errors. 10 | - App works fine again after downgrading to 7.1.409. Upgrading again to 7.1.417, and app starts failing again. 11 | - Application logs show DNS network related errors, possibly visible in the containter log from Service Fabric Explorer 12 | - DNS resolution works from command prompt directly on Service Fabric node (DNS is working as expected) 13 | 14 | ## Root Cause Analysis 15 | - Starting in 7.1.417 the optional DNS Service needs to be enabled so containerized application can resolve DNS queries 16 | 17 | ## Possible Mitigations 18 | - Simply enable DNS Service before upgrading to 7.1 CU1 should mitigate the issue. 19 | 20 | ## Additional information 21 | 22 | -------------------------------------------------------------------------------- /Known_Issues/Fabric 6.4 Upgrade fails.md: -------------------------------------------------------------------------------- 1 | # 6.4 Upgrade fails for 6.3 Clusters with fabric:/System/BackupRestoreService enabled 2 | 3 | An issue has been identified which is known to cause the Fabric 6.4 runtime upgrade to fail for clusters with the fabric:/System/BackupRestoreService enabled. 4 | 5 | ## Symptoms 6 | During the upgrade you may see some warning/error messages in Service Fabric explorer similar to the following: 7 | ```code 8 | Assert or Coding error with message 00000000-0000-0000-0000-000000007000@131873117199500233@fabric:/StateManager: Below type used in Reliable Collection 9 | urn:RetentionStore/dataStore could not be loaded. This commonly indicates that the user application is not backwards/forwards compatible. Common 10 | compatibility bugs that lead to this error are adding a new type or changing an assembly name without two phase upgrade, or removing a type. If this 11 | was caused by user's backwards/forwards compatibility bug, one way to mitigate the issue is to force the upgrade through without safety checks. 12 | ``` 13 | 14 | ## Mitigation 15 | 16 | - Move the fabric:/System/BackupRestoreService primary replica to the node from last Upgrade Domain and then trigger cluster upgrade. 17 | 18 | ### Steps: 19 | 1. Identify the node from the highest upgrade domain where BackupRestoreService's replica can be placed as per constraints, if any, and move BackupRestoreService's primary replica to this node. Assuming the identified node name as node_4, execute following PowerShell command to move the primary. 20 | ```PowerShell 21 | Move-ServiceFabricPrimaryReplica -ServiceName fabric:/System/BackupRestoreService -PartitionId 00000000-0000-0000-0000-000000007000 -NodeName node_4 22 | ``` 23 | 24 | 2. Increase cost of replica movement for BackupRestoreService, to reduce chances of primary replica movement. Execute following PowerShell command to do this. 25 | ```PowerShell 26 | Update-ServiceFabricService -Stateful -ServiceName fabric:/System/BackupRestoreService -DefaultMoveCost High 27 | ``` 28 | 29 | 3. Initiate upgrade of your Service Fabric Cluster to version 6.4.621.9590 or later. 30 | 31 | 4. Restore replica movement cost for BackupRestoreService. 32 | ```PowerShell 33 | Update-ServiceFabricService -Stateful -ServiceName fabric:/System/BackupRestoreService -DefaultMoveCost Low 34 | ``` 35 | 36 | 37 | 38 | ## Reference 39 | https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-backuprestoreservice-quickstart-azurecluster 40 | https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-backuprestoreservice-quickstart-standalonecluster 41 | -------------------------------------------------------------------------------- /Known_Issues/FabricDCA Process High Memory and CPU.md: -------------------------------------------------------------------------------- 1 | # FabricDCA.exe Process High Memory and CPU usage 2 | 3 | ## Background information: 4 | 5 | With the introduction of containers in Service fabric a feature was added to collect logs from these containers through FabricDCA (Data Collection Agent). 6 | 7 | The initial feature was found to be leaking handles and other objects related to the containers on the DCA side. This has been fixed with release of 6.3 CU2 (6.3.187.9494). 8 | 9 | ## Symptoms 10 | - Nodes performance degration due to high memory and CPU usage 11 | 12 | ## Diagnosing the problem: 13 | 1. Look for FabricDCA.exe process memory usage. The normal memory usage would depend on the number of applications in the cluster, but usually is around 100 to 200MB. 14 | 15 | 2. If memory usage mentioned above is abnormally high look at: 16 | D:\SvcFab\Log\Containers 17 | 18 | ![FabricDCA_003](../media/FabricDCA003.png) 19 | 20 | 3. The **Containers** folder should only have a few folders related to the active containers in the node. If there are hundreds or thousands of folders the bug was probably hit, so move to the Mitigation steps. 21 | 22 | ## Mitigation 23 | 24 | 1. Select all folders in D:\SvcFab\Log\Containers and delete. **Note:** that some may not get deleted because they are in use by an active container. That is okay, leave it there. 25 | 26 | 2. Kill FabricDCA.exe process. The process should automatically be restarted and the issue mitigated **. 27 | 28 | ** Note: If the cluster has containers repeatedly failing, the problem will resurface quickly. This is because every container failure creates a new folder and FabricDCA.exe will start to bloat again. If this is the case try to fix the errors with the application causing the container failures. 29 | -------------------------------------------------------------------------------- /Known_Issues/Microsoft.ServiceFabricMesh registration is stuck.md: -------------------------------------------------------------------------------- 1 | # Microsoft.ServiceFabricMesh registration is stuck 2 | 3 | ## Issue 4 | Customers using terraform see Microsoft.ServiceFabricMesh provider get stuck during registration. This is because ServiceFabricMesh Resource Provider is still within the list of available providers but it has been deprecated, read more about this [here](https://azure.microsoft.com/en-us/updates/azure-service-fabric-mesh-preview-retirement/). ServiceFabricMesh was part of the list of default providers to register from Terraform. 5 | 6 | ## Impact 7 | Customers that use terraform usually are stuck from deployments because of this error. They should not be trying to manually register this RP either. 8 | 9 | ## Symptoms 10 | - When deploying using terraform customers will get the following error: 11 | 12 | ``` 13 | Cannot register provider Microsoft.ServiceFabricMesh with Azure Resource Manager: resources.ProvidersClient 14 | ``` 15 | 16 | - In the Azure Portal the process of registering/unregistering Microsoft.ServiceFabricMesh gets stuck in "registering/unregistering" status: 17 | 18 | ![image](https://github.com/dbucce/Service-Fabric-Troubleshooting-Guides/assets/50681801/8a20f940-e9ba-404c-9909-c8fd1796e374) 19 | 20 | - Timeout errors from portal when trying to register/unregister RP: 21 | 22 | ``` 23 | 'Unregister' operation check timed out on Resource Provider 'microsoft.servicefabricmesh', please refresh resource providers list to check for registration status 24 | ``` 25 | 26 | ## Mitigation 27 | 28 | To mitigate, customers should use azurerm provider versions v3.41.0 or later. Terraform has taken out the ServiceFabricMesh provider from the providers list for these newer versions. 29 | 30 | **Steps**: 31 | 32 | Update the azurerm provider version in the terraform template 33 | 34 | ``` 35 | terraform { 36 | required_providers { 37 | azurerm = { 38 | ... 39 | version = "=3.41.0" 40 | } 41 | } 42 | } 43 | ``` 44 | -------------------------------------------------------------------------------- /Known_Issues/Nodes FabricDCA DataCollectionAgent.DiskSpaceAvailable.md: -------------------------------------------------------------------------------- 1 | # 'FabricDCA' reported Warning for property 'DataCollectionAgent.DiskSpaceAvailable' 2 | 3 | [Issue](#Issue) 4 | [Health State](#Health-State) 5 | [Description](#Description) 6 | [Cause](#Cause) 7 | [Mitigation](#Mitigation) 8 | [Resolution](#Resolution) 9 | [Reference](#Reference) 10 | 11 | ## Issue 12 | 13 | 'FabricDCA' reported Warning for property 'DataCollectionAgent.DiskSpaceAvailable'. 14 | 15 | ## Health State 16 | 17 | Warning 18 | 19 | ## Description 20 | 21 | ```text 22 | 'FabricDCA' reported Warning for property 'DataCollectionAgent.DiskSpaceAvailable'. 23 | The Data Collection Agent (DCA) does not have enough disk space to operate. Diagnostics information will be left uncollected if this continues to happen. 24 | ``` 25 | 26 | ## Cause 27 | 28 | There is not enough free disk space on drive where %FabricDataRoot% (typically d:\ (temp) drive in Azure) is located. 29 | This issue can have multiple causes: 30 | 31 | - nodetype sku too small 32 | - application design 33 | - application deployment 34 | - application data 35 | - application versions 36 | - application logging 37 | - fabric logging 38 | - fabric exceptions 39 | - container logging 40 | - code issues 41 | 42 | **NOTE: Service Fabric will always have an 8GB file named replicatorshared.log in %FabricDataRoot% ("D:\SvcFab\ReplicatorLog\replicatorshared.log").** 43 | 44 | ## Mitigation 45 | 46 | Depending on cause, there are different actions to mitigate issue. 47 | These steps may be necessary before resolving issue if cluster is not functioning. 48 | To Determine cause of issue, use [Out Of Diskspace](../Cluster/Out of Diskspace.md) to troubleshoot. 49 | 50 | Depending on cause, to temporarily resolve: 51 | 52 | - multiple application versions - delete any unneeded applications and application versions from imagestore. 53 | 54 | - logging - delete any .etl, .trace, .blg, .log , .err, .out, or .zip files from %FabricLogRoot% (typically d:\SvcFab\Log) and subdirectories. NOTE: this may limit RCA from Microsoft Support. 55 | 56 | - exceptions - delete any .dmp files from %FabricLogRoot% (typically d:\SvcFab\Log) and subdirectories. NOTE: this may limit RCA from Microsoft Support. 57 | 58 | ## Resolution 59 | 60 | Ensure you are on a supported version of Service Fabric. 61 | Ensure nodetype is sized correctly for load and application types. 62 | For production workloads a minimum size of 50 GB temp drive is recommended. 63 | See [Reference](#Reference). 64 | 65 | Depending on cause, there are different actions to resolve issue. 66 | To Determine cause of issue, use [Out Of Diskspace](../Cluster/Out of Diskspace.md) to troubleshoot. 67 | 68 | ## Reference 69 | 70 | https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-versions 71 | https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-capacity 72 | https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-capacity-planning 73 | 74 | -------------------------------------------------------------------------------- /Known_Issues/Nodes FabricNode Certificate Expiration.md: -------------------------------------------------------------------------------- 1 | # FabricNode Certificate Expiration 2 | 3 | [Issue](#Issue) 4 | [Health State](#Health-State) 5 | [Description](#Description) 6 | [Cause](#Cause) 7 | [Mitigation](#Mitigation) 8 | [Resolution](#Resolution) 9 | [Reference](#Reference) 10 | 11 | ## Issue 12 | 13 | Service Fabric Cluster Nodes Health Events Warnings for cluster, server, and client certificates. 14 | 15 | ## Health State 16 | 17 | Warning 18 | 19 | ## Description 20 | 21 | ```text 22 | System.FabricNode Certificate_cluster Mon, 11 May 2020 08:43:09 GMT Infinity 132336601893388932 false false 23 | Certificate expiration: thumbprint = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, 24 | expiration = 2020-06-07 18:22:33.000, 25 | remaining lifetime is 27:9:39:23.661, 26 | please refresh ahead of time to avoid catastrophic failure. 27 | Warning threshold Security/CertificateExpirySafetyMargin is configured at 30:0:00:00.000, 28 | if needed, you can adjust it to fit your refresh process. 29 | ``` 30 | 31 | ## Cause 32 | 33 | A warning will be displayed when certificate expiration time is below threshold. Default threshold is 30 days. 34 | 35 | **NOTE: It is critical to update certificate before expiration and to allow a time buffer in case update fails.** 36 | 37 | ## Mitigation 38 | 39 | If the certificate has not yet expired, [renew certificate](https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-rollover-cert-cn) before expiration. 40 | 41 | If the certificate is self-signed and near expiration and due to health issues the certificate rollover cannot be completed, you may be able to enable the security setting [AcceptExpiredPinnedClusterCertificate](https://github.com/Azure/Service-Fabric-Troubleshooting-Guides/blob/master/Security/How%20to%20recover%20from%20an%20Expired%20Cluster%20Certificate.md) to allow continued access to cluster after expiration. 42 | 43 | ## Resolution 44 | 45 | Renew certificate. 46 | If certificate is expired, see: 47 | 48 | [Fix Expired Cluster Certificate Automated Script](https://github.com/Azure/Service-Fabric-Troubleshooting-Guides/blob/master/Security/Fix%20Expired%20Cluster%20Certificate%20Automated%20Script.md) 49 | [Fix Expired Cluster Certificate Manual Steps](https://github.com/Azure/Service-Fabric-Troubleshooting-Guides/blob/master/Security/Fix%20Expired%20Cluster%20Certificate%20Manual%20Steps.md) 50 | [How to recover from an Expired Cluster Certificate](https://github.com/Azure/Service-Fabric-Troubleshooting-Guides/blob/master/Security/How%20to%20recover%20from%20an%20Expired%20Cluster%20Certificate.md) 51 | 52 | ## Reference 53 | 54 | https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-security-update-certs-azure 55 | https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-security 56 | https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-rollover-cert-cn 57 | https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-change-cert-thumbprint-to-cn 58 | https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-connect-to-secure-cluster 59 | -------------------------------------------------------------------------------- /Known_Issues/Nodes unhealthy due to a FabricDCA exception.md: -------------------------------------------------------------------------------- 1 | # Node(s) unhealthy due to FabricDCA exception in SourceId='FabricDCA', Property='DataCollectionAgent' 2 | 3 | A known issue with the Data Collection Agent has been idetentified which can potentially degrade Node health in a Service Fabric cluster. During initialization of the Service Fabric agents, DCA may fail to create the agents for some legitimate reasons. If this happens DCA raises the health warning that you see and goes on with rest of the processing. The Service Fabric team is working to improve this experience in a future version, however currently the only workaround is to restart the Fabric services on these nodes manually. 4 | 5 | ![FabricDCA_001](../media/FabricDCA001.png) 6 | 7 | ## Symptoms 8 | - Unhealthy Nodes 9 | - Can cause upgrades to fail 10 | 11 | Error event: SourceId='FabricDCA', Property='DataCollectionAgent'. 12 | Exception occured while creating an object of type FabricDCA.AzureBlobEtwCsvUploader (assembly AzureFileUploader, Version=6.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35) for creating consumer AzureBlobServiceFabricEtw. 13 | Exception occured while creating an object of type FabricDCA.AzureTableQueryableEventUploader (assembly AzureTableUploader, Version=6.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35) for creating consumer AzureTableServiceFabricEtwQueryable. 14 | Exception occured while creating an object of type FabricDCA.AzureTableOperationalEventUploader (assembly AzureTableUploader, Version=6.0.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35) for creating consumer AzureTableServiceFabricEtwOperational. 15 | 16 | ## Mitigation 17 | 18 | - Issue rolling restart for each affected node (in SFX) 19 | 20 | ![FabricDCA_002](../media/FabricDCA002.png) -------------------------------------------------------------------------------- /Known_Issues/Service Fabric 7.1 High CPU Fabric.exe One Node.md: -------------------------------------------------------------------------------- 1 | # Service Fabric 7.1 High CPU Fabric.exe One Node 2 | 3 | [Issue](#Issue) 4 | [Symptoms](#Symptoms) 5 | [Cause](#Cause) 6 | [Impact](#Impact) 7 | [Mitigation](#Mitigation) 8 | [Resolution](#Resolution) 9 | 10 | ## Issue 11 | 12 | Starting in version Service Fabric 7.1, you may experience high cpu on process fabric.exe on one node in the cluster. 13 | This applies to Service Fabric Runtime 7.1 versions prior to CU5, you can review the version number noted in [Service Fabric 7.1 CU5 Release Nodes](https://github.com/microsoft/service-fabric/blob/master/release_notes/Service-Fabric-71CU5-releasenotes.md). 14 | 15 | If you had previously applied the original mitigation please move to the [Mitigation](#Mitigation) section. 16 | 17 | 18 | ## Symptoms 19 | 20 | The node with high cpu will be 'primary' for 'Service fabric:/System/FailoverManagerService'. 21 | In Service Fabric support logs, there may be indications of this issue in following trace showing high transitions and iteration counts. 22 | Example trace message: 23 | 24 | ```json 25 | "Level": Informational, 26 | "Type": PLB.Searcher, 27 | "Text": Search of balancing completed with 1243000 total iterations and 538471 total transitions and 0 positive transitions, no better solution found, 28 | "NodeName": _NT_0, 29 | "FileType": fabric, 30 | ``` 31 | 32 | ## Cause 33 | 34 | A change in Placement and Loadbalancing calculations in earlier versions of Service Fabric release 7.1 introduced this issue. 35 | 36 | ## Impact 37 | 38 | This issue should not have any impact to cluster environment other than high cpu for fabric.exe on one node. 39 | 40 | ## Mitigation 41 | 42 | This issue was fixed in Service Fabric 7.1 CU5 and the mitigation settings should be removed if you have upgraded your cluster to 7.1 CU5 (or higher). 43 | 44 | Please remove the PlacementAndLoadBalancing setting and parameters form the fabricSettings section of the Service Fabric resource and Patch deployment using powershell or resources.azure.com. Refer to [Service Fabric Cluster Config Upgrade](https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-config-upgrade-azure) for modifying and deploying settings. 45 | 46 | ```json 47 | // "fabricSettings": [ 48 | // { 49 | // ... 50 | // }, 51 | { 52 | "name": "PlacementAndLoadBalancing", 53 | "parameters": [ 54 | { 55 | "name": "MovementPerPartitionPerRunLimit", 56 | "value": "0" 57 | }, 58 | { 59 | "name": "MovementPerPartitionPerRunLimitFallbackThreshold", 60 | "value": "-1" 61 | } 62 | ] 63 | } 64 | // ], 65 | 66 | ``` 67 | 68 | ## Resolution 69 | 70 | Upgrade the Service Fabric version of the Cluster to a version greater than or equal to 7.1 CU5 as listed here [Service Fabric 7.1 CU5 Release Nodes](https://github.com/microsoft/service-fabric/blob/master/release_notes/Service-Fabric-71CU5-releasenotes.md). -------------------------------------------------------------------------------- /Known_Issues/Service Fabric 8.2 Upgrade or Certificate Rotation Failure due to ImageStoreService Error.md: -------------------------------------------------------------------------------- 1 | # Service Fabric 8.2, Cluster certificate changes or Certificate Rollover May Fail due to ImageStoreService Error or Repair task(s) stuck in a restoring state 2 | 3 | ## Applies to 4 | - Clusters on [8.2 CU2](https://github.com/microsoft/service-fabric/blob/master/release_notes/Service_Fabric_ReleaseNotes_82CU2.md) (version 8.2.1486.9590) 5 | 6 | ## Symptoms 7 | 8 | - Cluster security - Adding new secondary certificate or modify existing cluster certificate configuration may cause Upgrade to get stuck on Upgrade Domain (UD) 0. 9 | - Connection authentication failures with error: FABRIC_E_SERVER_AUTHENTICATION_FAILED: CertificateNotMatched 10 | - 'fabric:/System/ImageStoreService' is in a 'Warning' or 'Error' state. 11 | - Some or all secondary replicas in ImageStoreService are down. 12 | - Service Fabric Explorer (SFX) Warning Event: 00000000-0000-0000-0000-000000003000 SafetyCheck: EnsurePartitionQuorum 13 | - SFX Error Event: 00000000-0000-0000-0000-000000003000 Partition is in quorum loss 14 | - Repair Task(s) are stuck in Restoring state after cert swap. 15 | 16 | ![](../media/sfx-imagestore-quorum-loss.png) 17 | 18 | ## Possible Mitigations 19 | 20 | One of the following mitigation can be applied 21 | 22 | - Option 1 - more complexity less impactful - [RDP](https://docs.microsoft.com/azure/service-fabric/service-fabric-cluster-remote-connect-to-azure-cluster-node) to nodes with 'Down' ImageStoreService partitions. Open TaskManager and right-click on FileStoreService.exe to terminate process. 23 | 24 | ![](../media/task-manager-filestoreservice-terminate.png) 25 | - Option 2 - less complexity more impactful - From SFX, restart each node with a down partition *one at a time* ensuring prior node restart is complete. 26 | 27 | ![](../media/sfx-node-restart.png) 28 | 29 | - Option 3 - If symptom match was with Repair Task(s) stuck in Restoring state, [RDP](https://docs.microsoft.com/azure/service-fabric/service-fabric-cluster-remote-connect-to-azure-cluster-node) to nodes hosting Primary replica of RepairManager service. Open TaskManager and right-click on RepairManagerService.exe to terminate process. 30 | 31 | ## Resolution 32 | 33 | - Fix for issue released in in Service Fabric version 8.2 CU21 (8.2.1571) [Microsoft Azure Service Fabric 8.2 Cumulative Update 2.1 Release Notes](https://github.com/microsoft/service-fabric/blob/master/release_notes/Service_Fabric_ReleaseNotes_82CU21.md). 34 | - This version is now available in all Azure regions. 35 | -------------------------------------------------------------------------------- /Known_Issues/Upgrade to Service Fabric 7.1 fails with certificate configuration errors.md: -------------------------------------------------------------------------------- 1 | # Application with dependency on wastorage.dll crashes on Service Fabric runtime 6.4.617.9590 2 | 3 | Upgrade to Service Fabric 7.1 fails and rolls back with application health errors trying to configure application certificates. 4 | 5 | ## Symptoms 6 | - Certificate configuration error is highlighted in a sample output below. The cause of the failure can be found in SFX under the details tab for the cluster. 7 | 8 | | Kind | Health State | Description | 9 | |---|---|---| 10 | | Applications | Error | 1% (1/100) applications are unhealthy. The evaluation tolerates 0% unhealthy applications. | 11 | | Application | Error | Application 'fabric:/Application' is in Error. | 12 | | Deployed Applications | Error | 100% (1/1) deployed applications are unhealthy. The evaluation tolerates 0% unhealthy deployed applications. | 13 | | Deployed Application | Error | Deployed application on node 'NodeName' is in Error. | 14 | | **Event** | **Error** | **'System.Hosting' reported Error for property 'Activation:1.0'. There was an error during activation.Failed to configure certificate permissions. Error E_FAIL.** | 15 | | DeployedServicePackages | Error | 100% (1/1) deployed service packages are unhealthy. | 16 | | DeployedServicePackage | Error | Service package for manifest 'ServicePkg' and service package activation ID '' is in Error. | 17 | | Event | Error | | 18 | | | | | 19 | 20 | 21 | ## Root Cause Analysis 22 | - All certificates specified for endpoints in the Application manifest are configured to give services access to these certificates. If service fabric does not find the certificate on the node, it will fail the activation process 23 | - Prior to 7.1, if a certificate was not found, the error was ignored and caused failures later in the activation process with a cryptic error. Starting from 7.1, all certificates mentioned in the Application Manifest are required to be on all the nodes where the application is deployed, independent of whether the certificate is used by the services. 24 | 25 | ## Possible Mitigations 26 | 27 | One of the following mitigation can be applied 28 | 29 | 1. Install all certificates referenced in Application manifests on all of the nodetypes where the Application package can be deployed. 30 | 2. Provision any certificates required on the nodes. Modify Application manifest to remove references to certificates not required by application services and redeploy the application package. 31 | 32 | After applying one of the above mitigation, retry upgrade to 7.1 33 | 34 | 35 | ## Additional information 36 | 37 | -------------------------------------------------------------------------------- /LICENSE-CODE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | Copyright (c) Microsoft Corporation 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and 5 | associated documentation files (the "Software"), to deal in the Software without restriction, 6 | including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, 7 | and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, 8 | subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all copies or substantial 11 | portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT 14 | NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 15 | IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, 16 | WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 17 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Service Fabric Troubleshooting Guides 3 | This is a public repository for all of Service Fabric's Troublesooting guides, and is intended to provide a central location for community driven troubleshooting content. This is the material that is reference by Customer Support Services when a ticket is created, by Service Fabric Site Reliability Engineers responding to an incident, and by users when self discoverying resolutions to active system issues. 4 | 5 | ## Table of Contents 6 | Troubleshooting guides are grouped by categories, and stored in relevantly named subdirectories; each directory containing a README the lists the commonly used and exposed guides through portal as recommendations during ticket creation process. The following are the categories of guides that are stored in reletantly named directories: 7 | 8 | * [Security](./Security/README.md) - Certificates, KeyVault, Azure Active Directory, Permissions. 9 | * [Cluster](./Cluster/README.md) - Scaling, Deployments, Nodes, Patch Orchestration. 10 | * [Deployment](./Deployment/README.md) - SF Internal Components During Deployment 11 | 12 | 13 | ## Contributing 14 | 15 | This project welcomes contributions and suggestions. Most contributions require you to agree to a 16 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us 17 | the rights to use your contribution. For details, visit https://cla.microsoft.com. 18 | 19 | When you submit a pull request, a CLA-bot will automatically determine whether you need to provide 20 | a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions 21 | provided by the bot. You will only need to do this once across all repos using our CLA. 22 | 23 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). 24 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or 25 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. 26 | 27 | ## Legal Notices 28 | 29 | Microsoft and any contributors grant you a license to the Microsoft documentation and other content 30 | in this repository under the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode), 31 | see the [LICENSE](LICENSE) file, and grant you a license to any code in the repository under the [MIT License](https://opensource.org/licenses/MIT), see the 32 | [LICENSE-CODE](LICENSE-CODE) file. 33 | 34 | Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation 35 | may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. 36 | The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. 37 | Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653. 38 | 39 | Privacy information can be found at https://privacy.microsoft.com/en-us/ 40 | 41 | Microsoft and any contributors reserve all others rights, whether under their respective copyrights, patents, 42 | or trademarks, whether by implication, estoppel or otherwise. 43 | -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | ## Security 4 | 5 | Microsoft takes the security of our software products and services seriously, which includes all source code repositories managed through our GitHub organizations, which include [Microsoft](https://github.com/Microsoft), [Azure](https://github.com/Azure), [DotNet](https://github.com/dotnet), [AspNet](https://github.com/aspnet), [Xamarin](https://github.com/xamarin), and [our GitHub organizations](https://opensource.microsoft.com/). 6 | 7 | If you believe you have found a security vulnerability in any Microsoft-owned repository that meets [Microsoft's definition of a security vulnerability](https://docs.microsoft.com/en-us/previous-versions/tn-archive/cc751383(v=technet.10)), please report it to us as described below. 8 | 9 | ## Reporting Security Issues 10 | 11 | **Please do not report security vulnerabilities through public GitHub issues.** 12 | 13 | Instead, please report them to the Microsoft Security Response Center (MSRC) at [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report). 14 | 15 | If you prefer to submit without logging in, send email to [secure@microsoft.com](mailto:secure@microsoft.com). If possible, encrypt your message with our PGP key; please download it from the [Microsoft Security Response Center PGP Key page](https://www.microsoft.com/en-us/msrc/pgp-key-msrc). 16 | 17 | You should receive a response within 24 hours. If for some reason you do not, please follow up via email to ensure we received your original message. Additional information can be found at [microsoft.com/msrc](https://www.microsoft.com/msrc). 18 | 19 | Please include the requested information listed below (as much as you can provide) to help us better understand the nature and scope of the possible issue: 20 | 21 | * Type of issue (e.g. buffer overflow, SQL injection, cross-site scripting, etc.) 22 | * Full paths of source file(s) related to the manifestation of the issue 23 | * The location of the affected source code (tag/branch/commit or direct URL) 24 | * Any special configuration required to reproduce the issue 25 | * Step-by-step instructions to reproduce the issue 26 | * Proof-of-concept or exploit code (if possible) 27 | * Impact of the issue, including how an attacker might exploit the issue 28 | 29 | This information will help us triage your report more quickly. 30 | 31 | If you are reporting for a bug bounty, more complete reports can contribute to a higher bounty award. Please visit our [Microsoft Bug Bounty Program](https://microsoft.com/msrc/bounty) page for more details about our active programs. 32 | 33 | ## Preferred Languages 34 | 35 | We prefer all communications to be in English. 36 | 37 | ## Policy 38 | 39 | Microsoft follows the principle of [Coordinated Vulnerability Disclosure](https://www.microsoft.com/en-us/msrc/cvd). 40 | 41 | -------------------------------------------------------------------------------- /Scripts/Add_New_Cert_To_VMSS.ps1: -------------------------------------------------------------------------------- 1 | # [AzureRM.ServiceFabric module], latest available @ https://www.powershellgallery.com/packages/AzureRM.ServiceFabric/0.3.8 2 | # 3 | #These new PowerShell commands are the preferred method to add/remove or manage certificates in the cluster 4 | # Cmdlet          Add-AzureRmServiceFabricApplicationCertificate     0.2.0      AzureRM.ServiceFabric 5 | # Cmdlet          Add-AzureRmServiceFabricClientCertificate          0.2.0      AzureRM.ServiceFabric 6 | # Cmdlet          Add-AzureRmServiceFabricClusterCertificate         0.2.0      AzureRM.ServiceFabric 7 | # Cmdlet          Remove-AzureRmServiceFabricClientCertificate       0.2.0      AzureRM.ServiceFabric 8 | # Cmdlet          Remove-AzureRmServiceFabricClusterCertificate      0.2.0      AzureRM.ServiceFabric 9 | # 10 | # 11 | #The following is a PowerShell Script to Achieve this: 12 | # 13 | # For Windows Cluster this script should run as-is 14 | # For Linux Clusters, remove -CertificateStore "My" parameter from New-AzureRmVmssVaultCertificateConfig function 15 | # 16 | # Certificate Configuration 17 | # Couldn't add or renew certificate 18 | 19 | Param( 20 | [string] [Parameter(Mandatory=$true)] $KeyVaultResourceGroupName, 21 | [string] [Parameter(Mandatory=$true)] $VmssResourceGroupName, 22 | [string] [Parameter(Mandatory=$true)] $VaultName, 23 | [string] [Parameter(Mandatory=$true)] $VmssName, 24 | [string] [Parameter(Mandatory=$true)] $SubscriptionId 25 | ,[string] [Parameter(Mandatory=$true)] $CertificateUrl 26 | ) 27 | 28 | Set-StrictMode -Version 3 29 | 30 | $ErrorActionPreference = "Stop" 31 | 32 | # Login 33 | Login-AzureRmAccount -SubscriptionId $SubscriptionId 34 | $sourceVaultId = "/subscriptions/$SubscriptionId/resourceGroups/$KeyVaultResourceGroupName/providers/Microsoft.KeyVault/vaults/$VaultName" 35 | $sourceVaultId 36 | $certConfig = New-AzureRmVmssVaultCertificateConfig -CertificateUrl $CertificateUrl -CertificateStore "My" 37 | $certConfig 38 | # Get current vmss 39 | $vmss = Get-AzureRmVmss -ResourceGroupName $VmssResourceGroupName -VMScaleSetName $VmssName 40 | $vmss 41 | # add new secret 42 | $vmss = Add-AzureRmVmssSecret -VirtualMachineScaleSet $vmss -SourceVaultId $sourceVaultId -VaultCertificate $certConfig 43 | $vmss 44 | # update VMSS 45 | Update-AzureRmVmss -ResourceGroupName $VmssResourceGroupName -Name $VmssName -VirtualMachineScaleSet $vmss -Verbose 46 | -------------------------------------------------------------------------------- /Scripts/CreateKeyVaultAndCertificateForServiceFabric.ps1: -------------------------------------------------------------------------------- 1 | Param( 2 | [string] [Parameter(Mandatory=$true)] $SubscriptionId, 3 | [string] [Parameter(Mandatory=$true)] $Location, 4 | [string] [Parameter(Mandatory=$true)] $ResourceGroup, 5 | [string] [Parameter(Mandatory=$true)] $VaultName, 6 | [string] [Parameter(Mandatory=$true)] $CertificateName, 7 | [string] [Parameter(Mandatory=$true)] $CommonName 8 | ) 9 | 10 | Set-StrictMode -Version 3 11 | 12 | function Check-Session () { 13 | $Error.Clear() 14 | 15 | #if context already exist 16 | Get-AzureRmContext -ErrorAction Continue 17 | foreach ($eacherror in $Error) { 18 | if ($eacherror.Exception.ToString() -like "*Run Login-AzureRmAccount to login.*") { 19 | Login-AzureRmAccount 20 | } 21 | } 22 | 23 | $Error.Clear(); 24 | } 25 | 26 | $ErrorActionPreference = "Stop" 27 | 28 | Check-Session 29 | Select-AzureRmSubscription -SubscriptionId $subscriptionId -ErrorAction Stop 30 | 31 | New-AzureRmResourceGroup -Name $ResourceGroup -Location $location -Force 32 | 33 | if(!(Get-AzureRmResource -ResourceName $VaultName -ResourceGroupName $ResourceGroup)) { 34 | New-AzureRmKeyVault -VaultName $VaultName -ResourceGroupName $ResourceGroup -Location $Location -EnabledForDeployment 35 | } 36 | 37 | $policy = New-AzureKeyVaultCertificatePolicy -DnsName $CommonName -IssuerName Self -ValidityInMonths 12 38 | Add-AzureKeyVaultCertificate -VaultName $VaultName -Name $CertificateName -CertificatePolicy $policy 39 | 40 | Write-Host "operation complete" -------------------------------------------------------------------------------- /Scripts/Readme.md: -------------------------------------------------------------------------------- 1 | ## **Service Fabric Support Scripts** 2 | This repository contains scripts that are used by Microsoft Customer Service and Support (CSS) to aid in supporting and troubleshooting Service Fabric clusters. A goal of this repository is provide transparency to Azure customers who want to better understand the scripts being used and to make these scripts accessible for self-help scenarios outside of Microsoft support. 3 | 4 | These scripts are generally intended address a particular scenario and are not published as samples of best practices for inclusion in applications. 5 | 6 | - For code and scripting examples of best practice see [OneCode](http://aka.ms/onecodesamples), [OneScript](http://aka.ms/onescriptsamples), and the [Microsoft Azure Script Center](https://azure.microsoft.com/en-us/documentation/scripts/). 7 | - For documentation on how to build and deploy applications to Microsoft Azure, please see the [Microsoft Azure Documentation Center](https://azure.microsoft.com/en-us/documentation/). 8 | - For more information about support in Microsoft Azure see http://azure.microsoft.com/support 9 | 10 | ### **Requirements** 11 | Each script will describe its own dependencies for execution. Generally, you will need an [Azure subscription](https://azure.microsoft.com/en-us/pricing/) as well as the script environments and any tools used by the script you wish to execute. This may include: 12 | 13 | - Azure PowerShell: [How to install and configure Azure PowerShell](https://azure.microsoft.com/en-us/documentation/articles/powershell-install-configure/) 14 | - Azure Cli: [How to install and configure Azure Command-Line Interface (Cli)](https://azure.microsoft.com/en-us/documentation/articles/xplat-cli-install/) 15 | - Python: [Azure Python Developer Center](https://azure.microsoft.com/en-us/develop/python/) 16 | 17 | A great place to find such an environment is the [Azure Cloud Shell](https://azure.microsoft.com/en-us/features/cloud-shell/). 18 | 19 | ### **Find Your Way** 20 | The repo is organized by support scenario and each support topic contains a description of the articles in a [readme.md](../README.md). The scripts in this folder act as resources for various articles in this repo which provide details on how to use effectively use them. 21 | 22 | ### **Liability** 23 | As described in the [MIT license](../LICENSE-CODE), these scripts are provided as-is with no warranty or liability associated with their use. 24 | 25 | ### **Provide Feedback** 26 | We value your input. If you encounter problems with the scripts or ideas on how they can be improved please file an issue in the [Issues](https://github.com/Azure/Service-Fabric-Troubleshooting-Guides/issues) section of the project. 27 | 28 | ### **Known Issues** 29 | -------------------------------------------------------------------------------- /Scripts/Remove-Unreferenced-Replica-Files/README.md: -------------------------------------------------------------------------------- 1 | # Remove-Unreferenced-Replica-Files 2 | 3 | When a replica is force removed, corresponding checkpoint files are not removed from the file system. This tool removes leaked files (state manager, state provider and dedicated log files) given a node name. 4 | 5 | **Parameters:** 6 | 1. NodeName: Name of the node from where the leaked files are being removed. 7 | 2. Verbose: Displays unreferenced files corresponding to leaked replicas. 8 | 3. WhatIf: Shows what would happen if the cmdlet runs. 9 | 10 | **Examples:** 11 | - Deletes all leaked files. 12 | 13 | .\Remove-UnreferencedReplicaFiles.ps1 14 | -NodeName 15 | 16 | - Shows leaked replicas indicating files corresponding to it would be deleted if the cmdlet runs. 17 | 18 | .\Remove-UnreferencedReplicaFiles.ps1 19 | -NodeName 20 | -WhatIf 21 | 22 | - Deletes and displays leaked files corresponding to the removed replicas. 23 | 24 | .\Remove-UnreferencedReplicaFiles.ps1 25 | -NodeName 26 | -Verbose 27 | -------------------------------------------------------------------------------- /Scripts/SetupAnonymousShare.ps1: -------------------------------------------------------------------------------- 1 | # 2 | # Example configuring a Standalone cluster Diagnostics Share (anonymous share) on non-domain joined nodes \\node1\DiagnosticsStore 3 | # 4 | # "diagnosticsStore": 5 | # { 6 | # "metadata": "Please replace the diagnostics file share with an actual file share accessible from all cluster machines. For example, \\\\machine1\\DiagnosticsStore.", 7 | # "dataDeletionAgeInDays": "21", 8 | # "storeType": "FileShare", 9 | # "connectionstring": "\\\\node1\\DiagnosticsStore" 10 | # }, 11 | # 12 | # Instructions: 13 | # 1. Execute this script on node1 to create and configure the share 14 | # 2. Update cluster diagnostics configuration (this step is required even if the connection string configured already) 15 | # a. edit ClusterConfig.X509.MultiMachine.json and increment the config version, e.g. "clusterConfigurationVersion": "1.0.1", 16 | # b. edit ClusterConfig.X509.MultiMachine.json and configure the diagnosticsStore connectionstring property, 17 | # c. start a configuration upgrade: Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath .\ClusterConfig.X509.MultiMachine.json 18 | # 19 | 20 | # enable Guest account 21 | net user guest /active:yes 22 | 23 | # Create our Shared Folder and Share Name 24 | $FolderPath = "c:\DiagnosticsShare" 25 | $ShareName = "DiagnosticsStore" 26 | 27 | If (!(TEST-PATH $FolderPath)) { 28 | New-Item -type directory -Path $FolderPath 29 | } 30 | 31 | # Configure ACL's to allow all anonymous users to have full control of the Share path 32 | $DiagShareAcl = Get-Acl -Path $FolderPath 33 | 34 | $colRightsEveryone = [System.Security.AccessControl.FileSystemRights]"FullControl" 35 | $permissionEveryone = "Everyone",$colRightsEveryone,"ContainerInherit,ObjectInherit","None","Allow" 36 | $accessRuleEveryone = New-Object System.Security.AccessControl.FileSystemAccessRule $permissionEveryone 37 | $DiagShareAcl.AddAccessRule($accessRuleEveryone) 38 | 39 | $colRightsEveryone = [System.Security.AccessControl.FileSystemRights]"FullControl" 40 | $permissionEveryone = "ANONYMOUS LOGON",$colRightsEveryone,"ContainerInherit,ObjectInherit","None","Allow" 41 | $accessRuleEveryone = New-Object System.Security.AccessControl.FileSystemAccessRule $permissionEveryone 42 | $DiagShareAcl.AddAccessRule($accessRuleEveryone) 43 | 44 | $colRightsEveryone = [System.Security.AccessControl.FileSystemRights]"FullControl" 45 | $permissionEveryone = "Guest",$colRightsEveryone,"ContainerInherit,ObjectInherit","None","Allow" 46 | $accessRuleEveryone = New-Object System.Security.AccessControl.FileSystemAccessRule $permissionEveryone 47 | $DiagShareAcl.AddAccessRule($accessRuleEveryone) 48 | 49 | $DiagShareAcl | Set-Acl $FolderPath 50 | 51 | # Share the folder with these specific users 52 | net share $ShareName=$FolderPath /grant:Administrators`,FULL /grant:Everyone`,FULL /grant:"Anonymous Logon"`,FULL /grant:Guest`,FULL 53 | 54 | # update local policy to enable anonymous access 55 | Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Control\LSA -Name EveryoneIncludesAnonymous -Value 1 56 | Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Services\LanManServer\Parameters -Name RestrictNullSessAccess -Value 0 57 | Set-ItemProperty -Path HKLM:\SYSTEM\CurrentControlSet\Services\LanManServer\Parameters -Name NullSessionShares -Value $ShareName 58 | -------------------------------------------------------------------------------- /Scripts/install-dotnet-48.ps1: -------------------------------------------------------------------------------- 1 | <# 2 | .SYNOPSIS 3 | example script to install dotnet 4.8 on virtual machine scaleset using custom script extension 4 | use custom script extension in ARM template 5 | save file to url that vmss nodes have access to during provisioning 6 | 7 | Microsoft Privacy Statement: https://privacy.microsoft.com/en-US/privacystatement 8 | 9 | MIT License 10 | 11 | Copyright (c) Microsoft Corporation. All rights reserved. 12 | 13 | Permission is hereby granted, free of charge, to any person obtaining a copy 14 | of this software and associated documentation files (the "Software"), to deal 15 | in the Software without restriction, including without limitation the rights 16 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 17 | copies of the Software, and to permit persons to whom the Software is 18 | furnished to do so, subject to the following conditions: 19 | 20 | The above copyright notice and this permission notice shall be included in all 21 | copies or substantial portions of the Software. 22 | 23 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 24 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 25 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 26 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 27 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 28 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 29 | SOFTWARE 30 | 31 | 32 | .NOTES 33 | v 1.0 34 | use: https://dotnet.microsoft.com/download to get download links 35 | 36 | .LINK 37 | [net.servicePointManager]::Expect100Continue = $true;[net.servicePointManager]::SecurityProtocol = [net.securityProtocolType]::Tls12; 38 | invoke-webRequest "https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/master/Scripts/install-dotnet-48.ps1" -outFile "$pwd\install-dotnet-48.ps1"; 39 | #> 40 | 41 | param( 42 | [switch]$restart 43 | ) 44 | 45 | $url = "https://go.microsoft.com/fwlink/?linkid=2088631" 46 | $registryPath = "HKLM:\SOFTWARE\Microsoft\NET Framework Setup\NDP\v4\Full" 47 | $installedVersion = [version]((Get-ItemProperty -Path $registryPath -Name Version).Version) 48 | $installedVersion 49 | 50 | if($installedVersion -ge [version]("4.8")) { 51 | write-host "dotnet 4.8 already installed" 52 | return 53 | } 54 | 55 | $path = "$psscriptroot\ndp48-x86-x64-allos-enu.exe" 56 | $path 57 | 58 | if(!(test-path $path)) { 59 | "Downloading [$url]`nSaving at [$path]" 60 | (new-object net.webClient).DownloadFile($url, $path) 61 | } 62 | 63 | $argumentList = "/q /log $psscriptroot\install.log" 64 | if (!$restart) { $argumentList += " /norestart" } 65 | 66 | Invoke-Command -ScriptBlock { Start-Process -FilePath $path -ArgumentList $argumentList -Wait -PassThru } 67 | Write-Host (Get-ItemProperty -Path $registryPath -Name Version).Version -------------------------------------------------------------------------------- /Security/Authentication Issue with AAD.md: -------------------------------------------------------------------------------- 1 | ## Symptom 2 | AAD Authentication fails on SFX (Service Fabric Explorer). 3 | 4 | According to the https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-cluster-creation-via-arm#assign-users-to-roles, users should be assigned to ‘ReadOnly’ or 'Admin' role, any other will be invalid. 5 | 6 | In this case CSS reviewed the traces and noticed where the user was assigned to ‘ReadWriteUser’ group, and therfore it breaks due to the wrong role assignment. Mitigate by fixing the user configuration to be either ReadOnly or Admin. 7 | 8 | | Date | Time | Type | Process | Thread | Text | 9 | |---|---|---|---|---|---| 10 | | 2018-5-11 | 00:58:11.740 | SystemFabric.AAD.Server | 980 | 3580 | Claim: name: xxxx xxxx | 11 | | 2018-5-11 | 00:58:11.740 | SystemFabric.AAD.Server | 980 | 3580 | Claim: nonce: 90b47fd2-eea2-45c7-9973-96bbb73e2f83 | 12 | | 2018-5-11 | 00:58:11.740 | SystemFabric.AAD.Server | 980 | 3580 | Claim: http://schemas.microsoft.com/identity/claims/objectidentifier: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | 13 | | 2018-5-11 | 00:58:11.740 | SystemFabric.AAD.Server | 980 | 3580 | Claim: onprem_sid: S-1-5-21-xxxxxxxxx-xxxxxxxxx-xxxxxxxxxx-xxxxxxxx | 14 | | 2018-5-11 | 00:58:11.740 | SystemFabric.AAD.Server | 980 | 3580 | Claim: http://schemas.microsoft.com/ws/2008/06/identity/claims/role: ReadWriteUser | 15 | | 2018-5-11 | 00:58:11.740 | General.Aad::ServerWrapper | 980 | 3580 | IsAdminRole failed: issuer=https://sts.windows.net/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/ audience=3c1beb77-e0ed-43d8-be02-569450b84d2f roleClaim=http://schemas.microsoft.com/ws/2008/06/identity/claims/role cert=https://login.microsoftonline.com/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx/federationmetadata/2007-06/federationmetadata.xml error=System.IdentityModel.Tokens.SecurityTokenValidationException: Invalid role: http://schemas.microsoft.com/ws/2008/06/identity/claims/role=ReadWriteUser at System.Fabric.AzureActiveDirectory.Server.ServerUtility.Validate(String expectedIssuer, String expectedAudience, String expectedRoleClaimKey, String expectedAdminRoleValue, String expectedUserRoleValue, String certEndpoint, Int64 certRolloverIntervalTicks, String jwt) at IsAdminRole(Char* expectedIssuer, Char* expectedAudience, Char* expectedRoleClaimKey, Char* expectedAdminRoleValue, Char* expectedUserRoleValue, Char* certEndpoint, Int64 certRolloverCheckIntervalTicks, Char* jwt, Boolean* isAdmin, Int32* expirationSeconds, Char* errorMessageBuffer, Int32 errorMessageBufferSize) | 16 | -------------------------------------------------------------------------------- /Security/Change the RDP password for VMSS.md: -------------------------------------------------------------------------------- 1 | ## Change the RDP password for your nodetype (VMSS) 2 | 3 | [How do I reset the password for VMs in my virtual machine scale set](https://docs.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-faq#how-do-i-reset-the-password-for-vms-in-my-virtual-machine-scale-set) 4 | -------------------------------------------------------------------------------- /Security/Create a New Self Signed Certificate.md: -------------------------------------------------------------------------------- 1 | ## Create a New Self Signed Certificate 2 | 3 | You can create self-signed certificates easily using the following PowerShell cmdlet 4 | 5 | ```PowerShell 6 | New-SelfSignedCertificate -NotBefore '2018-05-09' -NotAfter '2018-06-01' -DnsName www.domain-name.eastus.cloudapp.azure.com -CertStoreLocation Cert:\LocalMachine\My -Provider "Microsoft Strong Cryptographic Provider" -KeyExportPolicy ExportableEncrypted 7 | ``` 8 | 9 | Verify cert: 10 | ```Batch 11 | Certutil -v -store my 12 | ``` 13 | 14 | SF will not be able to extract and parse the certificate's private key if the certificate was created with an unsuitable CSP 15 | example: 16 | Provider = Microsoft Software Key Storage Provider 17 | 18 | To dump cert details: 19 | ```Batch 20 | Certutil -v -dump -store my 21 | Certutil -v -dump certfilename.pfx > output.txt 22 | ``` -------------------------------------------------------------------------------- /Security/DSC - ACL a certificate using Desired State Configuration.md: -------------------------------------------------------------------------------- 1 | ## DSC Example - ACL a certificate using Desired State Configuration 2 | 3 | 1. Create a new Self Signed cert -[CreateKeyVaultAndCertificateForServiceFabric.ps1](../Scripts/CreateKeyVaultAndCertificateForServiceFabric.ps1) -- or get one from CA 4 | 5 | 2. Add the cert to the VMSS - [Add new cert to VMSS](../Scripts/Add_New_Cert_To_VMSS.ps1) 6 | 7 | 3. Edit the attached DSC script, **change the Thumbprint** you want to ACL 8 | 9 | 4. Archive the file into a .ZIP format 10 | 11 | 5. Create a storage account or use an existing storage account to upload the DSC script, I used one of the storage accounts containing the VHD for the primary nodetype 12 | 13 | a. create a new container called 'scripts' 14 | 15 | ![Containers, scripts](../media/dsc_image001.jpg) 16 | 17 | 18 | 6. Upload the .zip file to the container 19 | 20 | > ![Upload zip to container](../media/dsc_image002.jpg) 21 | 22 | 23 | 7. [https://resources.azure.com ](https://resources.azure.com) 24 | 25 | a. Edit your VMSS and add a new VM Extension, update URL to your container and DSC script 26 | 27 | ```json 28 | { 29 | "properties": { 30 | "publisher": "Microsoft.Powershell", 31 | "type": "DSC", 32 | "typeHandlerVersion": "2.9", 33 | "autoUpgradeMinorVersion": true, 34 | "settings": { 35 | "configuration": { 36 | "url": "", 37 | "script": "SetCertificateACL_DSC.ps1", 38 | "function": "SetCertificatePermissions" 39 | } 40 | } 41 | }, 42 | "name": "Microsoft.Powershell.DSC" 43 | } 44 | ``` 45 | 46 | 8. Click PUT, and it should apply the NETWORK\_SERVICE ACL to the certs on all the nodes in the VMSS 47 | 48 | a. RDP to a node and review logs @ C:\\WindowsAzure\\Logs\\Plugins\\Microsoft.Powershell.DSC\\2.25.0.0 49 | 50 | b. Verify private-key permissions in certmgr for LocalMachine 51 | 52 | > ![Machine generated alternative text: 53 | > Permissions for ssl.westus.cloudapp.azure.com 54 | > Group or user names: 55 | > SYSTEM 56 | > NETWORK SERVICE 57 | > Administrators (sysOOOOOO\administrators 58 | > Permissions for NETWORK SERVICE 59 | > Special permissions 60 | > Full Control = Allow 61 | > Read = Allow 62 | > For special permissions or advanced settings.](../media/dsc_image003.png) 63 | -------------------------------------------------------------------------------- /Security/Determine Cert bound to a specific port.md: -------------------------------------------------------------------------------- 1 | ## Determine Cert bound to a specific port 2 | 3 | 4 | From command prompt: 5 | 6 | netsh http show sslcert 7 | 8 | 9 | Will show which certificate thumbprint is bound to the port 10 | 11 | 12 | ### SSL Certificate bindings: 13 | --- 14 | 15 | IP:port : 0.0.0.0:19080 16 | Certificate Hash : b24ac3bbf58709de716dbdce4ff31cb5f08aca19 17 | Application ID : {7f7f579c-89a9-412e-b4ef-1ac59cdf2f25} 18 | Certificate Store Name : My 19 | Verify Client Certificate Revocation : Disabled 20 | Verify Revocation Using Cached Client Certificate Only : Disabled 21 | Usage Check : Enabled 22 | Revocation Freshness Time : 0 23 | URL Retrieval Timeout : 0 24 | Ctl Identifier : (null) 25 | Ctl Store Name : (null) 26 | DS Mapper Usage : Disabled 27 | Negotiate Client Certificate : Disabled 28 | 29 | 30 | IP:port : 0.0.0.0:9001 31 | Certificate Hash : 85fb1cb077cd7c78789f5bb61a7a4f3b282008e3 32 | Application ID : {ba9bcb9f-58ac-4f6d-8e53-95f20f6811cd} 33 | Certificate Store Name : My 34 | Verify Client Certificate Revocation : Disabled 35 | Verify Revocation Using Cached Client Certificate Only : Disabled 36 | Usage Check : Enabled 37 | Revocation Freshness Time : 0 38 | URL Retrieval Timeout : 0 39 | Ctl Identifier : (null) 40 | Ctl Store Name : (null) 41 | DS Mapper Usage : Disabled 42 | Negotiate Client Certificate : Disabled 43 | -------------------------------------------------------------------------------- /Security/Download certificate from Keyvault in PFX or PEM or CER format.md: -------------------------------------------------------------------------------- 1 | ## Steps to download the pfx file from key vault on http://portal.azure.com 2 | 3 | 1. In Azure portal, go to the associated keyvault with cluster. 4 | 2. Select Certificates and click on the certificate being used. 5 | 3. You should see a screen with two options 'Download in CER format' and 'Download in PFX/PEM format' 6 | 4. Download the certificate in pfx/pem or cer format using the links above. 7 | -------------------------------------------------------------------------------- /Security/Failed to get the Certificates private key.md: -------------------------------------------------------------------------------- 1 | ## [Symptoms] 2 | Entries in the SF Traces and in the Microsoft-ServiceFabric Admin event logs: 3 | 4 | CryptAcquireCertificatePrivateKey failed. Error:0x80090014 5 | ---------------- 6 | Can't get private key filename for certificate. Error: 0x80090014 7 | ---------------- 8 | All tries to get private key filename failed. 9 | ---------------- 10 | Failed to get the Certificate's private key. Thumbprint:AzureServiceFabric-AnonymousClient. Error: E_FAIL 11 | ---------------- 12 | Can't find anonymous certificate. ErrorCode: E_FAIL 13 | ---------------- 14 | Error at AclAnonymousCertificate, ErrorCode E_FAIL 15 | 16 | ## [Analysis] 17 | 18 | * Checked into the nodes we can see the Certificate is present in all nodes and NetworkService account has Read rights on the Private Key: 19 | 20 | * The PID for the errors in the traces is for FabricFAS.exe 21 | 22 | Explanation according to PG: 23 | This AnonymousClient certificate is generated for unsecure clusters. We are aware of this issue and this shouldn’t affect functionality of any SF component and just shows up as warning in traces. 24 | 25 | In another words, you can safely ignore those warnings. 26 | -------------------------------------------------------------------------------- /Security/How to clean up Fabric firewall rules.md: -------------------------------------------------------------------------------- 1 | # How to clean up Fabric firewall rules 2 | 3 | ## **Steps** 4 | 5 | 1. Open Registry 6 | 7 | ```cmd 8 | regedit 9 | ``` 10 | 11 | 2. Export current firewall rules into file from below path. name file as firewallrules.reg 12 | 13 | ```regedit 14 | HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\SharedAccess\\Parameters\\FirewallPolicy\\FirewallRules 15 | ``` 16 | 17 | 3. Extract Fabric firewall rules 18 | 19 | ```cmd 20 | type firewallrules.reg \| findstr /V Fabric \> newfirewallrules.reg 21 | ``` 22 | 23 | 4. Open Registry and import filtered firewall rules back 24 | 25 | - Regedit 26 | - Rename current firewall reg key as firewallrulesold 27 | - import the newfirewallrules.reg file 28 | 29 | 5. Restart windows firewall 30 | 31 | - Open powershell in administrator mode 32 | 33 | 34 | ```PowerShell 35 | restart-service MpsSvc 36 | ``` 37 | 38 | - In case reboot required run following command ( killing lsass windows will kill all the process and will perform shutdown , use only when shutdown request is not going through) 39 | 40 | ```PowerShell 41 | taskkill /f /im lsass.exe & shutdown /r /t 1 42 | ``` 43 |   44 | 6. Check number of rules 45 | 46 | ```PowerShell 47 | (Get-NetFirewallRule).count   48 | ``` 49 |   50 | 51 |   52 | 53 |   54 | -------------------------------------------------------------------------------- /Security/Install intermediate certificates.md: -------------------------------------------------------------------------------- 1 | # Install intermediate certificates in a Service Fabric cluster 2 | 3 | Currently SF does not accept .CER or P7B certificates uploaded to keyvault. 4 | Missing intermediate can be installed to each nodetype using a [customscriptextension](https://blogs.technet.microsoft.com/stefan_stranger/2017/07/31/using-azure-custom-script-extension-to-execute-scripts-on-azure-vms/ "Examples of CustomScriptExtensions and DSC") to avoid SSL chain errors 5 | 6 | Upload the script and the intermediates to a storage account and add a CustomScriptExtension extension 7 | 8 | ```json 9 | "virtualMachineProfile": { 10 | "extensionProfile": { 11 | "extensions": [ 12 | { 13 | "type": "Microsoft.Compute/virtualMachines/extensions", 14 | "name": "InstallCertificates", 15 | 16 | "properties": { 17 | "publisher": "Microsoft.Compute", 18 | "type": "CustomScriptExtension", 19 | "typeHandlerVersion": "1.8", 20 | "autoUpgradeMinorVersion": true, 21 | "settings": { 22 | "fileUris": [ 23 | "https://examplestorage1.blob.core.windows.net/sfdeploy/certinst.ps1" 24 | ], 25 | "commandToExecute": "powershell.exe -ExecutionPolicy Unrestricted -File certinst.ps1" 26 | } 27 | } 28 | }, 29 | ``` 30 | 31 | Example contents of the certinst.ps1 PowerShell script 32 | 33 | ```PowerShell 34 | function Install-IntermediateCertificateFromUrl ($certurl) 35 | { 36 | $bytes = (Invoke-WebRequest $certurl -UseBasicParsing).Content 37 | 38 | $store = new-object System.Security.Cryptography.X509Certificates.X509Store "CA", "LocalMachine" 39 | $store.Open([System.Security.Cryptography.X509Certificates.OpenFlags]::ReadWrite) 40 | 41 | $cert = [System.Security.Cryptography.X509Certificates.X509Certificate2]$bytes 42 | $store.Add($cert) 43 | $store.Close() 44 | } 45 | 46 | #intermediate cert urls 47 | $cert1url = "https://examplestorage1.blob.core.windows.net/sfdeploy/abccag2.crt" 48 | $cert2url = "https://examplestorage1.blob.core.windows.net/sfdeploy/defcag2.crt" 49 | ## or download directly from the distribution points provided in Authority Info Access Extensions 50 | 51 | #$cert1url = "http://trust.certglobal.com/abccag2.crt" 52 | #$cert2url = "https://www.sample.nl/fileadmin/PKI/PKI_certifcaten/defcag2.crt" 53 | 54 | Install-IntermediateCertificateFromUrl $cert1url 55 | Install-IntermediateCertificateFromUrl $cert2url 56 | ``` -------------------------------------------------------------------------------- /Security/Intermediate Certificate.md: -------------------------------------------------------------------------------- 1 | # Intermediate Certificate 2 | 3 | ## **Symptom** 4 | 5 | During the Certificate Swap, customer noticed that intermediate certificate thumbprint is populating as secondary certificate, which cause the cluster upgrade operation to fail. 6 | 7 | ## **Resolution** 8 | 9 | This is a known issue with the AzureRM ServiceFabric powershell CmdLet, which fixed in latest [AzureRM.ServiceFabric 0.3.3](https://www.powershellgallery.com/packages/AzureRM.ServiceFabric/0.3.3)(To upgrade, please run **Install-Module -Name AzureRM.ServiceFabric -Force**). However if the rolling is completed, please suggest the customer to update the certificate manually from resource explorer. 10 | 11 | ![Machine generated alternative text: Add-AzureRmClusterCertificate command line and error message](../media/IntermediateCerts001.png) 12 | -------------------------------------------------------------------------------- /Security/PowerShell ARM Template Deployment - Swap certificates.md: -------------------------------------------------------------------------------- 1 | ## PowerShell ARM Template Deployment - How to Swap certificates using ARM deployment 2 | 3 | >> Note: This article is only showing how to SWAP certificates already deployed to the cluster, it does not detail how to create or deploy a new secondary which requires multiple deployments. 4 | >> 5 | >> **Full Steps include** 6 | >>* Create a new certificate and add to Key Vault 7 | >>* Deploy the new certificate to VMMS 8 | >>* Update ServiceFabric cluster resource with new Secondary certificate 9 | >>* Swap the certificate (this article) 10 | >>* Delete the old certificate 11 | >> 12 | >>Please see [Use Azure Resource Explorer to add the Secondary Certificate](./Use%20Azure%20Resource%20Explorer%20to%20add%20the%20Secondary%20Certificate.md) for details on those steps, which could be easily adapted to ARM template deployment. 13 | 14 | ## Swap the certificate 15 | 16 | 1. Export \"Automation Scripts\" for SF Cluster **Resource Group** from portal 17 | 18 | a. Resource Group \--\> Automation Script \--\> Download 19 | 20 | 2. Edit template.json and Swap the values of "thumbprint" and "thumbprintSecondary" properties in the VMMS resource 21 | 22 | ```json 23 | "virtualMachineProfile": { 24 | "osProfile": { 25 | ... 26 | "extensionProfile": { 27 | "extensions": [ 28 | { 29 | "properties": { 30 | "autoUpgradeMinorVersion": true, 31 | "settings": { 32 | ... swap thumbprints in the two certificate properties below 33 | "certificate": { 34 | "thumbprint": "8934E0494979684F2627EE382B5AD84A8FAD6823", 35 | "thumbprintSecondary": "16A2561C8C691B9C683DB1CA06842E7FA85F6726", 36 | "x509StoreName": "My" 37 | } 38 | }, 39 | "publisher": "Microsoft.Azure.ServiceFabric", 40 | "type": "ServiceFabricNode", 41 | "typeHandlerVersion": "1.0" 42 | }, 43 | "name": "wordcount_ServiceFabricNode" 44 | }, 45 | ``` 46 | 47 | 3. Swap the "thumbprint" property value in "certificate" and "certificateSecondary" for the ServiceFabric Cluster resource 48 | 49 | ```json 50 | "type": "Microsoft.ServiceFabric/clusters", 51 | ... 52 | "properties": { 53 | "provisioningState": "Succeeded", 54 | "clusterId": "d4556f3b-e496-4a46-9f20-3db88fecdf11", 55 | "clusterCodeVersion": "6.3.162.9494", 56 | "clusterState": "Ready", 57 | "managementEndpoint": "https://hughsftest.westus.cloudapp.azure.com:19080", 58 | "clusterEndpoint": "https://westus.servicefabric.azure.com/runtime/clusters/d4556f3b-e496-4a46-9f20-3db88fecdf11", 59 | "certificate": { 60 | "thumbprint": "8934E0494979684F2627EE382B5AD84A8FAD6823", 61 | "x509StoreName": "My" 62 | }, 63 | "certificateSecondary": { 64 | "thumbprint": "16A2561C8C691B9C683DB1CA06842E7FA85F6726", 65 | "x509StoreName": "my" 66 | } 67 | ``` 68 | * save the file 69 | 70 | 4. Edit parameters.json file and delete everything in the "parameters" property which default to null when exported 71 | 72 | ```json 73 | { 74 | "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#", 75 | "contentVersion": "1.0.0.0", 76 | "parameters": { 77 | } 78 | } 79 | ``` 80 | 81 | * save the file 82 | 83 | 84 | 5. Run .\\deploy.ps1 and deploy the template. If everything is correct it should work and swap certificates. 85 | 86 | -------------------------------------------------------------------------------- /Security/Removing a Secondary certificate with expiry date later than Primary certificate expiry date.md: -------------------------------------------------------------------------------- 1 | # Removing a Secondary certificate with expiry date later than Primary certificate expiry date 2 | 3 | *NOTE*: This article applies to ARM/Portal deployed environments and does not apply to other environments. 4 | 5 | There may be situations where it is necessary to remove the secondary certificate (or even the primary), and which is valid longer than the remaining certificate. SF will by default pick the longest valid certificate of all existing matches. The normal procedure to remove the Secondary certificate will not work in this situation. 6 | 7 | ## Prerequisite 8 | 9 | 1. Cluster is deployed through ARM or Azure Portal 10 | 11 | 2. ARM template is available for ARM deployed clusters 12 | 13 | 3. For portal deployed clusters, access is available to the configuration for the cluster through [resources.azure.com](https://resources.azure.com) 14 | 15 | 4. Certificate is provisioned on the VMs through OsProfile.Secrets in VMSS configuration 16 | 17 | 5. Primary certificate is not expired and is valid for at least 15-30 days 18 | 19 | 6. Primary certificate is installed on every node of the cluster 20 | 21 | **DO NOT** proceed with the steps if any of the above prerequisites is not met. 22 | 23 | ## Tasks to remove the secondary certificate 24 | 25 | **NOTE:** Following assumes ARM template. When using [resources.azure.com](https://resources.azure.com), make the changes in the corresponding sections of the resource configurations. 26 | 27 | **DO NOT** take these steps if the primary certificate is expired, or if it is not installed on every node of the cluster. 28 | 29 | "_Certificate_" in rest of the steps below refers to the certificate the is being removed. 30 | 31 | ### Remove provisioning of certificate on VMs 32 | 33 | 1. Find and remove all references to the certificate in each of the VMSS resource deployed through the ARM template 34 | 35 | a. **WARNING**: At this point, do not make any change related to the certificate in descriptions for other resource types in the ARM template. 36 | 37 | 2. Deploy the ARM template with the change 38 | 39 | 3. Wait for the update to complete 40 | 41 | ### Make sure that SF has stopped using the removed certificate 42 | 43 | Following steps are required for any version up to and including 7.1.410.9590 44 | 45 | 1. In Azure portal, locate the VMSS associated with the cluster 46 | 47 | 2. Restart all instances of the VM one by one across all VMSS 48 | 49 | 3. Wait for VM to restart and become healthy in SF before restarting the next VM 50 | 51 | 4. If the VMSS and corresponding SF node have Silver or higher durability, then you can select all instances and issue the command to restart. Silver or higher durability ensures that VMs are restarted by Update Domain 52 | 53 | 5. Wait for all VMs to reboot and become healthy in SF 54 | 55 | ### Remove reference to certificate from Service Fabric resource description 56 | 57 | 1. In the ARM template locate and remove reference to the certificate 58 | 59 | 2. Deploy the ARM template 60 | 61 | 3. Once upgrade is complete for the cluster, SF cluster should no longer have a dependency on the certificate 62 | -------------------------------------------------------------------------------- /Security/SF unable to authenticate with primary certificate.md: -------------------------------------------------------------------------------- 1 | # Service Fabric unable to authenticate with primary certificate 2 | 3 | ## **Symptom** 4 | - After adding a new secondary certificate you are now unable to authenticate with the original Primary certificate 5 | - This is regardless if the older certificate is in the primary or secondary cert position 6 | - Event logs may show FABRIC_E_SERVER_AUTHENTICATION_FAILED: 0x80092012 7 | 8 | ## **Cause** 9 | - By design 10 | - Starting with Service fabric 5.7 a design change was made to help simplify the certificate rollover process. This changed the default behavior to automatically use the certificate with the furthest expiration (in the future) for authentication. 11 | 12 | ## **Resolution** 13 | - Use the certificate with furthest expiration date for authentication 14 | - Or you can revert to the old algorithm by using following command, which will cause the cluster to only use the primary certificate. To use the newly added certificate you will need to manually swap the secondary cert and primary cert. 15 | 16 | ```PowerShell 17 | Set-AzureRmServiceFabricSetting -ResourceGroupName rgname -Name clustername -Section "Security" -Parameter "UseSecondaryIfNewer" -Value "false" 18 | ``` 19 | 20 | ## **References** 21 | - 22 | 23 | ## **See Also** 24 | [Use Azure Resource Explorer to add the Secondary Certificate](https://github.com/Azure/Service-Fabric-Troubleshooting-Guides/blob/master/Security/Use%20Azure%20Resource%20Explorer%20to%20add%20the%20Secondary%20Certificate.md) 25 | 26 | 27 | Repro: 28 | 29 | ```PowerShell 30 | Connect-ServiceFabricCluster -ConnectionEndpoint sampleCluster.northeurope.cloudapp.azure.com:19000 -FindType FindByThumbprint -FindValue 967d398e239f79464b9a012345678901234567890 -X509Credential -ServerCertThumbprint 967d398e239f79464b9a012345678901234567890 -StoreLocation CurrentUser -StoreName My 31 | 32 | WARNING: Failed to contact Naming Service. Attempting to contact Failover Manager Service... 33 | WARNING: Failed to contact Failover Manager Service, Attempting to contact FMM... 34 | False 35 | Connect-ServiceFabricCluster : FABRIC_E_SERVER_AUTHENTICATION_FAILED: 0x80092012 36 | At line:1 char:1 37 | + Connect-ServiceFabricCluster -ConnectionEndpoint sampleCluster.northeu ... 38 | ``` 39 | -------------------------------------------------------------------------------- /Security/Securing Application Endpoint (ie. DoS DDoS prevention).md: -------------------------------------------------------------------------------- 1 | ## Securing Application Endpoint (ie. DoS DDoS prevention) 2 | 3 | The first thing to clarify is what you are trying to protect. Are you trying to protect your application endpoints, or are you trying to protect the Service Fabric Management endpoint (ie. Service Fabric Explorer)? Below assumes you are talking about your application endpoints since the management endpoint is secured and potential attackers couldn’t complete the initial handshake. 4 | 5 | 6 | For protecting your application this really isn’t a Service Fabric question per-se. Service Fabric manages the deployment, monitoring, failover, etc of your applications running in the cluster, but it has very little to do with HTTP communication coming into your application (other than doing the initial HTTP.sys binding). SF can’t help mitigate HTTP type attacks because it is not injected anywhere in the communication path between the client and your application. 7 | 8 | The protection you are looking for will come in a few layers: 9 | 10 | 1. Azure platform. We have various DoS protections built into the Azure platform itself. You can find information at https://azure.microsoft.com/en-us/blog/microsoft-azure-network-security-whitepaper-version-3-is-now-available/ and https://azure.microsoft.com/en-us/documentation/videos/azurecon-2015-building-secure-virtual-networks-in-azure/. 11 | 12 | 2. Web Application Firewall (WAF). Application Gateway provides an optional Web Application Firewall layer which adds automatic protection for the most common types of vulnerabilities. You can read more at https://azure.microsoft.com/en-us/documentation/articles/application-gateway-webapplicationfirewall-overview/. There are also 3rd party web application firewalls such as Barracuda or Kemp (ie. https://azure.microsoft.com/en-us/marketplace/partners/barracudanetworks/waf/). This type of protection can be implemented into your deployment template for your cluster, and it adds a lot of extra functionality, but keep in mind that it also adds extra cost and complexity. 13 | 14 | 3. Application layer. With your Web Roles you are hosted in IIS which provides a lot of easy configuration of the various request limits to mitigate slow HTTP POST, but with Service Fabric your web server is probably self-hosted OWIN. You will get a lot of good default values to protect against most attacks from http.sys and the HTTPListener, but you can check out https://blogs.msdn.microsoft.com/tilovell/2015/03/11/request-and-connection-throttling-when-self-hosting-with-owinhttplistener/ for more information and additional tweaks you can make. This type of protection would be built into your application itself rather than in the template during cluster deployment. 15 | -------------------------------------------------------------------------------- /Security/SecurityApi_CertGetCertificateChain - CTL accessibility - CRL slow warnings.md: -------------------------------------------------------------------------------- 1 | ## How to mitigate SecurityApi_CertGetCertificateChain health warning (CTL accessibility issue or CRL slow/offline) 2 | 3 | ## Assessment 4 | You can modify the threshold to mitigate such warnings about slow certificate chain validations or CRL lookup by setting SlowAPiThreshold value. However, while removing the warnings from SFX the performance issue may still persist. 5 | 6 | ```json 7 | { 8 | "name": "Security", 9 | "parameters": [ 10 | { 11 | "name": "SlowApiThreshold ", 12 | "value": "some larger value" 13 | } 14 | ] 15 | } 16 | ``` 17 | 18 | Reference: [Modify configuration setting for Security/SlowApiThreshold](https://github.com/Microsoft/service-fabric/issues/48) 19 | 20 | This warning may also be caused when a NSG has blocked certificate CRL and CTL access/download. 21 | 22 | ## Mitigation 23 | * If the machine does not have access to public Internet access, then the slowdown is most probably caused by downloading Windows CTL and disallowed certificates, add the following to fabricSettings (FabricSettings in cluster manifest xml) and do a configuration upgrade. If Security or Federation section already exists, then add the settings to the existing section instead of creating a new one. 24 | 25 | * JSON changes: 26 | ```json 27 | { 28 | "name": "Security", 29 | "parameters": [ 30 | { 31 | "name": "CrlCheckingFlag", 32 | "value": "4" 33 | } 34 | ] 35 | }, 36 | { 37 | "name": "Federation", 38 | "parameters": [ 39 | { 40 | "name": "X509CertChainFlags", 41 | "value": "4" 42 | } 43 | ] 44 | } 45 | ``` 46 | 47 | * ClusterManifest changes for customers that still use cluster manifest: 48 | 49 | ```xml 50 | 51 |
52 | 53 |
54 |
55 | 56 |
57 | ``` 58 | 59 | * If the above does not solve the issue, CRL downloading should be checked, if CRL downloading is the issue, add the following to disable CRL downloading: 60 | JSON changes: 61 | 62 | ```json 63 | { 64 | "name": "Security", 65 | "parameters": [ 66 | { 67 | "name": "CrlCheckingFlag", 68 | "value": "0x80000000" 69 | } 70 | ] 71 | }, 72 | { 73 | "name": "Federation", 74 | "parameters": [ 75 | { 76 | "name": "X509CertChainFlags", 77 | "value": "0x80000000" 78 | } 79 | ] 80 | } 81 | ``` 82 | 83 | * ClusterManifest changes for customers that still use cluster manifest: 84 | 85 | ```xml 86 | 87 |
88 | 89 |
90 |
91 | 92 |
93 | ``` 94 | 95 | * If both CTL and CRL downloading are slow or unavailable, then we need make the above config change with flag value set to 0x80000004 (OR of 4 and 0x80000000) 96 | -------------------------------------------------------------------------------- /Security/Set ACL for a SF certificate.md: -------------------------------------------------------------------------------- 1 | applys to Windows OS 2 | 3 | ## How to set ACL for a SF certificate 4 | 5 | For a cluster running on a local dev box you do that by finding the certificate either using certmgr.msc or the relevant mmc snap-in and then right click > All Tasks > Manage Private Keys and then giving read permissions to NETWORK SERVICE. 6 | 7 | For remote clusters in Azure, you can do that using a custom script extension on the VMMS of the scale set that will run a PowerShell script that sets up the permissions you want. For example, it could do something like the following: 8 | 9 | ```PowerShell 10 | $certificate = Get-ChildItem -Path Cert:\LocalMachine\My | Where-Object {$_.Thumbprint -eq $certificateThumbprint} 11 | 12 | # Get file path 13 | $certificateFilePath = "C:\Documents and Settings\All Users\Application Data\Microsoft\Crypto\RSA\MachineKeys\" + $cert.PrivateKey.CspKeyContainerInfo.UniqueKeyContainerName 14 | 15 | # Take ownership of the file so that permissions can be set 16 | takeown /F $certificateFilePath 17 | 18 | # Give the NETWORK SERVICE read permissions 19 | $acl = (Get-Item $certificateFilePath).GetAccessControl('Access') 20 | $rule = new-object System.Security.AccessControl.FileSystemAccessRule "NETWORK SERVICE","Read","Allow" 21 | $acl.SetAccessRule($rule) 22 | Set-Acl -Path $certificateFilePath -AclObject $acl 23 | ``` 24 | 25 | **** Refer this On Prem ACL Setting *** 26 | 27 | Install the certificates section:- [https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-windows-cluster-x509-security](https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-windows-cluster-x509-security) 28 | 29 | Now set the access control on this certificate so that the Service Fabric process, which runs under the Network Service account, can use it by running the following script. Provide the thumbprint of the certificate and "NETWORK SERVICE" for the service account. You can check that the ACLs on the certificate are correct by opening the certificate in Start -> Manage computer certificates, and looking at All Tasks-> Manage Private Keys. 30 | 31 | ```PowerShell 32 | param 33 | ( 34 | [Parameter(Position=1, Mandatory=$true)] 35 | [ValidateNotNullOrEmpty()] 36 | [string]$pfxThumbPrint, 37 | 38 | [Parameter(Position=2, Mandatory=$true)] 39 | [ValidateNotNullOrEmpty()] 40 | [string]$serviceAccount 41 | ) 42 | 43 | $cert = Get-ChildItem -Path cert:\LocalMachine\My | Where-Object -FilterScript { $PSItem.ThumbPrint -eq $pfxThumbPrint; } 44 | 45 | # Specify the user, the permissions and the permission type 46 | $permission = "$($serviceAccount)","FullControl","Allow" 47 | $accessRule = New-Object -TypeName System.Security.AccessControl.FileSystemAccessRule -ArgumentList $permission 48 | 49 | # Location of the machine related keys 50 | $keyPath = Join-Path -Path $env:ProgramData -ChildPath "\Microsoft\Crypto\RSA\MachineKeys" 51 | $keyName = $cert.PrivateKey.CspKeyContainerInfo.UniqueKeyContainerName 52 | $keyFullPath = Join-Path -Path $keyPath -ChildPath $keyName 53 | 54 | # Get the current acl of the private key 55 | $acl = (Get-Item $keyFullPath).GetAccessControl('Access') 56 | 57 | # Add the new ace to the acl of the private key 58 | $acl.SetAccessRule($accessRule) 59 | 60 | # Write back the new acl 61 | Set-Acl -Path $keyFullPath -AclObject $acl -ErrorAction Stop 62 | 63 | # Observe the access rights currently assigned to this certificate. 64 | get-acl $keyFullPath| fl 65 | ``` 66 | -------------------------------------------------------------------------------- /Security/StorageFirewall.md: -------------------------------------------------------------------------------- 1 | # How to provide access to diagnostic logs for Microsoft Customer Support? 2 | 3 | ## Why does the customer support need access? 4 | In order to troubleshoot most issues that are related to Service Fabric, Service Fabric engineering and customer support can, with your permission, download diagnostic logs that are uploaded to your Service Fabric diagnostic storage account. Microsoft may access (including making temporary copies of) the data in this diagnostic storage account to assist with resolving your support incident. 5 | 6 | ## Which files are captured and uploaded to the storage account? 7 | 8 | Learn more about the diagnostic files. 9 | 10 | Documentation: [Microsoft Azure Service Fabric Logs](https://learn.microsoft.com/en-us/troubleshoot/azure/general/fabric-logs) 11 | 12 | 13 | ## How to allow access to customer support? 14 | 15 | In order to secure access to the Azure Storage account, customer might configure the storage account by lock down the public access to it completely. 16 | 17 | To allow customer support to access the download from the storage account, the customer need to add a client IP address to allow access. This IP address can be one which is owned by Microsoft as part of the IP range which is restricted to users of Secure Admin Workstation. Please contact customer support to get more information. 18 | 19 | Documentation: [Grant access from an internet IP range](https://learn.microsoft.com/en-us/azure/storage/common/storage-network-security?tabs=azure-portal#grant-access-from-an-internet-ip-range) 20 | 21 | ![Screenshot from Azure portal about where to configure the IP address for client access.](../media/storagefirewall.jpg) 22 | -------------------------------------------------------------------------------- /Security/View Cluster Certificate.md: -------------------------------------------------------------------------------- 1 | ## How to view the Cluster certificate from the Browser 2 | 3 | When working with customers using secure clusters it can often be useful to see the certificate they are using to secure their clusters (ie. to make sure it has Server+Client authentication, or to check the subject name). There are two ways to do this 4 | 5 | Normally you can view the cert in whatever web browser you are using. Click the security icon next to the URL and then click Details and view the certificate: 6 | ![View Certificate in Browser](../media/viewcert_image001.png) 7 | 8 | 9 | 10 | If the web browser does not allow you to view the certificate you can run the following Powershell to save the .cer file: 11 | 12 | ```PowerShell 13 | $URL = "https://sedwest.westus.cloudapp.azure.com:19080/Explorer" 14 | $output = (New-TemporaryFile).FullName + ".cer" 15 | 16 | $webRequest = [Net.WebRequest]::Create($URL) 17 | try { $webRequest.GetResponse() } catch {} 18 | $cert = $webRequest.ServicePoint.Certificate 19 | $bytes = $cert.Export([Security.Cryptography.X509Certificates.X509ContentType]::Cert) 20 | Set-Content -value $bytes -encoding byte -path $output 21 | Write-Output "Saved file to " $output 22 | Invoke-Item $output 23 | ``` -------------------------------------------------------------------------------- /media/Autoscale001.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/Autoscale001.PNG -------------------------------------------------------------------------------- /media/BRS/BackupCallbackStuck.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/BRS/BackupCallbackStuck.png -------------------------------------------------------------------------------- /media/BRS/ReconfigStuck.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/BRS/ReconfigStuck.png -------------------------------------------------------------------------------- /media/ClusterNodeUnhealthy01.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/ClusterNodeUnhealthy01.PNG -------------------------------------------------------------------------------- /media/ClusterNodeUnhealthy02.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/ClusterNodeUnhealthy02.PNG -------------------------------------------------------------------------------- /media/ClusterUpgradeTimerStuck.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/ClusterUpgradeTimerStuck.png -------------------------------------------------------------------------------- /media/FabricBRS_001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/FabricBRS_001.png -------------------------------------------------------------------------------- /media/FabricDCA001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/FabricDCA001.png -------------------------------------------------------------------------------- /media/FabricDCA002.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/FabricDCA002.png -------------------------------------------------------------------------------- /media/FabricDCA003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/FabricDCA003.png -------------------------------------------------------------------------------- /media/Installing-dependencies-on-virtual-machine-scaleset/api-playground-get-response.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/Installing-dependencies-on-virtual-machine-scaleset/api-playground-get-response.png -------------------------------------------------------------------------------- /media/Installing-dependencies-on-virtual-machine-scaleset/api-playground-get.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/Installing-dependencies-on-virtual-machine-scaleset/api-playground-get.png -------------------------------------------------------------------------------- /media/Installing-dependencies-on-virtual-machine-scaleset/api-playground-patch-response.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/Installing-dependencies-on-virtual-machine-scaleset/api-playground-patch-response.png -------------------------------------------------------------------------------- /media/Installing-dependencies-on-virtual-machine-scaleset/api-playground-patch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/Installing-dependencies-on-virtual-machine-scaleset/api-playground-patch.png -------------------------------------------------------------------------------- /media/Installing-dependencies-on-virtual-machine-scaleset/resource-explorer-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/Installing-dependencies-on-virtual-machine-scaleset/resource-explorer-1.png -------------------------------------------------------------------------------- /media/Installing-dependencies-on-virtual-machine-scaleset/resource-explorer-copy-resource-uri.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/Installing-dependencies-on-virtual-machine-scaleset/resource-explorer-copy-resource-uri.png -------------------------------------------------------------------------------- /media/IntermediateCerts001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/IntermediateCerts001.png -------------------------------------------------------------------------------- /media/NodeDeactivationInfo1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/NodeDeactivationInfo1.png -------------------------------------------------------------------------------- /media/NodeReboot001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/NodeReboot001.jpg -------------------------------------------------------------------------------- /media/ROSExperimentalFeature.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/ROSExperimentalFeature.png -------------------------------------------------------------------------------- /media/SharedLogWriteThrottled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/SharedLogWriteThrottled.png -------------------------------------------------------------------------------- /media/SharedLogWriteUnthrottled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/SharedLogWriteUnthrottled.png -------------------------------------------------------------------------------- /media/azure-export-template.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/azure-export-template.png -------------------------------------------------------------------------------- /media/certlm-certificate-acl.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/certlm-certificate-acl.png -------------------------------------------------------------------------------- /media/certlm-manage-private-keys.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/certlm-manage-private-keys.png -------------------------------------------------------------------------------- /media/certlm1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/certlm1.png -------------------------------------------------------------------------------- /media/certlm2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/certlm2.png -------------------------------------------------------------------------------- /media/certswap_image1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/certswap_image1.png -------------------------------------------------------------------------------- /media/create-alert-signal-lx.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/create-alert-signal-lx.png -------------------------------------------------------------------------------- /media/create-alert-signal.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/create-alert-signal.png -------------------------------------------------------------------------------- /media/create-alert.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/create-alert.png -------------------------------------------------------------------------------- /media/create-notification-action-group.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/create-notification-action-group.png -------------------------------------------------------------------------------- /media/dsc_image001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/dsc_image001.jpg -------------------------------------------------------------------------------- /media/dsc_image002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/dsc_image002.jpg -------------------------------------------------------------------------------- /media/dsc_image003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/dsc_image003.png -------------------------------------------------------------------------------- /media/eventvwr-microsoft-service-fabric.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/eventvwr-microsoft-service-fabric.png -------------------------------------------------------------------------------- /media/eventvwr1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/eventvwr1.png -------------------------------------------------------------------------------- /media/git-aspnetcore-sample-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/git-aspnetcore-sample-1.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-assistant-arm-connection.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-assistant-arm-connection.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-assistant-arm-service-connection.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-assistant-arm-service-connection.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-assistant-arm-template-settings.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-assistant-arm-template-settings.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-assistant-arm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-assistant-arm.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-assistant.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-assistant.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-repo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-repo.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-yaml-review.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-yaml-review.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-yaml.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline-yaml.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-new-pipeline.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-run-pipeline-debug-download.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-run-pipeline-debug-download.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-run-pipeline-debug-variable-ui.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-run-pipeline-debug-variable-ui.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-run-pipeline-debug.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-run-pipeline-debug.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-run-pipeline-jobs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-run-pipeline-jobs.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-run-pipeline-permissions-warn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/ado-run-pipeline-permissions-warn.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/arm-portal-new-cluster-download-template.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/arm-portal-new-cluster-download-template.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/arm-portal-new-cluster-save-template.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-arm-deployments/arm-portal-new-cluster-save-template.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/ado-aad-common-connection.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/ado-aad-common-connection.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/ado-aad-thumprint-connection.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/ado-aad-thumprint-connection.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/ado-certificate-common-connection.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/ado-certificate-common-connection.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/ado-certificate-thumbprint-connection.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/ado-certificate-thumbprint-connection.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/ado-nsg-service-tag.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/ado-nsg-service-tag.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-cluster-app-api-permissions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-cluster-app-api-permissions.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-cluster-app-registration-users.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-cluster-app-registration-users.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-cluster-app-registration.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-cluster-app-registration.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-cluster-user-applications.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-cluster-user-applications.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-cluster-user-overview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-cluster-user-overview.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-sfc-security.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-cluster/portal-sfc-security.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-managed-cluster/sfmc-ado-pool-type.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-managed-cluster/sfmc-ado-pool-type.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-managed-cluster/sfmc-ado-service-connection.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-managed-cluster/sfmc-ado-service-connection.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-managed-cluster/sfmc-cluster-id.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-managed-cluster/sfmc-cluster-id.png -------------------------------------------------------------------------------- /media/how-to-configure-azure-devops-for-service-fabric-managed-cluster/sfmc-enable-aad.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-azure-devops-for-service-fabric-managed-cluster/sfmc-enable-aad.png -------------------------------------------------------------------------------- /media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-portal-log-analytics-add-storage-account-event-type.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-portal-log-analytics-add-storage-account-event-type.png -------------------------------------------------------------------------------- /media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-portal-log-analytics-add-storage-account-sf-event-type.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-portal-log-analytics-add-storage-account-sf-event-type.png -------------------------------------------------------------------------------- /media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-portal-log-analytics-add-storage-account.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-portal-log-analytics-add-storage-account.png -------------------------------------------------------------------------------- /media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-portal-log-analytics-search.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-portal-log-analytics-search.png -------------------------------------------------------------------------------- /media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-portal-storage-wad-tables.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-portal-storage-wad-tables.png -------------------------------------------------------------------------------- /media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-storage-explorer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-log-analytics-for-service-fabric-clusters/azure-storage-explorer.png -------------------------------------------------------------------------------- /media/how-to-configure-service-fabric-cluster-automatic-os-image-upgrade/sfx-infrastructure-task-autoosupgrade.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-service-fabric-cluster-automatic-os-image-upgrade/sfx-infrastructure-task-autoosupgrade.png -------------------------------------------------------------------------------- /media/how-to-configure-service-fabric-cluster-automatic-os-image-upgrade/sfx-repair-task-autoosupgrade.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-service-fabric-cluster-automatic-os-image-upgrade/sfx-repair-task-autoosupgrade.png -------------------------------------------------------------------------------- /media/how-to-configure-service-fabric-managed-cluster-automatic-os-image-upgrade/sfx-repair-task-infra-autoosupgrade.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-service-fabric-managed-cluster-automatic-os-image-upgrade/sfx-repair-task-infra-autoosupgrade.png -------------------------------------------------------------------------------- /media/how-to-configure-service-fabric-managed-cluster-automatic-os-image-upgrade/sfx-repair-task-sfrp-autoosupgrade.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-configure-service-fabric-managed-cluster-automatic-os-image-upgrade/sfx-repair-task-sfrp-autoosupgrade.png -------------------------------------------------------------------------------- /media/how-to-rotate-access-keys-of-storage-account-for-service-fabric-logs/sfx-eventstore-bad.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-rotate-access-keys-of-storage-account-for-service-fabric-logs/sfx-eventstore-bad.png -------------------------------------------------------------------------------- /media/how-to-rotate-access-keys-of-storage-account-for-service-fabric-logs/sfx-eventstore-good.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/how-to-rotate-access-keys-of-storage-account-for-service-fabric-logs/sfx-eventstore-good.png -------------------------------------------------------------------------------- /media/knownissue_container_dns_image001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/knownissue_container_dns_image001.png -------------------------------------------------------------------------------- /media/metric-explorer-virtual-machine-add-filter-lx.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/metric-explorer-virtual-machine-add-filter-lx.png -------------------------------------------------------------------------------- /media/metric-explorer-virtual-machine-add-filter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/metric-explorer-virtual-machine-add-filter.png -------------------------------------------------------------------------------- /media/metric-explorer-virtual-machine-apply-splitting-lx.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/metric-explorer-virtual-machine-apply-splitting-lx.png -------------------------------------------------------------------------------- /media/metric-explorer-virtual-machine-apply-splitting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/metric-explorer-virtual-machine-apply-splitting.png -------------------------------------------------------------------------------- /media/metric-explorer-virtual-machine-guest1-lx.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/metric-explorer-virtual-machine-guest1-lx.png -------------------------------------------------------------------------------- /media/metric-explorer-virtual-machine-guest1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/metric-explorer-virtual-machine-guest1.png -------------------------------------------------------------------------------- /media/metric-explorer-virtual-machine-guest2-lx.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/metric-explorer-virtual-machine-guest2-lx.png -------------------------------------------------------------------------------- /media/metric-explorer-virtual-machine-guest2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/metric-explorer-virtual-machine-guest2.png -------------------------------------------------------------------------------- /media/monitor-explorer-lx.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/monitor-explorer-lx.png -------------------------------------------------------------------------------- /media/monitor-explorer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/monitor-explorer.png -------------------------------------------------------------------------------- /media/mstsc-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/mstsc-1.png -------------------------------------------------------------------------------- /media/mstsc-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/mstsc-2.png -------------------------------------------------------------------------------- /media/mstsc-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/mstsc-3.png -------------------------------------------------------------------------------- /media/mstsc-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/mstsc-4.png -------------------------------------------------------------------------------- /media/mstsc-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/mstsc-5.png -------------------------------------------------------------------------------- /media/nsg01.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/nsg01.png -------------------------------------------------------------------------------- /media/oneseednode001.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/oneseednode001.PNG -------------------------------------------------------------------------------- /media/oneseednode002.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/oneseednode002.PNG -------------------------------------------------------------------------------- /media/oneseednode003.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/oneseednode003.PNG -------------------------------------------------------------------------------- /media/oneseednode004.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/oneseednode004.PNG -------------------------------------------------------------------------------- /media/outofdiskspace001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/outofdiskspace001.jpg -------------------------------------------------------------------------------- /media/outofdiskspace002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/outofdiskspace002.jpg -------------------------------------------------------------------------------- /media/outofdiskspace003.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/outofdiskspace003.jpg -------------------------------------------------------------------------------- /media/outofdiskspace004.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/outofdiskspace004.jpg -------------------------------------------------------------------------------- /media/outofdiskspace005.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/outofdiskspace005.jpg -------------------------------------------------------------------------------- /media/outofdiskspace006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/outofdiskspace006.jpg -------------------------------------------------------------------------------- /media/outofdiskspace007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/outofdiskspace007.jpg -------------------------------------------------------------------------------- /media/outofdiskspace008.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/outofdiskspace008.png -------------------------------------------------------------------------------- /media/perfmon-view1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/perfmon-view1.png -------------------------------------------------------------------------------- /media/perfmon-view2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/perfmon-view2.png -------------------------------------------------------------------------------- /media/phantomNode001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/phantomNode001.jpg -------------------------------------------------------------------------------- /media/portal-upgrade-policy1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/portal-upgrade-policy1.png -------------------------------------------------------------------------------- /media/resource-explorer-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resource-explorer-1.png -------------------------------------------------------------------------------- /media/resourcemgr1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr1.png -------------------------------------------------------------------------------- /media/resourcemgr10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr10.png -------------------------------------------------------------------------------- /media/resourcemgr11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr11.png -------------------------------------------------------------------------------- /media/resourcemgr12.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr12.png -------------------------------------------------------------------------------- /media/resourcemgr13.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr13.png -------------------------------------------------------------------------------- /media/resourcemgr16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr16.png -------------------------------------------------------------------------------- /media/resourcemgr2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr2.png -------------------------------------------------------------------------------- /media/resourcemgr3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr3.png -------------------------------------------------------------------------------- /media/resourcemgr4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr4.png -------------------------------------------------------------------------------- /media/resourcemgr5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr5.png -------------------------------------------------------------------------------- /media/resourcemgr6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr6.png -------------------------------------------------------------------------------- /media/resourcemgr7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr7.png -------------------------------------------------------------------------------- /media/resourcemgr8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr8.png -------------------------------------------------------------------------------- /media/resourcemgr9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resourcemgr9.png -------------------------------------------------------------------------------- /media/resources-azure-wadcfg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/resources-azure-wadcfg.png -------------------------------------------------------------------------------- /media/rpcertswap_image001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/rpcertswap_image001.png -------------------------------------------------------------------------------- /media/rpcertswap_image002.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/rpcertswap_image002.PNG -------------------------------------------------------------------------------- /media/rpcertswap_image003.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/rpcertswap_image003.PNG -------------------------------------------------------------------------------- /media/rpcertswap_image004.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/rpcertswap_image004.PNG -------------------------------------------------------------------------------- /media/seednodeauto03.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/seednodeauto03.PNG -------------------------------------------------------------------------------- /media/service-fabric-10.0-sdk-7.0.1816-installation-failure/installation-error.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-10.0-sdk-7.0.1816-installation-failure/installation-error.png -------------------------------------------------------------------------------- /media/service-fabric-9x-repair-job-stuck/sfx-9x-stateful-known-issue.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-9x-repair-job-stuck/sfx-9x-stateful-known-issue.png -------------------------------------------------------------------------------- /media/service-fabric-9x-repair-job-stuck/sfx-9x-stateful-known-issue2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-9x-repair-job-stuck/sfx-9x-stateful-known-issue2.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-add-source.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-add-source.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-counters.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-counters.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-custom-events.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-custom-events.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-dcr-review.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-dcr-review.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-review.2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-review.2.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-select-scope.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create-select-scope.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create.2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create.2.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-create.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-created.log.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-created.log.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-created.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-created.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-custom-events-destination.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/azure-monitor-dcr-custom-events-destination.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/log-analytics-sf-counter.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/log-analytics-sf-counter.png -------------------------------------------------------------------------------- /media/service-fabric-managed-cluster-monitoring-with-azure-monitor/log-analytics-sf-event-logs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/service-fabric-managed-cluster-monitoring-with-azure-monitor/log-analytics-sf-event-logs.png -------------------------------------------------------------------------------- /media/sfx-container-logs-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/sfx-container-logs-2.png -------------------------------------------------------------------------------- /media/sfx-imagestore-quorum-loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/sfx-imagestore-quorum-loss.png -------------------------------------------------------------------------------- /media/sfx-node-restart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/sfx-node-restart.png -------------------------------------------------------------------------------- /media/storage-account-access-keys.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/storage-account-access-keys.png -------------------------------------------------------------------------------- /media/storagefirewall.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/storagefirewall.jpg -------------------------------------------------------------------------------- /media/task-manager-filestoreservice-terminate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/task-manager-filestoreservice-terminate.png -------------------------------------------------------------------------------- /media/task-manager-user-context.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/task-manager-user-context.png -------------------------------------------------------------------------------- /media/taskscheduler1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/taskscheduler1.png -------------------------------------------------------------------------------- /media/template-cse-extension.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/template-cse-extension.png -------------------------------------------------------------------------------- /media/template-extension.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/template-extension.png -------------------------------------------------------------------------------- /media/template-wadcfg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/template-wadcfg.png -------------------------------------------------------------------------------- /media/tffu001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/tffu001.jpg -------------------------------------------------------------------------------- /media/tffu0010.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/tffu0010.jpg -------------------------------------------------------------------------------- /media/tffu002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/tffu002.jpg -------------------------------------------------------------------------------- /media/tffu003.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/tffu003.jpg -------------------------------------------------------------------------------- /media/tffu004.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/tffu004.jpg -------------------------------------------------------------------------------- /media/tffu005.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/tffu005.jpg -------------------------------------------------------------------------------- /media/tffu006.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/tffu006.jpg -------------------------------------------------------------------------------- /media/tffu007.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/tffu007.jpg -------------------------------------------------------------------------------- /media/tffu008.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/tffu008.jpg -------------------------------------------------------------------------------- /media/tffu009.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/tffu009.jpg -------------------------------------------------------------------------------- /media/twoseednode001.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/twoseednode001.PNG -------------------------------------------------------------------------------- /media/twoseednode002.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/twoseednode002.PNG -------------------------------------------------------------------------------- /media/twoseednode003.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/twoseednode003.PNG -------------------------------------------------------------------------------- /media/twoseednode004.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/twoseednode004.PNG -------------------------------------------------------------------------------- /media/twoseednode005.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/twoseednode005.PNG -------------------------------------------------------------------------------- /media/twoseednode006.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/twoseednode006.PNG -------------------------------------------------------------------------------- /media/unmanaged-disk-deprecation-guidance/azure-portal-managed-disk-disabled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/unmanaged-disk-deprecation-guidance/azure-portal-managed-disk-disabled.png -------------------------------------------------------------------------------- /media/upgrade-service-fabric-cluster-basic-load-balancer/sfx-cluster-events.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/upgrade-service-fabric-cluster-basic-load-balancer/sfx-cluster-events.png -------------------------------------------------------------------------------- /media/upgrade-service-fabric-cluster-basic-load-balancer/sfx-green.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/upgrade-service-fabric-cluster-basic-load-balancer/sfx-green.png -------------------------------------------------------------------------------- /media/upgradehistory001.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/upgradehistory001.jpg -------------------------------------------------------------------------------- /media/upgradehistory002.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/upgradehistory002.jpg -------------------------------------------------------------------------------- /media/viewcert_image001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/viewcert_image001.png -------------------------------------------------------------------------------- /media/vs-build-output-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/vs-build-output-1.png -------------------------------------------------------------------------------- /media/vs-program-cs-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/vs-program-cs-1.png -------------------------------------------------------------------------------- /media/vs-sf-container-solution-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/vs-sf-container-solution-1.png -------------------------------------------------------------------------------- /media/vs-sfx-container-log-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/vs-sfx-container-log-1.png -------------------------------------------------------------------------------- /media/vs-solution-add-orchestrator-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/vs-solution-add-orchestrator-1.png -------------------------------------------------------------------------------- /media/vs-solution-publish-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/vs-solution-publish-1.png -------------------------------------------------------------------------------- /media/vs-solution-publish-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/vs-solution-publish-2.png -------------------------------------------------------------------------------- /media/vs-solution-run-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/vs-solution-run-1.png -------------------------------------------------------------------------------- /media/vs-solution-save-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/vs-solution-save-1.png -------------------------------------------------------------------------------- /media/vs-web-add-orchestrator-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Azure/Service-Fabric-Troubleshooting-Guides/64c26454cfe3dd7d002b94df855459960286842a/media/vs-web-add-orchestrator-1.png --------------------------------------------------------------------------------