├── .github
└── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE.md
├── README.md
├── check-storage-usage
├── .gitignore
├── CheckStorageUsage.cs
├── Properties
│ ├── serviceDependencies.json
│ └── serviceDependencies.local.json
├── README.md
├── check-storage-usage.csproj
├── check-storage-usage.sln
├── host.json
└── local.settings.json
├── data-lake-gen2-acl-indexing
├── DataLakeGen2ACLIndexing.csproj
├── Program.cs
├── README.md
├── SampleData
│ ├── Files for Organization.txt
│ ├── Private
│ │ └── confidential.txt
│ ├── Shared Documents
│ │ └── public.txt
│ └── User Documents
│ │ ├── Alice
│ │ ├── alice-secret.txt
│ │ └── alice.txt
│ │ ├── Bob
│ │ ├── Reports
│ │ │ ├── a.txt
│ │ │ ├── b.txt
│ │ │ └── c.txt
│ │ ├── Sales
│ │ │ ├── d.txt
│ │ │ └── e.txt
│ │ └── bob.txt
│ │ └── John
│ │ ├── Documents
│ │ ├── a.txt
│ │ └── b.txt
│ │ └── john.txt
└── appsettings.json
├── export-data
├── .gitignore
├── README.md
├── Sample
│ ├── Configuration.cs
│ ├── Document.cs
│ ├── Program.cs
│ ├── Sample.csproj
│ └── local.settings-example.json
├── export-data.sln
├── export-data
│ ├── Bound.cs
│ ├── ContinuousExporter.cs
│ ├── Exporter.cs
│ ├── FilePartitionWriter.cs
│ ├── IPartitionWriter.cs
│ ├── Partition.cs
│ ├── PartitionExporter.cs
│ ├── PartitionFile.cs
│ ├── PartitionGenerator.cs
│ ├── Program.cs
│ ├── Util.cs
│ └── export-data.csproj
└── tests
│ ├── MockPartitionWriter.cs
│ ├── PartitionExporterTests.cs
│ ├── Usings.cs
│ ├── config.example.json
│ └── tests.csproj
├── index-backup-restore
├── .gitignore
├── README.md
├── v10
│ ├── AzureSearchBackupRestoreIndex.sln
│ └── AzureSearchBackupRestoreIndex
│ │ ├── AzureSearchBackupRestoreIndex.csproj
│ │ ├── AzureSearchHelper.cs
│ │ ├── Program.cs
│ │ └── appsettings.json
└── v11
│ ├── AzureSearchBackupRestoreIndex.sln
│ └── AzureSearchBackupRestoreIndex
│ ├── AzureSearchBackupRestoreIndex.csproj
│ ├── AzureSearchHelper.cs
│ ├── Program.cs
│ └── appsettings.json
└── search-aggregations
├── Program.cs
├── README.md
├── appsettings.json
└── search-aggregations.csproj
/.github/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Microsoft Open Source Code of Conduct
2 |
3 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
4 |
5 | Resources:
6 |
7 | - [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/)
8 | - [Microsoft Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/)
9 | - Contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with questions or concerns
10 |
--------------------------------------------------------------------------------
/CONTRIBUTING.md:
--------------------------------------------------------------------------------
1 | # Contributing to azure-search-dotnet-utilities
2 |
3 | This project welcomes contributions and suggestions. Most contributions require you to agree to a
4 | Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
5 | the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
6 |
7 | When you submit a pull request, a CLA bot will automatically determine whether you need to provide
8 | a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
9 | provided by the bot. You will only need to do this once across all repos using our CLA.
10 |
11 | This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
12 | For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
13 | contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
14 |
15 | - [Code of Conduct](#coc)
16 | - [Issues and Bugs](#issue)
17 | - [Feature Requests](#feature)
18 | - [Submission Guidelines](#submit)
19 |
20 | ## Code of Conduct
21 | Help us keep this project open and inclusive. Please read and follow our [Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
22 |
23 | ## Found an Issue?
24 | If you find a bug in the source code or a mistake in the documentation, you can help us by
25 | [submitting an issue](#submit-issue) to the GitHub Repository. Even better, you can
26 | [submit a Pull Request](#submit-pr) with a fix.
27 |
28 | ## Want a Feature?
29 | You can *request* a new feature by [submitting an issue](#submit-issue) to the GitHub
30 | Repository. If you would like to *implement* a new feature, please submit an issue with
31 | a proposal for your work first, to be sure that we can use it.
32 |
33 | * **Small Features** can be crafted and directly [submitted as a Pull Request](#submit-pr).
34 |
35 | ## Submission Guidelines
36 |
37 | ### Submitting an Issue
38 | Before you submit an issue, search the archive, maybe your question was already answered.
39 |
40 | If your issue appears to be a bug, and hasn't been reported, open a new issue.
41 | Help us to maximize the effort we can spend fixing issues and adding new
42 | features, by not reporting duplicate issues. Providing the following information will increase the
43 | chances of your issue being dealt with quickly:
44 |
45 | * **Overview of the Issue** - if an error is being thrown a non-minified stack trace helps
46 | * **Version** - what version is affected (e.g. 0.1.2)
47 | * **Motivation for or Use Case** - explain what are you trying to do and why the current behavior is a bug for you
48 | * **Browsers and Operating System** - is this a problem with all browsers?
49 | * **Reproduce the Error** - provide a live example or a unambiguous set of steps
50 | * **Related Issues** - has a similar issue been reported before?
51 | * **Suggest a Fix** - if you can't fix the bug yourself, perhaps you can point to what might be
52 | causing the problem (line of code or commit)
53 |
54 | You can file new issues by providing the above information at the corresponding repository's issues link: https://github.com/[organization-name]/[repository-name]/issues/new].
55 |
56 | ### Submitting a Pull Request (PR)
57 | Before you submit your Pull Request (PR) consider the following guidelines:
58 |
59 | * Search the repository (https://github.com/[organization-name]/[repository-name]/pulls) for an open or closed PR
60 | that relates to your submission. You don't want to duplicate effort.
61 |
62 | * Make your changes in a new git fork:
63 |
64 | * Commit your changes using a descriptive commit message
65 | * Push your fork to GitHub:
66 | * In GitHub, create a pull request
67 | * If we suggest changes then:
68 | * Make the required updates.
69 | * Rebase your fork and force push to your GitHub repository (this will update your Pull Request):
70 |
71 | ```shell
72 | git rebase master -i
73 | git push -f
74 | ```
75 |
76 | That's it! Thank you for your contribution!
77 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) Microsoft Corporation.
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # C# utility code samples for Azure AI Search
2 |
3 | This repository contains C# code samples that help you perform specific tasks, such as checking storage or exporting content from an index.
4 |
5 | ## In this repository
6 |
7 | | Sample | Description |
8 | |--------|-------------|
9 | | check-storage-usage | Checks storage usage of an Azure AI Search service on a schedule. You can modify this sample to [adjust the service's capacity](https://docs.microsoft.com/azure/search/search-capacity-planning) or send an alert when the storage usage exceeds a predefined threshold. |
10 | | data-lake-gen2-acl-indexing | Proof-of-concept console app that demonstrates how to index a subset of your Azure Data Lake Gen2 data by using access control lists to allow certain files and directories to be accessed by an indexer in Azure AI Search. The indexer connection to Azure Data Lake Gen2 uses a managed identity and role assignments for selective data access. The sample loads data and sets up permissions programmatically, and then runs the indexer to create and load a search index. |
11 | | export-data | A console application that exports data from an Azure AI Search service. |
12 | | index-backup-restore | A console app that backs up an index (schema and documents) to your local computer and then uses the stored backup to recreate the index in a target search service that you specify.|
13 | | search-aggregations | Proof-of-concept console app that demonstrates how aggregations can be computed from the random data, and how the data can be filtered using a query. |
14 |
15 | ## More resources
16 |
17 | + See [.NET samples in Azure AI Search](https://learn.microsoft.com/azure/search/samples-dotnet) for a comprehensive list of all Azure AI Search code samples that run on .NET.
18 |
19 | + See [Azure AI Search documentation](https://learn.microsoft.com/azure/search) for product documentation.
20 |
--------------------------------------------------------------------------------
/check-storage-usage/.gitignore:
--------------------------------------------------------------------------------
1 | ## Ignore Visual Studio temporary files, build results, and
2 | ## files generated by popular Visual Studio add-ons.
3 |
4 | # User-specific files
5 | *.suo
6 | *.user
7 | *.userosscache
8 | *.sln.docstates
9 |
10 | # User-specific files (MonoDevelop/Xamarin Studio)
11 | *.userprefs
12 |
13 | # Build results
14 | [Dd]ebug/
15 | [Dd]ebugPublic/
16 | [Rr]elease/
17 | [Rr]eleases/
18 | x64/
19 | x86/
20 | bld/
21 | [Bb]in/
22 | [Oo]bj/
23 | [Ll]og/
24 |
25 | # Visual Studio 2015 cache/options directory
26 | .vs/
27 | # Uncomment if you have tasks that create the project's static files in wwwroot
28 | #wwwroot/
29 |
30 | # MSTest test Results
31 | [Tt]est[Rr]esult*/
32 | [Bb]uild[Ll]og.*
33 |
34 | # NUNIT
35 | *.VisualState.xml
36 | TestResult.xml
37 |
38 | # Build Results of an ATL Project
39 | [Dd]ebugPS/
40 | [Rr]eleasePS/
41 | dlldata.c
42 |
43 | # DNX
44 | project.lock.json
45 | project.fragment.lock.json
46 | artifacts/
47 |
48 | *_i.c
49 | *_p.c
50 | *_i.h
51 | *.ilk
52 | *.meta
53 | *.obj
54 | *.pch
55 | *.pdb
56 | *.pgc
57 | *.pgd
58 | *.rsp
59 | *.sbr
60 | *.tlb
61 | *.tli
62 | *.tlh
63 | *.tmp
64 | *.tmp_proj
65 | *.log
66 | *.vspscc
67 | *.vssscc
68 | .builds
69 | *.pidb
70 | *.svclog
71 | *.scc
72 |
73 | # Chutzpah Test files
74 | _Chutzpah*
75 |
76 | # Visual C++ cache files
77 | ipch/
78 | *.aps
79 | *.ncb
80 | *.opendb
81 | *.opensdf
82 | *.sdf
83 | *.cachefile
84 | *.VC.db
85 | *.VC.VC.opendb
86 |
87 | # Visual Studio profiler
88 | *.psess
89 | *.vsp
90 | *.vspx
91 | *.sap
92 |
93 | # TFS 2012 Local Workspace
94 | $tf/
95 |
96 | # Guidance Automation Toolkit
97 | *.gpState
98 |
99 | # ReSharper is a .NET coding add-in
100 | _ReSharper*/
101 | *.[Rr]e[Ss]harper
102 | *.DotSettings.user
103 |
104 | # JustCode is a .NET coding add-in
105 | .JustCode
106 |
107 | # TeamCity is a build add-in
108 | _TeamCity*
109 |
110 | # DotCover is a Code Coverage Tool
111 | *.dotCover
112 |
113 | # NCrunch
114 | _NCrunch_*
115 | .*crunch*.local.xml
116 | nCrunchTemp_*
117 |
118 | # MightyMoose
119 | *.mm.*
120 | AutoTest.Net/
121 |
122 | # Web workbench (sass)
123 | .sass-cache/
124 |
125 | # Installshield output folder
126 | [Ee]xpress/
127 |
128 | # DocProject is a documentation generator add-in
129 | DocProject/buildhelp/
130 | DocProject/Help/*.HxT
131 | DocProject/Help/*.HxC
132 | DocProject/Help/*.hhc
133 | DocProject/Help/*.hhk
134 | DocProject/Help/*.hhp
135 | DocProject/Help/Html2
136 | DocProject/Help/html
137 |
138 | # Click-Once directory
139 | publish/
140 |
141 | # Publish Web Output
142 | *.[Pp]ublish.xml
143 | *.azurePubxml
144 | # TODO: Comment the next line if you want to checkin your web deploy settings
145 | # but database connection strings (with potential passwords) will be unencrypted
146 | #*.pubxml
147 | *.publishproj
148 |
149 | # Microsoft Azure Web App publish settings. Comment the next line if you want to
150 | # checkin your Azure Web App publish settings, but sensitive information contained
151 | # in these scripts will be unencrypted
152 | PublishScripts/
153 |
154 | # NuGet Packages
155 | *.nupkg
156 | # The packages folder can be ignored because of Package Restore
157 | **/packages/*
158 | # except build/, which is used as an MSBuild target.
159 | !**/packages/build/
160 | # Uncomment if necessary however generally it will be regenerated when needed
161 | #!**/packages/repositories.config
162 | # NuGet v3's project.json files produces more ignoreable files
163 | *.nuget.props
164 | *.nuget.targets
165 |
166 | # Microsoft Azure Build Output
167 | csx/
168 | *.build.csdef
169 |
170 | # Microsoft Azure Emulator
171 | ecf/
172 | rcf/
173 |
174 | # Windows Store app package directories and files
175 | AppPackages/
176 | BundleArtifacts/
177 | Package.StoreAssociation.xml
178 | _pkginfo.txt
179 |
180 | # Visual Studio cache files
181 | # files ending in .cache can be ignored
182 | *.[Cc]ache
183 | # but keep track of directories ending in .cache
184 | !*.[Cc]ache/
185 |
186 | # Others
187 | ClientBin/
188 | ~$*
189 | *~
190 | *.dbmdl
191 | *.dbproj.schemaview
192 | *.jfm
193 | *.pfx
194 | *.publishsettings
195 | node_modules/
196 | orleans.codegen.cs
197 |
198 | # Since there are multiple workflows, uncomment next line to ignore bower_components
199 | # (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
200 | #bower_components/
201 |
202 | # RIA/Silverlight projects
203 | Generated_Code/
204 |
205 | # Backup & report files from converting an old project file
206 | # to a newer Visual Studio version. Backup files are not needed,
207 | # because we have git ;-)
208 | _UpgradeReport_Files/
209 | Backup*/
210 | UpgradeLog*.XML
211 | UpgradeLog*.htm
212 |
213 | # SQL Server files
214 | *.mdf
215 | *.ldf
216 |
217 | # Business Intelligence projects
218 | *.rdl.data
219 | *.bim.layout
220 | *.bim_*.settings
221 |
222 | # Microsoft Fakes
223 | FakesAssemblies/
224 |
225 | # GhostDoc plugin setting file
226 | *.GhostDoc.xml
227 |
228 | # Node.js Tools for Visual Studio
229 | .ntvs_analysis.dat
230 |
231 | # Visual Studio 6 build log
232 | *.plg
233 |
234 | # Visual Studio 6 workspace options file
235 | *.opt
236 |
237 | # Visual Studio LightSwitch build output
238 | **/*.HTMLClient/GeneratedArtifacts
239 | **/*.DesktopClient/GeneratedArtifacts
240 | **/*.DesktopClient/ModelManifest.xml
241 | **/*.Server/GeneratedArtifacts
242 | **/*.Server/ModelManifest.xml
243 | _Pvt_Extensions
244 |
245 | # Paket dependency manager
246 | .paket/paket.exe
247 | paket-files/
248 |
249 | # FAKE - F# Make
250 | .fake/
251 |
252 | # JetBrains Rider
253 | .idea/
254 | *.sln.iml
255 |
256 | # CodeRush
257 | .cr/
258 |
259 | # Python Tools for Visual Studio (PTVS)
260 | __pycache__/
261 | *.pyc
--------------------------------------------------------------------------------
/check-storage-usage/CheckStorageUsage.cs:
--------------------------------------------------------------------------------
1 | using System;
2 | using System.Collections.Generic;
3 | using System.Threading.Tasks;
4 | using Azure;
5 | using Azure.Communication.Email;
6 | using Azure.Communication.Email.Models;
7 | using Azure.Search.Documents.Indexes;
8 | using Azure.Search.Documents.Indexes.Models;
9 | using Microsoft.Azure.WebJobs;
10 | using Microsoft.Extensions.Logging;
11 |
12 | namespace check_storage_usage
13 | {
14 | public class CheckStorageUsage
15 | {
16 | // Run on a timer every 30 minutes
17 | // https://docs.microsoft.com/azure/azure-functions/functions-bindings-timer
18 | [FunctionName("CheckStorageUsage")]
19 | public async Task Run([TimerTrigger("0 */30 * * * *")]TimerInfo timer, ILogger log)
20 | {
21 | string serviceName = Environment.GetEnvironmentVariable("ServiceName");
22 | log.LogInformation($"Checking search storage usage for {serviceName}: {DateTime.Now}");
23 |
24 | string serviceAdminApiKey = Environment.GetEnvironmentVariable("ServiceAdminApiKey");
25 | // Storage used percentage threshold is a number between 0 and 1 representing how much storage should be
26 | // used before alerting
27 | // Example: 0.8 = 80%
28 | float storageUsedPercentThreshold = float.Parse(Environment.GetEnvironmentVariable("StorageUsedPercentThreshold"));
29 |
30 | var searchIndexClient = new SearchIndexClient(new Uri($"https://{serviceName}.search.windows.net"), new AzureKeyCredential(serviceAdminApiKey));
31 | SearchServiceStatistics statistics = await searchIndexClient.GetServiceStatisticsAsync();
32 | float storagedUsedPercent = (float)statistics.Counters.StorageSizeCounter.Usage / (float)statistics.Counters.StorageSizeCounter.Quota;
33 |
34 | if (storagedUsedPercent > storageUsedPercentThreshold)
35 | {
36 | string connectionString = Environment.GetEnvironmentVariable("CommunicationServicesConnectionString");
37 | var emailClient = new EmailClient(connectionString);
38 |
39 | string subject = string.Format("Low storage space on search service {0}", serviceName);
40 | string body = string.Format("Search service {0} is using {1:P2} of its storage which exceeds the alerting threshold of {2:P2}", serviceName, storagedUsedPercent, storageUsedPercentThreshold);
41 | EmailContent emailContent = new EmailContent(subject);
42 | emailContent.PlainText = body;
43 | string toEmailAddress = Environment.GetEnvironmentVariable("ToEmailAddress");
44 | string fromEmailAddress = Environment.GetEnvironmentVariable("FromEmailAddress");
45 | List emailAddresses = new List { new EmailAddress(toEmailAddress) };
46 | EmailRecipients emailRecipients = new EmailRecipients(emailAddresses);
47 | EmailMessage emailMessage = new EmailMessage(fromEmailAddress, emailContent, emailRecipients);
48 | Response response = emailClient.Send(emailMessage);
49 | log.LogInformation("Sent email about low storage, status code {0}", response.GetRawResponse().Status);
50 | }
51 | }
52 | }
53 | }
54 |
--------------------------------------------------------------------------------
/check-storage-usage/Properties/serviceDependencies.json:
--------------------------------------------------------------------------------
1 | {
2 | "dependencies": {
3 | "appInsights1": {
4 | "type": "appInsights"
5 | },
6 | "storage1": {
7 | "type": "storage",
8 | "connectionId": "AzureWebJobsStorage"
9 | }
10 | }
11 | }
--------------------------------------------------------------------------------
/check-storage-usage/Properties/serviceDependencies.local.json:
--------------------------------------------------------------------------------
1 | {
2 | "dependencies": {
3 | "appInsights1": {
4 | "type": "appInsights.sdk"
5 | },
6 | "storage1": {
7 | "type": "storage.emulator",
8 | "connectionId": "AzureWebJobsStorage"
9 | }
10 | }
11 | }
--------------------------------------------------------------------------------
/check-storage-usage/README.md:
--------------------------------------------------------------------------------
1 | ---
2 | page_type: sample
3 | languages:
4 | - csharp
5 | name: Check storage usage of Azure AI Search
6 | description: "Demonstrates checking storage usage of an Azure AI Search service. This example builds a C# Function App using the Azure AI Search .NET SDK."
7 | products:
8 | - azure
9 | - azure-cognitive-search
10 | - azure-functions
11 | urlFragment: check-storage-usage
12 | ---
13 |
14 | # Check Azure AI Search service storage usage
15 |
16 | 
17 |
18 | Demonstrates checking storage usage of an Azure AI Search service on a schedule. This sample may be modified to [adjust the service's capacity](https://docs.microsoft.com/azure/search/search-capacity-planning) or send an alert when the storage usage exceeds a predefined threshold.
19 |
20 | This .NET Core application runs as an [Azure Function](https://docs.microsoft.com/azure/azure-functions/functions-overview). The program [is deployed to Azure](https://docs.microsoft.com/azure/azure-functions/functions-create-your-first-function-visual-studio?tabs=in-process) using [Visual Studio](https://visualstudio.microsoft.com/downloads/) and [runs automatically on a predefined schedule](https://docs.microsoft.com/azure/azure-functions/functions-create-scheduled-function).
21 |
22 | ## Prerequisites
23 |
24 | - [Visual Studio](https://visualstudio.microsoft.com/downloads/)
25 | - [Azure AI Search service](https://docs.microsoft.com/azure/search/search-create-service-portal)
26 | - [Azure Functions](https://docs.microsoft.com/azure/azure-functions/functions-overview)
27 | - [Azure Communication Services](https://docs.microsoft.com/azure/communication-services/overview)
28 |
29 | ## Setup
30 |
31 | 1. Configure a [Communication Services](https://docs.microsoft.com/azure/communication-services/quickstarts/create-communication-resource) resource [to send email](https://docs.microsoft.com/azure/communication-services/quickstarts/email/create-email-communication-resource).
32 |
33 | 1. Clone or download this sample repository.
34 |
35 | 1. Extract contents if the download is a zip file. Make sure the files are read-write.
36 |
37 | ## Run the sample
38 |
39 | 1. Run the function locally [using Visual Studio](https://docs.microsoft.com/azure/azure-functions/functions-develop-local)
40 |
41 | 1. Deploy the sample to Azure [using Visual Studio](https://docs.microsoft.com/azure/azure-functions/functions-create-your-first-function-visual-studio?tabs=in-process#publish-the-project-to-azure).
42 |
43 | 1. Navigate to the deployed Function App in the Azure portal.
44 |
45 | 1. [Update the application settings of the Function App](https://docs.microsoft.com/azure/azure-functions/functions-how-to-use-azure-function-app-settings?tabs=portal). In the Azure portal, navigate to **Configuration** section under **Settings**. Add the following **Application Settings**:
46 |
47 | + `ServiceName` is the name of your search service.
48 | + `ServiceAdminKey` is the [Admin API Key to access your search service](https://docs.microsoft.com/azure/search/search-security-api-keys#find-existing-keys).
49 | + `StorageUsedPercentThreshold` is the threshold used for determining if a search service is using too much storage. This should be a decimal number between 0 and 1 which translates to a percentage of used storage. For example, 0.8 is 80% of used storage.
50 | + `CommunicationServicesConnectionString` is a connection string for your [Communication Services resource](https://docs.microsoft.com/azure/communication-services/concepts/authentication#access-key).
51 | + `ToEmailAddress` is the email address that will be notified of low storage in the search service.
52 | + `FromEmailAddress` is the email address that the notification email will be sent from. It must be in the [domain associated with your Communication Services email resource](https://docs.microsoft.com/azure/communication-services/concepts/email/email-domain-and-sender-authentication)
53 |
54 | ## Verify results
55 |
56 | [An email is sent](https://docs.microsoft.com/azure/communication-services/quickstarts/email/send-email) to the provided email address that the search service has low storage available.
57 |
58 | ## Next steps
59 |
60 | You can learn more about Azure AI Search on the [official documentation site](https://docs.microsoft.com/azure/search).
61 |
--------------------------------------------------------------------------------
/check-storage-usage/check-storage-usage.csproj:
--------------------------------------------------------------------------------
1 |
2 |
3 | net6.0
4 | v4
5 | check_storage_usage
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 | PreserveNewest
16 |
17 |
18 | PreserveNewest
19 | Never
20 |
21 |
22 |
23 |
--------------------------------------------------------------------------------
/check-storage-usage/check-storage-usage.sln:
--------------------------------------------------------------------------------
1 |
2 | Microsoft Visual Studio Solution File, Format Version 12.00
3 | # Visual Studio Version 17
4 | VisualStudioVersion = 17.3.32804.467
5 | MinimumVisualStudioVersion = 10.0.40219.1
6 | Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "check-storage-usage", "check-storage-usage.csproj", "{6FB7258C-3EC6-4EA4-8E75-3D1189D9351C}"
7 | EndProject
8 | Global
9 | GlobalSection(SolutionConfigurationPlatforms) = preSolution
10 | Debug|Any CPU = Debug|Any CPU
11 | Release|Any CPU = Release|Any CPU
12 | EndGlobalSection
13 | GlobalSection(ProjectConfigurationPlatforms) = postSolution
14 | {6FB7258C-3EC6-4EA4-8E75-3D1189D9351C}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
15 | {6FB7258C-3EC6-4EA4-8E75-3D1189D9351C}.Debug|Any CPU.Build.0 = Debug|Any CPU
16 | {6FB7258C-3EC6-4EA4-8E75-3D1189D9351C}.Release|Any CPU.ActiveCfg = Release|Any CPU
17 | {6FB7258C-3EC6-4EA4-8E75-3D1189D9351C}.Release|Any CPU.Build.0 = Release|Any CPU
18 | EndGlobalSection
19 | GlobalSection(SolutionProperties) = preSolution
20 | HideSolutionNode = FALSE
21 | EndGlobalSection
22 | GlobalSection(ExtensibilityGlobals) = postSolution
23 | SolutionGuid = {1204B124-F064-4D4D-9DA5-3BAC2EC09490}
24 | EndGlobalSection
25 | EndGlobal
26 |
--------------------------------------------------------------------------------
/check-storage-usage/host.json:
--------------------------------------------------------------------------------
1 | {
2 | "version": "2.0",
3 | "logging": {
4 | "applicationInsights": {
5 | "samplingSettings": {
6 | "isEnabled": true,
7 | "excludedTypes": "Request"
8 | }
9 | }
10 | }
11 | }
--------------------------------------------------------------------------------
/check-storage-usage/local.settings.json:
--------------------------------------------------------------------------------
1 | {
2 | "IsEncrypted": false,
3 | "Values": {
4 | "AzureWebJobsStorage": "UseDevelopmentStorage=true",
5 | "FUNCTIONS_WORKER_RUNTIME": "dotnet",
6 | "ServiceName": "",
7 | "ServiceAdminApiKey": "",
8 | "StorageUsedPercentThreshold": 0.8,
9 | "CommunicationServicesConnectionString": "",
10 | "ToEmailAddress": "a@example.org",
11 | "FromEmailAddress": "donotreply@"
12 | }
13 | }
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/DataLakeGen2ACLIndexing.csproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Exe
5 | net5.0
6 | DataLakeGen2ACLIndexing
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 | Always
19 |
20 |
21 | Always
22 |
23 |
24 |
25 |
26 |
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/Program.cs:
--------------------------------------------------------------------------------
1 | using System;
2 | using System.Collections.Generic;
3 | using System.IO;
4 | using System.Linq;
5 | using System.Threading.Tasks;
6 | using Azure;
7 | using Azure.Identity;
8 | using Azure.Search.Documents.Indexes;
9 | using Azure.Search.Documents.Indexes.Models;
10 | using Azure.Storage.Files.DataLake;
11 | using Azure.Storage.Files.DataLake.Models;
12 | using Microsoft.Extensions.Configuration;
13 |
14 |
15 | namespace DataLakeGen2ACLIndexing
16 | {
17 | class Program
18 | {
19 | // Name of Container / ADLS Gen2 filesystem for sample data
20 | const string DATA_LAKE_FILESYSTEM_NAME = "acldemo";
21 | // Directory sample data is stored in locally
22 | const string SAMPLE_DATA_DIRECTORY = "SampleData";
23 | // Search index name for data indexed from ADLS Gen2
24 | const string SEARCH_ACL_INDEX_NAME = "acltestindex";
25 | // Search data source name for connection to ADLS Gen2
26 | const string SEARCH_ACL_DATASOURCE_NAME = "acltestdatasource";
27 | // Search indexer name for connection to ADLS Gen2
28 | const string SEARCH_ACL_INDEXER_NAME = "acltestindexer";
29 |
30 | async static Task Main(string[] args)
31 | {
32 | // Read settings from appsettings.json
33 | IConfigurationRoot configuration = new ConfigurationBuilder()
34 | .AddJsonFile("appsettings.json", optional: true)
35 | .Build();
36 | var settings = new AppSettings
37 | {
38 | SearchManagedIdentityID = configuration["searchManagedIdentityID"],
39 | SearchAdminKey = configuration["searchAdminKey"],
40 | SearchEndpoint = configuration["searchEndpoint"],
41 | DataLakeResourceID = configuration["dataLakeResourceID"],
42 | DataLakeEndpoint = configuration["dataLakeEndpoint"]
43 | };
44 |
45 | // Login to Azure using the default credentials on your local machine
46 | var credential = new DefaultAzureCredential();
47 | var dfsClient = new DataLakeServiceClient(new Uri(settings.DataLakeEndpoint), credential);
48 |
49 | var fileSystemClient = dfsClient.GetFileSystemClient(DATA_LAKE_FILESYSTEM_NAME);
50 | Console.WriteLine("Create {0} if not exists...", DATA_LAKE_FILESYSTEM_NAME);
51 | await fileSystemClient.CreateIfNotExistsAsync();
52 |
53 | var rootDirectoryClient = fileSystemClient.GetDirectoryClient(String.Empty);
54 | Console.WriteLine("Uploading sample data if not exists...");
55 | await UploadSampleDataIfNotExistsAsync(SAMPLE_DATA_DIRECTORY, rootDirectoryClient);
56 |
57 | Console.WriteLine("Applying ACLs to sample data...");
58 | await ApplyACLsToSampleData(rootDirectoryClient, settings);
59 |
60 | Console.WriteLine("Creating search index, data source, and indexer...");
61 | await CreateSearchResources(settings);
62 |
63 | Console.WriteLine("Polling for search indexer completion...");
64 | await PollSearchIndexer(settings);
65 | }
66 |
67 | static async Task UploadSampleDataIfNotExistsAsync(string localDirectory, DataLakeDirectoryClient directoryClient)
68 | {
69 | // Upload all sample data files in this directory
70 | foreach (string filePath in Directory.GetFiles(localDirectory))
71 | {
72 | string fileName = Path.GetFileName(filePath);
73 | DataLakeFileClient fileClient = directoryClient.GetFileClient(fileName);
74 | if (!await fileClient.ExistsAsync())
75 | {
76 | await fileClient.UploadAsync(filePath);
77 | }
78 | }
79 |
80 | // Recursively create subdirectories, and upload all sample data files in those subdirectories
81 | foreach (string directory in Directory.GetDirectories(localDirectory))
82 | {
83 | string directoryName = Path.GetFileNameWithoutExtension(directory);
84 | DataLakeDirectoryClient subDirectoryClient = directoryClient.GetSubDirectoryClient(directoryName);
85 | await subDirectoryClient.CreateIfNotExistsAsync();
86 | await UploadSampleDataIfNotExistsAsync(directory, subDirectoryClient);
87 | }
88 | }
89 |
90 | static async Task ApplyACLsToSampleData(DataLakeDirectoryClient rootDirectoryClient, AppSettings settings)
91 | {
92 | Console.WriteLine("Applying Execute and Read ACLs to root directory...");
93 | await ApplyACLsForDirectory(rootDirectoryClient, RolePermissions.Execute | RolePermissions.Read, settings);
94 |
95 | Console.WriteLine(@"Applying Execute and Read ACLs to root ""Files For Organization""...");
96 | var filesForOrganizationClient = rootDirectoryClient.GetFileClient("Files for Organization.txt");
97 | await ApplyACLsForFile(filesForOrganizationClient, RolePermissions.Execute | RolePermissions.Read, settings);
98 |
99 | Console.WriteLine("Applying Execute And Read ACLs to Shared Documents directory recursively...");
100 | var sharedDocumentsDirectoryClient = rootDirectoryClient.GetSubDirectoryClient("Shared Documents");
101 | await ApplyACLsForDirectory(sharedDocumentsDirectoryClient, RolePermissions.Execute | RolePermissions.Read,settings, recursive: true);
102 |
103 | Console.WriteLine("Applying Execute and Read ACLs to User Documents directory...");
104 | var userDocumentsDirectoryClient = rootDirectoryClient.GetSubDirectoryClient("User Documents");
105 | await ApplyACLsForDirectory(userDocumentsDirectoryClient, RolePermissions.Execute | RolePermissions.Read, settings);
106 |
107 | Console.WriteLine("Applying Execute and Read ACLs to Alice's document directory...");
108 | var aliceDirectoryClient = userDocumentsDirectoryClient.GetSubDirectoryClient("Alice");
109 | await ApplyACLsForDirectory(aliceDirectoryClient, RolePermissions.Execute | RolePermissions.Read, settings);
110 |
111 | Console.WriteLine(@"Applying Execute and Read ACLs to ""Alice.txt""...");
112 | var aliceTxtFile = aliceDirectoryClient.GetFileClient("alice.txt");
113 | await ApplyACLsForFile(aliceTxtFile, RolePermissions.Execute | RolePermissions.Read, settings);
114 |
115 | Console.WriteLine("Applying Execute and Read ACLs to John's document directory recursively...");
116 | var johnDirectoryClient = userDocumentsDirectoryClient.GetSubDirectoryClient("John");
117 | await ApplyACLsForDirectory(johnDirectoryClient, RolePermissions.Execute | RolePermissions.Read, settings, recursive: true);
118 |
119 | Console.WriteLine("Applying Execute and Read ACLs to Bob's document directory recursively...");
120 | var bobDirectoryClient = userDocumentsDirectoryClient.GetSubDirectoryClient("Bob");
121 | await ApplyACLsForDirectory(bobDirectoryClient, RolePermissions.Execute | RolePermissions.Read, settings, recursive: true);
122 |
123 | Console.WriteLine(@"Removing Execute and Read ACLs from ""c.txt""");
124 | var cClient = bobDirectoryClient.GetSubDirectoryClient("Reports").GetFileClient("c.txt");
125 | await RemoveACLsForFile(cClient, settings);
126 |
127 | Console.WriteLine(@"Removing Execute and Read ACLs from Bob's Sales directory recursively...");
128 | var salesClient = bobDirectoryClient.GetSubDirectoryClient("Sales");
129 | await RemoveACLsForDirectory(salesClient, settings, recursive: true);
130 | }
131 |
132 | // If recursive is false, apply ACLs to a directory. None of the sub-directory or sub-path ACLs are updated
133 | // If recursive is true, apply ACLs to the directory and all sub-directories and sub-paths
134 | // When applying ACL recursively, the ACLs on all sub-directories and sub-paths are replaced with this directory's ACL
135 | static async Task ApplyACLsForDirectory(DataLakeDirectoryClient directoryClient, RolePermissions newACLs, AppSettings settings, bool recursive = false)
136 | {
137 | PathAccessControl directoryAccessControl =
138 | await directoryClient.GetAccessControlAsync();
139 |
140 | List accessControlList = UpdateACLs(directoryAccessControl.AccessControlList, newACLs, settings);
141 |
142 | if (recursive)
143 | {
144 | await directoryClient.SetAccessControlRecursiveAsync(accessControlList);
145 | }
146 | else
147 | {
148 | await directoryClient.SetAccessControlListAsync(accessControlList);
149 | }
150 | }
151 |
152 | // If recursive is false, remove the ACL from a directory. None of the sub-directory or sub-path ACLs are updated
153 | // If recursive is true, remove ACLs from the directory and all sub-directories and sub-paths
154 | // When removing ACLs recursively, the ACLs on all sub-directories and sub-paths are replaced with this directory's ACL
155 | static async Task RemoveACLsForDirectory(DataLakeDirectoryClient directoryClient, AppSettings settings, bool recursive = false)
156 | {
157 | PathAccessControl directoryAccessControl =
158 | await directoryClient.GetAccessControlAsync();
159 |
160 | List accessControlList = RemoveACLs(directoryAccessControl.AccessControlList, settings);
161 |
162 | if (recursive)
163 | {
164 | await directoryClient.SetAccessControlRecursiveAsync(accessControlList);
165 | }
166 | else
167 | {
168 | await directoryClient.SetAccessControlListAsync(accessControlList);
169 | }
170 | }
171 |
172 | static async Task ApplyACLsForFile(DataLakeFileClient fileClient, RolePermissions newACLs, AppSettings settings)
173 | {
174 | PathAccessControl fileAccessControl =
175 | await fileClient.GetAccessControlAsync();
176 |
177 | List accessControlList = UpdateACLs(fileAccessControl.AccessControlList, newACLs, settings);
178 |
179 | await fileClient.SetAccessControlListAsync(accessControlList);
180 | }
181 |
182 | static async Task RemoveACLsForFile(DataLakeFileClient fileClient, AppSettings settings)
183 | {
184 | PathAccessControl fileAccessControl =
185 | await fileClient.GetAccessControlAsync();
186 |
187 | List accessControlList = RemoveACLs(fileAccessControl.AccessControlList, settings);
188 |
189 | await fileClient.SetAccessControlListAsync(accessControlList);
190 | }
191 |
192 | static List UpdateACLs(IEnumerable existingACLs, RolePermissions newPermissionsForManagedIdentity, AppSettings settings)
193 | {
194 | // Either add an ACL for the search identity if it doesn't exist,
195 | // or update it if it exists
196 | // To learn more please visit https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-acl-dotnet#update-acls
197 | List accessControlList = existingACLs.ToList();
198 | PathAccessControlItem managedIdentityAcl = accessControlList.FirstOrDefault(
199 | accessControlItem => accessControlItem.AccessControlType == AccessControlType.User && accessControlItem.EntityId == settings.SearchManagedIdentityID);
200 | if (managedIdentityAcl == null)
201 | {
202 | managedIdentityAcl = new PathAccessControlItem(
203 | accessControlType: AccessControlType.User,
204 | permissions: RolePermissions.Execute | RolePermissions.Read,
205 | entityId: settings.SearchManagedIdentityID);
206 | accessControlList.Add(managedIdentityAcl);
207 | }
208 | else
209 | {
210 | managedIdentityAcl.Permissions = RolePermissions.Execute | RolePermissions.Read;
211 | }
212 |
213 | return accessControlList;
214 | }
215 |
216 | static List RemoveACLs(IEnumerable existingACLs, AppSettings settings)
217 | {
218 | // Remove the ACL for the search identity if exists
219 | // To learn more please visit https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-acl-dotnet#remove-acl-entries
220 | List accessControlList = existingACLs.ToList();
221 | accessControlList.RemoveAll(
222 | accessControlItem => accessControlItem.AccessControlType == AccessControlType.User && accessControlItem.EntityId == settings.SearchManagedIdentityID);
223 |
224 | return accessControlList;
225 | }
226 |
227 | static async Task CreateSearchResources(AppSettings settings)
228 | {
229 | SearchIndexClient indexClient = new SearchIndexClient(settings.SearchEndpointUri, settings.SearchKeyCredential);
230 |
231 | Console.WriteLine("Deleting search index {0} if exists...", SEARCH_ACL_INDEX_NAME);
232 | try
233 | {
234 | await indexClient.GetIndexAsync(SEARCH_ACL_INDEX_NAME);
235 | await indexClient.DeleteIndexAsync(SEARCH_ACL_INDEX_NAME);
236 | }
237 | catch (RequestFailedException)
238 | {
239 | // Index didn't exist - continue
240 | }
241 |
242 | Console.WriteLine("Creating search index {0}...", SEARCH_ACL_INDEX_NAME);
243 | await indexClient.CreateOrUpdateIndexAsync(
244 | new SearchIndex(SEARCH_ACL_INDEX_NAME, fields: new[]
245 | {
246 | new SearchField("key", SearchFieldDataType.String) { IsKey = true },
247 | new SearchField("metadata_storage_path", SearchFieldDataType.String),
248 | new SearchField("content", SearchFieldDataType.String)
249 | }));
250 |
251 | Console.WriteLine("Creating search data source {0}...", SEARCH_ACL_DATASOURCE_NAME);
252 | SearchIndexerClient indexerClient = new SearchIndexerClient(settings.SearchEndpointUri, settings.SearchKeyCredential);
253 | await indexerClient.CreateOrUpdateDataSourceConnectionAsync(
254 | new SearchIndexerDataSourceConnection(
255 | name: SEARCH_ACL_DATASOURCE_NAME,
256 | type: SearchIndexerDataSourceType.AzureBlob,
257 | connectionString: "ResourceId=" + settings.DataLakeResourceID,
258 | container: new SearchIndexerDataContainer(name: DATA_LAKE_FILESYSTEM_NAME)));
259 |
260 | Console.WriteLine("Deleting search indexer {0} if exists...", SEARCH_ACL_INDEXER_NAME);
261 | try
262 | {
263 | await indexerClient.GetIndexerAsync(SEARCH_ACL_INDEXER_NAME);
264 | await indexerClient.DeleteIndexerAsync(SEARCH_ACL_INDEXER_NAME);
265 | }
266 | catch (RequestFailedException)
267 | {
268 | // Indexer didn't exist - continue
269 | }
270 |
271 | Console.WriteLine("Creating search indexer {0}...", SEARCH_ACL_INDEXER_NAME);
272 | await indexerClient.CreateIndexerAsync(
273 | new SearchIndexer(
274 | name: SEARCH_ACL_INDEXER_NAME,
275 | dataSourceName: SEARCH_ACL_DATASOURCE_NAME,
276 | targetIndexName: SEARCH_ACL_INDEX_NAME)
277 | {
278 | Parameters = new IndexingParameters
279 | {
280 | MaxFailedItems = -1,
281 | IndexingParametersConfiguration = new IndexingParametersConfiguration
282 | {
283 | ParsingMode = BlobIndexerParsingMode.Text
284 | }
285 | }
286 | });
287 | }
288 |
289 | static async Task PollSearchIndexer(AppSettings settings)
290 | {
291 | await Task.Delay(TimeSpan.FromSeconds(5));
292 |
293 | SearchIndexerClient indexerClient = new SearchIndexerClient(settings.SearchEndpointUri, settings.SearchKeyCredential);
294 | while (true)
295 | {
296 | SearchIndexerStatus status = await indexerClient.GetIndexerStatusAsync(SEARCH_ACL_INDEXER_NAME);
297 | if (status.LastResult != null &&
298 | status.LastResult.Status != IndexerExecutionStatus.InProgress)
299 | {
300 | Console.WriteLine("Completed indexing sample data");
301 | break;
302 | }
303 |
304 | Console.WriteLine("Indexing has not finished. Waiting 5 seconds and polling again...");
305 | await Task.Delay(TimeSpan.FromSeconds(5));
306 | }
307 | }
308 |
309 | class AppSettings
310 | {
311 | public string SearchManagedIdentityID { get; set; }
312 | public string SearchAdminKey { get; set; }
313 | public string SearchEndpoint { get; set; }
314 | public string DataLakeEndpoint { get; set;}
315 | public string DataLakeResourceID { get; set; }
316 |
317 | public Uri SearchEndpointUri => new Uri(SearchEndpoint);
318 | public AzureKeyCredential SearchKeyCredential => new AzureKeyCredential(SearchAdminKey);
319 | }
320 | }
321 | }
322 |
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/README.md:
--------------------------------------------------------------------------------
1 | ---
2 | page_type: sample
3 | languages:
4 | - csharp
5 | name: Index Azure Data Lake Gen2 using a managed identity
6 | description: "Index a subset of your Azure Data Lake Gen2 data by using access control lists to allow certain files and directories to be accessed by an indexer in Azure AI Search."
7 | products:
8 | - azure
9 | - azure-cognitive-search
10 | urlFragment: data-lake-gen2-acl-indexing
11 | ---
12 |
13 | # Index Data Lake Gen2 using Azure AD
14 |
15 | This Azure AI Search sample shows you how to configure an indexer connection to Azure Data Lake Gen2 that uses a managed identity and role assignments for selective data access. The sample loads data and sets up permissions for data access, and then runs the indexer to create and load a search index.
16 |
17 | Normally, when setting up [managed identity with Azure Blob Storage or Data Lake Storage](https://docs.microsoft.com/azure/search/search-howto-managed-identities-storage#2---add-a-role-assignment), the [Storage Blob Data Reader role](https://docs.microsoft.com/azure/role-based-access-control/built-in-roles#storage-blob-data-reader) is used. However, this role grants full access to all files in the storage account, which may be undesirable if you are using [Access Control Lists](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-access-control) for more selective access. This sample shows you how to constrain data access to specific files and users.
18 |
19 | ## Prerequisites
20 |
21 | + [.NET 3](https://dotnet.microsoft.com/download/dotnet/5.0)
22 | + [Git](https://git-scm.com/downloads)
23 | + [Azure AI Search service](https://docs.microsoft.com/azure/search/search-create-service-portal) on a billable tier (free tier is not supported)
24 | + [Azure Storage](https://docs.microsoft.com/azure/storage/common/storage-account-create?tabs=azure-portal) with the "Enable hierarchical namespace" option
25 | + Client app: [Visual Studio](https://visualstudio.microsoft.com/downloads/), PowerShell, or [Visual Studio Code](https://code.visualstudio.com/download) with the [Azure Tools](https://docs.microsoft.com/dotnet/azure/configure-vs-code#install-the-azure-tools-extension-pack) extension pack
26 |
27 | ## Clone the search sample with git
28 |
29 | At a terminal, download the sample application to your local computer.
30 |
31 | ```bash
32 | git clone https://github.com/Azure-Samples/azure-search-dotnet-samples
33 | ```
34 |
35 | ## Set up Azure resources
36 |
37 | 1. [Sign in to the Azure portal](https://portal.azure.com).
38 |
39 | 1. [Create a resource group if one doesn't already exist](https://docs.microsoft.com/azure/azure-resource-manager/management/manage-resource-groups-portal#create-resource-groups).
40 |
41 | 1. [Create an Azure AI Search service if one doesn't already exist](https://docs.microsoft.com/azure/search/search-create-service-portal), at [Basic tier](https://azure.microsoft.com/pricing/details/search/) or above.
42 |
43 | 1. Enable a managed identity for your search service using either of the following approaches:
44 |
45 | + [System-managed identity](https://docs.microsoft.com/azure/search/search-howto-managed-identities-storage#option-1---turn-on-system-assigned-managed-identity)
46 |
47 | + [User-managed identity](https://docs.microsoft.com/azure/search/search-howto-managed-identities-storage#option-2---assign-a-user-assigned-managed-identity-to-the-search-service-preview)
48 |
49 | 1. [Create an Azure Storage account if one doesn't already exist](https://docs.microsoft.com/azure/storage/common/storage-account-create?tabs=azure-portal). Make sure that **Enable hierarchical namespace** is checked to enable Data Lake Storage Gen 2 on the storage account.
50 |
51 | ## Grant permissions in Azure Storage
52 |
53 | Search must be able to connect to Azure Storage, and the user who runs the app must be able to load and then secure that data. In this step, create role assignments in Azure Storage to support both tasks.
54 |
55 | 1. In your storage account page in the portal, [create a role assignment](https://docs.microsoft.com/azure/role-based-access-control/role-assignments-portal?tabs=current) that allows the search service's managed identity access to the storage account:
56 |
57 | + Choose [**Reader**](https://docs.microsoft.com/azure/role-based-access-control/built-in-roles#reader) (do not use **Storage Blob Data Reader**)
58 |
59 | 1. Repeat the previous step, this time [creating a role assignment](https://docs.microsoft.com/azure/role-based-access-control/role-assignments-portal?tabs=current) for the user running sample application. The role must be able to upload sample data and create role assignments in Data Lake Gen2 storage:
60 |
61 | + Choose a[**Storage Blob Data Contributor**](https://docs.microsoft.com/azure/role-based-access-control/built-in-roles#storage-blob-data-contributor) or [**Storage Blob Data Owner**](https://docs.microsoft.com/azure/role-based-access-control/built-in-roles#storage-blob-data-owner)
62 |
63 | ## Edit appsettings.json
64 |
65 | Open the **appsettings.json** file in your local copy of the sample application and change the following values.
66 |
67 | 1. "searchManagedIdentityId": "Object (principal) ID for User-assigned or System Managed Identity for Search Service":
68 |
69 | + For a system-assigned managed identity, go the search service's dashboard in the portal. In the left navigation pane, select Identity and then [copy the ID for the system managed identity](https://docs.microsoft.com/azure/search/search-howto-managed-identities-storage#option-1---turn-on-system-assigned-managed-identity).
70 |
71 | + For a user-assigned managed identity, [list the user-managed identities for your subscription](https://docs.microsoft.com/azure/active-directory/managed-identities-azure-resources/how-manage-user-assigned-managed-identities?pivots=identity-mi-methods-azp#list-user-assigned-managed-identities) and then copy the object ID.
72 |
73 | 1. "searchAdminKey": "Admin key for Search Service":
74 |
75 | + Find the Admin API key in the [Keys tab](https://docs.microsoft.com/azure/search/search-security-api-keys#find-existing-keys) on the search service's portal page.
76 |
77 | 1. "searchEndpoint": `https://.search.windows.net`:
78 |
79 | + Find the URI in the [search service's Overview portal page](https://docs.microsoft.com/azure/search/search-manage#overview-home-page).
80 |
81 | 1. "dataLakeResourceID": `/subscriptions//resourceGroups//providers/Microsoft.Storage/storageAccounts/`:
82 |
83 | + Find the resource ID in the storage account's service dashboard in the portal. Go to Settings > Endpoint > Data Lake Storage, and then copy the resource ID.
84 |
85 | 1. "dataLakeEndpoint": `https://.dfs.core.windows.net`:
86 |
87 | + Find the endpoint in the storage account's service dashboard in the portal. Go to Settings > Endpoint > Data Lake Storage, and then copy the primary endpoint.
88 |
89 | ## Run sample code and verify sample data
90 |
91 | Use a client application that can connect to Azure and build a .NET project.
92 |
93 | 1. Using Visual Studio Code with the Azure Tools Extension:
94 |
95 | 1. On the side bar, select the Azure Tools extension and then sign in to your Azure account.
96 |
97 | 1. On the side bar, open Explorer, and then open the local folder containing the sample code.
98 |
99 | 1. Right-click the folder name and open an integrated terminal.
100 |
101 | 1. Run the following command to execute the sample code: `dotnet run`
102 |
103 | 1. Using PowerShell on a computer that has .NET:
104 |
105 | 1. With Administrator permissions in PowerShell, load the Az module: `Import-Module -Name Az`
106 |
107 | 1. Connect to Azure: `Connect-AzAccount`
108 |
109 | 1. Run the following command to execute the sample code: `dotnet run`
110 |
111 | 1. When the sample data has finished indexing, the sample will exit with a message "Completed indexing sample data"
112 |
113 | 1. Return to the Azure portal and your search service. Use [Search Explorer](https://docs.microsoft.com/azure/search/search-explorer) to view "acltestindex" to see the indexed sample data. Only data with an [Access Control List](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-access-control) allowing the indexer's identity will appear in the index.
114 |
115 | ## Clean up resources
116 |
117 | To clean up resources created in this tutorial, [delete the resource group](https://docs.microsoft.com/azure/azure-resource-manager/management/delete-resource-group) that contains the resources.
118 |
119 | ## Next Steps
120 |
121 | Learn more about how Azure Data Lake Storage Gen2 works with access control lists:
122 |
123 | + [Access control lists (ACLs) in Azure Data Lake Storage Gen2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-access-control)
124 |
125 | + [Permissions table: Combining Azure RBAC and ACL](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-access-control-model#permissions-table-combining-azure-rbac-and-acl)
126 |
127 | + [Use .NET to manage ACLs in Azure Data Lake Storage Gen2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-acl-dotnet)
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/Files for Organization.txt:
--------------------------------------------------------------------------------
1 | Files for Organization in this directory
2 | Shared Documents - Accessible to all
3 | Private Documents - Confidential documents, not accessible
4 | User Documents - Mix of accessible to all and confidential documents
5 |
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/Private/confidential.txt:
--------------------------------------------------------------------------------
1 | Confidential data. If this got out, that would be bad....
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/Shared Documents/public.txt:
--------------------------------------------------------------------------------
1 | Public data. OK to be shared to everyone
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/Alice/alice-secret.txt:
--------------------------------------------------------------------------------
1 | You can't read this
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/Alice/alice.txt:
--------------------------------------------------------------------------------
1 | Alice declined to make most of her documents public. This one is public though.
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/Bob/Reports/a.txt:
--------------------------------------------------------------------------------
1 | Big report on the A organization. You can read this
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/Bob/Reports/b.txt:
--------------------------------------------------------------------------------
1 | Big report on the B organization. You can read this.
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/Bob/Reports/c.txt:
--------------------------------------------------------------------------------
1 | Secret report on the C organization. You can't read this.
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/Bob/Sales/d.txt:
--------------------------------------------------------------------------------
1 | Some sales Bob made to the D organization. This is private so you can't read this
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/Bob/Sales/e.txt:
--------------------------------------------------------------------------------
1 | Some sales Bob made to the E organization. This is private so you can't read this
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/Bob/bob.txt:
--------------------------------------------------------------------------------
1 | Bob has a few documents he wants made public. Use recursive acls to quickly set this up
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/John/Documents/a.txt:
--------------------------------------------------------------------------------
1 | Data for John on the A organization. This is public
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/John/Documents/b.txt:
--------------------------------------------------------------------------------
1 | Data for John on the B organization. This is public
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/SampleData/User Documents/John/john.txt:
--------------------------------------------------------------------------------
1 | John's documents, they are public
--------------------------------------------------------------------------------
/data-lake-gen2-acl-indexing/appsettings.json:
--------------------------------------------------------------------------------
1 | {
2 | "searchManagedIdentityId": "Object (principal) ID for User Assigned or System Managed Identity for Search Service",
3 | "searchAdminKey": "Admin key for Search Service",
4 | "searchEndpoint": "https://[search-service-name].search.windows.net",
5 | "dataLakeResourceID": "/subscriptions/[subscription-id]/resourceGroups/[resource-group-name]/providers/Microsoft.Storage/storageAccounts/[storageaccountname]",
6 | "dataLakeEndpoint": "https://[storageaccountname].dfs.core.windows.net"
7 | }
--------------------------------------------------------------------------------
/export-data/.gitignore:
--------------------------------------------------------------------------------
1 | tests/config.json
2 | Sample/local.settings.json
3 |
4 | ## Ignore Visual Studio temporary files, build results, and
5 | ## files generated by popular Visual Studio add-ons.
6 | ##
7 | ## Get latest from https://github.com/github/gitignore/blob/main/VisualStudio.gitignore
8 |
9 | # User-specific files
10 | *.rsuser
11 | *.suo
12 | *.user
13 | *.userosscache
14 | *.sln.docstates
15 |
16 | # User-specific files (MonoDevelop/Xamarin Studio)
17 | *.userprefs
18 |
19 | # Mono auto generated files
20 | mono_crash.*
21 |
22 | # Build results
23 | [Dd]ebug/
24 | [Dd]ebugPublic/
25 | [Rr]elease/
26 | [Rr]eleases/
27 | x64/
28 | x86/
29 | [Ww][Ii][Nn]32/
30 | [Aa][Rr][Mm]/
31 | [Aa][Rr][Mm]64/
32 | bld/
33 | [Bb]in/
34 | [Oo]bj/
35 | [Ll]og/
36 | [Ll]ogs/
37 |
38 | # Visual Studio 2015/2017 cache/options directory
39 | .vs/
40 | # Uncomment if you have tasks that create the project's static files in wwwroot
41 | #wwwroot/
42 |
43 | # Visual Studio 2017 auto generated files
44 | Generated\ Files/
45 |
46 | # MSTest test Results
47 | [Tt]est[Rr]esult*/
48 | [Bb]uild[Ll]og.*
49 |
50 | # NUnit
51 | *.VisualState.xml
52 | TestResult.xml
53 | nunit-*.xml
54 |
55 | # Build Results of an ATL Project
56 | [Dd]ebugPS/
57 | [Rr]eleasePS/
58 | dlldata.c
59 |
60 | # Benchmark Results
61 | BenchmarkDotNet.Artifacts/
62 |
63 | # .NET Core
64 | project.lock.json
65 | project.fragment.lock.json
66 | artifacts/
67 |
68 | # ASP.NET Scaffolding
69 | ScaffoldingReadMe.txt
70 |
71 | # StyleCop
72 | StyleCopReport.xml
73 |
74 | # Files built by Visual Studio
75 | *_i.c
76 | *_p.c
77 | *_h.h
78 | *.ilk
79 | *.meta
80 | *.obj
81 | *.iobj
82 | *.pch
83 | *.pdb
84 | *.ipdb
85 | *.pgc
86 | *.pgd
87 | *.rsp
88 | *.sbr
89 | *.tlb
90 | *.tli
91 | *.tlh
92 | *.tmp
93 | *.tmp_proj
94 | *_wpftmp.csproj
95 | *.log
96 | *.tlog
97 | *.vspscc
98 | *.vssscc
99 | .builds
100 | *.pidb
101 | *.svclog
102 | *.scc
103 |
104 | # Chutzpah Test files
105 | _Chutzpah*
106 |
107 | # Visual C++ cache files
108 | ipch/
109 | *.aps
110 | *.ncb
111 | *.opendb
112 | *.opensdf
113 | *.sdf
114 | *.cachefile
115 | *.VC.db
116 | *.VC.VC.opendb
117 |
118 | # Visual Studio profiler
119 | *.psess
120 | *.vsp
121 | *.vspx
122 | *.sap
123 |
124 | # Visual Studio Trace Files
125 | *.e2e
126 |
127 | # TFS 2012 Local Workspace
128 | $tf/
129 |
130 | # Guidance Automation Toolkit
131 | *.gpState
132 |
133 | # ReSharper is a .NET coding add-in
134 | _ReSharper*/
135 | *.[Rr]e[Ss]harper
136 | *.DotSettings.user
137 |
138 | # TeamCity is a build add-in
139 | _TeamCity*
140 |
141 | # DotCover is a Code Coverage Tool
142 | *.dotCover
143 |
144 | # AxoCover is a Code Coverage Tool
145 | .axoCover/*
146 | !.axoCover/settings.json
147 |
148 | # Coverlet is a free, cross platform Code Coverage Tool
149 | coverage*.json
150 | coverage*.xml
151 | coverage*.info
152 |
153 | # Visual Studio code coverage results
154 | *.coverage
155 | *.coveragexml
156 |
157 | # NCrunch
158 | _NCrunch_*
159 | .*crunch*.local.xml
160 | nCrunchTemp_*
161 |
162 | # MightyMoose
163 | *.mm.*
164 | AutoTest.Net/
165 |
166 | # Web workbench (sass)
167 | .sass-cache/
168 |
169 | # Installshield output folder
170 | [Ee]xpress/
171 |
172 | # DocProject is a documentation generator add-in
173 | DocProject/buildhelp/
174 | DocProject/Help/*.HxT
175 | DocProject/Help/*.HxC
176 | DocProject/Help/*.hhc
177 | DocProject/Help/*.hhk
178 | DocProject/Help/*.hhp
179 | DocProject/Help/Html2
180 | DocProject/Help/html
181 |
182 | # Click-Once directory
183 | publish/
184 |
185 | # Publish Web Output
186 | *.[Pp]ublish.xml
187 | *.azurePubxml
188 | # Note: Comment the next line if you want to checkin your web deploy settings,
189 | # but database connection strings (with potential passwords) will be unencrypted
190 | *.pubxml
191 | *.publishproj
192 |
193 | # Microsoft Azure Web App publish settings. Comment the next line if you want to
194 | # checkin your Azure Web App publish settings, but sensitive information contained
195 | # in these scripts will be unencrypted
196 | PublishScripts/
197 |
198 | # NuGet Packages
199 | *.nupkg
200 | # NuGet Symbol Packages
201 | *.snupkg
202 | # The packages folder can be ignored because of Package Restore
203 | **/[Pp]ackages/*
204 | # except build/, which is used as an MSBuild target.
205 | !**/[Pp]ackages/build/
206 | # Uncomment if necessary however generally it will be regenerated when needed
207 | #!**/[Pp]ackages/repositories.config
208 | # NuGet v3's project.json files produces more ignorable files
209 | *.nuget.props
210 | *.nuget.targets
211 |
212 | # Microsoft Azure Build Output
213 | csx/
214 | *.build.csdef
215 |
216 | # Microsoft Azure Emulator
217 | ecf/
218 | rcf/
219 |
220 | # Windows Store app package directories and files
221 | AppPackages/
222 | BundleArtifacts/
223 | Package.StoreAssociation.xml
224 | _pkginfo.txt
225 | *.appx
226 | *.appxbundle
227 | *.appxupload
228 |
229 | # Visual Studio cache files
230 | # files ending in .cache can be ignored
231 | *.[Cc]ache
232 | # but keep track of directories ending in .cache
233 | !?*.[Cc]ache/
234 |
235 | # Others
236 | ClientBin/
237 | ~$*
238 | *~
239 | *.dbmdl
240 | *.dbproj.schemaview
241 | *.jfm
242 | *.pfx
243 | *.publishsettings
244 | orleans.codegen.cs
245 |
246 | # Including strong name files can present a security risk
247 | # (https://github.com/github/gitignore/pull/2483#issue-259490424)
248 | #*.snk
249 |
250 | # Since there are multiple workflows, uncomment next line to ignore bower_components
251 | # (https://github.com/github/gitignore/pull/1529#issuecomment-104372622)
252 | #bower_components/
253 |
254 | # RIA/Silverlight projects
255 | Generated_Code/
256 |
257 | # Backup & report files from converting an old project file
258 | # to a newer Visual Studio version. Backup files are not needed,
259 | # because we have git ;-)
260 | _UpgradeReport_Files/
261 | Backup*/
262 | UpgradeLog*.XML
263 | UpgradeLog*.htm
264 | ServiceFabricBackup/
265 | *.rptproj.bak
266 |
267 | # SQL Server files
268 | *.mdf
269 | *.ldf
270 | *.ndf
271 |
272 | # Business Intelligence projects
273 | *.rdl.data
274 | *.bim.layout
275 | *.bim_*.settings
276 | *.rptproj.rsuser
277 | *- [Bb]ackup.rdl
278 | *- [Bb]ackup ([0-9]).rdl
279 | *- [Bb]ackup ([0-9][0-9]).rdl
280 |
281 | # Microsoft Fakes
282 | FakesAssemblies/
283 |
284 | # GhostDoc plugin setting file
285 | *.GhostDoc.xml
286 |
287 | # Node.js Tools for Visual Studio
288 | .ntvs_analysis.dat
289 | node_modules/
290 |
291 | # Visual Studio 6 build log
292 | *.plg
293 |
294 | # Visual Studio 6 workspace options file
295 | *.opt
296 |
297 | # Visual Studio 6 auto-generated workspace file (contains which files were open etc.)
298 | *.vbw
299 |
300 | # Visual Studio 6 auto-generated project file (contains which files were open etc.)
301 | *.vbp
302 |
303 | # Visual Studio 6 workspace and project file (working project files containing files to include in project)
304 | *.dsw
305 | *.dsp
306 |
307 | # Visual Studio 6 technical files
308 | *.ncb
309 | *.aps
310 |
311 | # Visual Studio LightSwitch build output
312 | **/*.HTMLClient/GeneratedArtifacts
313 | **/*.DesktopClient/GeneratedArtifacts
314 | **/*.DesktopClient/ModelManifest.xml
315 | **/*.Server/GeneratedArtifacts
316 | **/*.Server/ModelManifest.xml
317 | _Pvt_Extensions
318 |
319 | # Paket dependency manager
320 | .paket/paket.exe
321 | paket-files/
322 |
323 | # FAKE - F# Make
324 | .fake/
325 |
326 | # CodeRush personal settings
327 | .cr/personal
328 |
329 | # Python Tools for Visual Studio (PTVS)
330 | __pycache__/
331 | *.pyc
332 |
333 | # Cake - Uncomment if you are using it
334 | # tools/**
335 | # !tools/packages.config
336 |
337 | # Tabs Studio
338 | *.tss
339 |
340 | # Telerik's JustMock configuration file
341 | *.jmconfig
342 |
343 | # BizTalk build output
344 | *.btp.cs
345 | *.btm.cs
346 | *.odx.cs
347 | *.xsd.cs
348 |
349 | # OpenCover UI analysis results
350 | OpenCover/
351 |
352 | # Azure Stream Analytics local run output
353 | ASALocalRun/
354 |
355 | # MSBuild Binary and Structured Log
356 | *.binlog
357 |
358 | # NVidia Nsight GPU debugger configuration file
359 | *.nvuser
360 |
361 | # MFractors (Xamarin productivity tool) working folder
362 | .mfractor/
363 |
364 | # Local History for Visual Studio
365 | .localhistory/
366 |
367 | # Visual Studio History (VSHistory) files
368 | .vshistory/
369 |
370 | # BeatPulse healthcheck temp database
371 | healthchecksdb
372 |
373 | # Backup folder for Package Reference Convert tool in Visual Studio 2017
374 | MigrationBackup/
375 |
376 | # Ionide (cross platform F# VS Code tools) working folder
377 | .ionide/
378 |
379 | # Fody - auto-generated XML schema
380 | FodyWeavers.xsd
381 |
382 | # VS Code files for those working on multiple tools
383 | .vscode/*
384 | !.vscode/settings.json
385 | !.vscode/tasks.json
386 | !.vscode/launch.json
387 | !.vscode/extensions.json
388 | *.code-workspace
389 |
390 | # Local History for Visual Studio Code
391 | .history/
392 |
393 | # Windows Installer files from build outputs
394 | *.cab
395 | *.msi
396 | *.msix
397 | *.msm
398 | *.msp
399 |
400 | # JetBrains Rider
401 | *.sln.iml
--------------------------------------------------------------------------------
/export-data/README.md:
--------------------------------------------------------------------------------
1 | ---
2 | page_type: sample
3 | languages:
4 | - csharp
5 | name: Export data from an Azure AI Search index
6 | description: "Export data from an Azure AI Search service. This example builds a C# Console Application using the Azure AI Search .NET SDK."
7 | products:
8 | - azure
9 | - azure-cognitive-search
10 | urlFragment: export-data
11 | ---
12 |
13 | # Export Azure AI Search service index data
14 |
15 | 
16 |
17 | Export data from an Azure AI Search service. This .NET application runs on the command line.
18 |
19 | ## Prerequisites
20 |
21 | - [Visual Studio](https://visualstudio.microsoft.com/downloads/)
22 | - [Azure AI Search service](https://docs.microsoft.com/azure/search/search-create-service-portal)
23 |
24 | ## Setup
25 |
26 | 1. Clone or download this sample repository.
27 |
28 | 1. Extract contents if the download is a zip file. Make sure the files are read-write.
29 |
30 | ## Run the sample
31 |
32 | 1. Run the app locally [using Visual Studio](https://docs.microsoft.com/azure/azure-functions/functions-develop-local) or [dotnet run](https://learn.microsoft.com/dotnet/core/tools/dotnet-run)
33 |
34 | 1. There are 4 commands in the app
35 | 1. `get-bounds`
36 | 1. `partition-index`
37 | 1. `export-partitions`
38 | 1. `export-continuous`
39 |
40 | These commands support two different strategies for exporting data from the index
41 |
42 | 1. Partitioned export. Documents in the index are split into smaller partitions that can be concurrently exported into JSON files.
43 | 1. Continuous export. An additional field is added to your index to track export progress, and is continually updated as more documents are exported.
44 |
45 | These strategies have different tradeoffs. You should use partitioned export when:
46 |
47 | - You have a [sortable](https://learn.microsoft.com/azure/search/search-pagination-page-layout#ordering-with-orderby) and [filterable](https://learn.microsoft.com/azure/search/search-filters) field that can be used to partition the documents in the index.
48 | - You are not updating any documents in the index, or you are not updating the documents in the index you want to export.
49 | - You have a large number of documents. Partitioned export supports exporting more than 1000 documents concurrently. Export speed depends on [how your search service is provisioned](https://learn.microsoft.com/azure/search/search-capacity-planning).
50 |
51 | You should use continuous export when:
52 |
53 | - You do not have a [sortable](https://learn.microsoft.com/azure/search/search-pagination-page-layout#ordering-with-orderby) and [filterable](https://learn.microsoft.com/azure/search/search-filters) field. This field is required for partitioned export
54 | - You are actively updating the documents in the index you want to export
55 | - You have [storage space remaining on your search service](https://learn.microsoft.com/azure/search/search-limits-quotas-capacity#storage-limits), and are OK with the export process updating documents in the index. Continuous export adds an additional field to track export progress, which requires some storage be available.
56 | - Duplicate documents may be included in the exported data. If the search service has multiple replicas, a [best-effort attempt is made to use the same replica](https://learn.microsoft.com/azure/search/index-similarity-and-scoring#scoring-statistics-and-sticky-sessions) to ensure consistent export results. There may also [be a delay in updating already exported documents](https://learn.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents#response), so documents may be exported more than once.
57 |
58 | ### Partitioned export commands
59 |
60 | #### get-bounds
61 |
62 | The `get-bounds` command is used to find the smallest and largest values of a sortable and filterable field in the index. This is used to determine how to split up the documents in the index into smaller partitions
63 |
64 | ```
65 | Description:
66 | Find and display the largest and lowest value for the specified field. Used to determine how to partition index data for export
67 |
68 | Usage:
69 | export-data get-bounds [options]
70 |
71 | Options:
72 | --endpoint (REQUIRED) Endpoint of the search service to export data from. Example: https://example.search.windows.net
73 | --admin-key Admin key to the search service to export data from. If not specified - uses your Entra identity
74 | --index-name (REQUIRED) Name of the index to export data from
75 | --field-name (REQUIRED) Name of field used to partition the index data. This field must be filterable and sortable.
76 | -?, -h, --help Show help and usage information
77 | ```
78 |
79 | Sample usage:
80 |
81 | ```
82 | dotnet run get-bounds --endpoint https://example.search.windows.net --admin-key AAAAAAA --index-name my-index --field-name date
83 |
84 | Lower Bound 1969-12-31T16:11:38.0000000+00:00
85 | Upper Bound 2022-11-06T12:14:21.0000000+00:00
86 | ```
87 |
88 | In this example, `date` is a [Edm.DateTimeOffset](https://learn.microsoft.com/rest/api/searchservice/supported-data-types) with the [sortable](https://learn.microsoft.com/azure/search/search-pagination-page-layout#ordering-with-orderby) and [filterable](https://learn.microsoft.com/azure/search/search-filters) attributes applied. The lowest possible value in the index for this field is 1969/12/31 and the highest possible value in the index for this field is 2011/11/06.
89 |
90 | #### partition-index
91 |
92 | The `partition-index` command is used to divide the index into smaller partitions.
93 |
94 | ```
95 | Description:
96 | Partitions the data in the index between the upper and lower bound values into partitions with at most 100,000 documents.
97 |
98 | Usage:
99 | export-data partition-index [options]
100 |
101 | Options:
102 | --endpoint (REQUIRED) Endpoint of the search service to export data from. Example: https://example.search.windows.net
103 | --admin-key Admin key to the search service to export data from. If not specified - uses your Entra identity
104 | --index-name (REQUIRED) Name of the index to export data from
105 | --field-name (REQUIRED) Name of field used to partition the index data. This field must be filterable and sortable.
106 | --lower-bound Smallest value to use to partition the index data. Defaults to the smallest value in the index. []
107 | --upper-bound Largest value to use to partition the index data. Defaults to the largest value in the index. []
108 | --partition-size Maximum size of a partition. Defaults to 100,000. Cannot exceed 100,000 [default: 100000]
109 | --partition-path Path of the file with JSON description of partitions. Should end in .json. Default is -partitions.json []
110 | -?, -h, --help Show help and usage information
111 | ```
112 |
113 | Sample usage:
114 |
115 | ```
116 | dotnet run partition-index --endpoint https://example.search.windows.net --admin-key AAAAAAA --index-name my-index --field-name date
117 |
118 | Wrote partitions to my-index-partitions.json
119 | ```
120 |
121 | In this case, `my-index-partitions.json` has a JSON description of the partitions inside the index
122 |
123 | ```json
124 | {
125 | "endpoint": "https://example.search.windows.net",
126 | "indexName": "my-index",
127 | "fieldName": "date",
128 | "totalDocumentCount": 500000,
129 | "partitions": [
130 | {
131 | "upperBound": "1976-08-09T12:41:58.375+00:00",
132 | "lowerBound": "1969-12-31T16:11:38+00:00",
133 | "documentCount": 62382,
134 | "filter": "date ge 1969-12-31T16:11:38.0000000+00:00 and date le 1976-08-09T12:41:58.3750000+00:00"
135 | },
136 | // more partitions in the same format as above
137 | ]
138 | ```
139 |
140 | The JSON file contains metadata about the index and the partitions it created, such as total document count and partition field name. The `partitions` field lists all the [filters](https://learn.microsoft.com/azure/search/search-filters) used to retrieve the partitions using [pagination](https://learn.microsoft.com/azure/search/search-pagination-page-layout#paging-results).
141 |
142 | #### export-partitions
143 |
144 | The `export-partitions` command is used to export the partitions created by `partition-index` into JSON files.
145 |
146 | ```
147 | Description:
148 | Exports data from a search index using a pre-generated partition file from partition-index
149 |
150 | Usage:
151 | export-data export-partitions [options]
152 |
153 | Options:
154 | --partition-path (REQUIRED) Path of the file with JSON description of partitions. Should end in .json.
155 | --admin-key Admin key to the search service to export data from. If not specified - uses your Entra identity
156 | --export-path Directory to write JSON Lines partition files to. Every line in the partition file contains a JSON object with the contents of the Search document. Format of file names is --documents.json [default: .]
157 | --concurrent-partitions Number of partitions to concurrently export. Default is 2 [default: 2]
158 | --page-size Page size to use when running export queries. Default is 1000 [default: 1000]
159 | --include-partition List of partitions by index to include in the export. Example: --include-partition 0 --include-partition 1 only runs the export on first 2 partitions []
160 | --exclude-partition List of partitions by index to exclude from the export. Example: --exclude-partition 0 --exclude-partition 1 runs the export on every partition except the first 2 []
161 | --include-field List of fields to include in the export. Example: --include-field field1 --include-field field2. []
162 | --exclude-field List of fields to exclude in the export. Example: --exclude-field field1 --exclude-field field2. []
163 | -?, -h, --help Show help and usage information
164 | ```
165 |
166 | Sample usage:
167 |
168 | ```cmd
169 | dotnet run export-partitions --partition-path my-index-partitions.json --admin-key AAAAAAA --export-path C:\Users\MyAccount\output --concurrent-partitions 8
170 | Starting partition 2
171 | Starting partition 1
172 | Starting partition 0
173 | Starting partition 3
174 | Starting partition 7
175 | Starting partition 4
176 | Starting partition 5
177 | Starting partition 6
178 | Ended partition 4
179 | Ended partition 6
180 | Ended partition 3
181 | Ended partition 0
182 | Ended partition 7
183 | Ended partition 2
184 | Ended partition 1
185 | Ended partition 5
186 | ```
187 |
188 | The `export-partitions` command was run on partitions in the `my-index-partitions.json` file, which was output by the previous `partition-index` command. `--concurrent-partitions` was set to 8, so 8 partitions in this file were loaded into JSON files concurrently. This number can be changed to customize parallelization. Higher numbers increase load on the search service but complete the export more quickly. Lower numbers use less resources, but take a longer time to complete the export.
189 |
190 | 1 JSON file per partition is output, with the file name formatted as `index-partition_index-documents.json`. The output [JSONL files](https://jsonlines.org/) have 1 JSON object per line, corresponding to a single search document. All fields marked as [retrievable](https://learn.microsoft.com/azure/search/search-query-simple-examples) are exported by default. Fields can be either explicitly included using `--include-field`, or explicitly excluded using `--exclude-field`.
191 |
192 | Example output in `index-0-documents.json`:
193 |
194 | ```json
195 | {"id":"document-1", "text": "first document", "date":"1969-12-31T16:11:38Z"}
196 | {"id":"document-2","text": "second document", "date":"1969-12-31T17:05:39Z"}
197 | ...
198 | ```
199 |
200 | ### Continuous export commands
201 |
202 | #### export-continuous
203 |
204 | The `export-continuous` command starts finding documents that have not been exported and writes them into a JSON file
205 |
206 | ```
207 | Description:
208 | Exports data from a search service by adding a column to track which documents have been exported and continually updating it
209 |
210 | Usage:
211 | export-data export-continuous [options]
212 |
213 | Options:
214 | --endpoint (REQUIRED) Endpoint of the search service to export data from. Example: https://example.search.windows.net
215 | --admin-key Admin key to the search service to export data from. If not specified - uses your Entra identity
216 | --index-name (REQUIRED) Name of the index to export data from
217 | --export-field-name Name of the Edm.Boolean field the continuous export process will update to track which documents have been exported. Default is 'exported' [default: exported]
218 | --page-size Page size to use when running export queries. Default is 1000 [default: 1000]
219 | --export-path Path to write JSON Lines file to. Every line in the file contains a JSON object with the contents of the Search document. Format of file is -documents.json []
220 | --include-field List of fields to include in the export. Example: --include-field field1 --include-field field2. []
221 | --exclude-field List of fields to exclude in the export. Example: --exclude-field field1 --exclude-field field2. []
222 | -?, -h, --help Show help and usage information
223 | ```
224 |
225 | Sample usage:
226 |
227 | ```
228 | dotnet run export-continuous --endpoint https://example.search.windows.net --admin-key AAAA --index-name my-index
229 | ```
230 |
231 | 1 JSON file is output, with the file name formatted as `my-index-documents.json`. The output [JSONL file](https://jsonlines.org/) has 1 JSON object per line, corresponding to a single search document. All fields marked as [retrievable](https://learn.microsoft.com/azure/search/search-query-simple-examples) are exported by default, except the field used to track if the document was exported or not. Fields can be either explicitly included using `--include-field`, or explicitly excluded using `--exclude-field`. If the export is cancelled, it is resumed where it left off.
232 |
233 | Duplicate documents may be included in the exported data. If the search service has multiple replicas, a [best-effort attempt is made to use the same replica](https://learn.microsoft.com/azure/search/index-similarity-and-scoring#scoring-statistics-and-sticky-sessions) to ensure consistent export results. There may also [be a delay in updating already exported documents](https://learn.microsoft.com/rest/api/searchservice/addupdate-or-delete-documents#response), so documents may be exported more than once. Storage usage also increases as additional data is added to the index. If duplicate documents or storage limits are an issue, partitioned export is recommended.
234 |
235 | Example output in `my-index-documents.json`:
236 |
237 | ```json
238 | {"id":"document-1", "text": "first document"}
239 | {"id":"document-2","text": "second document"}
240 | ```
241 |
242 | ## Next steps
243 |
244 | You can learn more about Azure AI Search on the [official documentation site](https://docs.microsoft.com/azure/search).
245 |
--------------------------------------------------------------------------------
/export-data/Sample/Configuration.cs:
--------------------------------------------------------------------------------
1 | using Microsoft.Extensions.Configuration;
2 |
3 | namespace Sample
4 | {
5 | public class Configuration
6 | {
7 | ///
8 | /// Service endpoint for the search service
9 | /// e.g. "https://your-search-service.search.windows.net
10 | ///
11 | [ConfigurationKeyName("AZURE_SEARCH_SERVICE_ENDPOINT")]
12 | public string ServiceEndpoint { get; set; }
13 |
14 | ///
15 | /// Index name in the search service
16 | /// e.g. sample-index
17 | ///
18 | [ConfigurationKeyName("AZURE_SEARCH_INDEX_NAME")]
19 | public string IndexName { get; set; }
20 |
21 | ///
22 | /// Admin API key for search service
23 | /// Optional, if not specified attempt to use DefaultAzureCredential
24 | ///
25 | [ConfigurationKeyName("AZURE_SEARCH_ADMIN_KEY")]
26 | public string AdminKey { get; set; }
27 |
28 | ///
29 | /// Directory to save the exported files in
30 | ///
31 | [ConfigurationKeyName("EXPORT_DIRECTORY")]
32 | public string ExportDirectory { get; set; }
33 |
34 | ///
35 | /// Validate the configuration
36 | ///
37 | /// If any parameters are invalid
38 | public void Validate()
39 | {
40 | if (!Uri.TryCreate(ServiceEndpoint, UriKind.Absolute, out _))
41 | {
42 | throw new ArgumentException("Must specify service endpoint", nameof(ServiceEndpoint));
43 | }
44 |
45 | if (string.IsNullOrEmpty(IndexName))
46 | {
47 | throw new ArgumentException("Must specify index name", nameof(IndexName));
48 | }
49 | }
50 | }
51 | }
52 |
--------------------------------------------------------------------------------
/export-data/Sample/Document.cs:
--------------------------------------------------------------------------------
1 | using Azure.Search.Documents.Indexes;
2 |
3 | namespace Sample
4 | {
5 | public class Document
6 | {
7 | [SimpleField(IsKey = true, IsFilterable = true)]
8 | public string Id { get; set; }
9 |
10 | [SimpleField(IsFilterable = true, IsSortable = true)]
11 | public DateTimeOffset Timestamp { get; set; }
12 | }
13 | }
14 |
--------------------------------------------------------------------------------
/export-data/Sample/Program.cs:
--------------------------------------------------------------------------------
1 | using Azure.Search.Documents;
2 | using Azure.Identity;
3 | using Microsoft.Extensions.Configuration;
4 | using Sample;
5 | using Azure.Search.Documents.Indexes;
6 | using Azure;
7 | using Azure.Search.Documents.Indexes.Models;
8 | using export_data;
9 |
10 | // Before running this sample
11 | // 1. Copy local.settings-example.json to local.settings.json
12 | // 2. Fill in the sample values with actual values
13 | var configuration = new Configuration();
14 | new ConfigurationBuilder()
15 | .SetBasePath(Directory.GetCurrentDirectory())
16 | .AddJsonFile("local.settings.json")
17 | .Build()
18 | .Bind(configuration);
19 | configuration.Validate();
20 |
21 | var endpoint = new Uri(configuration.ServiceEndpoint);
22 | var defaultCredential = new DefaultAzureCredential();
23 | var adminKey = !string.IsNullOrEmpty(configuration.AdminKey) ? new AzureKeyCredential(configuration.AdminKey) : null;
24 | var searchIndexClient = adminKey != null ? new SearchIndexClient(endpoint, adminKey) : new SearchIndexClient(endpoint, defaultCredential);
25 |
26 | var fieldBuilder = new FieldBuilder();
27 | var searchFields = fieldBuilder.Build(typeof(Document));
28 | var indexDefinition = new SearchIndex(configuration.IndexName, searchFields);
29 | await searchIndexClient.CreateOrUpdateIndexAsync(indexDefinition);
30 |
31 | var searchClient = searchIndexClient.GetSearchClient(indexDefinition.Name);
32 |
33 | // Upload randomly generated documents
34 | using (var bufferedSender = new SearchIndexingBufferedSender(searchClient))
35 | {
36 | const int DocumentCount = 500000;
37 | DateTimeOffset start = new DateTimeOffset(2023, 01, 01, 0, 0, 0, TimeSpan.Zero);
38 | DateTimeOffset end = new DateTimeOffset(2024, 01, 01, 0, 0, 0, TimeSpan.Zero);
39 | var random = new Random();
40 | for (int i = 0; i < DocumentCount; i++)
41 | {
42 | bufferedSender.UploadDocuments(
43 | new[] {
44 | new Document {
45 | Id = Convert.ToString(i),
46 | // Get the next date by adding a random amount of ticks between the end and start dates
47 | Timestamp = start + (random.NextDouble() * (end-start))
48 | }
49 | });
50 | }
51 | }
52 |
53 | // Demonstrate how to use partition export
54 | SearchField timestampField = searchFields
55 | .Where(field => field.Type == SearchFieldDataType.DateTimeOffset)
56 | .Single();
57 | object lowerBound = await Bound.FindLowerBoundAsync(timestampField, searchClient);
58 | object upperBound = await Bound.FindUpperBoundAsync(timestampField, searchClient);
59 | List partitions = await new PartitionGenerator(searchClient, timestampField, lowerBound, upperBound).GeneratePartitions();
60 |
61 | var partitionFile = new PartitionFile
62 | {
63 | Endpoint = endpoint.AbsoluteUri,
64 | IndexName = indexDefinition.Name,
65 | FieldName = timestampField.Name,
66 | TotalDocumentCount = partitions.Sum(partition => partition.DocumentCount),
67 | Partitions = partitions
68 | };
69 |
70 | if (!Directory.Exists(configuration.ExportDirectory))
71 | {
72 | Directory.CreateDirectory(configuration.ExportDirectory);
73 | }
74 | var partitionFilePath = Path.Combine(configuration.ExportDirectory, $"{indexDefinition.Name}-partitions.json");
75 | partitionFile.SerializeToFile(partitionFilePath);
76 |
77 | var partitionWriter = new FilePartitionWriter(configuration.ExportDirectory, indexDefinition.Name);
78 | await new PartitionExporter(
79 | partitionFile,
80 | partitionWriter,
81 | searchClient,
82 | indexDefinition,
83 | concurrentPartitions: 2,
84 | pageSize: 1000).ExportAsync();
--------------------------------------------------------------------------------
/export-data/Sample/Sample.csproj:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | Exe
5 | net6.0
6 | enable
7 | disable
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 | Always
22 |
23 |
24 |
25 |
26 |
--------------------------------------------------------------------------------
/export-data/Sample/local.settings-example.json:
--------------------------------------------------------------------------------
1 | {
2 | "AZURE_SEARCH_SERVICE_ENDPOINT": "https://my-service-endpoint.search.windows.net",
3 | "AZURE_SEARCH_INDEX_NAME": "my-admin-key",
4 | "AZURE_SEARCH_ADMIN_KEY": "my-example-index-name",
5 | "EXPORT_DIRECTORY": "sample-export"
6 | }
7 |
--------------------------------------------------------------------------------
/export-data/export-data.sln:
--------------------------------------------------------------------------------
1 |
2 | Microsoft Visual Studio Solution File, Format Version 12.00
3 | # Visual Studio Version 17
4 | VisualStudioVersion = 17.3.32929.385
5 | MinimumVisualStudioVersion = 10.0.40219.1
6 | Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "export-data", "export-data\export-data.csproj", "{DDF5C0C5-8499-453A-AA06-368A9E218262}"
7 | EndProject
8 | Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "tests", "tests\tests.csproj", "{EBF0AE26-F3AB-4FF7-8197-75F17D7F9402}"
9 | EndProject
10 | Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "Sample", "Sample\Sample.csproj", "{CBE2C21F-50C2-4CC2-82E0-263EE265612C}"
11 | EndProject
12 | Global
13 | GlobalSection(SolutionConfigurationPlatforms) = preSolution
14 | Debug|Any CPU = Debug|Any CPU
15 | Release|Any CPU = Release|Any CPU
16 | EndGlobalSection
17 | GlobalSection(ProjectConfigurationPlatforms) = postSolution
18 | {DDF5C0C5-8499-453A-AA06-368A9E218262}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
19 | {DDF5C0C5-8499-453A-AA06-368A9E218262}.Debug|Any CPU.Build.0 = Debug|Any CPU
20 | {DDF5C0C5-8499-453A-AA06-368A9E218262}.Release|Any CPU.ActiveCfg = Release|Any CPU
21 | {DDF5C0C5-8499-453A-AA06-368A9E218262}.Release|Any CPU.Build.0 = Release|Any CPU
22 | {EBF0AE26-F3AB-4FF7-8197-75F17D7F9402}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
23 | {EBF0AE26-F3AB-4FF7-8197-75F17D7F9402}.Debug|Any CPU.Build.0 = Debug|Any CPU
24 | {EBF0AE26-F3AB-4FF7-8197-75F17D7F9402}.Release|Any CPU.ActiveCfg = Release|Any CPU
25 | {EBF0AE26-F3AB-4FF7-8197-75F17D7F9402}.Release|Any CPU.Build.0 = Release|Any CPU
26 | {CBE2C21F-50C2-4CC2-82E0-263EE265612C}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
27 | {CBE2C21F-50C2-4CC2-82E0-263EE265612C}.Debug|Any CPU.Build.0 = Debug|Any CPU
28 | {CBE2C21F-50C2-4CC2-82E0-263EE265612C}.Release|Any CPU.ActiveCfg = Release|Any CPU
29 | {CBE2C21F-50C2-4CC2-82E0-263EE265612C}.Release|Any CPU.Build.0 = Release|Any CPU
30 | EndGlobalSection
31 | GlobalSection(SolutionProperties) = preSolution
32 | HideSolutionNode = FALSE
33 | EndGlobalSection
34 | GlobalSection(ExtensibilityGlobals) = postSolution
35 | SolutionGuid = {4B25BA6A-936B-47CB-9896-89F6A884EAC5}
36 | EndGlobalSection
37 | EndGlobal
38 |
--------------------------------------------------------------------------------
/export-data/export-data/Bound.cs:
--------------------------------------------------------------------------------
1 | using Azure.Search.Documents.Indexes.Models;
2 | using Azure.Search.Documents.Models;
3 | using Azure.Search.Documents;
4 |
5 | namespace export_data
6 | {
7 | ///
8 | /// Potential value for the sortable and filterable field, used as a bound between different potential partitions
9 | ///
10 | public static class Bound
11 | {
12 | public static async Task