├── .github └── workflows │ └── main.yml ├── Concepts ├── adf-data-flow-column-pattern.md ├── adf-data-flow-datasets.md ├── adf-data-flow-debug-mode.md ├── adf-data-flow-execute-data-flow.md ├── adf-data-flow-inspect-pane.md ├── adf-data-flow-json.md ├── adf-data-flow-move-nodes.md ├── adf-data-flow-parameter-file.md ├── adf-data-flow-reference-node.md ├── adf-data-flow-schema-drift.md ├── adf-data-flow-ssis.md ├── archive │ └── adf-data-flow-linked-service.md └── data-flow-error-rows.md ├── Cosmos Pipeline.zip ├── Data Flows Tutorial Q2 CY21.docx ├── README.md ├── SkipLines.zip ├── TOC.yml ├── Transformations ├── adf-data-flow-aggregate-transform.md ├── adf-data-flow-conditional-split-transform.md ├── adf-data-flow-derived-column-transform.md ├── adf-data-flow-exists-transformation.md ├── adf-data-flow-expression-builder.md ├── adf-data-flow-filter-transform.md ├── adf-data-flow-join-transformation.md ├── adf-data-flow-lookup-transformation.md ├── adf-data-flow-new-branch.md ├── adf-data-flow-pivot-transformation.md ├── adf-data-flow-select-transformation.md ├── adf-data-flow-sink-transformation.md ├── adf-data-flow-sort-transform.md ├── adf-data-flow-source-transformation.md ├── adf-data-flow-surrogate-key-transformation.md ├── adf-data-flow-transformations-optimize-tab.md ├── adf-data-flow-union-transformation.md ├── adf-data-flow-unpivot-transformation.md ├── adf-data-flow-window-transformation.md └── readme.md ├── adf-data-flow-expression-functions.md ├── adf-data-flow-faq.md ├── adf_dataflow_overview.md ├── archives ├── ADF Data Flow Get Started in 12 Steps.md ├── ADFDataFlowGettingStarted.md ├── AzureDatabricksModeForDataFlow.md ├── Help-Databricks-NoMoreCores.md └── readme.md ├── data-flow-cluster-monitor.md ├── data-flow-expression-samples.md ├── data-flow-monitoring.md ├── data-flow-reserved-capacity-overview.md ├── data-flow-samples-template.md ├── data-flow-script.md ├── data-flow-understand-reservation-charges.md ├── data-flow-versioning.md ├── findDelimiter.zip ├── images ├── AddSubcolumn.png ├── ComplexColumn.png ├── accesstoken.png ├── adb.png ├── adb50.png ├── adw1.png ├── adw2.png ├── adw3.png ├── agg.png ├── agg2.png ├── agg3.png ├── agghead.png ├── agghead1.png ├── automap.png ├── azureir1.png ├── bb_debug1.png ├── bb_inspect1.png ├── bb_ssms1.png ├── bb_ssms2.png ├── branch2.png ├── cd1.png ├── ce1.png ├── ce10.png ├── ce2.png ├── ce3.png ├── ce4.png ├── ce5.png ├── ce6.png ├── ce7.png ├── ce8.png ├── ce9.png ├── cfdf001.png ├── chart.png ├── columnpattern.png ├── columnpattern2.png ├── comments.png ├── dafl1.png ├── dafl2.png ├── dafl3.png ├── dafl4.png ├── databricks.png ├── dataflowls.png ├── datapreview.png ├── dataset1.png ├── dbls001.png ├── dc.png ├── dc1.png ├── debug1.png ├── debug2.png ├── debugbutton.png ├── defaultformat.png ├── dfls2.png ├── errorow1.png ├── errors1.png ├── existingcluster.png ├── exp1.png ├── exp2.png ├── exp3.png ├── exp4.png ├── exp4b.png ├── exp5.png ├── expb1.png ├── expb2.png ├── expression.png ├── exsits.png ├── extdep.png ├── extend.png ├── folderpath.png ├── gentoken.png ├── join.png ├── joinoptimize.png ├── keycols.png ├── lookup1.png ├── lsconnections.png ├── maxcon.png ├── menu.png ├── mon001.png ├── mon002.png ├── mon003.png ├── mon004.png ├── mon005.png ├── mon1.png ├── multi1.png ├── newdataflowactivity.png ├── newresource.png ├── nullchart.png ├── opt001.png ├── opt002.png ├── params.png ├── pipe1.png ├── pivot1.png ├── pivot2.png ├── pivot3.png ├── pivot4.png ├── pivot5.png ├── portal.png ├── redf001.png ├── referencenode.png ├── resource1.png ├── scd7.png ├── schemadrift001.png ├── select001.png ├── selfjoin.png ├── sink1.png ├── sink2.png ├── soccer1.png ├── soccer2.png ├── soccer3.png ├── soccer4.png ├── sort.png ├── source.png ├── source003.png ├── source1.png ├── source2.png ├── source3.png ├── sourcepart.png ├── sourceparts.png ├── sources5.png ├── sql001.png ├── stats.png ├── storeage.png ├── surrogate.png ├── tags.png ├── tags1.png ├── taxi1.png ├── taxidrift1.png ├── taxidrift2.png ├── template.png ├── template1.png ├── template2.png ├── union.png ├── unpivot1.png ├── unpivot3.png ├── unpivot4.png ├── unpivot5.png ├── unpivot6.png ├── unpivot7.png ├── upload.png ├── v2dataflowportal.png ├── vers1.png ├── vers2.png ├── windows1.png ├── windows2.png ├── windows3.png ├── windows4.png ├── windows5.png ├── windows6.png ├── windows7.png ├── windows8.png └── windows9.png ├── mapping-data-flow-overview.md ├── media ├── 09b0f0e02aaede3d38acf46a6dcb8644.png ├── 2da2e7aee362dad223964fd741982505.png ├── 3bd6be2be665dff8a97d48df4fd1326b.png ├── 7f882e627a336827d8890f3fa71110df.png ├── 9904436a39ba0b54a6e030eb7ad0540f.png ├── 9e9d9af6ef098ed75b62f96211dcf313.png ├── a0b2dbe0b01e1a3f4a9a6b17ab57bbe2.png ├── a35981c95cd51d0b13ecd090b3b97cfe.png ├── a51be05cea35390eb8052f25fb152eef.png ├── adb1.png ├── adf-data-flows.png ├── af068303e7906e297c666307bf12d39b.png ├── azureir1.png ├── b682cdfd3971c23f096c21f194defe81.png ├── c6511f8763cfc590a0e2262cdc960442.png ├── comments2.png ├── d242a4c1928463417119ab08248e1e37.png ├── dataset1.png ├── defaultformat.png ├── e117d32a02042ba72df328a372931772.png ├── eba63d158e958a245b6686819ba0d5ac.png ├── f3a2eff81e3af2a1775407d2c410b71f.png ├── fb18c5028f040939979273b045c5ca5a.png ├── readme ├── sink1.png ├── sink2.png ├── source1.png ├── source2.png ├── source3.png └── sql001.png ├── patterns ├── adf-data-flow-patterns.md └── adfdataflowlinks.md ├── sampledata ├── Address.csv ├── Adventure Works SQL.zip ├── AdventureWorks Data.zip ├── AdventureWorksSchemas.xlsx ├── Currency_ARS.txt ├── Currency_AUD.txt ├── Currency_BRL.txt ├── Currency_CAD.txt ├── Currency_CNY.txt ├── Currency_DEM.txt ├── Currency_EUR.txt ├── Currency_FRF.txt ├── Currency_GBP.txt ├── Currency_JPY.txt ├── Currency_MXN.txt ├── Currency_SAR.txt ├── Currency_USD.txt ├── Currency_VEB.txt ├── DeltaPipeline.zip ├── DimProducts.csv ├── Distinct Rows All.zip ├── DynaColsPipe.zip ├── Flatten Orders.zip ├── Generic SCD Type1.zip ├── Kromer-ADF-Mapping-Data-Flows-ReadMe.docx ├── Load Multiple Tables.zip ├── MovieAnalytics.zip ├── Moving Average.zip ├── Partition Output by Size.zip ├── Product2.csv ├── Products.csv ├── SCD Pipeline2.zip ├── SQL Orders to CosmosDB.zip ├── SampleCurrencyData.txt ├── SearchLog.tsv ├── loans.csv ├── metadatapipeline.zip ├── movies.csv ├── moviesDB.csv ├── moviesDB2.csv ├── names │ ├── 90_name_records.csv │ └── readme.md ├── names2.csv ├── orders.json ├── powerquery │ ├── employeedemo │ │ ├── EmployeeInfo.csv │ │ └── EmployeeSalary.csv │ ├── moviedemo │ │ └── moviesDB.csv │ └── taxidemo │ │ ├── trip_data_1.csv │ │ └── trip_fare_1.csv ├── readme ├── scdtype2 │ ├── Employee1.csv │ ├── employee2.csv │ └── readme.md ├── sinkIfMoreThanNRows.zip ├── sinkIfMoreThanNRows2.zip ├── small_radio_json.json ├── summaryStats2.zip ├── synapse-dataflows-tutorial-001.docx ├── synapse-dataflows-tutorial-004.docx ├── trip_data_1.csv └── trip_fare_1.csv └── videos └── readme.md /.github/workflows/main.yml: -------------------------------------------------------------------------------- 1 | name: My Workflow 2 | on: [push, pull_request] 3 | jobs: 4 | build: 5 | runs-on: ubuntu-latest 6 | steps: 7 | - uses: actions/checkout@master 8 | - name: Find and Replace 9 | uses: jacobtomlinson/gha-find-replace@master 10 | with: 11 | find: "Data Flow" 12 | replace: "DataFlow" 13 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-column-pattern.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Concepts 2 | 3 | ## Column Patterns 4 | 5 | Several ADF Data Flow transformations support the idea of "Columns Patterns" so that you can create template columns based on patterns instead of hard-coded column names. You can use this feature within the Expression Builder to define patterns to match columns for transformation instead of requiring you to pick exact, specific field names. This is very useful when your incoming source fields change often, particularly in the case of changing columns in text files or NoSQL databases. This is sometimes referred to as "Schema Drift". 6 | 7 | ![column patterns](../images/columnpattern2.png "Column Patterns") 8 | 9 | This is very useful for handling Schema Drift scenarios or general scenarios where you are not able always fully know or assume each column name. You can pattern match on column name and column data type and build an expression for transformation that will perform that operation against any field in the data stream that matches your `name` & `type` patterns. 10 | 11 | When adding an expression to a transform that accepts patterns instead of only exact field name matches, pick "Add Column Pattern" in order to utlize schema drift column matching patterns. 12 | 13 | When building template column patterns, you can use `$$` in the expression to represent a reference to each matched field from the input data stream. 14 | 15 | If you utilize one of the Expression Builder regex functions, you can then subsequently use $1, $2, $3 ... to reference the sub-patterns matched from your regex expression. 16 | 17 | An example of using Column Patterns is to SUM a series of incoming fields for an Aggregate calculations in the Aggregate transformation. You can then SUM on every match of field types that match "integer" and then use $$ to reference each match in your expression. 18 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-datasets.md: -------------------------------------------------------------------------------- 1 | # Datasets are an ADF construct that define the data that you are working with in your pipeline. 2 | 3 | In Data Flow, row & column level data requires a much more finely-grained definition than is required by Datasets within ADF pipeline control flow requirements. 4 | 5 | Therefore, you use ADF Datasets in Source & Sink transforms to define the basic data schema (if you have schema in your data, otherwise you do not load schema and use Schema Drift), data types, data formats, location and connection (as part of the associated Linked Service). 6 | 7 | ## ADF Preview Service for Data Flows 8 | 9 | There are currently 5 direct connectors available to you in Data Flow: 10 | 11 | 1. ADLS Gen 1 (MSI & AKV authentication not currently supported) 12 | 2. ADLS Gen 2 13 | 3. Blob Storage 14 | 4. Azure SQL DB 15 | 5. Azure SQL DW 16 | 17 | ![Scource Transformation options](../images/sources5.png "source 5") 18 | 19 | Data flow in ADF is built upon the premise that you will stage your data for transformation in Spark. Therefore, if your source data is not located in one of those above listed stores, or you are required to sink your data into a destination not listed there, use the Copy Activity after your Data Flow Activity in an ADF Pipeline to then move your data into your final sink, or to stage the data from the other 70+ ADF connectors via Copy into one of the above supported staging stores. 20 | 21 | **NOTE: Not all properties from the Data Set definitions in ADF are currently honored by Data Flow. The UI will make an attempt to notify you of configuration differences between Copy & Data Flow interpretation of generic Dataset properties.** 22 | 23 | ## ADF V2 GA Service with Data Flows 24 | 25 | Here you will find 4 datasets that you can use: 26 | 27 | 1. Azure SQL DB 28 | 2. Azure SQL DW 29 | 3. Parquet 30 | 4. Delimited Text 31 | 32 | This separates the source *type* from the Linked Service connection type. Previously, you chose the connection type (Blob, ADLS) and then the type of file. Now, with Data Flow, you will pick the source types, which can be associated with different Linked Service connection types. 33 | 34 | ![Scource Transformation options](../images/dataset1.png "sources") 35 | 36 | There is a new "Data Flow Compatible" checkbox on the top right of the create Dataset panel. Clicking that button will filter only the datasets that can be used with Data Flows. 37 | 38 | ## Import schemas 39 | 40 | When importing the schema of Data Flow datasets, you will see an Import Schema button. Clicking that button will present you with 2 options: Import from the source or import from a local file. In most cases, you will import the schema directly from the source. However, if you have a richly-defined schema file, you can point to that local file and ADF will define the schema based upon that schema file. 41 | 42 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-debug-mode.md: -------------------------------------------------------------------------------- 1 | 2 | # ADF Data Flow Debug Mode 3 | 4 | * [ADF Data Flow Debug Session, Pt 1](https://www.youtube.com/watch?v=k0YHmJc14FM) 5 | * [ADF Data Flow Debug Session: Data Prep, Pt 2](https://www.youtube.com/watch?v=6ezKRDgK3rE) 6 | 7 | Azure Data Factory Data Flow has a debug mode which can be switched on from the Debug button at the top of the design surface. 8 | 9 | 10 | 11 | ## Overview 12 | When Debug mode is on, you will interactively build your data flow with a running Azure Databricks interactive cluster. The session will close once you turn debug off in ADF. You should be aware of the hourly charges incurred by Azure Databricks during the time that you have the debug session turned on. 13 | 14 | In most cases, it is a good practice to build your Data Flows in debug mode so that you can validate your business logic and view your data transformations before publishing your work in ADF. 15 | 16 | Use an Azure Databricks interactive cluster for debug workloads. It is recommend to not use auto-scaling of your cluster and to use limited/sampled datasets for testing in ADF Data Flow debug sessions. If you need a larger cluster when debugging, manually resize the Databricks cluster to a large instance. 17 | 18 | ## Debug Mode On 19 | When you switch on debug mode, you will be prompted with a side-panel form that will request you to point to your interactive Azure Databricks cluster and select options for the source sampling. You must use an interactive cluster from Azure Databricks and select either a sampling size from each your Source transforms, or pick a text file to use for your test data. 20 | 21 | 22 | 23 | **NOTE: When running in Debug Mode in Data Flow, your data will not be written to the Sink transform. A Debug session is intended to serve as a test harness for your transformations. Sinks are not required during debug and are ignored in your data flow. If you wish to test writting the data in your Sink, execute the Data Flow from an ADF Pipeline and use the Debug execution from a pipeline** 24 | 25 | ## Debug Settings 26 | Debug settings can be Each Source from your Data Flow will appear in the side panel and can also be edited by selecting "source settigns" on Data Flow design toolbar. You can select the limits and/or file source to use for each your Source transformation here. You can also select which Databricks cluster that you'd like to use for debug. 27 | 28 | ## Cluster status 29 | There is a cluster status indicator at the top of the design surface that will turn green when the cluster is ready for debug. If your cluster is already warm, then the green indicator will appear almost instantly. If your cluster was not already running when you entered debug mode, then you will have to wait 5-7 minutes for the cluster to spin up. The indiciator light will be yellow until it is ready. Once your cluster is ready for Data Flow debug, the indicator light will turn green. 30 | 31 | When you are finished with your debugging, turn the Debug switch off so that your Azure Databricks cluster can terminate. 32 | 33 | 34 | 35 | ## Data Preview 36 | With debug on, the Data Preview tab will light-up on the bottom panel. Without degub mode on, Data Flow will show you only the current metadata in and out of each of your transformations in the Inspect tab. The data preview will only query the number of rows that you have set as your limit in your source settings. You may need to click "Fetch data" to refresh the data preview. 37 | 38 | 39 | 40 | ## Columns Stats 41 | Selecting individual columns in your data preview tab will pop-up a chart on the far-right of your data grid with details about each field. ADF will make a determination based upon the data sampling of which type of chart to display. High-cardinality fields will default to NULL / NOT NULL charts while categorical and numeric data that has low cardinality will display bar charts showing data value frequency. You will also see max / len length of string fields, min / max values in numeric fields, standard dev, percentiles, counts and average. 42 | 43 | 44 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-execute-data-flow.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | To build a logical data flow, you will use the Reource Explorer on the pipeline builder screen by clicking the plus (+) symbol to create a new Data Flow resource. Data Flows that you author will appear in the Resource Explorer in the Data Flows folder. 4 | 5 | 6 | 7 | In order to schedule, manage, monitor and operationalize a Data Flow, you will first build a pipeline. Go to Resource Explorer to select the plus sign again, this time to create a new pipeline resource. Within that pipeline, add a Data Flow activity. Select the Data Flow that you have already built and choose that Data Flow for execution. When the pipeline executes, that Data Flow will execute within the pipeline sequence when the Data Flow activity is reached. 8 | 9 | ## Execute Data Flow Activity in Preview Service 10 | 11 | When you add a Data Flow activity to your pipeline, you will be prompted to either select an existing Data Flow defintion or to create a new one. Then, within that activity properties, you must set the Linked Service to point to your Azure Databricks account. 12 | 13 | Use the "Existing Cluster" for debugging / designing data flows for a quick and interactive experience. Use the "New Job Cluster" choice for operationalized pipelines that are executing pipelines on a schedule. 14 | 15 | You will also need to point to an Azure Blob Storage that we can use as a staging area for data transformation. Use a Linked Service for Azure Blob and then enter your container/folder name for a staging location. 16 | 17 | 18 | 19 | ### Execute Data Flow Activity in ADF V2 20 | 21 | If you are using the public ADF V2 version of Data Flows, you will not require a Linked Service because ADF will execute the Data Flow on internal ADF-managed Azure Databricks clusters. 22 | 23 | 1. Select the Azure Integration Runtime to define the region location of the ADF compute you'd like to use for the data flow execution. Currently, only the auto-resolve IR is supported. This means that ADF will utilize the Databricks cluster in the same region as your Factory. 24 | 25 | 2. Choose the compute type. Choose Compute Optimized for large datasets, General Purpose for general workloads, or Memory Optimized for data flows that rely heavily on aggregations and computations. 26 | 27 | 3. Select the Core Count to determine how many scale-out cores of Spark that ADF should use to execute your Data Flow. 28 | 29 | 30 | 31 | ### Parameterized Datasets 32 | 33 | In the Execute Data Flow activity, make sure to set a value for any parameters that you have in your dataset 34 | 35 | 36 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-inspect-pane.md: -------------------------------------------------------------------------------- 1 | ![Inspect Pane](../images/agg3.png "Inspect Pane") 2 | 3 | The Inspect Pane is enabled on most data transformations, with the exception of Source, Sink and Union. 4 | 5 | This widget provides a view into the metadata of the data stream that you are transforming. You will be able to see the column counts, columns changed, columns added, data types, column ordering and column references. This is a read-only view of your metadata. 6 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-json.md: -------------------------------------------------------------------------------- 1 | # JSON 2 | ## Creating JSONs in UI 3 | ### Derived Column Transformation 4 | Adding a complex column to your data flow is easier through the derived column expression editor. After adding a new column and opening the editor, there are two options: enter the JSON structure manually or use the UI to add subcolumns interactively. 5 | 6 | #### Interactive UI JSON Design 7 | From the output schema side pane, new subcolumns can be added using the `+` menu: 8 | ![](../images/AddSubcolumn.png "Add subcolumn") 9 | From there, new columns and subcolumns can be added in the same way. For each non-complex field, an expression can be added in the expression editor to the right. 10 | ![](../images/ComplexColumn.png "Complex column") 11 | 12 | #### Manual JSON Design 13 | To manually add a JSON structure, add a new column and enter the expression in the editor. The expression follows the following general format: 14 | ``` 15 | @( 16 | field1=0, 17 | field2=@( 18 | field1=0 19 | ) 20 | ) 21 | ``` 22 | If this expression were entered for a column named "complexColumn" then it would be written to the sink as the following JSON: 23 | ``` 24 | { 25 | "complexColumn": { 26 | "field1": 0, 27 | "field2": { 28 | "field1": 0 29 | } 30 | } 31 | } 32 | ``` 33 | 34 | #### Sample Manual DSL 35 | ``` 36 | @( 37 | title=Title, 38 | firstName=FirstName, 39 | middleName=MiddleName, 40 | lastName=LastName, 41 | suffix=Suffix, 42 | contactDetails=@( 43 | email=EmailAddress, 44 | phone=Phone 45 | ), 46 | address=@( 47 | line1=AddressLine1, 48 | line2=AddressLine2, 49 | city=City, 50 | state=StateProvince, 51 | country=CountryRegion, 52 | postCode=PostalCode 53 | ), 54 | ids=[ 55 | toString(CustomerID), toString(AddressID), rowguid 56 | ] 57 | ) 58 | ``` 59 | 60 | ## Source Format Options 61 | ### Default 62 | ``` 63 | { "json": "record 1" } 64 | { "json": "record 2" } 65 | { "json": "record 3" } 66 | ``` 67 | 68 | ### Single document 69 | #### Option one 70 | ``` 71 | [ 72 | { 73 | "json": "record 1" 74 | }, 75 | { 76 | "json": "record 2" 77 | }, 78 | { 79 | "json": "record 3" 80 | } 81 | ] 82 | ``` 83 | 84 | #### Option two 85 | ``` 86 | File1.json 87 | { 88 | "json": "record 1" 89 | } 90 | File2.json 91 | { 92 | "json": "record 2" 93 | } 94 | File3.json 95 | { 96 | "json": "record 3" 97 | } 98 | ``` 99 | 100 | ### Unquoted column names 101 | ``` 102 | { json: "record 1" } 103 | { json: "record 2" } 104 | { json: "record 3" } 105 | ``` 106 | 107 | ### Has comments 108 | ``` 109 | { "json": /** comment **/ "record 1" } 110 | { "json": "record 2" } 111 | { /** comment **/ "json": "record 3" } 112 | ``` 113 | 114 | ### Single quoted 115 | ``` 116 | { 'json': 'record 1' } 117 | { 'json': 'record 2' } 118 | { 'json': 'record 3' } 119 | ``` 120 | 121 | ### Backslash escaped 122 | ``` 123 | { "json": "record 1" } 124 | { "json": "\} \" \' \\ \n \\n record 2" } 125 | { "json": "record 3" } 126 | ``` 127 | 128 | # Higher order functions 129 | ## filter 130 | ### Description 131 | Filters elements out of the array that do not meet the provided predicate. Filter expects a reference to one element in the predicate function as #item. 132 | 133 | ### Examples 134 | ``` 135 | filter([1, 2, 3, 4], #item > 2) => [3, 4] 136 | filter(['a', 'b', 'c', 'd'], #item == 'a' || #item == 'b') => ['a', 'b'] 137 | ``` 138 | 139 | ## map 140 | ### Description 141 | Maps each element of the array to a new element using the provided expression. Map expects a reference to one element in the expression function as #item. 142 | 143 | ### Examples 144 | ``` 145 | map([1, 2, 3, 4], #item + 2) => [3, 4, 5, 6] 146 | map(['a', 'b', 'c', 'd'], #item + '_processed') => ['a_processed', 'b_processed', 'c_processed', 'd_processed'] 147 | ``` 148 | 149 | ## reduce 150 | ### Description 151 | Accumulates elements in an array. Reduce expects a reference to an accumulator and one element in the first expression function as #acc and #item and it expects the resulting value as #result to be used in the second expression function. 152 | 153 | ### Examples 154 | ``` 155 | reduce([1, 2, 3, 4], 0, #acc + #item, #result) => 10 156 | reduce(['1', '2', '3', '4'], '0', #acc + #item, #result) => '01234' 157 | reduce([1, 2, 3, 4], 0, #acc + #item, #result + 15) => 25 158 | ``` 159 | 160 | ## sort 161 | ### Description 162 | Sorts the array using the provided predicate function. Sort expects a reference to two consecutive elements in the expression function as #item1 and #item2. 163 | 164 | ### Examples 165 | ``` 166 | sort([4, 8, 2, 3], compare(#item1, #item2)) => [2, 3, 4, 8] 167 | sort(['a3', 'b2', 'c1'], 168 | iif(right(#item1, 1) >= right(#item2, 1), 1, -1)) => ['c1', 'b2', 'a3'] 169 | sort(['a3', 'b2', 'c1'], 170 | iif(#item1 >= #item2, 1, -1)) => ['a3', 'b2', 'c1'] 171 | ``` 172 | 173 | ## contains 174 | ### Description 175 | Returns true if any element in the provided array evaluates as true in the provided predicate. Contains expects a reference to one element in the predicate function as #item. 176 | 177 | ### Examples 178 | ``` 179 | contains([1, 2, 3, 4], #item == 3) => true 180 | contains([1, 2, 3, 4], #item > 5) => false 181 | ``` 182 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-move-nodes.md: -------------------------------------------------------------------------------- 1 | ![Agg Transformation options](../images/agghead1.png "aggregator header") 2 | 3 | On each transformation you will see an "Incoming Steam" field on the tranformation's settings pane. 4 | 5 | This tells you which incoming data streaming is feeding that transformation. You can change the physical location of your transform node on the graph by clicking the Incoming Stream name and selecting another data stream. The current transformation along with all subsequent transforms on that stream will then move to the new location. 6 | 7 | If you are moving a transformation with 1 or more transformations after it, then the new location in the data flow will be joined via a new branch. 8 | 9 | If you have no subsequent transformations after the node you've selected, then only that transform will move to the new location. 10 | 11 | Since ADF Data Flow is a "construction" design paradigm, this is the mechanism to use to reorder your data flow in lieu of drag & drop actions. 12 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-parameter-file.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Concepts 2 | 3 | ## Parameter Files 4 | 5 | * [ADF Data Flow: Parameter File](https://www.youtube.com/watch?v=Is7yTyvidMU) 6 | 7 | You can use parameter files in ADF Data Flows in order to make your data flow more flexible and reusable. Use a text file or database table as the parameters container, lookup the parameter values with a Lookup transformation, and then use a Filter transformation to filter the rows that match the incoming value in the parameters file. 8 | 9 | 1. Add a 2nd source to your data flow, which will point to your file with parameter values 10 | ![parameter file](../images/ce1.png "Parameter File") 11 | 12 | 2. Use a Lookup transformation to join the main data stream with the lookup parameters file 13 | ![lookup parameter file](../images/ce2.png "Lookup Parameter File") 14 | 15 | 3. Filter the rows using the Filter transformation. Match on the incoming params value to the matching column in the data source 16 | ![filter parameter file](../images/ce4.png "Filter Parameter File") 17 | 18 | 4. Your data flow will now work against whichever values are found at the time of execution in your parameters file 19 | ![final parameter file](../images/ce10.png "Final Parameter File") 20 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-reference-node.md: -------------------------------------------------------------------------------- 1 | ![Reference Node](../images/referencenode.png "reference node") 2 | 3 | A reference node is automatically added to the canvas by Data Flow to signify that the node which it is attached to references another existing node on the canvas. 4 | 5 | For example: When you Join or Union more than one stream of data, the Data Flow canvas may add a reference node that reflects the name and settings of the non-primary incoming stream. 6 | 7 | The reference node cannot be moved or deleted. However, you can click into the node to modify the originating transformation settings. 8 | 9 | The UI rules that govern when Data Flow adds the reference node are based upon available space and vertical spacing between rows. 10 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-schema-drift.md: -------------------------------------------------------------------------------- 1 | # ADF Data Flow Schema Drift 2 | 3 | * [ADF Data Flow Schema Drift Handling](https://www.youtube.com/watch?v=vSTn_aGq3C8) 4 | 5 | The concept of Schema Drift is the case where your sources often change metadata. Fields, columns, types, etc. can be added, removed or changed on the fly. Without handling for Schema Drift, your Data Flow becomes vulnerable to changes in upstream data source changes. When incoming columns and fields change, typical ETL patterns fail because they tend to be tied to those source names. 6 | 7 | In order to protect against Schema Drift, it is important to have the facilities in a Data Flow tool to allow you, as a Data Engineer, to: 8 | 9 | 1. Define sources that have mutable field names, data types, values and sizes 10 | 2. Define transformation parameters that can work with data patterns instead of hard-coded fields and values 11 | 3. Define expressions that understand patterns to match incoming fields, instead of using named fields 12 | 13 | In ADF Data Flow, those facilities are surfaced through this worflow: 14 | 15 | 1. Choose "Allow Schema Drift" in your Source Tranformation 16 | 17 | 18 | 19 | 2. When you've selected this option, all incoming fields will be read from your source on every Data Flow execution and will be passed through the entire flow to the Sink. 20 | 21 | 3. Make sure to use "Auto-Map" to map all new fields in the Sink Transformation so that all new fields get picked-up and landed in your destination: 22 | 23 | 24 | 25 | 4. Everything will work when new fields are introduced in that scenario with a simple Source -> Sink (aka Copy) mapping. 26 | 27 | 5. To add transformations in that workflow that handles schema drift, you can use pattern matching to match columns by name, type, and value. 28 | 29 | 6. Click on "Add Column Pattern" in the Derived Column or Aggregate transformation if you wish to create a transformation that understands "Schema Drift". 30 | 31 | 32 | 33 | *NOTE* You need to make an architectural decision in your data flow to accept schema drift throughout your flow. When you do this, you can protect against schema changes from the sources. However, you will lose early-binding of your columns and types throughout your data flow. ADF treats schema drift flows as late-binding flows, so when you build your transformations, the column names will not be avaiable to you in the schema views throuhgout the flow. 34 | 35 | ## Example 36 | 37 | 38 | 39 | In the Taxi Demo sample Data Flow, there is a sample Schema Drift in the bottom data flow with the TripFare source. In the Aggregate transformation, notice that we are using the "column pattern" design for the aggregation fields. Instead of naming specific columns, or looking for columns by position, we assume that the data can change and may not appear in the same order between runs. 40 | 41 | In this example of ADF Data Flow schema drift handling, we've built and aggregation that scans for columns of type 'double', knowing that the data domain contains prices for each trip. We can then perform an aggregrate math calculation across all double fields in the source, regardless of where the column lands and regardless of the column's naming. 42 | 43 | The ADF Data Flow syntax uses $$ to represent each matched column from your matching pattern. You can also match on column names using complex string search and regular expression functions. In this case, we are going to create a new aggregated field name based on each match of a 'double' type of column and append the text ```_total``` to each of those matched names: 44 | 45 | ```concat($$, '_total')``` 46 | 47 | Then, we will round and sum the values for each of those matched columns: 48 | 49 | ```round(sum ($$))``` 50 | 51 | You can test this out with the ADF Data Flow sample "Taxi Demo". Switch on the Debug session using the Debug toggle at the top of the Data Flow design surface so that you can see your results interactively: 52 | 53 | 54 | -------------------------------------------------------------------------------- /Concepts/adf-data-flow-ssis.md: -------------------------------------------------------------------------------- 1 | # ADF Data Flow for SSIS Developers 2 | 3 | | SSIS | ADF | 4 | |----------------------|------------------| 5 | | Destination | Sink | 6 | | Toolbox | Plus Sign | 7 | | Multicast | New Branch | 8 | | Data Flow DTS Engine | Azure Databricks | 9 | | Connection Manager | Connections | 10 | | SSDT | UI | 11 | | Connections | Linked Service | 12 | | Joins (Merge Join) | Join (Hash or replicated / broadcast join) | 13 | | Column Selectivity | As opposed to dropping columns in transformation steps, ADF Data Flow passes all columns through the entire Data Flow. This way, we can better handle schema drift conditions. To remove columns from your flow, use the Select transform | 14 | | Auto-create destination table | Sink Transformation SQL DB & SQL DW option in Dataset. Use the "Edit" checkbox and enter the name of the new destination table to be auto-created. | 15 | 16 | For a detailed look at the transformation comparisons between SSIS & ADF Data Flow, I highly recommend [this blog post from Kamil Nowinski.](https://sqlplayer.net/2018/12/azure-data-factory-v2-and-its-available-components-in-data-flows/) 17 | 18 | -------------------------------------------------------------------------------- /Concepts/archive/adf-data-flow-linked-service.md: -------------------------------------------------------------------------------- 1 | # ADF Data Flow Linked Services 2 | 3 | Before you can begin debugging or executing your Data Flows, you must link to your Azure Databricks account. You can create new Databricks and Data Flow Linked Services from the Connections section in the ADF UI or when you add a Data Flow activity to the Pipeline canvas. Choose New > Linked Service. 4 | 5 | 6 | 7 | 8 | 9 | There are 2 ways to connect Data Flow to your Azure Databricks account. You must first ensure that you have an Azure Databricks Workspace: https://docs.microsoft.com/en-us/azure/azure-databricks/quickstart-create-databricks-workspace-portal 10 | 11 | ## Data Flow Linked Service 12 | 13 | 14 | 15 | The ADF Data Flow Linked Service can take care of standing-up your Databricks cluster for you based on debug session (interactive cluster) or pipeline execution (job cluster). This type of Linked Service removes the need for you, as a Data Engineer, to choose Databricks configurations. Simply select Small, Medium or Large size for your execution environment. 16 | 17 | In the Linked Service definition, either pick the Workspace from your Azure Subscription in the drop-down or manually enter the URL for the workspace. 18 | 19 | You will need to provide the access token key for your Azure Databricks account so that we can connect to your clusters for execution of your Data Flows. [Click here for more details on obtaining Azure Databricks access key tokens](https://docs.databricks.com/api/latest/authentication.html#generate-token) 20 | 21 | 1. Small would be appropriate for debug and testing 22 | 3. Medium is best for normal operationalized workloads 23 | 3. Large should be used to execute very large big data workloads with auto-scaling 24 | 25 | You can create separate Linked Services for each environment and then choose the appropriately-sized Linked Service based on your current workload. 26 | 27 | There is also a custom size where you can then go through the size configurations manually, while still allowing ADF to pick the right type of cluster to execute based on debug (interactive cluster) or pipeline (job cluster) execution environment. 28 | 29 | ## Azure Databricks Linked Service 30 | 31 | 32 | 33 | You can manually configure a Linked Service to Databricks as opposed to using the Data Flow linked service above. In this case, you are responsible for configuring the cluster settings in the Linked Service and determining when to use an Interactive Cluster vs. a Job Cluster. 34 | 35 | **For designing, debugging Data Flows interactively, you must use the "Existing Cluster" option so that ADF can use an Azure Databricks interactive cluster. This way, the cluster is already warm and Data Flows can execute immediately. For operationalized, scheduled pipelines with Data Flows, the "New Cluster" option will work fine because you won't be impacted by the 5-7 minute spin-up time for a new job cluster when you schedule your Data Flow jobs nightly** 36 | 37 | In the Linked Service definition, either pick the Workspace from your Azure Subscription in the drop-down or manually enter the URL for the workspace. 38 | 39 | You will need to provide the access token key for your Azure Databricks account so that we can connect to your clusters for execution of your Data Flows. 40 | 41 | 42 | 43 | [Click here for more details on obtaining Azure Databricks access key tokens](https://docs.databricks.com/api/latest/authentication.html#generate-token) 44 | 45 | You must then select whether you wish to have ADF Data Flow automatically start-up a job cluster on every Data Flow execution, or do you want to re-use an existing cluster. 46 | 47 | If you select "New job cluster", Data Flow will startup a new Databricks job cluster on each execution of your data flow. You will incur a spin-up wait time in the neighborhood of 5-7 minutes before your data flow will execute. 48 | 49 | If you select "Existing Cluster", Data Flow will utilize your existing cluster. For this option, you will choose the Cluster Name from the list. This is the name you gave to your interactive cluster. In this case, you control the start/stop of your cluster to make it available for data flow. 50 | -------------------------------------------------------------------------------- /Concepts/data-flow-error-rows.md: -------------------------------------------------------------------------------- 1 | # Enable error row handling on Azure SQL DB sink types 2 | 3 | Add this string to the end of your ADF URL: ```&feature.errorrowhandling=true``` 4 | 5 | # Sink settings 6 | 7 | In the sink settings for an Azure SQL DB sink type, the default behavior is to fail on first error. You can modify this behavior by switching to "Continue on error" and setting the location where ADF will store the logs of the errored row details. 8 | 9 | ![Derived column settings](../images/errors1.png "Sink error settings") 10 | 11 | # Data flow monitoring output 12 | 13 | ![Derived column settings](../images/errorow1.png "Derived column settings") 14 | 15 | When viewing the output of your data flow execution in the ADF monitoring view, the count of the rows that were successfully and those that failed will be shown from the sink output panel. 16 | 17 | To view the rejected row details, use the Azure Storage Explorer to navigate to the folder container where you chose to store the log files. 18 | -------------------------------------------------------------------------------- /Cosmos Pipeline.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/Cosmos Pipeline.zip -------------------------------------------------------------------------------- /Data Flows Tutorial Q2 CY21.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/Data Flows Tutorial Q2 CY21.docx -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ADF Mapping Data Flows are now GENERALLY AVAILABLE 2 | # The docs in this repo are archived and no longer updated 3 | 4 | Please see Data Flow documentation under Azure Data Factory docs: https://docs.microsoft.com/en-us/azure/data-factory/data-flow-create 5 | -------------------------------------------------------------------------------- /SkipLines.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/SkipLines.zip -------------------------------------------------------------------------------- /TOC.yml: -------------------------------------------------------------------------------- 1 | - name: Data flow reserved capacity overview 2 | href: data-flow-reserved-capacity-overview.md 3 | - name: Data flow understand reservation charges 4 | href: data-flow-understand-reservation-charges.md 5 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-aggregate-transform.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Aggregate 4 | 5 | The Aggregate transformation is where you will define aggregations of columns in your data streams. In the Expression Builder, you can define different types of aggregations (i.e. SUM, MIN, MAX, COUNT, etc ...) and create a new field in your output that includes these aggregations with optional group-by fields. 6 | 7 | ![Agg Transformation options](../images/agg.png "agg 1") 8 | 9 | *NOTE: Aggregate transforms will only output the columns used in the aggregation. I.e. only the fields used in group-by and the aggregated fields will be passed on to the next transformation in your data flow. If you wish to include the previous columns in your flow, use a New Branch from the previous step and use the self-join pattern to connect the flow with the original metadata.* 10 | 11 | ### Group By 12 | (Optional) Choose a Group-by clause for your aggregation and use either the name of an existing column or a new name. Use "Add Column" add more group-by clauses and click on the text box next to the column name to launch the Expression Builder to either select just an existing column, combination of columns or expressions for your grouping. 13 | 14 | ### The Aggregate Column tab 15 | (Required) Choose the Aggregate Column tab to build the aggregation expressions. You can either choose an existing column to overwrite the value with the aggregation, or create a new field with the new name for the aggregation. The expression that you wish to use for the aggregation will be entered in the right-hand box next to the column name selector. Clicking on that text box will open up the Expression Builder. 16 | 17 | Use the ADF Data Flow Expression Language to describe the column transformations in the Expression Builder: https://aka.ms/dataflowexpressions. 18 | 19 | ![Agg Transformation options](../images/agg2.png "aggregator") 20 | 21 | ### Data Preview in Expression Builder 22 | 23 | Note that in Debug mode, expression builder cannot produce data previews with Aggregate functions. You must view data previews from the Aggregate transformation, outside of the expression builder. 24 | 25 | ### Use Aggregates for Data Deduping 26 | 27 | You can use the grouping function in the Aggregate Transformation to group data by an attribute that can help identify and remove duplicates from your data sources: [ADF Data Flow: Deduplication of your Data](https://www.youtube.com/watch?v=OLenvYwg__I). 28 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-conditional-split-transform.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Conditional Split 4 | 5 | The Conditional Split transformation can route data rows to different streams depending on the content of the data. The implementation of the Conditional Split transformation is similar to a CASE decision structure in a programming language. The transformation evaluates expressions, and based on the results, directs the data row to the specified strean. This transformation also provides a default output, so that if a row matches no expression it is directed to the default output. 6 | 7 | ![conditional split](../images/cd1.png "conditional split") 8 | 9 | To add additioinal conditions, select "Add Stream" in the bottom configuration pane and click in the Expression Builder text box to build your expression. 10 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-derived-column-transform.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | * [Video: Clean Addresses with Derived Columns](https://www.youtube.com/watch?v=axEYbuU3lmw) 4 | 5 | ## Derived Column 6 | 7 | Use the Derived Column transformation to generate new columns in your data flow or to modify existing incoming data columns. Use the ADF Data Flow Expression Language to describe the column transformations in the Expression Builder: https://aka.ms/dataflowexpressions. 8 | 9 | ![derive column](../images/dc1.png "Derived Column") 10 | 11 | You can perform multiple Derived Column actions in a single Derived Column transformation. Click "Add Column" to transform more than 1 column in the single transformation step. 12 | 13 | In the Column field, either select an existing column to overwrite with a new derived value, or click "Create New Column" to generate a new column with the newly derived value. 14 | 15 | The Expression text box will open the Expression Builder where you can build the expression for the derived columns using expression functions. 16 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-exists-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Exists Transformation 4 | 5 | The Exists transformation is a row filter transforms that allows you to filter rows in your data that exists (SQL WHERE EXISTS) or do not exist (SQL WHERE NOT EXISTS). The resuling rows from your data stream after this transformation will either include all rows where column values from source 1 exist in source 2 or do NOT exist in source 2. 6 | 7 | ![Exists settings](../images/exsits.png "exists 1") 8 | 9 | ### Settings 10 | 11 | #### Fed by 12 | Choose the 2nd source for your Exists so that Data Flow can compare values from Stream 1 against Stream 2 13 | 14 | Select the column from Source 1 and from Source 2 whose values you wish to check against for Exists or Not Exists. 15 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-expression-builder.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Expression Builder 4 | 5 | In ADF Data Flow, you will find expression boxes where you can enter expressions for data transformation that utilizes columns, fields, variables, parameters, functions from your data flow. To build the expression, you use the Expression Builder, which can be launched by clicking in the expression text box within each transformation. You will also sometimes see "Computed Column" options when selecting columns for transformation. When you click that, you will also see the Expression Builder launched. 6 | 7 | ![Expression Builder](../images/exp1.png "Expression Builder") 8 | 9 | The Expression Builder tool defaults to the text editor option with auto-complete from the entire ADF Data Flow object model with syntax checking and highlighting. 10 | 11 | ![Expression Builder auto-complete](../images/expb1.png "Expression Builder auto-complete") 12 | 13 | ## Currently Working on Field 14 | 15 | ![Expression Builder](../images/exp3.png "Currently Working On") 16 | 17 | At the top left of the Expression Builder UI, you will see a field called "Currently Working On" with the name of the field that you are currently working on. The expression that you build in the UI will be applied just to that currently working field. If you wish to transform another field, you can save your current work and use this drop-down to select another field and build an expression for the other fields. 18 | 19 | ## Data Preview in Debug mode 20 | 21 | ![Expression Builder](../images/exp4b.png "Expression Data Preview") 22 | 23 | When you are working on your expressions, you can optionally switch on Debug mode from the ADF Data Flow design surface, enabling live in-progress preview of your data results from the expression that you are buidling. This enables real-time debugging of your expression code. 24 | 25 | ![Debug Mode](../images/debugbutton.png "Debug Button") 26 | 27 | Click the Refresh button when you are ready to update the results and test your expressions. 28 | 29 | ![Expression Builder](../images/exp5.png "Expression Data Preview") 30 | 31 | ## Comments 32 | 33 | You can generate comments in your expressions using single line and multi-line comment syntax: 34 | 35 | ![Comments](../images/comments.png "Comments") 36 | 37 | ## Regular Expressions 38 | 39 | The ADF Data Flow expression language, [full reference documentation here](http://aka.ms/), enables functions that include regular expression syntax. When using regular expression functions, the Expression Builder will try to interpret backslash (\) as an escape character sequence. So, when using backslashes in your regular expression, either enclose the entire regex in ticks ` ` or use a double backslash. 40 | 41 | I.e. using ticks 42 | 43 | ``` 44 | regex_replace('100 and 200', `(\d+)`, 'digits') 45 | ``` 46 | or using double slash 47 | ``` 48 | regex_replace('100 and 200', '(\\d+)', 'digits') 49 | ``` 50 | 51 | ## Addressing Array Indexes 52 | 53 | When utilizing expression functions that return arrays, use square brackets [] to address specific indexes inside that returned array object. Note that the array is 1-based. 54 | 55 | ![Expression Builder array](../images/expb2.png "Expression Data Preview") 56 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-filter-transform.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Filter 4 | 5 | The Filter transforms is a row filtering transform that takes an expression as its parameter. Click in the text box to launch the Expression Builder. This is where you can build a filter expression which allows you to control which rows from the current data stream are allowed to pass through (filter) to the next transformation. 6 | 7 | ![Filter Transformation](../images/scd7.png "Filter") 8 | 9 | I.e. filter on loan_status column: 10 | 11 | ``` 12 | in([‘Default’, ‘Charged Off’, ‘Fully Paid’], loan_status). 13 | ``` 14 | 15 | Filter on the year column in the Movies demo: 16 | 17 | ``` 18 | year > 1980 19 | ``` 20 | 21 | ## Tips for using Filter transform 22 | 23 | The Filter transform can be very useful when used directly after a Source. You can use the Filter transform as the query predicate to a full table query from the source. ADF Data Flow is smart enough to take your end-to-end flows and optimize the execution utilizing pushdown techniques when available. So, using a Filter transform against what appears like a complete table scan in the design view may not actually execute as such when you attach your Data Flow to a pipeline. It is best to experiment with different techniques and use the Monitoring in ADF to gather timings and partition counts on your Data Flow activity executions. 24 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-join-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | * [Video: ADF Data Flow Joins](https://www.youtube.com/watch?v=zukwayEXRtg) 4 | 5 | ## Join 6 | 7 | Use Join to combine data from 2 tables in your Data Flow. Add the Join transform with the Plus icon on the data stream where you would like to join and then inside the transform you will select the 2nd Join stream. 8 | 9 | ![Join Transformation](../images/join.png "Join") 10 | 11 | ### Join types 12 | 13 | Selecting Join Type is required for the Join transformation 14 | 15 | #### Inner Join 16 | 17 | Inner join will pass through only rows that match the column conditions from both tables 18 | 19 | #### Left Outer 20 | 21 | All rows from the left stream not meeting the join condition are passed through, and output columns from the other table are set to NULL in addition to all rows returned by the inner join. 22 | 23 | #### Right Outer 24 | 25 | All rows from the right stream not meeting the join condition are passed through, and output columns that correspond to the other table are set to NULL, in addition to all rows returned by the inner join. 26 | 27 | #### Full Outer 28 | 29 | Full Outer produces all columns and rows from both sides with NULL values for columns that are not present in the other table 30 | 31 | #### Cross Join 32 | 33 | Specific the cross product of the 2 streams with an expression and use this option to write a free-form expression for other JOIN types. 34 | 35 | ### Specify Join Conditions 36 | 37 | The Left Join condition is from the data stream connected to the left of your Join. The Right Join condition is the second data stream connected to your Join on the bottom, which will either be a direct connector to another stream or a reference to another stream. 38 | 39 | You are required to enter at least 1 (1..n) join conditions. They can be either directly-referenced fields selected from the drop-down menu, or expressions. 40 | 41 | ### Join Peformance Optimizations 42 | 43 | Please note that unlike Merge Join in tools like SSIS, ADF's Join in Data Flow is not a mandatory merge join operation. Therefore, the join keys do not need to be sorted first. The Join operation will occur in Spark using Databricks based on the optimal join operation in Spark: Broadcast / Map-side join: 44 | 45 | ![Join Transformation optimize](../images/joinoptimize.png "Join Optimization") 46 | 47 | If your dataset can fit into the Databricks worker node memory, we can optimize your Join performance. You can also specify partitioning of your data on the Join operation to create sets of data that can fit better into memory per worker. 48 | 49 | ### Self-Join 50 | 51 | You can achieve self-join conditions in ADF Data Flow by using the Select transformation to alias an existing stream. First, create a "New Branch" from a stream, then add a Select to alias the entire original stream. 52 | 53 | ![Self-join](../images/selfjoin.png "Self-join") 54 | 55 | In the above diagram, the Select transform is at the top. All it's doing is aliasing the original stream to "OrigSourceBatting". In the higlighted Join transform below it you can see that we use this Select alias stream as the right-hand join, allowing us to reference the same key in both the Left & Right side of the Inner Join. 56 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-lookup-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Lookup 4 | 5 | Use Lookup to add reference data from another source to your Data Flow. The Lookup transform requires a defined source that points to your reference table and matches on key fields. 6 | 7 | ![Lookup Transformation](../images/lookup1.png "Lookup") 8 | 9 | Select the key fields that you wish to match on between the incoming stream fields and the fields from the reference source. You must first have created a new source on the Data Flow design canvas to use as the right-side for the lookup. 10 | 11 | When matches are found, the resulting rows and columns from the reference source will be added to your data flow. You can choose which fields of interest that you wish to include in your Sink at the end of your Data Flow or use the Select tranformation for column selectivity. 12 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-new-branch.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | Branching will take the current data stream in your data flow and replicate it to another stream. This allows you to perform multiple sets of different operations and transformations against the same data stream. 4 | 5 | Example: You have a Source Transform that includes a selected set of columns with data type conversions and then place a Derived Column immediately following that Source. In the Derived Column, you've create a nwe field that combines first name and last name to make a new "full name" field. 6 | 7 | You can treat that new stream with a set of transformations and a sink on one row and use New Branch to create a copy of that stream to perform a completely different set of transformations and sink on another row. 8 | 9 | ** NOTE: "New Branch" will only show as an action on the + Transformation menu when there is a subsequent transformation following the current location where you are attempting to branch. i.e. You will not see a "New Branch" option at the end here until you add another transformation after the Select: 10 | 11 | ![Branch](../images/branch2.png "Branch 2") 12 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-pivot-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Pivot 4 | 5 | * [ADF Data Flow Pivot Transformation](https://www.youtube.com/watch?v=Tua14ZQA3F8&t=34s) 6 | 7 | Use Pivot in ADF Data Flow as an aggregation where one or more grouping columns has its distinct row values transformed into individual columns. Essentially, you can Pivot row values into new columns (turn data into metadata). 8 | 9 | 10 | 11 | ### Group By 12 | 13 | 14 | 15 | First, set the columns that you wish to group by for your pivot aggregation. You can set more than 1 column here with the + sign next to the column list. 16 | 17 | ### Pivot Key 18 | 19 | 20 | 21 | The Pivot Key is the column that ADF will pivot from row to column. By default, each unqiue value in the dataset for this field will pivot to a column. However, you can optionally enter the values from the dataset that you wish to pivot to column values. 22 | 23 | ### Pivoted Columns 24 | 25 | 26 | 27 | Lastly, you will choose the aggregation that you wish to use for the pivoted values and how you would like the columns to be displayed in the new output projection from the transformation. 28 | 29 | (Optional) You can set a naming pattern with a prefix, middle, and suffix to be added to each new column name from the row values. 30 | 31 | For instance, pivoting "Sales" by "Region" would simply give you new column values from each sales value, i.e. "25", "50", "1000", etc. However, if you set a prefix value of "Sales " 32 | 33 | 34 | 35 | Setting the Column Arrangement to "Normal" will group together all of the pivoted columns with their aggregated values. Setting the columns arrangment to "Lateral" will alternate between column and value. 36 | 37 | #### Aggregation 38 | 39 | To set the aggregation you wish to use for the pivot values, click on the field at the bottom of the Pivoted Columns pane. You will enter into the ADF Data Flow expression builder where you can build an aggregation expression and provide a descriptive alias name for your new aggregated values. 40 | 41 | Use the ADF Data Flow Expression Language to describe the pivoted column transformations in the Expression Builder: https://aka.ms/dataflowexpressions. 42 | 43 | #### How to rejoin original fields 44 | *NOTE: The Pivot transformation will only project the columns used in the aggregation, grouping, and pivot action. If you wish to include the other columns from the previous step in your flow, use a New Branch from the previous step and use the self-join pattern to connect the flow with the original metadata* 45 | 46 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-select-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Select Transformation 4 | 5 | Use this transformation for column selectivity (reducing number of columns) or to alias columns and stream names. 6 | 7 | In ADF Data Flow, all of your columns will automatically propagate throughout your streams. As your columns accumulate, you can use the Select transform to "select" the columns that you wish to keep. You will see stream names appended to the column names to indicate the origins and lineage of those columns to help you make the proper determination. 8 | 9 | While you can always choose column selectivity in the Sink transform at the end of your Data Flow, maintaining column hygiene may help you to prune your column lists. The downside to this approach is that you will lose that metadata downstream and will not be able to access it once you've dropped it from your metadata with a Select. 10 | 11 | The Select transform allows you to alias an entire stream, or columns in that stream, assign different names (aliases) and then reference those new names later in your data flow. This is very useful for self-join scenarios. The way to implement a self-join in ADF Data Flow is to take a stream, branch it with "New Branch", then immediately afterward, add a "Select" transform. That stream will now have a new name that you can use to join back to the original stream, creating a self-join: 12 | 13 | ![Self-join](../images/selfjoin.png "Self-join") 14 | 15 | In the above diagram, the Select transform is at the top. All it's doing is aliasing the original stream to "OrigSourceBatting". In the higlighted Join transform below it you can see that we use this Select alias stream as the right-hand join, allowing us to reference the same key in both the Left & Right side of the Inner Join. 16 | 17 | Select can also be used as a way de-select columns from your data flow. For example, if you have 6 columns defined in your sink, but you only wish to pick a specific 3 to transform and then flow to the sink, you can select just those 3 by using the select transform. 18 | 19 | ### NOTE: You must switch off Select All to pick only specific columns ### 20 | 21 | ## Options 22 | 23 | The default setting for "Select" is to include all incoming columns and keep those original names. You can alias the stream by setting the name of the Select transform. 24 | 25 | To alias individual columns, deselect "Select All" and use the column mapping at the bottom. 26 | 27 | ![Select Transformation](../images/select001.png "Select Alias") 28 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-sink-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Sink Transformation 4 | 5 | **NOTE: If you are using Data Flows on the ADF V2 GA Version, please go to the ADF V2 GA Version section at the bottom of this article for upates** 6 | 7 | 8 | 9 | At the completion of your data flow transformation, you can sink your transformed data into a destination dataset. In the Sink transformation, you can choose the dataset definition that you wish to use for the destination output data. Data Flow debug mode does not require a sink. No data is written and no files are moved or deleted in Data Flow debug mode. You must execute your data flow using the Exec Data Flow activity for the Sink to execute. You are required to have at least 1 Sink in order to publish your data flow for pipeline execution. 10 | 11 | A common practice to account for changing incoming data and to account for schema drift is to sink the output data to a folder without a defined schema in the output dataset. You can additionally account for all column changes in your sources by selecting "Allow Schema Drift" at the Source and then auto-map all fields in the Sink. 12 | 13 | You can choose to overwrite, append, or fail the data flow when sinking to a dataset. 14 | 15 | You can also choose "automap" to simply sink all incoming fields. If you wish to choose the fields that you want to sink to the destination, or if you would like to change the names of the fields at the destination, choose "Off" for "automap" and then click on the Mapping tab to map output fields: 16 | 17 | 18 | 19 | ### Output to single File 20 | For Azure Storage Blob or Data Lake sink types, you will output the transformed data into a folder. Spark will generate partitioned output data files based on the partitioning scheme being used in the Sink transform. You can set the partitioning scheme by clicking on the "Optimize" tab. If you would like ADF to merge your output into a single file, click on the "Single Partition" radio button. 21 | 22 | 23 | 24 | ### Data Lake Folders 25 | When Sinking your data transformations to Azure Blob Store or ADLS, choose a data lake *folder* as your destination folder path, not a file. ADF Data Flow will generate the output files for you in that folder. 26 | 27 | 28 | 29 | ### Azure Blob Folders 30 | When sinking your data to Azure Blob Store datasets, make sure to choose a blob *folder* inside of a container, i.e.: container/folder. Do not land your data directlying in a container, create an output folder inside your container. 31 | 32 | **PLEASE NOTE: Not all Dataset properties in Blob and ADW are configured for use within Data Flow during the preview period. Currently, ADF supports both a straight-forward Copy Activity as well as data tranformation-based Data Flow capability, both of which utilize Datasets. All of the Dataset properties present today work with Copy Activity. The UI will try to notify you interactively of which properties are not recognized by Data Flow. We will updates these properties during each subsequent iteration of Data Flow** 33 | 34 | ### Azure SQL Data Warehouse and SQL Database Sink Datasets 35 | 36 | If you prefer to sink your transformed data directly into Azure SQL DW or Azure SQL DB instead of the Lake approach of landing transformed data into Blob or ADLS first, you can use Sink Datasets for Data Flow that are Azure SQL DB or DW. This will allow you to land your transformed data directly into Azure SQL DW within Data Flow without the need of adding a Copy Activity in your pipeline. 37 | 38 | Start by creating an ADW dataset, just as you would for any other ADF pipeline, with a Linked Service that includes your ADW credentials and choose the database that you wish to connect to. In the table name, either select an existing table or type in the name of the table that you would like Data Flow to auto-create for you. A new table will be generated in the target database using the incoming metadata schema. 39 | 40 | **NOTE: At this time, we are not supporting SQL Server square brackets " [ ] ", so please use the "Edit" link on the table name field and remove the brackets** 41 | 42 | 43 | 44 | Azure SQL DW Datasets require staging locations to be specified because ADF uses Polybase behind the scenes. You'll select the Storage account you wish to use for staging the data for the Polybase load into ADW. The path field is of the format: "containername/foldername". 45 | 46 | 47 | 48 | #### Save Policy 49 | 50 | Overwrite will truncate the table if it exists, then recreate it and load the data. Append will simply insert the new rows. If the table from the Dataset table name does not exist at all in the target ADW, Data Flow will create the table, then load the data. 51 | 52 | #### Field Mapping 53 | 54 | On the Mapping tab of your Sink transformation, you can map the incoming (left side) columns to the destination (right side). When you sink data flows to files, ADF will always write new files to a folder. When you map to a database dataset, you can choose to either generate a new table with this schema (set Save Policy to "overwrite") or insert new rows to an existing table and map the fields to the existing schema. 55 | 56 | You can use multi-select in the mapping table to Link multiple columns with one click, Delink multiple columns or map multiple rows to the same column name. 57 | 58 | 59 | 60 | If you'd like to reset your columns mappings, press the "Remap" button to reset the mappings. 61 | 62 | #### Max Concurrent Connections 63 | 64 | You can set the maximum concurrent connections in the Sink transformation when writing your data to an Azure database connection. 65 | 66 | 67 | 68 | ### Updates to Sink Transformation for ADF V2 GA Version 69 | 70 | 71 | 72 | 73 | 74 | 1. Allow Schema Drift and Validate Schema options are now available in Sink. This will allow you to instruct ADF to either fully accept flexible schema definitions (Schema Drift) or fail the Sink if the schema changes (Validate Schema). 75 | 76 | 2. Clear the Folder. ADF will truncate the sink folder contents before writing the destination files in that target folder. 77 | 78 | 3. File name options 79 | 80 | * Default: Allow Spark to name files based on PART defaults 81 | * Pattern: Enter a name for your output files 82 | * Per partition: Enter a file name per partition 83 | * As data in column: Set the output file to the value of a column 84 | 85 | **NOTE: File operations will only execute when you are running the Execute Data Flow activity, not while in Data Flow Debug mode** 86 | 87 | With the SQL sink types, you can set: 88 | 89 | * Truncate table 90 | * Recreate table (performs drop/create) 91 | * Batch size for large data loads. Enter a number to bucket writes into chunks. 92 | 93 | 94 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-sort-transform.md: -------------------------------------------------------------------------------- 1 | 2 | # Azure Data Factory Data Flow Transformations 3 | 4 | ## Sort 5 | 6 | ![Sort settings](../images/sort.png "Sort") 7 | 8 | The Sort transformation allows you to sort the incoming rows on the current data stream. The outcoming rows from the Sort Transformation will subsequently follow the ordering rules that you set. You can choose invidual columns and sort them ASC or DEC, using the arrow indicator next to each field. If you need to modify the column before applying the sort, click on "Computed Columns" to launch the expression editor. This will provide with an opportunity to build an expression for the sort operation instead of simply applying a column for the sort. 9 | 10 | You can turn on "Case insensitive" if you wish to ignore case when sorting string or text fields. 11 | 12 | "Sort Only Within Partitions" leverages the Spark data partitioning capability to sort incoming data only within each partition as opposed to the entire data stream. 13 | 14 | Each of the sort conditions in the Sort Transformation can be re-arranged. So if you need to move a column higher in the sort precendence, grab that row with your mouse and move it higher or lower in the sorting list. 15 | 16 | ## Partitioning effects on Sort 17 | 18 | ADF Data Flow is executing on big data Spark clusters in the backend that distributes data across multiple nodes and multiple data partitions. It is important to keep this in mind when architecting your data flow in ADF where you are depending on the Sort transform to keep data in that same order. If you choose repartition your data in a subsequent transformation, you may lose your sorting due to that reshuffling of data. 19 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-source-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Source on Preview Version 4 | 5 | The Source transformation configures a data source that you wish to use to bring data into your data flow. You may have more than 1 Source transform in a single Data Flow. This is where you will begin designing your Data Flows. 6 | 7 | **NOTE: Every Data Flow requires at least 1 Source Transformation (i.e. 1 to n)** 8 | 9 | ![Scource Transformation options](../images/source.png "source 1") 10 | 11 | Your Data Flow source must be associated with exactly 1 ADF Dataset, which defines the shape and location of your data to write to or read from. 12 | 13 | ### Data Flow Staging Areas 14 | 15 | ADF Data Flow has line-of-sight to 5 primary "staging" areas within Azure to perform your data transformations: Azure Blob, ADLS Gen 1, ADLS Gen 2, Azure SQL DB and Azure SQL DW. ADF has access to nearly 80 different native connectors, so to include those other sources of data into your Data Flow, first stage that data into one of those 5 primary Data Flow staging areas first by using the Copy Activity: 16 | 17 | * [ADF Data Flow: Staging Data](https://youtu.be/mZLKdyoL3Mo) 18 | 19 | ### Options 20 | 21 | #### Allow schema drift 22 | Select Allow Schema Drift if the source columns will change often. This setting will allow all incoming fields from your source to flow through the transformations to the Sink. 23 | 24 | #### Fail if columns in the dataset are not found 25 | Choose this option to enforce a Source schema validation that will fail your Data Flow if columns that are expected from your source are not present. 26 | 27 | #### Sampling 28 | Use Sampling to limit the number of rows from your Source. This is useful when you need just a sample of your source data for testing and debugging purposes. 29 | 30 | #### Define Schema 31 | 32 | ![Scource Transformation](../images/source2.png "source 2") 33 | 34 | #### You can modify the name of the source columns and their associated data types 35 | 36 | For source file types that are not strongly typed (i.e. flat files as opposed to Parquet files) you should define the data types for each field here in the Source transformation as opposed to in the Dataset. 37 | 38 | If you do not see the column names and types in your Data Flow, it is likely because you did not define them in the Define Schema section of the Sink. You will only need to do this if you are not using Data Flow's Schema Drift handling. 39 | 40 | Here in the "Define Schema" tab on the Source transformation is where you can set the data types and formats: 41 | 42 | ![Scource Transformation](../images/source003.png "data types") 43 | 44 | ### Optimize 45 | 46 | ![Scource Partitions](../images/sourcepart.png "partitioning") 47 | 48 | On the Optimize tab for the Source Transformation, you will see an additional paritioning type called "Source". This will only light-up when you have select Azure SQL DB as your source. This is because ADF will wish to parallelize connections to execute large queries against your Azure SQL DB source. 49 | 50 | Partioning data on your SQL DB source is optional. You should use this for large queries. You have 2 options: 51 | 52 | #### Column 53 | 54 | Select a column to partition on from your source table. You must also set the max number of connections. 55 | 56 | #### Query Condition 57 | 58 | You can optionally choose to partition the connections based on a query. For this option, simply put in the contents of a WHERE predicate. I.e. year > 1980 59 | 60 | ## Source on ADF V2 Data Flow 61 | 62 | ![Public Source](../images/source1.png "public source 1") 63 | 64 | ### New "Validate Schema" option 65 | 66 | Use this to enforce the defined schema from your source dataset. If the incoming version of the source data does not match the defined schema, then that execution of the data flow will fail. 67 | 68 | ![New Source Settings](../images/source2.png "New settings") 69 | 70 | ### New settings are available on the "Settings" tab: 71 | 72 | * Wilcard paths 73 | * List of Files (This is a file set. Point to a text file that you create with a list of relative path files to process) 74 | * Column to store file name (This will store the name of the file from the source in a column in your data. Enter a new name here to store the file name string) 75 | * After Completion (You can choose to do nothing with the source file after the data flow executes, delete the source file(s) or move them. The paths for move are relative paths.) 76 | 77 | ### New Settings for SQL Datasets 78 | 79 | When you are using Azure SQL DB or Azure SQL DW as your source, you will have additional options for: 80 | 81 | 1. Query: Enter a SQL query for your source 82 | 83 | 2. Batch size: Enter a batch size to chunk large data into batch-sized reads 84 | 85 | **NOTE: The file operation settings will only execute when the Data Flow is executed from a pipeline run (pipeline debug or execution run) using the Execute Data Flow activity in a pipeline. File operations do NOT execute in Data Flow debug mode** 86 | 87 | ### Projection replaces schema 88 | 89 | ![Projection](../images/source3.png "Projection") 90 | 91 | Similar to schemas in datasets, use Projection in Source to define the data types and formats from the source data. If you have a text file with no defined schema, click "Detect Data Type" to ask ADF to attempt to sample and infer the data types. You can set the default data formats for auto-detect using the "Define Default Format" button. 92 | 93 | ![Default formats](../images/source2.png "Default formats") 94 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-surrogate-key-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Surrogate Key Transformation 4 | 5 | Use the Surrogate Key Transformation to add an incrementing non-business arbitrary key value to your data flow rowset. This is useful when designing dimension tables in a star schema analytical data model where each member in your dimension tables needs to have a unique key that is a non-business key, part of the Kimball DW methodology. 6 | 7 | ![Surrogate Key Transform](../images/surrogate.png "Surrogate Key Transformation") 8 | 9 | "Key Column" is the name that you will give to your new surrogate key column. 10 | 11 | "Start Value" is the beginning point of the incremental value. 12 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-transformations-optimize-tab.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow 2 | 3 | ## Transformation Optimize Tab 4 | 5 | * [ADF Data Flow: Optimize Data Flows](https://www.youtube.com/watch?v=a2KtwUJngHo) 6 | 7 | Each Data Flow transformation will have an "Optimize" tab which contains optional settings to configure your partitioning schemes for your data flow. 8 | 9 | 10 | 11 | The default setting is "use current partitioning" which will instruct ADF to use the partitioning scheme native to Data Flows running on Spark in Azure Databricks. Generally, this is the recommended approach. 12 | 13 | However, there are instances where you may wish to adjust the partitioning. For instance, if you want to output your transformations to a single file in the lake, then chose "single partition" on the Optimize tab for partitioning in the Sink Transformation. 14 | 15 | Another case where you may wish to exercise control over the partitioning schemes being used for your data transformations is in terms of performance. Adjusting the partitioning of data provides a level of control over the distribution of your data across compute nodes and data locality optimizations that can have both positive as well as negative effects on your overall data flow performance. 16 | 17 | If you wish to change partitioning on any transformation, simply click the Optimize tab and select the "Set Partitioning" radio button. You will then be presented with a series of options for partitioning. The best method of partitioning to implement will differ based on your data volumes, candidate keys, null values and cardinality. Best practice is to start with default partitioning and then try the different partitioning options. You can test using the Debug run in Pipeline and then view the time spent in each transformation grouping as well as partition usage from the Monitoring view. 18 | 19 | 20 | 21 | ### Round Robin 22 | 23 | This is simple partition that automatically distributes data equally across partitions. This should only be used when you do not have good key candidates to implement a solid, smart partitioning strategy. You can set the number of physical partitions. 24 | 25 | ### Hash 26 | 27 | ADF will produce a hash of columns to produce uniform partitions such that rows with similar values will fall in the same partition. When using this option, test for possible partition skew. You can set the number of physical partitions. 28 | 29 | ### Dynamic Range 30 | 31 | This will use Spark dynamic ranges based on the columns or expressions that you provide. You can set the number of physical partitions. 32 | 33 | ### Fixed Range 34 | 35 | You must build an expression that provides a fixed range for values within your partitioned data columns. You should have a good understanding of your data before using this option in order to avoid partition skew. The value that enter for the expression will be used as part of a partition function. You can set the number of physical partitions. 36 | 37 | ### Key 38 | 39 | If you have a good understanding of the cardinality of your data, key partitioning may be a good partition strategy. This will create partitions for each unique value in your column. You cannot set the number of partitions because the number will be based on unique values in the data. 40 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-union-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Union 4 | 5 | Union will combine multiple data streams into one, with the SQL Union of those streams as the new output from the Union transformation. 6 | 7 | ![Union Transformation](../images/union.png "Union") 8 | 9 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-unpivot-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | ## Unpivot 4 | 5 | Use Unpivot as a way to turn a non-normalized dataset into a more normalized version by expanding values from multiple columns in a single record into multiple records with the same values in a single column. 6 | 7 | 8 | 9 | ### Ungroup By 10 | 11 | 12 | 13 | First, set the columns that you wish to group by for your pivot aggregation. Set more than one column to ungroup by using the + sign next to the column list. 14 | 15 | ### Unpivot Key 16 | 17 | 18 | 19 | The Pivot Key is the column that ADF will pivot from row to column. By default, each unqiue value in the dataset for this field will pivot to a column. However, you can optionally enter the values from the dataset that you wish to pivot to column values. 20 | 21 | ### Unpivoted Columns 22 | 23 | 24 | 25 | Lastly, you will choose the aggregation that you wish to use for the pivoted values and how you would like the columns to be displayed in the new output projection from the transformation. 26 | 27 | (Optional) You can set a naming pattern with a prefix, middle, and suffix to be added to each new column name from the row values. 28 | 29 | For instance, pivoting "Sales" by "Region" would simply give you new column values from each sales value. For example: "25", "50", "1000", ... However, if you set a prefix value of "Sales", then the pivoted columns will include "Sales" in the values. 30 | 31 | 32 | 33 | Setting the Column Arrangement to "Normal" will group together all of the pivoted columns with their aggregated values. Setting the columns arrangment to "Lateral" will alternate between column and value. 34 | 35 | 36 | 37 | The final unpivoted data result set shows the column totals now unpivoted into separate row values. 38 | -------------------------------------------------------------------------------- /Transformations/adf-data-flow-window-transformation.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow Transformations 2 | 3 | * [ADF Data Flow: Window Transformation](https://www.youtube.com/watch?v=m6zgbtY5AYQ) 4 | 5 | ## Window Functions 6 | 7 | The Window transformation is where you will define window-based aggregations of columns in your data streams. In the Expression Builder, you can define different types of aggregations that are based on data or time windows (aka SQL OVER clause) such as LEAD, LAG, NTILE, CUMEDIST, RANK, etc ...) and create a new field in your output that includes these aggregations with optional group-by fields. 8 | 9 | ![Window Transformation](../images/windows1.png "windows 1") 10 | 11 | ### Over 12 | This where you will set the partitioning of column data for your window transformation. This is equivalent to the Partition By in the Over clause in SQL. If you wish to create a calculation or create an expression to use for the partitioning, you can do that by hovering over the column name and select "computed column". 13 | 14 | 15 | 16 | ### Sort 17 | Also part of the Over clause is setting the Order By, or sort order. As is the case with the "Over" column selector, you can also create an expression for a calculate value in this column field for sorting. 18 | 19 | 20 | 21 | ### Range By 22 | Next, set the window frame as Unbounded or Bounded. To set an unbounded window frame, set the slider to Unbounded on both ends. If you choose a setting between Unbounded and Current Row, then you must set the Offset start and end values. Both values will be positive integers. You can use either relative numbers or values from your data. 23 | 24 | The window slider has 2 values to set: the values before the current row and the values after the current row. The Start and End offset matches the 2 selectors on the slider. 25 | 26 | 27 | 28 | ### Window Columns 29 | Lastly, use the Expression Builder to define the aggregations you wish to use with the data windows such as RANK, COUNT, MIN, MAX, DENSE RANK, LEAD, LAG, etc. 30 | 31 | 32 | 33 | The full list of aggregation and analytical functions available for you to use in the ADF Data Flow Expression Language via the Expression Builder are listed here: https://aka.ms/dataflowexpressions. 34 | 35 | -------------------------------------------------------------------------------- /Transformations/readme.md: -------------------------------------------------------------------------------- 1 | THIS IS AN OLD ARCHIVE OF ORIGINAL PRIVATE PREVIEW DOCUMENTATION FOR ADF DATA FLOWS. THESE DOCS ARE OUT OF DATE AND NO LONGER MAINTAINED. PLEASE VISIT THE CURRENT ONLINE AZURE DATA FACTORY DOCUMENTATION FOR MAPPING DATA FLOWS: https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview 2 | -------------------------------------------------------------------------------- /adf-data-flow-faq.md: -------------------------------------------------------------------------------- 1 | Q: Which ADF version do I use to create Data Flows? 2 | 3 | Use the ADF V2 version to create Data Flows 4 | 5 | Q: I was a previous private preview customer using Data Flows and I used the ADF V2 w/Data Flows preview version 6 | 7 | This version is now obsolete. Use ADF V2 for Data Flows 8 | 9 | Q: What has changed from private preview to limited public preview in Data Flows? 10 | 11 | You will no longer have to bring your own Databricks clusters. ADF will manage cluster creation and tear-down. Blob datasets and ADLS datasets are separated into Delimited Text and Parquet datasets. You can still use ADLS & Blob Store to store those files. Use the appropriate Linked Service for those storage engines. 12 | 13 | Q: Can I migrate my private preview factories to ADF V2? 14 | 15 | [Yes, follow the instructions here](https://www.slideshare.net/kromerm/adf-mapping-data-flow-private-preview-migration) 16 | 17 | Q: I need help troubleshooting my data flow logic, what do you need? 18 | 19 | When Microsoft provides help or troubleshooting with Data Flows, please provide the "DSL Code Plan". To do this, follow these steps: 20 | 21 | * From the Data Flow Designer, click "Code" in the top-right corner. This will display the editable JSON code for the data flow. 22 | * From the code view, click "Plan" on the top-right corner. The Plan swtich from JSON to the formatted DSL script plan. 23 | * Copy & paste this script or save it in a text file. 24 | 25 | Q: How do I access data using the other 80 dataset types in ADF? 26 | 27 | Data Flow currently allows Azure SQL DB, Azure SQL DW, Delimited Text files from Blob or ADLS, and Parquet files from Blob or ADLS natively for Source and Sink. Use the Copy Activity to stage data from any of the other connectors and then execute a Data Flow activity to transform data after it's been staged. For example, your pipeline will first Copy into Blob and then a Data Flow activity will use a dataset in Source to transform that data. 28 | -------------------------------------------------------------------------------- /adf_dataflow_overview.md: -------------------------------------------------------------------------------- 1 | # What is Data Flow in Azure Data Factory? 2 | 3 | Data Flow is a new feature, currently in limited preview, that sits inside of ADF and allows you to develop graphical data transformation logic that can be executed as activities within ADF Pipelines at scale using Spark. 4 | 5 | The intent of ADF Data Flow is to provide a fully visual experience with no coding required. Your Data Flows will execute on your own Azure Databricks cluster for scaled-out data processing using Spark. ADF handles all of the code translation, Spark optimization and execution of your data flow jobs. 6 | 7 | Start by creating data flows, then add a Data Flow activity to your pipeline to execute and test your data flow in Debug mode in the pipeline or use "Trigger Now" in the pipeline to test your Data Flow from a pipeline Activity. 8 | 9 | You will then operationalize your Data Flow by scheduling and monitoring your ADF pipeline that is executing the Data Flow activity. 10 | 11 | There is also a Debug Mode toggle switch on the Data Flow design surface to allow you to interactively build your data transformations in an interative data prep environment. 12 | 13 | ADF Data Flow has line-of-sight to 5 primary "staging" areas within Azure to perform your data transformations: Azure Blob, ADLS Gen 1, ADLS Gen 2, Azure SQL DB and Azure SQL DW. ADF has access to nearly 80 different native connectors, so to include those other sources of data into your Data Flow, first stage that data into one of those 5 primary Data Flow staging areas using the Copy Activity: 14 | 15 | * [ADF Data Flow: Staging Data](https://www.youtube.com/watch?v=zukwayEXRtg) 16 | -------------------------------------------------------------------------------- /archives/ADF Data Flow Get Started in 12 Steps.md: -------------------------------------------------------------------------------- 1 | by balakreshnan 2 | 3 | Here’s how to get started using Azure Data factory Data Flow 4 | ============================================================ 5 | 6 | This guide is for limted preview customers to get started with Azure data factory v2 with data flow using the preview version of ADF with Data Flows 7 | ---------------------------------------------------------------------------------------------------- 8 | 9 | 1. To build your first Data Flow Data Factory, use the Azure Portal to create "Azure Data Factory" and select "V2 with data flow (preview)". Be sure to include the sample Data Flows by checking the "sample" checkbox: 10 | 11 | 12 | 13 | 2. That will create your new Data Factory with Data Flows in the SE Asia region 14 | 15 | 3. Click on author & Monitor 16 | 17 | ![](media/09b0f0e02aaede3d38acf46a6dcb8644.png) 18 | 19 | That should take you to a new window which is azure data factory author main page 20 | 21 | 4. On the main page click on the left hand side menu there should be pencil 22 | 23 | ![](media/f3a2eff81e3af2a1775407d2c410b71f.png) 24 | 25 | 5. Click on Connections in the left bottom of the page to edit the Azure Databricks Linked Service 26 | 27 | ![](media/d242a4c1928463417119ab08248e1e37.png) 28 | 29 | ADF Data Flow currently only supports Azure Databricks version 5.0 30 | 31 | 32 | 33 | 6. Click on the Edit button as highlighted with black arrow 34 | 35 | ![](media/af068303e7906e297c666307bf12d39b.png) 36 | 37 | 7. Now here you need to change the Azure databricks cluster to existing if you 38 | want to use existing running clusters 39 | 40 | ![](media/adb1.png) 41 | 42 | 8. You will need to provide the access token key for your Azure Databricks account. [Here is more information on how to obtain that.](https://docs.databricks.com/api/latest/authentication.html#generate-token) 43 | 44 | 9. Make sure Select Existing Cluster 45 | 46 | 10. Then Get the cluster ID from the Azure Data bricks URL 47 | 48 | 11. Would be numbers and letters combined. If you go to azure Databricks 49 | workbench and select the cluster, you will see the Cluster ID in the Tags 50 | section at the bottom of the cluster page: 51 | 52 | ![](media/c6511f8763cfc590a0e2262cdc960442.png) 53 | 54 | 12. Copy and paste that in the configuration page. 55 | 56 | You're done! Now the Data factory is ready for submitting the jobs with data transformaions via Data Flow. 57 | -------------------------------------------------------------------------------- /archives/ADFDataFlowGettingStarted.md: -------------------------------------------------------------------------------- 1 | # Getting Started with ADF Data Flows for Limited Preview Customers 2 | 3 | * [Getting Started with ADF Data Flow ARM Template and Samples](https://www.youtube.com/watch?v=YhpHlyYWCyI) 4 | * [Build your first ADF Data Flow](https://youtu.be/WQ1KqsRL9Bg) 5 | 6 | Start by building a new Data Factory with Data Flows from the Azure Portal. Select V2 (data flow preview). Be sure to include the sample Data Flows by checking the "sample" checkbox: 7 | 8 | 9 | 10 | Once you've gone through building a new Azure Data Factory with the new version that enables Data Flow (use the ARM Template in the Samples directory), you can begin by experimenting with the 3 samples that will be loaded in your Factory from the ARM Template: Taxi Demo, Currency Demo and Movies Demo. 11 | 12 | To begin building your first Data Flow from scratch, begin by building a new ADF Pipeline in the Factory. From the Pipeline Builder UI, go to the left-hand resource explorer and click the + sign to build a new Data Flow. 13 | 14 | 15 | 16 | In the Data Flow canvas, give your Data Flow a name and start with a Source transform. Every Data Flow must have a least 1 Source to be a valid flow. 17 | 18 | Follow the samples or experiment with the different data transformations in the Data Flow canvas. You can add transforms by clicking the + sign that is next to each transform. 19 | 20 | End your Data Flow with a Sink to land your transformed data either back in Blob Storage or Azure Data Warehouse. 21 | 22 | Click "Validate" to see if you have any configuration errors. If it all checks out clean, then you can either Save your Data Flow, if it is being designed in Git mode, or Publish the changes, if you are designing your work directly against the ADF Service. 23 | 24 | Now you're ready to test your Data Flow. From the Pipeline view, add a new Data Flow activity. Select the name of the Data Flow that you just created and use your Azure Databricks account Linked Service. Instructions for how to configure Databricks can be found in the doc "First Data Flow" under Samples. 25 | 26 | ADF Data Flow currently only supports Azure Databricks version 5.0 27 | 28 | 29 | 30 | 31 | 32 | You can test your Data Flow from this pipeline with either a Debug run or a Trigger-Now run from the pipeline. Once you are satisified with your results, save or publish the latest changes and you're ready to operationalize your Data Flow. 33 | 34 | Scheduling, monitoring and source control is all the same process that ADF already supports. To view the results of Data Flow runs in the ADF Monitor view, be sure to click into the Data Flow activity from your pipeline run view. 35 | 36 | 37 | -------------------------------------------------------------------------------- /archives/AzureDatabricksModeForDataFlow.md: -------------------------------------------------------------------------------- 1 | # ADF Data Flow currently only supports Azure Databricks version 5.0 2 | 3 | 4 | 5 | ## There are 2 modes that ADF Data Flow will use to execute your Data Flows on Azure Databricks: New Job Cluster and Existing Cluster. 6 | 7 | ### New Job Cluster 8 | 9 | Use this mode when you have completed building and testing your Data Flow. This mode directs ADF to spin-up a new Azure Databricks job cluster on every exeuction of your Data Flow. It works well in operationalized ADF pipelines where Data Flow is an activity within that pipeline which can be scheduled to run on a calendar or event. This way, you do not incur the costs of the Azure Databricks cluster running without activity since ADF only spins-up the job cluster on demand. However, you must allow the cluster to warm-up, which can take 5-7 minutes in most cases. 10 | 11 | 12 | 13 | ### Existing Cluster 14 | 15 | The spin-up time of the "New Job Cluster" mode above does not work well with debugging due to the spin-up delay. Therefore, we recommend using the "Existing Cluster" option when debugging, building, designing Data Flows. The concept behind Debug sessions in ADF Data Flow is to work on limited / sampled data. So keep that in mind when you size your Azure Databricks interactive cluster for debug. We recommend that you do not use auto-scaling with your cluster for debug. Instead, manually resize the cluster when you are debugging large datasets. It is preferrable to debug with smaller datasets. 16 | 17 | #### Cluster ID 18 | 19 | There is an additional option in the Azure Databricks Linked Service that you must set when using an existing cluster. You will need to point Data Flow to the cluster that you wish to use. Each Azure Databricks workspace account can have many Databricks clusters, so you'll need to point the Linked Service to the interactive cluster you wish to use. For this, go to your Azure Databricks workspace in Azure and find the "Tags" section at the bottom of the page. There you will see a field called "Cluster ID". That cluster name will appear in the Linked Service. 20 | 21 | 22 | 23 | **NOTE: If your Azure Databricks cluster is failing when running your cluster, check the cluster error logs to see if you do not have enough cores / vCPUs enabled on your subscription. [Click here for details on the Azure Databricks cores limitation](https://github.com/kromerm/adfdataflowdocs/blob/master/Help-Databricks-NoMoreCores.md)** 24 | -------------------------------------------------------------------------------- /archives/Help-Databricks-NoMoreCores.md: -------------------------------------------------------------------------------- 1 | If you are using an Azure Subscription that is a trial subscription or has low quotas on cores / vCPUs, you may run into a situation where you do not have enough cores available in your subscription to run Azure Databricks. In that case, you must request additional cores. 2 | 3 | You can do this from the Azure Portal. Find the "subscriptions" resource and select your subscription. You will then see a category on the subscriptions blade for "Usage & quotas". Select that and you will then open a blade with a button at the top right for "Request Increase". 4 | 5 | Select vCPUs as the quote type and the VM types that you are using with Azure Databricks, most likely the DS3 series. 6 | 7 | From there, select the region where your Azure Databricks is stood up and increase the number of cores. After submitting your request, you should receive and email confirmation that your subscription has had the vCPU / cores limit increased based on your request. 8 | 9 | [The detailed documentation for requesting these increases is here](https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request) 10 | 11 | -------------------------------------------------------------------------------- /archives/readme.md: -------------------------------------------------------------------------------- 1 | THIS IS AN OLD ARCHIVE OF ORIGINAL PRIVATE PREVIEW DOCUMENTATION FOR ADF DATA FLOWS. THESE DOCS ARE OUT OF DATE AND NO LONGER MAINTAINED. PLEASE VISIT THE CURRENT ONLINE AZURE DATA FACTORY DOCUMENTATION FOR MAPPING DATA FLOWS: https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview 2 | -------------------------------------------------------------------------------- /data-flow-cluster-monitor.md: -------------------------------------------------------------------------------- 1 | # Data flows cluster report 2 | 3 | The ADF Data Flows cluster report is a by-request feature that enables users to self-serve analyzing cluster utilization for their ADF data flow pipelines by presenting resource utilization via Azure Log Analytics. Analyzing the results of your data flow executions can help to right-size your Azure Integration Runtime data flow configurations. 4 | 5 | ## Setup Log Analytics Workspace 6 | 7 | * Via Portal: https://docs.microsoft.com/en-us/azure/azure-monitor/logs/quick-create-workspace 8 | * ARM Template: https://docs.microsoft.com/en-us/azure/architecture/databricks-monitoring/dashboards#deploy-the-azure-log-analytics-workspace 9 | * PS: Deploying log analytics workspace using the ARM template will include useful queries specific to spark-monitoring. These queries could be customized for specific business requirement. 10 | 11 | ## Link Log Analytics Workspace with Data Factory 12 | Under “diagnostic setting” property for Data Factory, link Log Analytics workspace 13 | 14 | * PS: Spark monitoring expects only single log analytics workspace associated with the factory. If 0 or > 1 workspaces are linked to the factory, spark monitoring will not work. 15 | 16 | Assign Factory managed identity permissions on log analytics workspace using built-in or custom RBAC role : 17 | 18 | Option 1: Assign Built-in Monitoring contributor access on the workspace or the Resource group or subscription containing the workspace. 19 | https://docs.microsoft.com/en-us/azure/azure-monitor//roles-permissions-security#monitoring-contributor 20 | 21 | Option 2: Create custom Role with following permissions access on the workspace or the Resource group or subscription containing the workspace: 22 | https://docs.microsoft.com/en-us/azure/role-based-access-control/custom-roles 23 | 24 | ``` 25 | "actions": [ 26 | "*/read", 27 | "Microsoft.Insights/DiagnosticSettings/*", 28 | "Microsoft.OperationalInsights/workspaces/search/action", 29 | "Microsoft.OperationalInsights/workspaces/sharedKeys/action" 30 | ] 31 | ``` 32 | 33 | Sample query to correlate Activity Run id with cluster id to analyse cpuUsage: 34 | ``` 35 | ADFActivityRun 36 | | where ActivityType contains 'ExecuteDataflow' 37 | | where Status !in ('Queued', 'InProgress') 38 | | where Status == 'Succeeded' 39 | | extend output=parse_json(Output) 40 | | extend clusterId=tostring(output["runStatus"]["ClusterId"]) 41 | | extend IRName=substring(EffectiveIntegrationRuntime, 0, indexof(EffectiveIntegrationRuntime, "(") - 1) 42 | | project Status, ActivityRunId, IRName, output, clusterId 43 | | join ( 44 | SparkMetric_CL 45 | | where name_s contains "executor.cpuTime" 46 | | extend sname=split(name_s, ".") 47 | | extend executor=strcat(sname[0],".",sname[1]) 48 | | project TimeGenerated, cpuTime=count_d/1000000, executor, name_s, clusterId_s 49 | | join kind=inner ( 50 | SparkMetric_CL 51 | | where name_s contains "executor.RunTime" 52 | | extend sname=split(name_s, ".") 53 | | extend executor=strcat(sname[0],".",sname[1]) 54 | | project TimeGenerated, runTime=count_d, executor, name_s, clusterId_s 55 | ) on executor, TimeGenerated 56 | | extend cpuUsage=(cpuTime/runTime)*100 57 | //| summarize cpuUsage by bin(TimeGenerated, 1ms), clusterId_s 58 | ) on $left.clusterId == $right.clusterId_s 59 | | summarize max(cpuUsage) by bin(TimeGenerated, 1ms), strcat(ActivityRunId, '-', clusterId) 60 | | render timechart 61 | ``` 62 | 63 | This query can be customized to visualize cpu/memory usage per Activity/Integration Runtime/etc. 64 | 65 | Further References: 66 | * https://spark.apache.org/docs/latest/monitoring.html#metrics 67 | * https://docs.microsoft.com/en-us/azure/architecture/databricks-monitoring/dashboards 68 | 69 | -------------------------------------------------------------------------------- /data-flow-expression-samples.md: -------------------------------------------------------------------------------- 1 | ## Keyboard shortcuts 2 | 3 | * ```Ctrl-K Ctrl-C```: Comments entire line 4 | * ```Ctrl-K Ctrl-U```: Uncomment 5 | * ```F1```: Provide editor help commands 6 | * ```Alt-Down Arrow```: Move current line down 7 | * ```Alt-Up Arrow```: Move current line up 8 | * ```Cntrl-Space```: Show context help 9 | 10 | ## Manual Comments 11 | 12 | ```/* This is my comment */ 13 | 14 | /* This is a 15 | multi-line comment */ 16 | 17 | // This is a single line comment 18 | ``` 19 | 20 | TIP: If you put a comment at the top of your expression, it will appear in the transformation text box to document your transformation expressions: 21 | 22 | ![Comments](media/comments2.png "Comments") 23 | 24 | ## Convert Date to String format 25 | 26 | `toString(toDate('28/03/2010', 'dd/MM/yyyy'), 'ddMMMyyyy')` 27 | = 28Mar2010 28 | 29 | 30 | ## Concat Strings Shortcut 31 | 32 | 'This is my string.' + ' This is my new string.' 33 | = This is my string. This is my new string 34 | 35 | 36 | ## Regexp 37 | 38 | https://kromerbigdata.com/2019/01/02/azure-data-factory-data-flow-transform-data-with-regular-expressions/ 39 | 40 | * regexReplace(Address1,`[ ]{2}|\.`,' ') 41 | * regex_extract(Address1, `^(\d+)`, 1) 42 | * rlike(City,'^[A-G]') 43 | 44 | ## Good for Alter Row: 45 | 46 | `true()` 47 | 48 | Use it in your alter row filter will allow all rows to match that condition. Good for Upsert. No need to use 1==1. 49 | 50 | Or, if you want inequality (1==0): 51 | 52 | `false()` 53 | 54 | ## Use byName() to access "hidden fields" 55 | 56 | When you are working in the ADF Data Flow UI, you can see the metadata as you construct your transformations. The metadata is based on the projection of the source plus the columns defined in transformations. However, in some instances you do not get the metadata due to schema drift, column patterns, or dynamic transfomrations like Pivot that create column names on the fly. In that case, you byName(): 57 | 58 | `toString(byName('mynewcol'))` 59 | 60 | ## Fuzzy matching 61 | 62 | `Soundex(columnname)` 63 | 64 | ## isNull / coalesce 65 | 66 | `isNull (col1, 'somevalue')` 67 | 68 | or 69 | 70 | `coalesce(expression)` 71 | 72 | ## Lookup Match / No match 73 | 74 | After your Lookup transformation, you can use subsequent transformations to inspect the results of each match row by using the expression function `isMatch()` to make further choices in your logic based on whether or not the Lookup resulted in a row match or not. 75 | 76 | ## Regex to remove non-alphanumeric chars 77 | 78 | ``` 79 | regexReplace(mystring,`^a-zA-Z\d\s:`,'') 80 | ``` 81 | 82 | ## Convert to Timestamp 83 | 84 | `toString(toTimestamp('12/31/2016T00:12:00', 'MM/dd/yyyy\'T\'HH:mm:ss'), 'MM/dd /yyyy\'T\'HH:mm:ss')` 85 | 86 | Note that to include string literals in your timestamp output, you need to wrap your conversion inside of a toString(). 87 | 88 | Here is how to convert seconds from Epoch to a date or timestamp: 89 | 90 | `toTimestamp(seconds(1575250977))` 91 | 92 | ## How can I create a derived column that is a nullable timestamp, like C# DateTime or SSIS NULL(DT_DATE)? 93 | 94 | DateReported2 = 95 | CASE 96 | WHEN DateReported is null THEN DateReported 97 | WHEN YEAR(DateReported) = 1899 THEN NULL 98 | ELSE DateReported 99 | End 100 | ... 101 | 102 | Solution: 103 | 104 | case(year(DateReported) != 1899, DateReported) 105 | 106 | ## Row Counts 107 | 108 | To get Row Counts in Data Flows, add an Aggregate transformation, leave the Group By empty, then use `count(1)` as your aggregate function. 109 | 110 | ## Distinct Rows 111 | 112 | To get distinct rows in your Data Flows, use the Aggregate transformation, set the key(s) to use for distinct in your group by, then choose `First($$)` or `Last($$)` as your aggregate function using column patterns. 113 | 114 | ## Handling names with special characters 115 | 116 | When you have column names that include special characters or spaces, surround the name with curly braces. 117 | 118 | ```{[dbo].this_is my complex name$$$}``` 119 | 120 | -------------------------------------------------------------------------------- /data-flow-monitoring.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Data Flow 2 | 3 | * [ADF Data Flow Monitoring UX](https://www.youtube.com/watch?v=AYkwX6J9sII&t=4s) 4 | * [ADF Data Flow Data Lineage](https://www.youtube.com/watch?v=5KvqYF-y93s) 5 | 6 | ## Monitoring Data Flows 7 | 8 | After you have completed building and debugging your data flow, you will want to schedule your data flow to execute on a schedule within the context of a pipeline. You can schedule the pipeline from ADF using Triggers. Or you can use the Trigger Now option from the ADF Pipeline Builder to execute a single-run execution to test your data flow within the pipeline context. 9 | 10 | When you execute your pipeline, you will be able to monitor the pipeline and all of the activities contained in the pipeline including the Data Flow activity. Click on the monitor icon in the left-hand ADF UI panel. You will see a screen similar to the one below. The highlighted icons will allow you to drill into the activities in the pipeline, inlcuding the Data Flow activity. 11 | 12 | 13 | 14 | You will see stats at this level as well inculding the run times and status. The Run ID at the activity level is different that the Run ID at the pipeline level. The Run ID at the previous level is for the pipeline. Clicking the eyeglasses will give you deep details on your data flow execution. 15 | 16 | 17 | 18 | When you are in the graphical node monitoring view, you will see a simplified view-only version of your data flow graph. 19 | 20 | 21 | 22 | ### View Data Flow Execution Plans 23 | 24 | When your Data Flow is executed in Databricks, ADF determines optimal code paths based on the entirity of your data flow. Additionally, the execution paths may occur on different scale-out nodes and data partitions. Therefore, the monitoring graph represents the design of your flow, taking into account the execution path of your transformations. When you click on individual nodes, you will see "groupings" that represent code that was executed together on the cluster. The timings and counts that you see represent those groups as opposed to the individual steps in your design. 25 | 26 | 27 | 28 | 1. When you click on the open space in the monitoring window, the stats in the bottom pane will display timing and row counts for each Sink and the Transformations that led to the sink data for transformation lineage. 29 | 30 | 2. When you select individual transformations, you will receive additional feedback on the right-hand panel that shows partition stats, column counts, skewness (how evenly is the data distributed across partitions) and kurtosis (how spikey is the data). 31 | 32 | 3. When you click on the Sink in the node view, you will see column lineage. There 3 different methods that columns are accumulated throughout your data flow to land in the Sink. They are: 33 | 34 | a. Computed: You use the column for conditional processing or within an expression in your data flow, but do not land it in the Sink 35 | b. Derived: The column is a new column that you generated in your flow, i.e. it was not present in the Source 36 | c. Mapped: The column originated from the source and your are mapping it to a sink field 37 | 38 | ### Monitoring Icons 39 | 40 | This icon means that the transformation data was already cached on the cluster, so the timings and execution path have taken that into account: 41 | 42 | 43 | 44 | You will also see green circle icons in the transformation. They represent a count of the number of sinks that data is flowing into. 45 | 46 | 47 | -------------------------------------------------------------------------------- /data-flow-reserved-capacity-overview.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Save compute costs with reserved capacity 3 | description: Learn how to buy Azure Data Factory data flow reserved capacity to save on your compute costs. 4 | ms.topic: conceptual 5 | author: kromerm 6 | ms.author: makromer 7 | ms.service: data-factory 8 | ms.date: 01/25/2021 9 | --- 10 | # Save costs for resources with reserved capacity - Azure Data Factory data flows 11 | 12 | Save money with Azure Data Factory data flow costs by committing to a reservation for compute resources compared to pay-as-you-go prices. With reserved capacity, you make a commitment for ADF data flow usage for a period of one or three years to get a significant discount on the compute costs. To purchase reserved capacity, you need to specify the Azure region, compute type, core count, and term. 13 | 14 | You do not need to assign the reservation to a specific factory or integration runtime. Existing factories or newly deployed factories automatically get the benefit. By purchasing a reservation, you commit to usage for the data flow compute costs for a period of one or three years. As soon as you buy a reservation, the compute charges that match the reservation attributes are no longer charged at the pay-as-you go rates. 15 | 16 | You can buy reserved capacity in the [Azure portal](https://portal.azure.com). Pay for the reservation [up front or with monthly payments](https://docs.microsoft.com/azure/cost-management-billing/reservations/prepare-buy-reservation.md). To buy reserved capacity: 17 | 18 | - You must be in the owner role for at least one Enterprise or individual subscription with pay-as-you-go rates. 19 | - For Enterprise subscriptions, **Add Reserved Instances** must be enabled in the [EA portal](https://ea.azure.com). Or, if that setting is disabled, you must be an EA Admin on the subscription. Reserved capacity. 20 | 21 | For more information about how enterprise customers and Pay-As-You-Go customers are charged for reservation purchases, see [Understand Azure reservation usage for your Enterprise enrollment](https://docs.microsoft.com/azure/cost-management-billing/reservations/understand-reserved-instance-usage-ea) and [Understand Azure reservation usage for your Pay-As-You-Go subscription](https://docs.microsoft.com/azure/cost-management-billing/reservations/understand-reserved-instance-usage.md). 22 | 23 | > [!NOTE] 24 | > Purchasing reserved capacity does not pre-allocate or reserve specific infrastructure resources (virtual machines or clusters) for your use. 25 | 26 | ## Determine proper Azure IR sizes needed before purchase 27 | 28 | The size of reservation should be based on the total amount of compute used by the existing or soon-to-be-deployed data flows using the same compute tier. 29 | 30 | For example, let's suppose that you are executing a pipeline hourly using memory optimized with 32 cores. Further, let's supposed that you plan to deploy within the next month an additional pipeline that uses general purpose 64 cores. Also, let's suppose that you know that you will need these resources for at least 1 year. In this case, you should purchase a 32 cores 1-year reservation for memory optimized data flows and a general purpose 64 core 1-year reservation. 31 | 32 | ## Buy reserved capacity 33 | 34 | 1. Sign in to the [Azure portal](https://portal.azure.com). 35 | 2. Select **All services** > **Reservations**. 36 | 3. Select **Add** and then in the **Purchase Reservations** pane, select **ADF Data Flows** to purchase a new reservation for ADF data flows. 37 | 4. Fill in the required fields and attributes you select qualify to get the reserved capacity discount. The actual number of data flows that get the discount depends on the scope and quantity selected. 38 | 5. Review the cost of the capacity reservation in the **Costs** section. 39 | 6. Select **Purchase**. 40 | 7. Select **View this Reservation** to see the status of your purchase. 41 | 42 | ## Cancel, exchange, or refund reservations 43 | 44 | You can cancel, exchange, or refund reservations with certain limitations. For more information, see [Self-service exchanges and refunds for Azure Reservations](https://docs.microsoft.com/azure/cost-management-billing/reservations/exchange-and-refund-azure-reservations.md). 45 | 46 | ## Need help? Contact us 47 | 48 | If you have questions or need help, [create a support request](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest). 49 | 50 | ## Next steps 51 | 52 | To learn more about Azure Reservations, see the following articles: 53 | 54 | - [Understand Azure Reservations discount](data-flow-understand-reservation-charges.md) 55 | -------------------------------------------------------------------------------- /data-flow-samples-template.md: -------------------------------------------------------------------------------- 1 | samples 2 | -------------------------------------------------------------------------------- /data-flow-script.md: -------------------------------------------------------------------------------- 1 | # Data Flow script (DFS) 2 | ## What is the DFS? 3 | The data flow script (DFS) is the underlying text, similar to a coding language, that is used to execute the transformations that are included in a mapping data flow. Every transformation is represented by a series of properties that provide the necessary information to run the job properly. 4 | 5 | For instance, `allowSchemaDrift: true,` in a source transformation tells the service to include all columns from the source dataset in the data flow even if they are not included in the schema projection. 6 | 7 | ## Use cases 8 | The DFS is usually hidden from users and is automatically produced by the user interface. As a result, most of the time reading or editing the DFS directly is unnecessary. There are some cases, though, where it can be helpful or necessary to have an understanding of the script while debugging and producing data flows. 9 | 10 | Here are a few examples: 11 | - Programatically producing many data flows that are fairly similar 12 | - Complex expressions that are difficult to manage in the UI or are resulting in validation issues 13 | - Debugging and better understanding various errors returned during execution 14 | 15 | ## How to add transforms 16 | Adding transformations requires three basic steps: adding the core transformation data, rerouting the input stream, and then rerouting the output stream. This can be seen easiest in an example. 17 | Let's say we start with a simple source to sink data flow like the following: 18 | 19 |
 20 | source(output(
 21 | 		movieId as string,
 22 | 		title as string,
 23 | 		genres as string
 24 | 	),
 25 | 	allowSchemaDrift: true,
 26 | 	validateSchema: false) ~> source1
 27 | source1 sink(allowSchemaDrift: true,
 28 | 	validateSchema: false) ~> sink1
 29 | 
30 | 31 | If we decide to add a derive transformation, first we need to create the core transformation text, which has a simple expression to add a new uppercase column called `upperCaseTitle`: 32 |
 33 | derive(upperCaseTitle = upper(title)) ~> deriveTransformationName
 34 | 
35 | 36 | Then, we take the existing DFS and add the transformation: 37 |
 38 | source(output(
 39 | 		movieId as string,
 40 | 		title as string,
 41 | 		genres as string
 42 | 	),
 43 | 	allowSchemaDrift: true,
 44 | 	validateSchema: false) ~> source1
 45 | derive(upperCaseTitle = upper(title)) ~> deriveTransformationName
 46 | source1 sink(allowSchemaDrift: true,
 47 | 	validateSchema: false) ~> sink1
 48 | 
49 | 50 | And now we reroute the incoming stream by identifying which transformation we want the new transformation to come after (in this case, `source1`) and copying the name of the stream to the new transformation: 51 |
 52 | source(output(
 53 | 		movieId as string,
 54 | 		title as string,
 55 | 		genres as string
 56 | 	),
 57 | 	allowSchemaDrift: true,
 58 | 	validateSchema: false) ~> source1
 59 | source1 derive(upperCaseTitle = upper(title)) ~> deriveTransformationName
 60 | source1 sink(allowSchemaDrift: true,
 61 | 	validateSchema: false) ~> sink1
 62 | 
63 | 64 | Finally we identify the transformation we want to come after this new transformation, and replace its input stream (in this case, `sink1`) with the output stream name of our new transformation: 65 |
 66 | source(output(
 67 | 		movieId as string,
 68 | 		title as string,
 69 | 		genres as string
 70 | 	),
 71 | 	allowSchemaDrift: true,
 72 | 	validateSchema: false) ~> source1
 73 | source1 derive(upperCaseTitle = upper(title)) ~> deriveTransformationName
 74 | deriveTransformationName sink(allowSchemaDrift: true,
 75 | 	validateSchema: false) ~> sink1
 76 | 
77 | 78 | ## DFS fundamentals 79 | The DFS is composed of a series of connected transformations, including sources, sinks, and various others which can add new columns, filter data, join data, and much more. 80 | Usually, the script with start with one or more sources followed by many transformations and ending with one or more sinks. 81 | 82 | Sources all have the same basic construction: 83 |
 84 | source(
 85 |   source properties
 86 | ) ~> source_name
 87 | 
88 | 89 | For instance, a simple source with three columns (movieId, title, genres) would be: 90 |
 91 | source(output(
 92 | 		movieId as string,
 93 | 		title as string,
 94 | 		genres as string
 95 | 	),
 96 | 	allowSchemaDrift: true,
 97 | 	validateSchema: false) ~> source1
 98 | 
99 | 100 | All transformations other than sources have the same basic construction: 101 |
102 | name_of_incoming_stream transformation_type(
103 |   properties
104 | ) ~> new_stream_name
105 | 
106 | 107 | For example, a simple derive transformation that takes a column (title) and overwrites it with an uppercase version would be as follows: 108 |
109 | source1 derive(
110 |   title = upper(title)
111 | ) ~> derive1
112 | 
113 | 114 | And a sink with no schema would simply be: 115 |
116 | derive1 sink(allowSchemaDrift: true,
117 | 	validateSchema: false) ~> sink1
118 | 
119 | -------------------------------------------------------------------------------- /data-flow-understand-reservation-charges.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Understand reservations discount for Azure Data Factory data flows | Microsoft Docs 3 | description: Learn how a reservation discount is applied to running ADF data flows. The discount is applied to these data flows on an hourly basis. 4 | author: kromerm 5 | ms.service: data-factory 6 | ms.topic: conceptual 7 | ms.date: 01/25/2021 8 | ms.author: makromer 9 | --- 10 | 11 | # How a reservation discount is applied to Azure Data Factory data flows 12 | 13 | After you buy ADF data flow reserved capacity, the reservation discount is automatically applied to data flows using an Azure integration runtime that match the compute type and core count of the reservation. 14 | 15 | ## How reservation discount is applied 16 | 17 | A reservation discount is "*use-it-or-lose-it*". So, if you don't have matching Azure integration resources used for any hour, then you lose a reservation quantity for that hour. You can't carry forward unused reserved hours. 18 | 19 | When you stop using the integration runtime for data flows, the reservation discount automatically applies to another matching resource in the specified scope. If no matching resources are found in the specified scope, then the reserved hours are *lost*. 20 | 21 | ## Discount applied to ADF data flows 22 | 23 | The ADF data flow reserved capacity discount is applied to executing integration runtimes on an hourly basis. The reservation that you buy is matched to the compute usage emitted by the integration runtime being utilized. For data flows that don't run the full hour, the reservation is automatically applied to other data flows matching the reservation attributes. The discount can also apply to data flows that are running concurrently. If you don't have data flows that run for the full hour that match the reservation attributes, you don't get the full benefit of the reservation discount for that hour. 24 | 25 | The following examples show how the ADF data flow reserved capacity discount applies depending on the number of cores you bought, and when they're running. 26 | 27 | - Scenario 1: You buy an ADF data flow reserved capacity for 1 hour of 80 cores of memory optimized compute. You run a data flow with an Azure integration runtime set to 144 cores of memory optimized for one hour. You're charged the pay-as-you-go price for 64 cores of data flow usage for one hour. You get the reservation discount for one hour of 80 cores of memory optimized usage. 28 | - Scenario 2: You buy an ADF data flow reserved capacity for 32 cores of general purpose compute. You debug your data flows for 1 hour using 32 cores of general compute Azure integration runtime. You get the reservation discount for that entire hour of usage. 29 | 30 | To understand and view the application of your Azure Reservations in billing usage reports, see [Understand Azure reservation usage](https://docs.microsoft.com/azure/cost-management-billing/reservations/understand-reserved-instance-usage-ea). 31 | 32 | ## Need help? Contact us 33 | 34 | If you have questions or need help, [create a support request](https://go.microsoft.com/fwlink/?linkid=2083458). 35 | 36 | ## Next steps 37 | 38 | To learn more about Azure Reservations, see the following article: 39 | 40 | - [What are Azure Reservations?](https://docs.microsoft.com/azure/cost-management-billing/reservations/save-compute-costs-reservations) 41 | -------------------------------------------------------------------------------- /data-flow-versioning.md: -------------------------------------------------------------------------------- 1 | # ADF Data Flows Versioning 2 | 3 | ### This is a feature that will allow you to test your data factory data flows by executing your factory on a version that only updates on a monthly basis, as opposed to the live service which is updated weekly. 4 | 5 | #### JSON ARM Template config 6 | 7 | "properties": { 8 | "globalConfigurations": { 9 | dataFlowRuntimeVersion: "Candidate" 10 | } 11 | } 12 | 13 | #### Valid values for Runtime Version are: 14 | * Stable (this build is updated on the 15th of every month and is the previous Candidate build) 15 | * Candidate (this build is updated on the 1st of every month and becomes "Stable" on the 15th of the next month) 16 | * Live (this is the current live service version of ADF Data Flows) 17 | * If you leave the property set to blank, the default value will be "Live" 18 | 19 | #### Recommended use of Data Flow versioning 20 | 21 | If you wish to switch to versioned Data Flows, we recommend that you use "Stable" for the live production version of your factories and "Candidate" for your dev/test environment. You can also optionally maintain an "experimental" factory that is using the "Live" version that is the version of ADF data flows that is used by all general ADF customers. This would give you an opportuntity to experiment with the features in ADF that are deployed on a weekly cadence. 22 | 23 | #### Include data flow runtime version in ARM template 24 | 25 | To include the runtime version in the ARM template, it is very similar to global parameters, which has a CI/CD section [here](https://docs.microsoft.com/azure/data-factory/author-global-parameters#cicd). Check the "Include in ARM template" checkbox when in git mode. 26 | 27 | Once it is included in the ARM template it will be parametrizable like any other factory detail - it will look something like this: 28 | 29 | ![Versioning settings](images/vers1.png "Versioning 1") 30 | 31 | The options are Live, Candidate, Stable. If it is not showing up in the ARM template as parametrizable, you may have a custom parameter definition file or added git to your factory before this change, in which case you should make sure the parametrization template includes a global configurations section like this: 32 | 33 | ![Versioning settings](images/vers2.png "Versioning 2") 34 | -------------------------------------------------------------------------------- /findDelimiter.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/findDelimiter.zip -------------------------------------------------------------------------------- /images/AddSubcolumn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/AddSubcolumn.png -------------------------------------------------------------------------------- /images/ComplexColumn.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ComplexColumn.png -------------------------------------------------------------------------------- /images/accesstoken.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/accesstoken.png -------------------------------------------------------------------------------- /images/adb.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/adb.png -------------------------------------------------------------------------------- /images/adb50.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/adb50.png -------------------------------------------------------------------------------- /images/adw1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/adw1.png -------------------------------------------------------------------------------- /images/adw2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/adw2.png -------------------------------------------------------------------------------- /images/adw3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/adw3.png -------------------------------------------------------------------------------- /images/agg.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/agg.png -------------------------------------------------------------------------------- /images/agg2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/agg2.png -------------------------------------------------------------------------------- /images/agg3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/agg3.png -------------------------------------------------------------------------------- /images/agghead.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/agghead.png -------------------------------------------------------------------------------- /images/agghead1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/agghead1.png -------------------------------------------------------------------------------- /images/automap.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/automap.png -------------------------------------------------------------------------------- /images/azureir1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/azureir1.png -------------------------------------------------------------------------------- /images/bb_debug1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/bb_debug1.png -------------------------------------------------------------------------------- /images/bb_inspect1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/bb_inspect1.png -------------------------------------------------------------------------------- /images/bb_ssms1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/bb_ssms1.png -------------------------------------------------------------------------------- /images/bb_ssms2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/bb_ssms2.png -------------------------------------------------------------------------------- /images/branch2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/branch2.png -------------------------------------------------------------------------------- /images/cd1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/cd1.png -------------------------------------------------------------------------------- /images/ce1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ce1.png -------------------------------------------------------------------------------- /images/ce10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ce10.png -------------------------------------------------------------------------------- /images/ce2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ce2.png -------------------------------------------------------------------------------- /images/ce3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ce3.png -------------------------------------------------------------------------------- /images/ce4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ce4.png -------------------------------------------------------------------------------- /images/ce5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ce5.png -------------------------------------------------------------------------------- /images/ce6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ce6.png -------------------------------------------------------------------------------- /images/ce7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ce7.png -------------------------------------------------------------------------------- /images/ce8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ce8.png -------------------------------------------------------------------------------- /images/ce9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/ce9.png -------------------------------------------------------------------------------- /images/cfdf001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/cfdf001.png -------------------------------------------------------------------------------- /images/chart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/chart.png -------------------------------------------------------------------------------- /images/columnpattern.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/columnpattern.png -------------------------------------------------------------------------------- /images/columnpattern2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/columnpattern2.png -------------------------------------------------------------------------------- /images/comments.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/comments.png -------------------------------------------------------------------------------- /images/dafl1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/dafl1.png -------------------------------------------------------------------------------- /images/dafl2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/dafl2.png -------------------------------------------------------------------------------- /images/dafl3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/dafl3.png -------------------------------------------------------------------------------- /images/dafl4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/dafl4.png -------------------------------------------------------------------------------- /images/databricks.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/databricks.png -------------------------------------------------------------------------------- /images/dataflowls.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/dataflowls.png -------------------------------------------------------------------------------- /images/datapreview.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/datapreview.png -------------------------------------------------------------------------------- /images/dataset1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/dataset1.png -------------------------------------------------------------------------------- /images/dbls001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/dbls001.png -------------------------------------------------------------------------------- /images/dc.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/dc.png -------------------------------------------------------------------------------- /images/dc1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/dc1.png -------------------------------------------------------------------------------- /images/debug1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/debug1.png -------------------------------------------------------------------------------- /images/debug2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/debug2.png -------------------------------------------------------------------------------- /images/debugbutton.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/debugbutton.png -------------------------------------------------------------------------------- /images/defaultformat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/defaultformat.png -------------------------------------------------------------------------------- /images/dfls2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/dfls2.png -------------------------------------------------------------------------------- /images/errorow1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/errorow1.png -------------------------------------------------------------------------------- /images/errors1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/errors1.png -------------------------------------------------------------------------------- /images/existingcluster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/existingcluster.png -------------------------------------------------------------------------------- /images/exp1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/exp1.png -------------------------------------------------------------------------------- /images/exp2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/exp2.png -------------------------------------------------------------------------------- /images/exp3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/exp3.png -------------------------------------------------------------------------------- /images/exp4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/exp4.png -------------------------------------------------------------------------------- /images/exp4b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/exp4b.png -------------------------------------------------------------------------------- /images/exp5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/exp5.png -------------------------------------------------------------------------------- /images/expb1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/expb1.png -------------------------------------------------------------------------------- /images/expb2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/expb2.png -------------------------------------------------------------------------------- /images/expression.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/expression.png -------------------------------------------------------------------------------- /images/exsits.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/exsits.png -------------------------------------------------------------------------------- /images/extdep.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/extdep.png -------------------------------------------------------------------------------- /images/extend.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/extend.png -------------------------------------------------------------------------------- /images/folderpath.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/folderpath.png -------------------------------------------------------------------------------- /images/gentoken.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/gentoken.png -------------------------------------------------------------------------------- /images/join.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/join.png -------------------------------------------------------------------------------- /images/joinoptimize.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/joinoptimize.png -------------------------------------------------------------------------------- /images/keycols.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/keycols.png -------------------------------------------------------------------------------- /images/lookup1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/lookup1.png -------------------------------------------------------------------------------- /images/lsconnections.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/lsconnections.png -------------------------------------------------------------------------------- /images/maxcon.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/maxcon.png -------------------------------------------------------------------------------- /images/menu.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/menu.png -------------------------------------------------------------------------------- /images/mon001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/mon001.png -------------------------------------------------------------------------------- /images/mon002.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/mon002.png -------------------------------------------------------------------------------- /images/mon003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/mon003.png -------------------------------------------------------------------------------- /images/mon004.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/mon004.png -------------------------------------------------------------------------------- /images/mon005.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/mon005.png -------------------------------------------------------------------------------- /images/mon1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/mon1.png -------------------------------------------------------------------------------- /images/multi1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/multi1.png -------------------------------------------------------------------------------- /images/newdataflowactivity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/newdataflowactivity.png -------------------------------------------------------------------------------- /images/newresource.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/newresource.png -------------------------------------------------------------------------------- /images/nullchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/nullchart.png -------------------------------------------------------------------------------- /images/opt001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/opt001.png -------------------------------------------------------------------------------- /images/opt002.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/opt002.png -------------------------------------------------------------------------------- /images/params.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/params.png -------------------------------------------------------------------------------- /images/pipe1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/pipe1.png -------------------------------------------------------------------------------- /images/pivot1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/pivot1.png -------------------------------------------------------------------------------- /images/pivot2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/pivot2.png -------------------------------------------------------------------------------- /images/pivot3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/pivot3.png -------------------------------------------------------------------------------- /images/pivot4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/pivot4.png -------------------------------------------------------------------------------- /images/pivot5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/pivot5.png -------------------------------------------------------------------------------- /images/portal.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/portal.png -------------------------------------------------------------------------------- /images/redf001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/redf001.png -------------------------------------------------------------------------------- /images/referencenode.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/referencenode.png -------------------------------------------------------------------------------- /images/resource1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/resource1.png -------------------------------------------------------------------------------- /images/scd7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/scd7.png -------------------------------------------------------------------------------- /images/schemadrift001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/schemadrift001.png -------------------------------------------------------------------------------- /images/select001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/select001.png -------------------------------------------------------------------------------- /images/selfjoin.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/selfjoin.png -------------------------------------------------------------------------------- /images/sink1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/sink1.png -------------------------------------------------------------------------------- /images/sink2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/sink2.png -------------------------------------------------------------------------------- /images/soccer1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/soccer1.png -------------------------------------------------------------------------------- /images/soccer2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/soccer2.png -------------------------------------------------------------------------------- /images/soccer3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/soccer3.png -------------------------------------------------------------------------------- /images/soccer4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/soccer4.png -------------------------------------------------------------------------------- /images/sort.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/sort.png -------------------------------------------------------------------------------- /images/source.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/source.png -------------------------------------------------------------------------------- /images/source003.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/source003.png -------------------------------------------------------------------------------- /images/source1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/source1.png -------------------------------------------------------------------------------- /images/source2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/source2.png -------------------------------------------------------------------------------- /images/source3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/source3.png -------------------------------------------------------------------------------- /images/sourcepart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/sourcepart.png -------------------------------------------------------------------------------- /images/sourceparts.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/sourceparts.png -------------------------------------------------------------------------------- /images/sources5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/sources5.png -------------------------------------------------------------------------------- /images/sql001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/sql001.png -------------------------------------------------------------------------------- /images/stats.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/stats.png -------------------------------------------------------------------------------- /images/storeage.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/storeage.png -------------------------------------------------------------------------------- /images/surrogate.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/surrogate.png -------------------------------------------------------------------------------- /images/tags.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/tags.png -------------------------------------------------------------------------------- /images/tags1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/tags1.png -------------------------------------------------------------------------------- /images/taxi1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/taxi1.png -------------------------------------------------------------------------------- /images/taxidrift1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/taxidrift1.png -------------------------------------------------------------------------------- /images/taxidrift2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/taxidrift2.png -------------------------------------------------------------------------------- /images/template.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/template.png -------------------------------------------------------------------------------- /images/template1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/template1.png -------------------------------------------------------------------------------- /images/template2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/template2.png -------------------------------------------------------------------------------- /images/union.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/union.png -------------------------------------------------------------------------------- /images/unpivot1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/unpivot1.png -------------------------------------------------------------------------------- /images/unpivot3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/unpivot3.png -------------------------------------------------------------------------------- /images/unpivot4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/unpivot4.png -------------------------------------------------------------------------------- /images/unpivot5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/unpivot5.png -------------------------------------------------------------------------------- /images/unpivot6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/unpivot6.png -------------------------------------------------------------------------------- /images/unpivot7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/unpivot7.png -------------------------------------------------------------------------------- /images/upload.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/upload.png -------------------------------------------------------------------------------- /images/v2dataflowportal.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/v2dataflowportal.png -------------------------------------------------------------------------------- /images/vers1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/vers1.png -------------------------------------------------------------------------------- /images/vers2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/vers2.png -------------------------------------------------------------------------------- /images/windows1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/windows1.png -------------------------------------------------------------------------------- /images/windows2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/windows2.png -------------------------------------------------------------------------------- /images/windows3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/windows3.png -------------------------------------------------------------------------------- /images/windows4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/windows4.png -------------------------------------------------------------------------------- /images/windows5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/windows5.png -------------------------------------------------------------------------------- /images/windows6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/windows6.png -------------------------------------------------------------------------------- /images/windows7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/windows7.png -------------------------------------------------------------------------------- /images/windows8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/windows8.png -------------------------------------------------------------------------------- /images/windows9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/images/windows9.png -------------------------------------------------------------------------------- /mapping-data-flow-overview.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Mapping Data Flow: Overview 2 | 3 | *NOTE: This overview of Mapping Data Flows is specific to the public preview version* 4 | 5 | Mapping Data Flows in ADF provide a way to transform data at scale without any coding required. Design a data transformation job in the ADF Data Flow designer by constructing a series of Source Transformations, followed by data transformation steps, then sink your results in a Sink Transformation. 6 | 7 | ## Getting Started 8 | 9 | Start by creating a new ADF V2 factory from Azure Portal 10 | 11 | 12 | 13 | Once you are in the ADF UI, we have sample Data Flows available from the ADF Template Gallery. In ADF, create "Pipeline from Tempalte" and select the Data Flow category from the template gallery. 14 | 15 | 16 | 17 | You will be prompted to enter your Azure Blob Storage account information. 18 | 19 | 20 | 21 | [The data used for these samples can be found here](https://github.com/kromerm/adfdataflowdocs/tree/master/sampledata). Download the sample data and store the files in your Azure Blob storage accounts so that you can execute the samples. 22 | 23 | Use the Create Resource plus sign button in the ADF UI to create Data Flows 24 | 25 | 26 | 27 | -------------------------------------------------------------------------------- /media/09b0f0e02aaede3d38acf46a6dcb8644.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/09b0f0e02aaede3d38acf46a6dcb8644.png -------------------------------------------------------------------------------- /media/2da2e7aee362dad223964fd741982505.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/2da2e7aee362dad223964fd741982505.png -------------------------------------------------------------------------------- /media/3bd6be2be665dff8a97d48df4fd1326b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/3bd6be2be665dff8a97d48df4fd1326b.png -------------------------------------------------------------------------------- /media/7f882e627a336827d8890f3fa71110df.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/7f882e627a336827d8890f3fa71110df.png -------------------------------------------------------------------------------- /media/9904436a39ba0b54a6e030eb7ad0540f.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/9904436a39ba0b54a6e030eb7ad0540f.png -------------------------------------------------------------------------------- /media/9e9d9af6ef098ed75b62f96211dcf313.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/9e9d9af6ef098ed75b62f96211dcf313.png -------------------------------------------------------------------------------- /media/a0b2dbe0b01e1a3f4a9a6b17ab57bbe2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/a0b2dbe0b01e1a3f4a9a6b17ab57bbe2.png -------------------------------------------------------------------------------- /media/a35981c95cd51d0b13ecd090b3b97cfe.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/a35981c95cd51d0b13ecd090b3b97cfe.png -------------------------------------------------------------------------------- /media/a51be05cea35390eb8052f25fb152eef.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/a51be05cea35390eb8052f25fb152eef.png -------------------------------------------------------------------------------- /media/adb1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/adb1.png -------------------------------------------------------------------------------- /media/adf-data-flows.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/adf-data-flows.png -------------------------------------------------------------------------------- /media/af068303e7906e297c666307bf12d39b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/af068303e7906e297c666307bf12d39b.png -------------------------------------------------------------------------------- /media/azureir1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/azureir1.png -------------------------------------------------------------------------------- /media/b682cdfd3971c23f096c21f194defe81.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/b682cdfd3971c23f096c21f194defe81.png -------------------------------------------------------------------------------- /media/c6511f8763cfc590a0e2262cdc960442.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/c6511f8763cfc590a0e2262cdc960442.png -------------------------------------------------------------------------------- /media/comments2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/comments2.png -------------------------------------------------------------------------------- /media/d242a4c1928463417119ab08248e1e37.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/d242a4c1928463417119ab08248e1e37.png -------------------------------------------------------------------------------- /media/dataset1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/dataset1.png -------------------------------------------------------------------------------- /media/defaultformat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/defaultformat.png -------------------------------------------------------------------------------- /media/e117d32a02042ba72df328a372931772.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/e117d32a02042ba72df328a372931772.png -------------------------------------------------------------------------------- /media/eba63d158e958a245b6686819ba0d5ac.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/eba63d158e958a245b6686819ba0d5ac.png -------------------------------------------------------------------------------- /media/f3a2eff81e3af2a1775407d2c410b71f.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/f3a2eff81e3af2a1775407d2c410b71f.png -------------------------------------------------------------------------------- /media/fb18c5028f040939979273b045c5ca5a.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/fb18c5028f040939979273b045c5ca5a.png -------------------------------------------------------------------------------- /media/readme: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /media/sink1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/sink1.png -------------------------------------------------------------------------------- /media/sink2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/sink2.png -------------------------------------------------------------------------------- /media/source1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/source1.png -------------------------------------------------------------------------------- /media/source2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/source2.png -------------------------------------------------------------------------------- /media/source3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/source3.png -------------------------------------------------------------------------------- /media/sql001.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/media/sql001.png -------------------------------------------------------------------------------- /patterns/adf-data-flow-patterns.md: -------------------------------------------------------------------------------- 1 | # ADF Mapping Data Flow Patterns 2 | 3 | * [Convert from U-SQL Search Log Analytics to ADF Data Flow](https://kromerbigdata.com/2019/03/03/u-sql-searchlog-aggregations-as-adf-data-flows/) 4 | * [Working with multiple source files](https://kromerbigdata.com/2019/02/22/azure-data-factory-data-flows-working-with-multiple-files/) 5 | * [Transform data with regular expression](https://kromerbigdata.com/2019/01/02/azure-data-factory-data-flow-transform-data-with-regular-expressions/) 6 | * [Build Data Flows with interactive data debug](https://kromerbigdata.com/2018/10/27/adf-data-flow-preview-interactive-debug-now-live/) 7 | * [U-SQL Tweets as an ADF Data Flow](https://mssqldude.wordpress.com/2019/02/04/azure-data-factory-build-u-sql-tweet-analysis-with-adf-data-flows/) 8 | * [Databricks Notebook ETL as an ADF Data Flow](https://mssqldude.wordpress.com/2019/01/23/adf-data-flows-databricks-notebook-etl-vs-adf-visual-data-flow/) 9 | * [ADF Data Flow Self-Join Pattern](https://mssqldude.wordpress.com/2018/12/20/adf-data-flows-self-join/) 10 | * [Slowly Changing Dimension Pattern](https://mssqldude.wordpress.com/2018/12/02/azure-data-factory-data-flow-building-slowly-changing-dimensions/) 11 | * [Slowly Changing Dimension Type 2](https://mssqldude.wordpress.com/2019/04/15/adf-slowly-changing-dimension-type-2-with-mapping-data-flows-complete/) 12 | * [Fuzzy Matching and Dedupe](https://kromerbigdata.com/2019/04/21/use-adf-mapping-data-flows-for-fuzzy-matching-and-dedupe/) 13 | * [Partition Large Files](https://mssqldude.wordpress.com/2019/03/23/partition-large-files-with-adf-using-mapping-data-flows/) 14 | * [Data Flow Partitioning](https://dataflowninja.blogspot.com/2019/06/all-about-partitioning.html) 15 | * [Dynamic File Names](https://kromerbigdata.com/2019/04/05/dynamic-file-names-in-adf-with-mapping-data-flows/) 16 | * [Dynamic field names](https://kromerbigdata.com/2019/05/24/adf-mapping-data-flows-create-rules-to-modify-column-names/) 17 | * [Data Vault data model pattern by Rayis Imayev](http://datanrg.blogspot.com/2019/05/using-azure-data-factory-mapping-data.html) 18 | * [Dynamically create database tables](https://mssqldude.wordpress.com/2019/06/03/dynamic-sql-table-names-with-azure-data-factory-data-flows/) 19 | * [Iterate through multiple files and folders in Source Transformation](https://kromerbigdata.com/2019/07/05/adf-mapping-data-flows-iterate-multiple-files-with-source-transformation/) 20 | * [End-to-end hands-on lab: Movie Analytics](http://aka.ms/moviesanalytics) 21 | * [Migrate Pig ETL to Data Flows](https://mssqldude.wordpress.com/2019/08/16/etl-with-adf-convert-pig-to-data-flows/) 22 | * [Process Fixed-length text files](https://kromerbigdata.com/2019/08/20/process-fixed-length-text-files-with-adf-mapping-data-flows/) 23 | * [Dev/test/debug patterns with Data Flows](https://kromerbigdata.com/2019/06/14/adf-mapping-data-flow-debug-and-test-pattern/) 24 | * [Model projections from drifted schemas](https://kromerbigdata.com/2019/09/16/adf-data-flows-generate-models-from-drifted-columns/) 25 | * [Tutorial: Big Data Analytics using Rank and Dense Rank Window Functions by Ron L'Esteve](https://www.mssqltips.com/sqlservertip/6169/azure-data-factory-mapping-data-flows-for-big-data-lake-aggregations-and-transformations/) 26 | * [Tutorial: Slowly changing dimensions and ETL by Ron L'Esteve](https://www.mssqltips.com/sqlservertip/6074/azure-data-factory-mapping-data-flow-for-datawarehouse-etl/) 27 | * [Distinct Rows and Row Counts](https://mssqldude.wordpress.com/2019/09/18/adf-data-flows-distinct-rows/) 28 | * [Reduce total execution time of data flow](https://mssqldude.wordpress.com/2019/10/04/reduce-execution-time-for-data-flow-activities-in-adf-pipelines/) 29 | * [Dynamically skip lines in a text file](https://kromerbigdata.com/2019/09/28/adf-dynamic-skip-lines-find-data-with-variable-headers/) 30 | * [Create reusable patterns: SCD Type 1 example](https://techcommunity.microsoft.com/t5/Azure-Data-Factory/Create-Generic-SCD-Pattern-in-ADF-Mapping-Data-Flows/ba-p/918519) 31 | * [Store data preview summary stats](https://techcommunity.microsoft.com/t5/azure-data-factory/how-to-save-your-data-profiler-summary-stats-in-adf-data-flows/ba-p/1243251) 32 | * [Fill down](https://techcommunity.microsoft.com/t5/azure-data-factory/implement-fill-down-in-adf-and-synapse-data-flows/ba-p/2013406) 33 | * [Transform arrays](https://kromerbigdata.com/2021/01/06/transforming-arrays-in-azure-data-factory-and-azure-synapse-data-flows/) 34 | -------------------------------------------------------------------------------- /patterns/adfdataflowlinks.md: -------------------------------------------------------------------------------- 1 | # Azure Data Factory Mapping Data Flows 2 | ## Helpful links 3 | 4 | * [Get Started with ADF Mapping Data Flows Tutorial](https://docs.microsoft.com/en-us/azure/data-factory/tutorial-data-flow) 5 | * [Short video tutorials](https://aka.ms/dataflowvideos) 6 | * [Common data flow patterns](https://aka.ms/dataflowpatterns) 7 | * [Official online documentation](https://aka.ms/adfdataflowdocs) 8 | * [Data transformation expression language functions guide](https://aka.ms/dataflowexpressions) 9 | * [Expression examples](https://aka.ms/dataflowexpsamples) 10 | * [ADF Data Flow Hands-on Lab: Chicago Crime Stats](https://github.com/kromerm/adfmappingdataflowslab) 11 | * [ADF Data Flow Hands-on Lab: Movies Analytics](https://aka.ms/moviesanalytics) 12 | * [ADF Data Flow Performance Guide](https://aka.ms/dfperf) 13 | * [ADF Data Flow Training Slides](http://aka.ms/adfdataflowtraining) 14 | * [ADF Data Flow NYC Taxi Demo Lab](https://github.com/fabragaMS/ADPE2E/blob/master/Lab/Lab2/Lab2.md) 15 | * [How to migrate from Notebooks ETL to ADF Mapping Data Flows](https://techcommunity.microsoft.com/t5/Azure-Data-Factory/ADF-Mapping-Data-Flows-for-Databricks-Notebook-Developers/ba-p/919214) 16 | -------------------------------------------------------------------------------- /sampledata/Address.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/Address.csv -------------------------------------------------------------------------------- /sampledata/Adventure Works SQL.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/Adventure Works SQL.zip -------------------------------------------------------------------------------- /sampledata/AdventureWorks Data.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/AdventureWorks Data.zip -------------------------------------------------------------------------------- /sampledata/AdventureWorksSchemas.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/AdventureWorksSchemas.xlsx -------------------------------------------------------------------------------- /sampledata/Currency_DEM.txt: -------------------------------------------------------------------------------- 1 | 0.527231507 DEM 7/1/2001 0:00 0.52698145 2 | 0.528429508 DEM 7/2/2001 0:00 0.528485361 3 | 0.532028091 DEM 7/3/2001 0:00 0.531858313 4 | 0.531603849 DEM 7/4/2001 0:00 0.531886602 5 | 0.526925914 DEM 7/5/2001 0:00 0.526759376 6 | 0.526121955 DEM 7/6/2001 0:00 0.526398905 7 | 0.526676147 DEM 7/7/2001 0:00 0.526592944 8 | 0.527287108 DEM 7/8/2001 0:00 0.527509627 9 | 0.527231507 DEM 7/9/2001 0:00 0.527120342 10 | 0.523094628 DEM 7/10/2001 0:00 0.523313622 11 | 0.521050438 DEM 7/11/2001 0:00 0.520996145 12 | 0.518349575 DEM 7/12/2001 0:00 0.518483953 13 | 0.51999376 DEM 7/13/2001 0:00 0.519858598 14 | 0.519076045 DEM 7/14/2001 0:00 0.519156889 15 | 0.516929439 DEM 7/15/2001 0:00 0.516715755 16 | 0.515649977 DEM 7/16/2001 0:00 0.515703161 17 | 0.516049128 DEM 7/17/2001 0:00 0.516022499 18 | 0.514323921 DEM 7/18/2001 0:00 0.514376832 19 | 0.512006554 DEM 7/19/2001 0:00 0.512006554 20 | 0.512321328 DEM 7/20/2001 0:00 0.512400082 21 | 0.512557663 DEM 7/21/2001 0:00 0.512531393 22 | 0.518914431 DEM 7/22/2001 0:00 0.519049102 23 | 0.52279381 DEM 7/23/2001 0:00 0.522602561 24 | 0.523971706 DEM 7/24/2001 0:00 0.524136485 25 | 0.522193211 DEM 7/25/2001 0:00 0.522111419 26 | 0.518914431 DEM 7/26/2001 0:00 0.519076045 27 | 0.51810787 DEM 7/27/2001 0:00 0.517839573 28 | 0.518000518 DEM 7/28/2001 0:00 0.518188413 29 | 0.517678729 DEM 7/29/2001 0:00 0.517625136 30 | 0.514376832 DEM 7/30/2001 0:00 0.514482688 31 | 0.513083633 DEM 7/31/2001 0:00 0.512978352 32 | 0.518672199 DEM 8/1/2001 0:00 0.518914431 33 | 0.52232959 DEM 8/2/2001 0:00 0.52227503 34 | 0.515118735 DEM 8/3/2001 0:00 0.515384219 35 | 0.515463918 DEM 8/4/2001 0:00 0.515384219 36 | 0.515490489 DEM 8/5/2001 0:00 0.515703161 37 | 0.516155673 DEM 8/6/2001 0:00 0.516129032 38 | 0.515410782 DEM 8/7/2001 0:00 0.515596803 39 | 0.516022499 DEM 8/8/2001 0:00 0.515809563 40 | 0.51810787 DEM 8/9/2001 0:00 0.518268982 41 | 0.518188413 DEM 8/10/2001 0:00 0.518134715 42 | 0.517678729 DEM 8/11/2001 0:00 0.517759138 43 | 0.517839573 DEM 8/12/2001 0:00 0.517625136 44 | 0.517839573 DEM 8/13/2001 0:00 0.517946859 45 | 0.514747516 DEM 8/14/2001 0:00 0.514535632 46 | 0.513373376 DEM 8/15/2001 0:00 0.513610683 47 | 0.513373376 DEM 8/16/2001 0:00 0.513162621 48 | 0.514906544 DEM 8/17/2001 0:00 0.515145271 49 | 0.514112385 DEM 8/18/2001 0:00 0.513953847 50 | 0.514059528 DEM 8/19/2001 0:00 0.514085955 51 | 0.524356353 DEM 8/20/2001 0:00 0.524246396 52 | 0.526953681 DEM 8/21/2001 0:00 0.527120342 53 | 0.527565286 DEM 8/22/2001 0:00 0.527537455 54 | 0.527287108 DEM 8/23/2001 0:00 0.527537455 55 | 0.526094276 DEM 8/24/2001 0:00 0.525983589 56 | 0.526426616 DEM 8/25/2001 0:00 0.52645433 57 | 0.526121955 DEM 8/26/2001 0:00 0.526011257 58 | 0.524163958 DEM 8/27/2001 0:00 0.524383849 59 | 0.528262018 DEM 8/28/2001 0:00 0.528094635 60 | 0.526925914 DEM 8/29/2001 0:00 0.52698145 61 | 0.524246396 DEM 8/30/2001 0:00 0.524191435 62 | 0.518537724 DEM 8/31/2001 0:00 0.51877983 63 | 0.518242123 DEM 9/1/2001 0:00 0.518054188 64 | 0.518430193 DEM 9/2/2001 0:00 0.518591505 65 | 0.517063082 DEM 9/3/2001 0:00 0.516902719 66 | 0.517946859 DEM 9/4/2001 0:00 0.518054188 67 | 0.517571554 DEM 9/5/2001 0:00 0.517464424 68 | 0.520074891 DEM 9/6/2001 0:00 0.520074891 69 | 0.51583617 DEM 9/7/2001 0:00 0.515756357 70 | 0.51583617 DEM 9/8/2001 0:00 0.51607576 71 | 0.51583617 DEM 9/9/2001 0:00 0.515596803 72 | 0.514535632 DEM 9/10/2001 0:00 0.514588586 73 | 0.511849312 DEM 9/11/2001 0:00 0.511692166 74 | 0.512111435 DEM 9/12/2001 0:00 0.512295082 75 | 0.505101525 DEM 9/13/2001 0:00 0.505050505 76 | 0.499201278 DEM 9/14/2001 0:00 0.499400719 77 | 0.499525451 DEM 9/15/2001 0:00 0.49942566 78 | 0.499575361 DEM 9/16/2001 0:00 0.499575361 79 | 0.496031746 DEM 9/17/2001 0:00 0.495933347 80 | 0.49689441 DEM 9/18/2001 0:00 0.497067303 81 | 0.499001996 DEM 9/19/2001 0:00 0.498827755 82 | 0.506277845 DEM 9/20/2001 0:00 0.506380393 83 | 0.502310629 DEM 9/21/2001 0:00 0.502209723 84 | 0.503018109 DEM 9/22/2001 0:00 0.50311934 85 | 0.503043413 DEM 9/23/2001 0:00 0.502916918 86 | 0.501077316 DEM 9/24/2001 0:00 0.501177768 87 | 0.504235579 DEM 9/25/2001 0:00 0.504184733 88 | 0.507717303 DEM 9/26/2001 0:00 0.507872016 89 | 0.503372596 DEM 9/27/2001 0:00 0.503271263 90 | 0.505101525 DEM 9/28/2001 0:00 0.50530571 91 | 0.505101525 DEM 9/29/2001 0:00 0.505076014 92 | 0.504210155 DEM 9/30/2001 0:00 0.504362738 93 | 0.500876534 DEM 10/1/2001 0:00 0.500826363 94 | 0.5017058 DEM 10/2/2001 0:00 0.50193244 95 | 0.504133898 DEM 10/3/2001 0:00 0.503930659 96 | 0.504999495 DEM 10/4/2001 0:00 0.505229121 97 | 0.503676841 DEM 10/5/2001 0:00 0.503626108 98 | 0.503956055 DEM 10/6/2001 0:00 0.504108484 99 | 0.503448623 DEM 10/7/2001 0:00 0.503448623 100 | 0.504744599 DEM 10/8/2001 0:00 0.504999495 101 | 0.512846813 DEM 10/9/2001 0:00 0.512636489 102 | 0.512794216 DEM 10/10/2001 0:00 0.51289942 103 | 0.507614213 DEM 10/11/2001 0:00 0.507562684 104 | 0.498206457 DEM 10/12/2001 0:00 0.498256104 105 | 0.498206457 DEM 10/13/2001 0:00 0.49815682 106 | 0.498206457 DEM 10/14/2001 0:00 0.498380264 107 | 0.49689441 DEM 10/15/2001 0:00 0.496696965 108 | 0.493291239 DEM 10/16/2001 0:00 0.493388593 109 | 0.497363971 DEM 10/17/2001 0:00 0.497265042 110 | 0.493096647 DEM 10/18/2001 0:00 0.493242577 111 | 0.490484599 DEM 10/19/2001 0:00 0.490388388 112 | 0.491038547 DEM 10/20/2001 0:00 0.491255649 113 | 0.489883898 DEM 10/21/2001 0:00 0.489787922 114 | 0.490196078 DEM 10/22/2001 0:00 0.490436488 115 | 0.490388388 DEM 10/23/2001 0:00 0.490148025 116 | 0.491303921 DEM 10/24/2001 0:00 0.491303921 117 | 0.493997925 DEM 10/25/2001 0:00 0.493900331 118 | 0.4927322 DEM 10/26/2001 0:00 0.492877914 119 | 0.4927322 DEM 10/27/2001 0:00 0.492635105 120 | 0.492029128 DEM 10/28/2001 0:00 0.492174427 121 | 0.492926505 DEM 10/29/2001 0:00 0.492780762 122 | 0.494682167 DEM 10/30/2001 0:00 0.494878013 123 | 0.494878013 DEM 10/31/2001 0:00 0.49478007 124 | 0.496647629 DEM 11/1/2001 0:00 0.496795668 125 | 0.496795668 DEM 11/2/2001 0:00 0.496647629 126 | 0.496795668 DEM 11/3/2001 0:00 0.496968492 127 | 0.497166153 DEM 11/4/2001 0:00 0.496993191 128 | 0.497537191 DEM 11/5/2001 0:00 0.497735304 129 | 0.493023714 DEM 11/6/2001 0:00 0.492877914 130 | 0.491038547 DEM 11/7/2001 0:00 0.491207388 131 | 0.496647629 DEM 11/8/2001 0:00 0.496425735 132 | 0.499525451 DEM 11/9/2001 0:00 0.499725151 133 | 0.499525451 DEM 11/10/2001 0:00 0.49942566 134 | 0.49942566 DEM 11/11/2001 0:00 0.499675211 135 | 0.493559054 DEM 11/12/2001 0:00 0.493388593 136 | 0.491448791 DEM 11/13/2001 0:00 0.491545419 137 | 0.485979492 DEM 11/14/2001 0:00 0.48588504 138 | 0.491352201 DEM 11/15/2001 0:00 0.491472944 139 | 0.48828125 DEM 11/16/2001 0:00 0.48828125 140 | 0.488376636 DEM 11/17/2001 0:00 0.488615264 141 | 0.48856752 DEM 11/18/2001 0:00 0.488424343 142 | 0.489524182 DEM 11/19/2001 0:00 0.489548147 143 | 0.491811341 DEM 11/20/2001 0:00 0.491617915 144 | 0.492635105 DEM 11/21/2001 0:00 0.492829333 145 | 0.489931899 DEM 11/22/2001 0:00 0.489691984 146 | 0.487923884 DEM 11/23/2001 0:00 0.48801913 147 | 0.487923884 DEM 11/24/2001 0:00 0.487828675 148 | 0.488376636 DEM 11/25/2001 0:00 0.488519785 149 | 0.491908112 DEM 11/26/2001 0:00 0.491714609 150 | 0.490292214 DEM 11/27/2001 0:00 0.490436488 151 | 0.489835905 DEM 11/28/2001 0:00 0.489739948 152 | 0.487971502 DEM 11/29/2001 0:00 0.488185901 153 | 0.491086775 DEM 11/30/2001 0:00 0.49106266 154 | 0.491352201 DEM 12/1/2001 0:00 0.491352201 155 | 0.4914971 DEM 12/2/2001 0:00 0.491352201 156 | 0.486926036 DEM 12/3/2001 0:00 0.487020893 157 | 0.484050535 DEM 12/4/2001 0:00 0.483863163 158 | 0.480954213 DEM 12/5/2001 0:00 0.48111619 159 | 0.479800403 DEM 12/6/2001 0:00 0.479616307 160 | 0.479662318 DEM 12/7/2001 0:00 0.479800403 161 | 0.480145964 DEM 12/8/2001 0:00 0.480030722 162 | 0.480676793 DEM 12/9/2001 0:00 0.480907954 163 | 0.479754366 DEM 12/10/2001 0:00 0.479547307 164 | 0.470809793 DEM 12/11/2001 0:00 0.470987189 165 | 0.472165825 DEM 12/12/2001 0:00 0.472121241 166 | 0.464921661 DEM 12/13/2001 0:00 0.465137913 167 | 0.46587468 DEM 12/14/2001 0:00 0.465679426 168 | 0.466135272 DEM 12/15/2001 0:00 0.46628742 169 | 0.465918092 DEM 12/16/2001 0:00 0.46587468 170 | 0.468120962 DEM 12/17/2001 0:00 0.468208634 171 | 0.464468184 DEM 12/18/2001 0:00 0.464317222 172 | 0.457561199 DEM 12/19/2001 0:00 0.457707799 173 | 0.455166136 DEM 12/20/2001 0:00 0.455021158 174 | 0.458947175 DEM 12/21/2001 0:00 0.459115743 175 | 0.458778731 DEM 12/22/2001 0:00 0.458715596 176 | 0.458589379 DEM 12/23/2001 0:00 0.458673516 177 | 0.45930553 DEM 12/24/2001 0:00 0.459178988 178 | 0.463864923 DEM 12/25/2001 0:00 0.464015591 179 | 0.463542391 DEM 12/26/2001 0:00 0.463499421 180 | 0.460808258 DEM 12/27/2001 0:00 0.46085073 181 | 0.46988065 DEM 12/28/2001 0:00 0.469792352 182 | 0.470079443 DEM 12/29/2001 0:00 0.470212066 183 | 0.469836497 DEM 12/30/2001 0:00 0.469836497 184 | 0.465831276 DEM 12/31/2001 0:00 0.465961512 185 | -------------------------------------------------------------------------------- /sampledata/Currency_FRF.txt: -------------------------------------------------------------------------------- 1 | 0.157205515 FRF 7/1/2001 0:00 0.157200572 2 | 0.157559715 FRF 7/2/2001 0:00 0.157567163 3 | 0.158626925 FRF 7/3/2001 0:00 0.158626925 4 | 0.158503725 FRF 7/4/2001 0:00 0.158528852 5 | 0.157099318 FRF 7/5/2001 0:00 0.157089447 6 | 0.156872588 FRF 7/6/2001 0:00 0.156882432 7 | 0.157040108 FRF 7/7/2001 0:00 0.15703271 8 | 0.157220344 FRF 7/8/2001 0:00 0.157220344 9 | 0.157205515 FRF 7/9/2001 0:00 0.157200572 10 | 0.155969742 FRF 7/10/2001 0:00 0.155984339 11 | 0.155361526 FRF 7/11/2001 0:00 0.155361526 12 | 0.154554728 FRF 7/12/2001 0:00 0.154571451 13 | 0.155041163 FRF 7/13/2001 0:00 0.155026742 14 | 0.154765221 FRF 7/14/2001 0:00 0.15478199 15 | 0.154125952 FRF 7/15/2001 0:00 0.15411645 16 | 0.15374681 FRF 7/16/2001 0:00 0.153756266 17 | 0.153867459 FRF 7/17/2001 0:00 0.153848521 18 | 0.153350713 FRF 7/18/2001 0:00 0.153355417 19 | 0.152664763 FRF 7/19/2001 0:00 0.15264612 20 | 0.15275338 FRF 7/20/2001 0:00 0.152774383 21 | 0.152832755 FRF 7/21/2001 0:00 0.152825748 22 | 0.154719725 FRF 7/22/2001 0:00 0.154724513 23 | 0.155882215 FRF 7/23/2001 0:00 0.155874926 24 | 0.156230471 FRF 7/24/2001 0:00 0.156245117 25 | 0.15569776 FRF 7/25/2001 0:00 0.155690487 26 | 0.154719725 FRF 7/26/2001 0:00 0.154724513 27 | 0.154478327 FRF 7/27/2001 0:00 0.154473554 28 | 0.154444925 FRF 7/28/2001 0:00 0.154456852 29 | 0.154354336 FRF 7/29/2001 0:00 0.154342424 30 | 0.153364824 FRF 7/30/2001 0:00 0.153376586 31 | 0.152984732 FRF 7/31/2001 0:00 0.152980051 32 | 0.154645552 FRF 8/1/2001 0:00 0.154652727 33 | 0.155743833 FRF 8/2/2001 0:00 0.155736556 34 | 0.153595675 FRF 8/3/2001 0:00 0.15361455 35 | 0.153685375 FRF 8/4/2001 0:00 0.153675928 36 | 0.153701911 FRF 8/5/2001 0:00 0.153701911 37 | 0.153900611 FRF 8/6/2001 0:00 0.153891137 38 | 0.153668844 FRF 8/7/2001 0:00 0.153692461 39 | 0.153853255 FRF 8/8/2001 0:00 0.153829588 40 | 0.154478327 FRF 8/9/2001 0:00 0.154478327 41 | 0.154506968 FRF 8/10/2001 0:00 0.1544831 42 | 0.154354336 FRF 8/11/2001 0:00 0.154361484 43 | 0.154399617 FRF 8/12/2001 0:00 0.154378165 44 | 0.154399617 FRF 8/13/2001 0:00 0.154418691 45 | 0.153480163 FRF 8/14/2001 0:00 0.153468386 46 | 0.15307372 FRF 8/15/2001 0:00 0.153083094 47 | 0.15307372 FRF 8/16/2001 0:00 0.153064348 48 | 0.153515505 FRF 8/17/2001 0:00 0.153524932 49 | 0.153289595 FRF 8/18/2001 0:00 0.153287245 50 | 0.153273148 FRF 8/19/2001 0:00 0.153289595 51 | 0.156342829 FRF 8/20/2001 0:00 0.15633794 52 | 0.157106723 FRF 8/21/2001 0:00 0.15712894 53 | 0.157304431 FRF 8/22/2001 0:00 0.157287112 54 | 0.157220344 FRF 8/23/2001 0:00 0.157230232 55 | 0.156862745 FRF 8/24/2001 0:00 0.156835683 56 | 0.156961231 FRF 8/25/2001 0:00 0.156983407 57 | 0.156872588 FRF 8/26/2001 0:00 0.156862745 58 | 0.156281745 FRF 8/27/2001 0:00 0.156281745 59 | 0.157512562 FRF 8/28/2001 0:00 0.1575076 60 | 0.157106723 FRF 8/29/2001 0:00 0.157119065 61 | 0.156311059 FRF 8/30/2001 0:00 0.156306173 62 | 0.154609688 FRF 8/31/2001 0:00 0.154609688 63 | 0.154523681 FRF 9/1/2001 0:00 0.154509356 64 | 0.154578619 FRF 9/2/2001 0:00 0.154588177 65 | 0.154163969 FRF 9/3/2001 0:00 0.15413783 66 | 0.154435384 FRF 9/4/2001 0:00 0.15444254 67 | 0.154311462 FRF 9/5/2001 0:00 0.154311462 68 | 0.155072419 FRF 9/6/2001 0:00 0.155084443 69 | 0.153808293 FRF 9/7/2001 0:00 0.153801197 70 | 0.153808293 FRF 9/8/2001 0:00 0.153817757 71 | 0.153808293 FRF 9/9/2001 0:00 0.153808293 72 | 0.153414234 FRF 9/10/2001 0:00 0.153423649 73 | 0.152620494 FRF 9/11/2001 0:00 0.152613506 74 | 0.152692736 FRF 9/12/2001 0:00 0.152709059 75 | 0.150606946 FRF 9/13/2001 0:00 0.150581999 76 | 0.148840532 FRF 9/14/2001 0:00 0.148847179 77 | 0.148942508 FRF 9/15/2001 0:00 0.148924763 78 | 0.148946945 FRF 9/16/2001 0:00 0.148971353 79 | 0.147891802 FRF 9/17/2001 0:00 0.147874307 80 | 0.148148148 FRF 9/18/2001 0:00 0.148165709 81 | 0.148787383 FRF 9/19/2001 0:00 0.148778528 82 | 0.150954789 FRF 9/20/2001 0:00 0.150968463 83 | 0.149763374 FRF 9/21/2001 0:00 0.149756645 84 | 0.149979003 FRF 9/22/2001 0:00 0.149981252 85 | 0.149988001 FRF 9/23/2001 0:00 0.149985751 86 | 0.149405367 FRF 9/24/2001 0:00 0.149405367 87 | 0.150346549 FRF 9/25/2001 0:00 0.150342028 88 | 0.151382119 FRF 9/26/2001 0:00 0.151405039 89 | 0.150080293 FRF 9/27/2001 0:00 0.15006678 90 | 0.150606946 FRF 9/28/2001 0:00 0.15061602 91 | 0.150606946 FRF 9/29/2001 0:00 0.150588802 92 | 0.150339768 FRF 9/30/2001 0:00 0.150355591 93 | 0.14933174 FRF 10/1/2001 0:00 0.149320591 94 | 0.149593107 FRF 10/2/2001 0:00 0.149602059 95 | 0.15031491 FRF 10/3/2001 0:00 0.15031265 96 | 0.150579732 FRF 10/4/2001 0:00 0.15059107 97 | 0.150177209 FRF 10/5/2001 0:00 0.150156914 98 | 0.150253929 FRF 10/6/2001 0:00 0.150269734 99 | 0.150114087 FRF 10/7/2001 0:00 0.15010958 100 | 0.150500414 FRF 10/8/2001 0:00 0.15051174 101 | 0.152914551 FRF 10/9/2001 0:00 0.152905199 102 | 0.152891172 FRF 10/10/2001 0:00 0.152914551 103 | 0.151352333 FRF 10/11/2001 0:00 0.151345461 104 | 0.148546473 FRF 10/12/2001 0:00 0.148557507 105 | 0.148546473 FRF 10/13/2001 0:00 0.148544266 106 | 0.148544266 FRF 10/14/2001 0:00 0.148550886 107 | 0.148150343 FRF 10/15/2001 0:00 0.148141564 108 | 0.147080453 FRF 10/16/2001 0:00 0.147095597 109 | 0.148290947 FRF 10/17/2001 0:00 0.148277754 110 | 0.147019906 FRF 10/18/2001 0:00 0.1470372 111 | 0.146239452 FRF 10/19/2001 0:00 0.146235175 112 | 0.146412884 FRF 10/20/2001 0:00 0.146415028 113 | 0.146064298 FRF 10/21/2001 0:00 0.146060031 114 | 0.146160367 FRF 10/22/2001 0:00 0.146179596 115 | 0.146209518 FRF 10/23/2001 0:00 0.146192418 116 | 0.146481514 FRF 10/24/2001 0:00 0.146492243 117 | 0.147292759 FRF 10/25/2001 0:00 0.147284082 118 | 0.14691407 FRF 10/26/2001 0:00 0.14692918 119 | 0.14691407 FRF 10/27/2001 0:00 0.146896805 120 | 0.146707158 FRF 10/28/2001 0:00 0.14671792 121 | 0.146970209 FRF 10/29/2001 0:00 0.14696157 122 | 0.14749045 FRF 10/30/2001 0:00 0.147501328 123 | 0.147557916 FRF 10/31/2001 0:00 0.1475405 124 | 0.148075756 FRF 11/1/2001 0:00 0.148080141 125 | 0.148121815 FRF 11/2/2001 0:00 0.148099879 126 | 0.148121815 FRF 11/3/2001 0:00 0.148121815 127 | 0.148240387 FRF 11/4/2001 0:00 0.148225006 128 | 0.148348143 FRF 11/5/2001 0:00 0.148359148 129 | 0.147002617 FRF 11/6/2001 0:00 0.146998295 130 | 0.146412884 FRF 11/7/2001 0:00 0.14643218 131 | 0.148080141 FRF 11/8/2001 0:00 0.148064793 132 | 0.148942508 FRF 11/9/2001 0:00 0.148962476 133 | 0.148942508 FRF 11/10/2001 0:00 0.148933635 134 | 0.148913675 FRF 11/11/2001 0:00 0.148931417 135 | 0.147154041 FRF 11/12/2001 0:00 0.14714971 136 | 0.146535176 FRF 11/13/2001 0:00 0.146545913 137 | 0.144893938 FRF 11/14/2001 0:00 0.14488554 138 | 0.146500828 FRF 11/15/2001 0:00 0.14650512 139 | 0.145579479 FRF 11/16/2001 0:00 0.145575241 140 | 0.145619758 FRF 11/17/2001 0:00 0.145638845 141 | 0.145674912 FRF 11/18/2001 0:00 0.145666424 142 | 0.14594918 FRF 11/19/2001 0:00 0.145957701 143 | 0.146642617 FRF 11/20/2001 0:00 0.146623266 144 | 0.146886016 FRF 11/21/2001 0:00 0.146905437 145 | 0.146081367 FRF 11/22/2001 0:00 0.146070698 146 | 0.145482055 FRF 11/23/2001 0:00 0.145494755 147 | 0.145482055 FRF 11/24/2001 0:00 0.145477822 148 | 0.145609156 FRF 11/25/2001 0:00 0.145630361 149 | 0.146672729 FRF 11/26/2001 0:00 0.146657672 150 | 0.14618387 FRF 11/27/2001 0:00 0.146190281 151 | 0.146045099 FRF 11/28/2001 0:00 0.146023773 152 | 0.145494755 FRF 11/29/2001 0:00 0.145511692 153 | 0.146423603 FRF 11/30/2001 0:00 0.146412884 154 | 0.146502974 FRF 12/1/2001 0:00 0.146509413 155 | 0.146550208 FRF 12/2/2001 0:00 0.146535176 156 | 0.145180023 FRF 12/3/2001 0:00 0.145182131 157 | 0.144327219 FRF 12/4/2001 0:00 0.14432097 158 | 0.143402071 FRF 12/5/2001 0:00 0.14341441 159 | 0.14305333 FRF 12/6/2001 0:00 0.143043099 160 | 0.143020595 FRF 12/7/2001 0:00 0.143036961 161 | 0.143161873 FRF 12/8/2001 0:00 0.143145479 162 | 0.143319861 FRF 12/9/2001 0:00 0.14332397 163 | 0.143039007 FRF 12/10/2001 0:00 0.14302264 164 | 0.140368608 FRF 12/11/2001 0:00 0.140374519 165 | 0.140783602 FRF 12/12/2001 0:00 0.140779638 166 | 0.138625116 FRF 12/13/2001 0:00 0.138636647 167 | 0.138902393 FRF 12/14/2001 0:00 0.138888889 168 | 0.13898927 FRF 12/15/2001 0:00 0.138991202 169 | 0.138912041 FRF 12/16/2001 0:00 0.138894676 170 | 0.139575133 FRF 12/17/2001 0:00 0.13959072 171 | 0.138484974 FRF 12/18/2001 0:00 0.138477304 172 | 0.136425648 FRF 12/19/2001 0:00 0.136442401 173 | 0.135712832 FRF 12/20/2001 0:00 0.135705465 174 | 0.136838216 FRF 12/21/2001 0:00 0.136838216 175 | 0.136795163 FRF 12/22/2001 0:00 0.136787678 176 | 0.136731569 FRF 12/23/2001 0:00 0.136739047 177 | 0.136943155 FRF 12/24/2001 0:00 0.136937529 178 | 0.13830876 FRF 12/25/2001 0:00 0.138322152 179 | 0.13821509 FRF 12/26/2001 0:00 0.138207449 180 | 0.137396609 FRF 12/27/2001 0:00 0.137398497 181 | 0.140103116 FRF 12/28/2001 0:00 0.140089377 182 | 0.140163992 FRF 12/29/2001 0:00 0.140177745 183 | 0.140087415 FRF 12/30/2001 0:00 0.140067793 184 | 0.138894676 FRF 12/31/2001 0:00 0.138894676 185 | -------------------------------------------------------------------------------- /sampledata/DeltaPipeline.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/DeltaPipeline.zip -------------------------------------------------------------------------------- /sampledata/Distinct Rows All.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/Distinct Rows All.zip -------------------------------------------------------------------------------- /sampledata/DynaColsPipe.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/DynaColsPipe.zip -------------------------------------------------------------------------------- /sampledata/Flatten Orders.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/Flatten Orders.zip -------------------------------------------------------------------------------- /sampledata/Generic SCD Type1.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/Generic SCD Type1.zip -------------------------------------------------------------------------------- /sampledata/Kromer-ADF-Mapping-Data-Flows-ReadMe.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/Kromer-ADF-Mapping-Data-Flows-ReadMe.docx -------------------------------------------------------------------------------- /sampledata/Load Multiple Tables.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/Load Multiple Tables.zip -------------------------------------------------------------------------------- /sampledata/MovieAnalytics.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/MovieAnalytics.zip -------------------------------------------------------------------------------- /sampledata/Moving Average.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/Moving Average.zip -------------------------------------------------------------------------------- /sampledata/Partition Output by Size.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/Partition Output by Size.zip -------------------------------------------------------------------------------- /sampledata/Products.csv: -------------------------------------------------------------------------------- 1 | Name,ProductNumber,Color,StandardCost,ListPrice,Size 2 | "LL Road Frame - Black, 48",FR-R38B-48,Red,204.6251,337.22,M 3 | My Bike,MY-BIKE-01,Blue,10,25,L 4 | -------------------------------------------------------------------------------- /sampledata/SCD Pipeline2.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/SCD Pipeline2.zip -------------------------------------------------------------------------------- /sampledata/SQL Orders to CosmosDB.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/SQL Orders to CosmosDB.zip -------------------------------------------------------------------------------- /sampledata/SearchLog.tsv: -------------------------------------------------------------------------------- 1 | 399266 2/15/2012 11:53:16 AM en-us how to make nachos 73 www.nachos.com;www.wikipedia.com NULL 2 | 382045 2/15/2012 11:53:18 AM en-gb best ski resorts 614 skiresorts.com;ski-europe.com;www.travelersdigest.com/ski_resorts.htm ski-europe.com;www.travelersdigest.com/ski_resorts.htm 3 | 382045 2/16/2012 11:53:20 AM en-gb broken leg 74 mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture mayoclinic.com/health;webmd.com/a-to-z-guides;mybrokenleg.com;wikipedia.com/Bone_fracture 4 | 106479 2/16/2012 11:53:50 AM en-ca south park episodes 24 southparkstudios.com;wikipedia.org/wiki/Sout_Park;imdb.com/title/tt0121955;simon.com/mall southparkstudios.com 5 | 906441 2/16/2012 11:54:01 AM en-us cosmos 1213 cosmos.com;wikipedia.org/wiki/Cosmos:_A_Personal_Voyage;hulu.com/cosmos NULL 6 | 351530 2/16/2012 11:54:01 AM en-fr microsoft 241 microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com NULL 7 | 640806 2/16/2012 11:54:02 AM en-us wireless headphones 502 www.amazon.com;reviews.cnet.com/wireless-headphones;store.apple.com www.amazon.com;store.apple.com 8 | 304305 2/16/2012 11:54:03 AM en-us dominos pizza 60 dominos.com;wikipedia.org/wiki/Domino's_Pizza;facebook.com/dominos dominos.com 9 | 460748 2/16/2012 11:54:04 AM en-us yelp 1270 yelp.com;apple.com/us/app/yelp;wikipedia.org/wiki/Yelp,_Inc.;facebook.com/yelp yelp.com 10 | 354841 2/16/2012 11:59:01 AM en-us how to run 610 running.about.com;ehow.com;go.com running.about.com;ehow.com 11 | 354068 2/16/2012 12:00:33 PM en-mx what is sql 422 wikipedia.org/wiki/SQL;sqlcourse.com/intro.html;wikipedia.org/wiki/Microsoft_SQL wikipedia.org/wiki/SQL 12 | 674364 2/16/2012 12:00:55 PM en-us mexican food redmond 283 eltoreador.com;yelp.com/c/redmond-wa/mexican;agaverest.com NULL 13 | 347413 2/16/2012 12:11:55 PM en-gr microsoft 305 microsoft.com;wikipedia.org/wiki/Microsoft;xbox.com NULL 14 | 848434 2/16/2012 12:12:35 PM en-ch facebook 10 facebook.com;facebook.com/login;wikipedia.org/wiki/Facebook facebook.com 15 | 604846 2/16/2012 12:13:55 PM en-us wikipedia 612 wikipedia.org;en.wikipedia.org;en.wikipedia.org/wiki/Wikipedia wikipedia.org 16 | 840614 2/16/2012 12:13:56 PM en-us xbox 1220 xbox.com;en.wikipedia.org/wiki/Xbox;xbox.com/xbox360 xbox.com/xbox360 17 | 656666 2/16/2012 12:15:55 PM en-us hotmail 691 hotmail.com;login.live.com;msn.com;en.wikipedia.org/wiki/Hotmail NULL 18 | 951513 2/16/2012 12:17:00 PM en-us pokemon 63 pokemon.com;pokemon.com/us;serebii.net pokemon.com 19 | 350350 2/16/2012 12:18:17 PM en-us wolfram 30 wolframalpha.com;wolfram.com;mathworld.wolfram.com;en.wikipedia.org/wiki/Stephen_Wolfram NULL 20 | 641615 2/16/2012 12:19:55 PM en-us kahn 119 khanacademy.org;en.wikipedia.org/wiki/Khan_(title);answers.com/topic/genghis-khan;en.wikipedia.org/wiki/Khan_(name) khanacademy.org 21 | 321065 2/16/2012 12:20:03 PM en-us clothes 732 gap.com;overstock.com;forever21.com;footballfanatics.com/college_washington_state_cougars footballfanatics.com/college_washington_state_cougars 22 | 651777 2/16/2012 12:20:33 PM en-us food recipes 183 allrecipes.com;foodnetwork.com;simplyrecipes.com foodnetwork.com 23 | 666352 2/16/2012 12:21:03 PM en-us weight loss 630 en.wikipedia.org/wiki/Weight_loss;webmd.com/diet;exercise.about.com webmd.com/diet 24 | -------------------------------------------------------------------------------- /sampledata/loans.csv: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/loans.csv -------------------------------------------------------------------------------- /sampledata/metadatapipeline.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/metadatapipeline.zip -------------------------------------------------------------------------------- /sampledata/names/readme.md: -------------------------------------------------------------------------------- 1 | samples with names 2 | -------------------------------------------------------------------------------- /sampledata/names2.csv: -------------------------------------------------------------------------------- 1 | acctnum,fullname,phone,zip 2 | 1,mike adams,5551212,98900 3 | 2,mike addams,5551212,98900 4 | 3,mick adams,4441313,12345 5 | 4,mike addams,4441000,54321 6 | 5,joe schmo,5551212,88990 7 | 6,joey schmo,5551212,88990 8 | 7,joe shmoe,5551212,88990 9 | 8,sally jones,3331010,55442 10 | 9,sallie jones,3331010,55442 11 | 10,sally joness,2223030,55443 12 | 11,mark kromer,6104845151,19888 13 | 12,mark cromer,6104845151,19888 14 | 13,marC cromer,5161234567,34443 15 | 14,Anita Thomas,4545454231,10101 -------------------------------------------------------------------------------- /sampledata/powerquery/employeedemo/EmployeeInfo.csv: -------------------------------------------------------------------------------- 1 | EmployeeId,FirstName,LastName,City,ZIP,Email,State,DateOfJoining,hiphens,mix,defaultsimpleformat 2 | 1," ""Harry"""," ""Potter"""," ""Bellevue"""," ""98004"""," ""harryk@fabrikam.com"""," ""WA""",1/20/2008,1-20-2008,1-20-2008,2021-1-6 3 | 2," ""Hermione"""," ""Granger"""," ""Las Vegas"""," ""19801"""," ""hermione@fabrikam.com"""," ""DE""",11/21/2008,1-20-2008,1/20/2008,2021-1-7 4 | 3," ""Lord"""," ""Voldemort"""," ""Billings"""," ""59115"""," ""lordc@fabrikam.com"""," ""MT""",9/22/2008,1-20-2008,1-21-2008 5 | 4," ""Albus"""," ""Dumbledore"""," ""Newyork"""," ""12345"""," ""albusd@fabrikam.com"""," ""NY""",8/23/2008,1-20-2008,1-22-2008,2021-1-6 6 | 5," ""Severus"""," ""Snape"""," ""Columbus"""," ""56789"""," ""severus@fabrikam.com"""," ""OH""",1/24/2008,1-20-2008,1/20/2008,2021-1-2 7 | 6," ""Draco"""," ""Malfoy"""," ""Bellevue"""," ""91019"""," ""dracoh@fabrikam.com"""," ""TX""",1/25/2008,1-20-2008,1-20-2008,2021-1-5 8 | 7," ""Dobby"""," ""Elf"""," ""Salt Lake City""",," ""dobbyz@fabrikam.com"""," ""UT""",1/26/2008,1-20-2008,1-20-2008 9 | 8," ""Ron"""," ""Weasley"""," ""Las Vegas"""," ""51527"""," ""ronag@fabrikam.com"""," ""NV""",1/27/2008,1-20-2008,1/20/2008 10 | 9," ""Sirius"""," ""Black"""," ""Newyork"""," ""61623"""," ""hcblack@fabrikam.com"""," ""RI""",1/28/2008,1-20-2008,1/21/2008 11 | 10," ""Luna"""," ""Lovegood"""," ""Kansas City"""," ""68692"""," ""lunal@fabrikam.com"""," ""MO""",1/29/2008,1-20-2008,1-20-2008 12 | 11," ""Rubeus"""," ""Hagrid"""," ""Boston"""," ""98052"""," ""gamalfoyl@fabrikam.com"""," ""ID""",1/30/2008,1-20-2008,1-20-2008 13 | 12," ""Bellatrix"""," ""Lestrange"""," ""Bellevue"""," ""78965"""," ""mlestrange@fabrikam.com"""," ""CA""",1/31/2008,1-20-2008 14 | 13," ""Ginny"""," ""Weasley"""," ""Redmond"""," ""98052"""," ""ginnywfabrikam.com"""," ""WA""",2/1/2008,1-20-2008,1-20-2008 15 | 14," ""Neville"""," ""Longbottom"""," ""Las Vegas"""," ""98053"""," ""nevillea@fabrikam.com"""," ""WA""",2/2/2008,1-20-2008,1-20-2008 16 | 15," ""Alastor"""," ""Moody"""," ""Newyork"""," ""98054"""," ""albusd@fabrikamcom"""," ""WA""",2/3/2008,1-20-2008,1-20-2008 17 | 16," ""Lucius"""," ""Malfoy"""," ""Bellevue""",," ""luciusmalfoy@fabrikam.com"""," ""WA""",2/4/2008,1-20-2008,1-20-2008 18 | 17," ""Cedric"""," ""Diggory"""," ""Seattle"""," ""98989"""," ""cedricp@fabrikam.com"""," ""WA""",2/5/2008,1-20-2008,1-20-2008 19 | 18," ""Argus"""," ""Filch"""," ""Salt Lake City"""," ""11128"""," ""argusm@fabrikam.com"""," ""UT""",2/6/2008,1-20-2008,1-20-2008 20 | 19," ""Vernon"""," ""Dursley"""," ""Newyork"""," ""87654"""," ""gamalfoyl@fabrikam.com"""," ""OR""",2/17/2008,1-20-2008,1-20-2008 21 | 20,Paras,Kumar," ""Newyork""",98021,parask@fabrikam.com,WA,9/29/2006,1-20-2008,1-20-2008 22 | 21,Gaurav,Malhotra," ""Las Vegas""",,,,12/21/2019,1-20-2008,1-20-2008 23 | -------------------------------------------------------------------------------- /sampledata/powerquery/employeedemo/EmployeeSalary.csv: -------------------------------------------------------------------------------- 1 | EmployeeId,BasePay 2 | 1,90000 3 | 2,100000 4 | 3,110000 5 | 4,120000 6 | 5,130000 7 | 6,140000 8 | 7,150000 9 | 8,160000 10 | 9,170000 11 | 10,180000 12 | 11,190000 13 | 12,200000 14 | 13,210000 15 | 14,220000 16 | 15,230000 17 | 16,240000 18 | 17,250000 19 | 18,260000 20 | 19,270000 21 | 20,30000 22 | 21,10000 23 | -------------------------------------------------------------------------------- /sampledata/readme: -------------------------------------------------------------------------------- 1 | If you are here from the ADF Mapping Data Flows book, please download this Word doc here for instructions on accessing the samples: https://github.com/kromerm/adfdataflowdocs/blob/master/sampledata/Kromer-ADF-Mapping-Data-Flows-ReadMe.docx 2 | -------------------------------------------------------------------------------- /sampledata/scdtype2/Employee1.csv: -------------------------------------------------------------------------------- 1 | EmpID,Region,Status,Function,Level,Role,StartDate,EndDate 2 | 1234,SER,A,ADM,A,Finance,1/1/2000, 3 | 1345,SER,A,ADM,A,Finance,4/5/2008, 4 | 1789,PNW,A,ENG,N,Engineer,7/9/2011, 5 | 2349,PNW,I,ENG,N,Engineer,9/8/1999,4/1/2019 6 | 8382,NER,A,RAD,A,Marketing,4/5/1998, 7 | -------------------------------------------------------------------------------- /sampledata/scdtype2/employee2.csv: -------------------------------------------------------------------------------- 1 | EmpID,Region,Status,Function,Level,Role,StartDate,EndDate 2 | 1234,SER,A,ADM,A,Finance,1/1/2000, 3 | 1345,SER,A,ADM,A,Finance,4/5/2008, 4 | 1789,PNW,A,ENG,N,Engineer,7/9/2011, 5 | 2349,PNW,I,ENG,N,Engineer,9/8/1999,4/1/2019 6 | 8382,NER,I,RAD,A,Marketing,4/5/1998,4/9/2019 7 | 7384,MDW,A,SAL,B,Sales,6/5/2010, 8 | 7355,MDW,A,SAL,B,Sales,1/2/2013, 9 | -------------------------------------------------------------------------------- /sampledata/scdtype2/readme.md: -------------------------------------------------------------------------------- 1 | sample data for SCD Type 2 2 | -------------------------------------------------------------------------------- /sampledata/sinkIfMoreThanNRows.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/sinkIfMoreThanNRows.zip -------------------------------------------------------------------------------- /sampledata/sinkIfMoreThanNRows2.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/sinkIfMoreThanNRows2.zip -------------------------------------------------------------------------------- /sampledata/small_radio_json.json: -------------------------------------------------------------------------------- 1 | {"ts":1409318650332,"userId":"309","sessionId":1879,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"free","itemInSession":2,"location":"Killeen-Temple, TX","lastName":"Montgomery","firstName":"Annalyse","registration":1384448062332,"gender":"F","artist":"El Arrebato","song":"Quiero Quererte Querer","length":234.57914} 2 | {"ts":1409318653332,"userId":"11","sessionId":10,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":9,"location":"Anchorage, AK","lastName":"Thomas","firstName":"Dylann","registration":1400723739332,"gender":"M","artist":"Creedence Clearwater Revival","song":"Born To Move","length":340.87138} 3 | {"ts":1409318685332,"userId":"201","sessionId":2047,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":11,"location":"New York-Newark-Jersey City, NY-NJ-PA","lastName":"Watts","firstName":"Liam","registration":1406279422332,"gender":"M","artist":"Gorillaz","song":"DARE","length":246.17751} 4 | {"ts":1409318686332,"userId":"779","sessionId":2136,"page":"Home","auth":"Logged In","method":"GET","status":200,"level":"free","itemInSession":0,"location":"Nashville-Davidson--Murfreesboro--Franklin, TN","lastName":"Townsend","firstName":"Tess","registration":1406970190332,"gender":"F"} 5 | {"ts":1409318697332,"userId":"401","sessionId":400,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"free","itemInSession":2,"location":"Atlanta-Sandy Springs-Roswell, GA","lastName":"Smith","firstName":"Margaux","registration":1406191211332,"gender":"F","artist":"Otis Redding","song":"Send Me Some Lovin'","length":135.57506} 6 | {"ts":1409318714332,"userId":"521","sessionId":520,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":39,"location":"Chicago-Naperville-Elgin, IL-IN-WI","lastName":"Morse","firstName":"Alan","registration":1401760632332,"gender":"M","artist":"Slightly Stoopid","song":"Mellow Mood","length":198.53016} 7 | {"ts":1409318743332,"userId":"244","sessionId":2261,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"free","itemInSession":1,"location":"San Jose-Sunnyvale-Santa Clara, CA","lastName":"Shelton","firstName":"Gabriella","registration":1389460542332,"gender":"F","artist":"NOFX","song":"Linoleum","length":130.2722} 8 | {"ts":1409318804332,"userId":"969","sessionId":968,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":0,"location":"Detroit-Warren-Dearborn, MI","lastName":"Williams","firstName":"Elijah","registration":1388691347332,"gender":"M","artist":"Nirvana","song":"The Man Who Sold The World","length":260.98893} 9 | {"ts":1409318832332,"userId":"401","sessionId":400,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"free","itemInSession":3,"location":"Atlanta-Sandy Springs-Roswell, GA","lastName":"Smith","firstName":"Margaux","registration":1406191211332,"gender":"F","artist":"Aventura","song":"La Nina","length":293.56363} 10 | {"ts":1409318891332,"userId":"779","sessionId":2136,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"free","itemInSession":1,"location":"Nashville-Davidson--Murfreesboro--Franklin, TN","lastName":"Townsend","firstName":"Tess","registration":1406970190332,"gender":"F","artist":"Harmonia","song":"Sehr kosmisch","length":655.77751} 11 | {"ts":1409318912332,"userId":"521","sessionId":520,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":40,"location":"Chicago-Naperville-Elgin, IL-IN-WI","lastName":"Morse","firstName":"Alan","registration":1401760632332,"gender":"M","artist":"Spragga Benz","song":"Backshot","length":122.53995} 12 | {"ts":1409318931332,"userId":"201","sessionId":2047,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":12,"location":"New York-Newark-Jersey City, NY-NJ-PA","lastName":"Watts","firstName":"Liam","registration":1406279422332,"gender":"M","artist":"Bananarama","song":"Love In The First Degree","length":208.92689} 13 | {"ts":1409318931332,"userId":"201","sessionId":2047,"page":"Home","auth":"Logged In","method":"GET","status":200,"level":"paid","itemInSession":13,"location":"New York-Newark-Jersey City, NY-NJ-PA","lastName":"Watts","firstName":"Liam","registration":1406279422332,"gender":"M"} 14 | {"ts":1409318993332,"userId":"11","sessionId":10,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":10,"location":"Anchorage, AK","lastName":"Thomas","firstName":"Dylann","registration":1400723739332,"gender":"M","artist":"Alliance Ethnik","song":"Représente","length":252.21179} 15 | {"ts":1409319034332,"userId":"521","sessionId":520,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":41,"location":"Chicago-Naperville-Elgin, IL-IN-WI","lastName":"Morse","firstName":"Alan","registration":1401760632332,"gender":"M","artist":"Sense Field","song":"Am I A Fool","length":181.86404} 16 | {"ts":1409319064332,"userId":"969","sessionId":968,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":1,"location":"Detroit-Warren-Dearborn, MI","lastName":"Williams","firstName":"Elijah","registration":1388691347332,"gender":"M","artist":"Binary Star","song":"Solar Powered","length":268.93016} 17 | {"ts":1409319125332,"userId":"401","sessionId":400,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"free","itemInSession":4,"location":"Atlanta-Sandy Springs-Roswell, GA","lastName":"Smith","firstName":"Margaux","registration":1406191211332,"gender":"F","artist":"Sarah Borges and the Broken Singles","song":"Do It For Free","length":158.95465} 18 | {"ts":1409319215332,"userId":"521","sessionId":520,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":42,"location":"Chicago-Naperville-Elgin, IL-IN-WI","lastName":"Morse","firstName":"Alan","registration":1401760632332,"gender":"M","artist":"Incubus","song":"Drive","length":232.46322} 19 | {"ts":1409319245332,"userId":"11","sessionId":10,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":11,"location":"Anchorage, AK","lastName":"Thomas","firstName":"Dylann","registration":1400723739332,"gender":"M","artist":"Ella Fitzgerald","song":"On Green Dolphin Street (Medley) (1999 Digital Remaster)","length":427.15383} 20 | {"ts":1409319283332,"userId":"401","sessionId":400,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"free","itemInSession":5,"location":"Atlanta-Sandy Springs-Roswell, GA","lastName":"Smith","firstName":"Margaux","registration":1406191211332,"gender":"F","artist":"10cc","song":"Silly Love","length":241.34485} 21 | {"ts":1409319293332,"userId":"906","sessionId":1909,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"free","itemInSession":0,"location":"Toledo, OH","lastName":"Oconnell","firstName":"Aurora","registration":1406406461332,"gender":"F","artist":"Eric Johnson","song":"Trail Of Tears (Album Version)","length":361.37751} 22 | {"ts":1409319332332,"userId":"969","sessionId":968,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":2,"location":"Detroit-Warren-Dearborn, MI","lastName":"Williams","firstName":"Elijah","registration":1388691347332,"gender":"M","artist":"Phoenix","song":"Holdin' On Together","length":207.15057} 23 | {"ts":1409319365332,"userId":"750","sessionId":749,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"free","itemInSession":0,"location":"Grants Pass, OR","lastName":"Coleman","firstName":"Alex","registration":1404326435332,"gender":"M","artist":"Ween","song":"The Stallion","length":276.13995} 24 | {"ts":1409319447332,"userId":"521","sessionId":520,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":43,"location":"Chicago-Naperville-Elgin, IL-IN-WI","lastName":"Morse","firstName":"Alan","registration":1401760632332,"gender":"M","artist":"dEUS","song":"Secret Hell","length":299.83302} 25 | {"ts":1409319539332,"userId":"969","sessionId":968,"page":"NextSong","auth":"Logged In","method":"PUT","status":200,"level":"paid","itemInSession":3,"location":"Detroit-Warren-Dearborn, MI","lastName":"Williams","firstName":"Elijah","registration":1388691347332,"gender":"M","artist":"Holly Cole","song":"Cry (If You Want To)","length":158.98077} 26 | -------------------------------------------------------------------------------- /sampledata/summaryStats2.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/summaryStats2.zip -------------------------------------------------------------------------------- /sampledata/synapse-dataflows-tutorial-001.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/synapse-dataflows-tutorial-001.docx -------------------------------------------------------------------------------- /sampledata/synapse-dataflows-tutorial-004.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/kromerm/adfdataflowdocs/a27cfb3e20135679e9840100e004ff33fbd5d1d7/sampledata/synapse-dataflows-tutorial-004.docx -------------------------------------------------------------------------------- /videos/readme.md: -------------------------------------------------------------------------------- 1 | Start Here: 2 | * [ADF Data Flows: Getting started](http://youtu.be/MFw7t_8tuV4) 3 | * [Getting started with ADF Data Flows from Altius](https://www.youtube.com/watch?v=CQ1JfgZKH9s&t=981s) 4 | * [Getting started with Mapping Data Flows by Adam from Azure 4 Everyone](https://www.youtube.com/watch?v=AUpMCRggjIM) 5 | 6 | Debug and Prep: 7 | * [ADF Data Flow: Debug Session, Pt 1](https://www.youtube.com/watch?v=k0YHmJc14FM) 8 | * [ADF Data Flow: Debug Session, Pt 2 Data Prep](https://www.youtube.com/watch?v=6ezKRDgK3rE) 9 | * [ADF Data Flow: Debug and Test Lifecycle](https://youtu.be/fktIWdJiqTk) 10 | * [Mapping and Wrangling: Data Exploration](http://youtu.be/En1ztyh5GaA) 11 | * [Debug and testing End-to-End in Mapping Data Flows](http://youtu.be/3ANxyvDGfjA) 12 | * [Data Masking for Sensitive Data](https://www.youtube.com/watch?v=OFd4LeiTmfs) 13 | * [Benchmark Timings](http://youtu.be/6CSbWm4lRhw?hd=1) 14 | * [Dynamically optimize data flow cluster size at runtime](https://www.youtube.com/watch?v=jWSkJdtiJNM) 15 | 16 | Transformations: 17 | * [ADF Data Flows: Joins](https://www.youtube.com/watch?v=zukwayEXRtg) 18 | * [ADF Data Flows: Pivot Transformation](https://www.youtube.com/watch?v=Tua14ZQA3F8&t=34s) 19 | * [ADF Data Flows: Window Transformation](https://www.youtube.com/watch?v=m6zgbtY5AYQ) 20 | * [ADF Data Flows: Alter Row Transformation](https://www.youtube.com/watch?v=4ktoohwptmQ) 21 | * [ADF Data Flows: Surrogate Key Transformation](https://www.youtube.com/watch?v=ISpegL9CbTM) 22 | * [ADF Data Flows: Lookup Transformation](https://www.youtube.com/watch?v=9U-0VPU2ZPU) 23 | * [ADF Data Flows: Lookup Transformation Updates & Tips](https://youtu.be/MBskWoeuTLw) 24 | * [ADF Data Flows: Aggregate Transformation](http://youtu.be/jdL75xIr98I) 25 | * [ADF Data Flows: Derived Column Transformation](https://www.youtube.com/watch?v=FFCbU4ujCiY) 26 | * [ADF Data Flows: Union Transformation](http://youtu.be/_Et6mg1tEr8?hd=1) 27 | * [ADF Data Flows: Unpivot Transformation](http://youtu.be/KFYkxcpB8b0?hd=1) 28 | * [ADF Data Flows: Exists Transformation](http://youtu.be/GS8JVgNBMfs?hd=1) 29 | * [ADF Data Flows: Filter Transformation](https://youtu.be/OhbKDOXSfeE) 30 | * [ADF Data Flows: Conditional Split Transformation](http://youtu.be/W1lQHojhKZw?hd=1) 31 | * [ADF Data Flows: Select and New Branch](http://youtu.be/F9VjQ_YyRyU?hd=1) 32 | * [ADF Data Flows: Select transformation: Large Datasets](https://www.youtube.com/watch?v=R5ea2_R0ouc) 33 | * [ADF Data Flows: Flatten transformation](https://youtu.be/VY2tFQJoAXE) 34 | * [ADF Data Flows: Flowlet](https://www.youtube.com/watch?v=bVdeBFiiJNQ) 35 | * [ADF Data Flows: Stringify transformation](https://www.youtube.com/watch?v=1X4sRHf5W2U) 36 | * [ADF Data Flows: External Call transformation](https://www.youtube.com/watch?v=dIMfbwX8r0A) 37 | * [ADF Data Flows: Assert transformation](https://www.youtube.com/watch?v=8K7flL7JWMo) 38 | * [ADF Data Flow Sources: CosmosDB](http://youtu.be/plp1etT2ftY?hd=1) 39 | * [Fuzzy Lookups](http://youtu.be/7gdwExjHBbw) 40 | * [Quick Transformations](https://www.youtube.com/watch?v=CP0TnNmaLA0) 41 | * [Drifted Columns using Pivot](https://youtu.be/5MygzCX0wnM) 42 | * [Source & Sink Transformations: Partitioned Files](https://www.youtube.com/watch?v=7Q-db4Qgc4M) 43 | * [ADF Data Flows: New Datasets Parquet and Delimited Text](https://youtu.be/V_2a60j2Kjo) 44 | * [JSON Transformations](https://www.youtube.com/watch?v=yY5aB7Kdhjg) 45 | * [Infer Data Types](https://www.youtube.com/watch?v=nJjRzlFktlA) 46 | * [Dynamic Joins and Lookups](https://youtu.be/CMOPPie9bXM) 47 | * [Transform Hierachical Data](https://youtu.be/oAEh21NFgWQ) 48 | * [Rank transformation](https://youtu.be/6XvgkbMtws0) 49 | * [Row context via Window transformation](http://youtu.be/jqt1gmX2XUg) 50 | * [Parse transformation](https://www.youtube.com/watch?v=r7O7AJcuqoY) 51 | * [Transform complex data types](https://youtu.be/Wk0C76wnSDE) 52 | * [Output to next activity](http://youtu.be/r1m3Ya14qpE?hd=1) 53 | 54 | Optimize: 55 | * [ADF Data Flow: Monitoring UX](https://www.youtube.com/watch?v=AYkwX6J9sII&t=4s) 56 | * [ADF Data Flow: Data Lineage](https://www.youtube.com/watch?v=5KvqYF-y93s) 57 | * [ADF Data Flow: Optimize Data Flows](https://www.youtube.com/watch?v=a2KtwUJngHo) 58 | * [ADF Data Flow: Iterate files with parameters](http://youtu.be/uEgz0ptYRDM?hd=1) 59 | * [ADF Data Flow: Decrease start-up times](https://youtu.be/FFCbU4ujCiY?t=528) 60 | * [ADF Data Flow Perf: SQL DB](https://youtu.be/iyZT5CY3V_4) 61 | * [Optimize compute environment](https://www.youtube.com/watch?v=VT_2ZV3a7Fc&feature=youtu.be&hd=1) 62 | * [Quick cluster start-up time with Azure IR](https://www.youtube.com/watch?v=mxzsOZX6WVY) 63 | * [Updated monitoring view](https://www.youtube.com/watch?v=FWCBslsk6KE) 64 | 65 | Patterns: 66 | * [ADF Data Flow: Staging Data Pattern](https://youtu.be/mZLKdyoL3Mo) 67 | * [ADF Data Flow: Clean Addresses Pattern](https://youtu.be/axEYbuU3lmw) 68 | * [ADF Data Flow: ETL DW Load Pattern](https://www.youtube.com/watch?v=7mLqwtmeQFg) 69 | * [ADF Data Flow: Deduplication of your Data](https://youtu.be/QOi26ETtPTw) 70 | * [ADF Data Flow: Merge Files](http://youtu.be/WbDTBAyYte8) 71 | * [ADF Data Flow: SCD Type 1 Overwrite](http://youtu.be/Rz2zx5GRbrA) 72 | * [ADF Data Flow: SCD Type 2 History](http://youtu.be/123CptslKvU) 73 | * [ADF Data Flow: Fact Table Loading](http://youtu.be/ABG3X9pgFPQ) 74 | * [ADF Data Flow: Logical Models](http://youtu.be/K5tgzLjEE9Q) 75 | * [ADF Data Flow: Detect source data changes](http://youtu.be/CaxIlI7oXfI?hd=1) 76 | * [ADF Data Flow: Schema Drift Handling](https://www.youtube.com/watch?v=vSTn_aGq3C8) 77 | * [Flexible Schema Handling with Schema Drift](https://www.youtube.com/watch?v=1vvCM29JSAs) 78 | * [ADF Data Flow: Parameters](https://www.youtube.com/watch?v=vpuuQcFojt8) 79 | * [Transform SQL Server on-prem with delta data loading pattern](https://youtu.be/IN-4v0e7UIs) 80 | * [SCD Type 1 and Type 2 demo by Bob Rubocki](https://www.youtube.com/watch?v=ps12o93VAo0) 81 | * [Rule-based Mapping](https://youtu.be/5lf1lh1qMwU) 82 | * [Distinct Row & Row Counts](https://youtu.be/ryYo8UFUgTI) 83 | * [Handling truncation errors](http://youtu.be/sPpcSiKQz34) 84 | * [Partition your files in the data lake](https://youtu.be/VNWv-MvLQ_0) 85 | * [Intelligent data routing](https://youtu.be/PIGw-Z-0upw) 86 | * [Transform and Create Multiple Tables](https://www.youtube.com/watch?v=Sj15Yjwai1A) 87 | * [Data Quality Patterns](https://www.youtube.com/watch?v=wjsi9g3ffco) 88 | * [Self-join patterns](https://www.youtube.com/watch?v=Dx1kANfnvmk&feature=youtu.be&hd=1) 89 | * [Data lake file output options](https://www.youtube.com/watch?v=NAPSbjvSQA8) 90 | * [Evolving database schema handling](http://youtu.be/urzLAb83IjU?hd=1) 91 | * [Change data capture in ADF](https://www.youtube.com/watch?v=Y9J5J2SRt5k) 92 | * [Incremental data loading with Azure Data Factory and Azure SQL DB](https://youtu.be/6tNWFErnGGU) 93 | * [Transform Avro data from Event Hubs using Parse and Flatten](https://youtu.be/F2x7Eg-635o) 94 | 95 | Expressions: 96 | * [Date/Time expressions](https://www.youtube.com/watch?v=uboyCZ25r_E&feature=youtu.be&hd=1) 97 | * [Split Arrays and Case Statement](http://youtu.be/DHNH8ZO7YjI?hd=1) 98 | * [Fun with string interpolation and parameters](https://youtu.be/hb3-cn2CMgM) 99 | * [Data Flow Script Intro: Copy, Paste, Snippets](https://www.youtube.com/watch?v=3_1I4XdoBKQ) 100 | * [Data Quality Expressions](https://www.youtube.com/watch?v=O8gmv5-lXhs) 101 | * [Collect aggregate function](https://www.youtube.com/watch?v=zneE18EHJSE) 102 | 103 | Metadata: 104 | * [Metadata validation rules](https://www.youtube.com/watch?v=E_UD3R-VpYE) 105 | --------------------------------------------------------------------------------