Azure SSIS and ADF v2

Introducing SSIS on Azure and ADFv2

A Unified Approach to Modern ETL/ELT Development.

This week Microsoft announced their plan to bring Modern ETL Development into Azure at their annual Microsoft Ignite event. This includes SSIS as a Service and a new release of Azure Data Factory(Version 2).

SSIS on Azure is the unification and modernization of SSIS as a Service and an upgrade to Azure Data Factory. This provides us the streamlined functionality when creating Traditional ETL Pipelines along with our Big Data Pipelines(Modern ETL).

What's New in Azure Data Factory v2?

alt

1. Azure SSIS

2. Use Cases

3. Considerations


Azure SSIS

Azure SSIS will provide us many new capabilities in how we design and approach our Data Pipeline Development(ETL/ELT). Let's look at the new capabilities as well as some traditional workflows and how that will work in Azure SSIS.

Azure SSIS is essentially SSIS as a Service. Providing us with the capability to run SSIS packages on a cluster of nodes within Azure and providing that rich scale-out capability that is offered in many of the Azure services. It is also incorporated into Azure Data Factory(Version 2) which in itself provides us the Big Data Pipelines across the cloud and On-Prem.

SSIS Data Factory Pipeline Example

alt

Azure Data Factory provides the concept of using Pipelines which group together Linked Services(essentially connections and compute), Data Sets(similar to a external table), and Activities(SSIS Package, Machine Learning job, etc.).

Provisioning

alt

Referencing the image above, the SSISDB can live On-Prem or in Azure. The nodes that we can scale out to utilizes Azure Data Factory Computes to run the dtsx jobs. The nodes are also specifically provisioned for executing SSIS packages(.dtsx).

*Currently provisioning is only supported through Powershell which can be referenced here: SSIS Integration RunTime

Big Data Components Internal to SSIS

alt Not only can we utilize Azure Data Factory for our Big Data Integrations, we can also leverage the newest SSIS Components alongside our ADF pipelines. Essentially, this functionality already existed with running ADF pipelines across Azure, but previously only for HDInsight and other Azure services jobs such as Azure Machine Learning. Now we have the capability for SSIS as well.

Execution

Execution of SSIS packages will support many of the existing methods such as execution from:

  • SSMS - SQL Server Management Studio

  • Command Line - dtexec.exe

  • .NET SDK

  • SSISDB Stored Procedures

Though you will be able to execute these packages from SSMS, the new concept allows you to execute them from Azure Data Factory or other custom code throughout Microsoft Azure.

Monitoring

You will still be able to monitor your SSIS packages through SSMS, but look for future additional features to be added the encompasses the entire ETL/ELT ecosystem.

Monitoring Methods:

  • SSMS

  • Powershell

  • .NET SDK

Additional Monitoring Requirements:
Not only will we need to account for SSIS executions when monitoring but also provisioning and allocation of ADF nodes.

Scheduling

Scheduling can be accomplished in a number of ways, some of which are listed here:

  • ADFv2 App - User Interface

  • Azure SQL DB - Elastic Jobs

  • On-Prem SQL Server - SQL Server Agent


Use Cases

There are many reasons you may want to shift your SSIS processing to Microsoft's Cloud Platform.

Migration to Microsoft Azure -You want to completely migrate to Azure but previously did not want to pursue SSIS Scale-Out on VM's or previous Azure Data Factory offerings.

Modern ETL Development Capabilities - You need a robust offering to better handle both your structured and un-structured data in Azure and On-Prem as you expand and modernize your Data Estate.

Reduce ETL Cost - You want to utilize Azure to reduce ETL costs and increase performance.


Considerations

Pattern Discovery

Consider a pattern discovery initiative when planning to implement Big Data pipelines across Azure. Understand that there are many options available to solve Modern Data Integration problems. Avoid creating multiple one off Pipelines that could soon become unmanageable.

  • Ingestion
  • Transformation
  • Conversion

Patterned Development will help speed development times, ensure manageability, and reduce Azure consumption costs.

It is important to a have developed a sound Data Integration plan that takes into account your traditional and modern data sources, and identifying patterns that can fit both scenarios.

Cost Analysis

It is important to understand the cost savings included with lifting existing SSIS ETL to Azure. Moving from a SSIS/SQL Server licensing model to Azure's pay-per-hour model, can quickly reduce your organization's overall ETL costs. Development best practices can also change as performance can quickly and directly effect ETL cost. Understand how you will be executing packages and jobs and have a strong monitoring solution to ensure the lowest cost possible.

Strategic Capabilities

The last thing to consider is the long term capabilities you are looking to pursue. It is important to identify where you are within the maturity of your business intelligence and analytic's growth model. One thing to steer away from is utilizing multiple tools for the same purpose, and with Azure SSIS and ADFv2, Microsoft is helping to close that gap.




Resources:
  1. Microsoft: Azure Data Factory: V2 and SSIS