FreshBI

View Original

Follow Along an Azure Data Factory Migration Process:


An Azure Data Factory Example 

Hello again, if you read our last blog, you will remember that we outlined three of the most common scenarios that we find our partners in need of Azure Data Factory. We also showed you the architecture required to bring their solutions to life. Today our goal is to make our way all the way through the process of using Azure Data Factory to migrate an existing Amazon S3 database into the Azure Cloud and consume it using PowerBI.

Data Factory is a Microsoft’s Cloud Extract Transform and Load (ETL) tool available on the Azure cloud. It allows us to extract data from cloud or on-premise data sources, transform it, and then load it into other cloud or on-premise locations. Azure Data Factory is an orchestration tool that can enables east composition of a modern analytics solution. Here is Microsoft’s definition:

Azure Data Factory is a cloud-based data integration service that allows you to create data driven workflows in the cloud that orchestrate and automate data movement and data transformation.

Below is an illustration that helps to solidify Azure Data Factories place in the process. Data travels through its pipelines which include DataBricks, Blob storage and more.

There are two versions of Azure Data Factory available.

Version 1

  • Time-series based

  • Schedules driven by dataset availability

  • Developed using Visual Studio

  • Good Cloud Data Integration Tool

Version 2

  • General purpose

  • Explicit and Tumbling-window scheduling

  • Awesome Cloud Data Integration Tool

If you are going to use or explore Azure Data Factory then we suggest you start with version two. Version 2 allows us to use Azure data factory in the same fashion as a tradition ELT tool. It can fulfill organizational Data Warehouse needs though drag and drop activities.

Azure QuickStart Templates

Azure QuickStart Templates allow you to deploy Azure resources, including Data Factory, via Azure Resource Manager. These are templates built by Microsoft and their community partners and will help you build your solution faster. You can visit Azure GitHub and find available templates.

Here are some Data Factory templates available to you:

  • Copy data from Azure Blob Storage to Azure SQL Db with Stored

  • Copy data from Azure Blob Storage to Azure SQL

  • Two-Activity Chained Data Factory

  • Create a pipeline to transform data by running Hive

  • Azure Data Factory to copy data from Salesforce to Azure

  • Copy data from one folder to another folder in an Azure Blob

  • Copy data from on-premises SQL Server to Azure Blob

  • Provision Azure SSIS integration

  • Copy data from one folder to another folder in an Azure Blob Storage 

Creating Data Factory Pipeline

As you can see, there are plenty of applications for Data Factory. So lets see one!

We are going to go through an example of moving our data from Amazon S3 to an Azure Database using Azure Data Factory. This will demonstrate the power and convenience of Data Factory. The process we will go through is very common and we are hoping that it will help you in your next data migration.

The architecture we are aiming to accomplish will look like the following:

Its time to get started with Azure Data Factory

Lets walk through the 20 steps below to see how we can create our desired architecture.

Step 1: Create a Data Factory

The first step is to login to the Azure portal https://portal.azure.com and then navigate the Azure Data Factory service under Analytics:

Clicking Add to create a new Data Factory. Once you are prompted, fill the form and click Create.

Azure will be immediately start to deploy Microsoft.DataFactory-MyfirstDataLoad to a resource group MyFirstDataLoad. You can learn more about Azure Resource Group here.

Step 2: Create Pipelines

Now that we have created a Data Factory, we will will begin to edit. Navigate to our Data Factory resource group titled MyFirstDataLoad. Click on Author&Monitor:

This will take us to the Azure Data Factory where we can create data pipelines. This page can teach us lots about Data Factory.

In the Azure Data Factory page click on Author

Step 3: Extract

Our next step is to use Copy Data activity in order to load data from Amazon S3 into the Azure SQL Database. Expand “+” and choose Copy Data:

It will ask us to specify a connection for the source and for the destination. Moreover, it will read the files and identify the schema.

I am going to specify Amazon S3 as a source, because it has files that I would like to upload into the Azure SQL database.

Here we are going to give our task a name and a description. Click Next.

Step 4: Choose Data Source

Next we will choose Amazon S3 as a source. (If you don’t have AWS account, you can get it for free and then you can create S3 bucket and an IAM user with access to this bucket. Your IAM user will provide a Access ID and Key for AWS access.)

Amazon S3 is very similar to Azure Blob storage. We will use Data Factory to copy files into our Azure Blob storage. This approach provides many advantages. A large portion of this is because we can run commands from SQL console which inserts data straight from our Blob right into a table without data factory. From an engineering perspective it is far better to load data into our SQL database from our own Blob storage than from external sources. There is more information about this approach here.

As we proceed we will be prompted to choose the files that we want from S3.

I chose allevents_pipe.txt and click Next. Data Factory will read the sample data and define the schema:

Step 6: Create SQL Database

After spending a couple of minutes in Azure we have a new (Free for 60 days) SQL database. Next we will create our destination tables.

Let’s login to our database and create our tables. Using Azure portal, we can open Query Editor and execute the required commands to create our table structure.

In the end our table structure is ready to load our data into.

Step 5: Define Load Location

Now we will specify the target. Remember our goal is to load this into the Azure SQL Database. We don’t have a database for this yet, so we are going to create a new one. This is one reason why we love clouds, addition resources can be created on the fly!

We should jump back to the Azure Portal and create new SQL database:

Step 7: Back to Data Factory

Now back to our Data Factory Job. We will finish the assigning our destination. Lets choose Azure SQL database and then enter the required information for our new database.

Step 8: Column Mapping

Next step is to set our Column Mapping.

Click next and check your column mapping:

We can leave all other settings at default. Run the job now!

Step 9: Monitor Our Run

We are going click Monitor in order to verify that our job ran correctly.

Additionally, heading back to the Azure Portal, we can view data regarding our pipeline. We can see usage data and additional logs. You can find information here about these monitoring options.

Step 10: Load The Rest Of Our Tables

Let’s load the rest of our S3 files into our SQL database. We don’t need to create a new ADF job since this is one time load. We can clone an existing job and run it for each pair file<->table.

When we clone the job, we can fill the properties in the tab. Moreover, we can put all Copy command in one tab and execute them all together. Be careful, ADF won’t truncate any tables before load. If you are loading into the Staging tables you might get duplicates. You should use Stored Procedure and execute it before Copy command or you can leverage SSIS packages in order to do work for you.


Connecting Power BI with Azure SQL Database

The that we’ve got our data in our Azure SQL Database, we can dig into it and gain valuable insights. We can either utilize SQL, or we could use the powerful self-service BI tool – Power BI. You can download free version of Power BI desktop here.

We are going use PowerBI desktop.

Step 11: Allow Access

Before we actually can hit the database from PowerBI desktop, we must open the firewall. In the Azure Portal add our IP into Firewall Settings:

We do this by clicking +Add Client IP. Once this is done we can connect using our database credentials:

We do this in PowerBI by clicking Get Data and choosing this the Azure SQL database option.

Takeaways from this blog post

Every solution depends on many factors such as scope, timeline, compliance and etc. Azure Data Factory is a flexible tool that can conform to many different scenarios. We hope that this demonstration helps to solidify the process for you. Azure Data Factory gives us huge advantage and enables us to orchestrate and compose organizational data from external sources.

PS If you want to get these file or need help with Azure Data Factory, Azure SQL Database or Power BI, let us know!


Our Latest Blogs

See this gallery in the original post

About FreshBI

Based in USA and Canada, we have helped hundreds of businesses achieve excellence & success through business intelligence.

Power BI runs in our blood, and we are ready to take your business to next level.

See this social icon list in the original post