In the new world of information, the data is valuable and it’s getting more and more volume. We start to have a greater capacity to collect information and each new equipment has the capacity to transmit a variety of data that can be used to control it by distance and set a variety of analyses.
Through the Azure platform, Microsoft has developed services that can receive and handle such a large amount of data, using the Cloud as a flexible repository, which provides backup capabilities and extremely high availability.
With the immense amount of data silos, there is a need to manage and orchestrate the movement of data between them, the Azure Data Factory (ADF) is the ideal tool to be able to trigger data movements through scheduling, something so necessary for most analytical solutions.
Azure Data Factory is, therefore, the Cloud-based ETL platform that allows the creation of data flows originating from different silos and that manages to transform and enrich this data into information for later analysis.
To better understand the process, it is necessary to follow these 4 steps:
Font: Manual DP-200T01 - Implementing an Azure Data Solution
Connect and collect is the first step in creating an ADF process. In this step, we define our data sources, which can be files, databases, web services, among others. The next step will be to move this data to a central location for further processing.
The second step is transformation and enrichment.
Computer services such as Databricks and Machine Learning can be used to provide data transformation to feed production environments, enriching with treated, clean and transformed information, including new data for analysis, or for example, consolidating data for standardization processes used in experiments in Machine Learning.
Publication is the third step after data is refined, it is loaded into Azure Data Warehouse, Azures SQL Database, Azure Cosmos or any analytical engine that the business needs to use in its BI tools.
Finally, the last step in the process is Monitoring. The ADF created a pipeline support that monitors via Azure Monitor, API, PowerShell, Azure Monitor logs and health panels on the Azure portal and that allows you to control the scheduling of activities and pipelines to manage success and failure rates.
Azure Data Factory is just one more piece within the immense world of the Azure platform, but it is certainly one of the most important for obtaining and processing data, creating and helping to create new useful information that answers our business users questions.
by Gonçalo Ricardo
Business Intelligence Consultant @Passio Consulting
Comments