Setting up a new Data Factory

Copper Contributor

Hi all,

I'm not a data expert/novice, but I do love data. I'm looking to up my companies game when it comes to ingesting and storing data from our various sources. We currently have a few on prem databases for applications we run, MSSQL and mySQL. But we have tons of software as a services, all with their own REST APIs. I have a hodgepodge of Power Automate Flows, CDS DataFlows, manual imports, csv dumps, etc etc, and these could point to a number of places. I do like the Common Data Service Dataflows feature which uses Power/M Query to transform the data and put it in a CDS entity, however I don't like CDS entities for use outside of the M365 ecosystem. My end result I want is no matter where the data lives I can get it into a Azure SQL Database. I stumbled upon Data Factory this week and it seems to fit the bill at a high level and I can do a basic copy data pipeline. My question is, how well is it dealing with REST APIs that are not official integrations? I'm wanting to transform (Data Flows?) the data without having to store that data in a intermediate area. Do data flows really only support "live" transformations from the select few sources? For our cloud infrastructure we are a Azure shop, so I'd like to keep our solution within Azure. Any assistance would be appreciate and if I sound like a complete noob that has no idea about what I'm getting myself into, it's because that is the case :)

1 Reply

Hi, @RamonMA1360 ,

 

Your explanation is a bit confuse, but I will try to answer following some patterns. It doesn't mean they will fit your situation, and in my opinion you need to hire a consultant to help solve your problem, either me or other you may find in the data community.

 

From the beginning, the very first question is: What are you trying to build? In what definition does it fit?

 

Production: Regular daily work of the company

Intelligence: Analytics data, never exactly the same, to help with the business plan

 

If you are trying to build something that fits in the definition of "Production", than you have an architecture problem. The microservices you have should be correct planned in a way to cover the production needs without this need of data integration. They can use each other, of course, but they should have a low coupling and be mostly independent. If your need to join the data they produce is to solve a production problem, this means they are not correctly designed and should be fixed in the first place. 

If what you are building is for data intelligence, than starting from production the first thing you need to build is a data warehouse. It's not a simple data transfer that a copy activity may handle. You need to build a data warehouse model, which is different than the relational model.

 

Fact table and dimension, star model and snowflake, de-normalizing, these are just some of the concepts you may need to use to build the data warehouse model. The data will be imported from the production using an ETL process. You don't have the need to call the API's for the ETL, at least you shouldn't. You can use Data Factory to read directly from each microservices database and transform the data for the data warehouse.

 

If any of the microservices is using unstructured data (should they? SQL or No SQL, usually there is a structure) you may need to build a data lake, import the files into the data lake and use a tool such as Synapse SQL On Demand Pool to read the data and integrate as needed with the data warehouse, what may have many different meanings.

 

So, in summary: Your question was a bit confuse, so I'm exposing some usual patterns. You should try to fit on them, but that doesn't mean your environment will fit precisely on them. As I say, you may need a consultant.

 

Kind Regards,

 

Dennes