Forum Discussion
fil_b1
Jun 10, 2022Copper Contributor
Data warehouse/Pipeline best practice
We currently have a number of PowerBi reports attached to disparate datasets that pull data from APIs and are looking to change this to a cloud hosted centralized data warehouse (less than 50GB) that each report is connected to. We also have the requirement of being able to query the dataset programmatically (preferably through python) for ML. It seems like Microsoft offer a few different solutions and I was hoping that I could get some advice on which option would suit our needs best.
- Option 1a: Create a data factory pipeline that extracts and transforms the API data and loads it into a Azure SQL database which is then connected to powerBi.
- Option 1b: Create a data factory pipeline that extracts the API data and loads it into a staging SQL database followed by another data factory instance to transform the data and pass it to a second SQL database which is connected to the powerBi reports.
- Option 2: Use Azure Synapse for the ETL layer and for the Storage
- Option 3: Use PowerBi Datamart for the ETL layer and for the storage. (This is probably our preferred option as we have a lot of powerBi experience in house and we like the idea of storing measures globally however we are a little unsure about how easy it will be to query the data programmatically for ML in the datamart)
We are of course open to other suggestions too!
No RepliesBe the first to reply