Forum Discussion

Rubio67's avatar
Rubio67
Copper Contributor
Aug 26, 2024

Aggregation of enterprise data and exporting large datasets to third parties

Assume a large organization with multiple applications/systems that may or may not be connected. All systems are currently on-prem. There are  requirements to aggregate data from various sources (internal databases like DB2, MariaDB, PostgreSQL), export data to large data files (currently mostly XML) and send them to third parties in a secure fashion (currently SFTP). The legacy system responsible for doing this is at the end of its life.

 

If I wanted to replace the legacy system with a cloud solution,

 

1. What kind a data store would be best, a data lake (or some other HDFS-based storage), a data warehouse (Stretch database?), CosmosDB, or something else?

2. What options are there for transfering data from on-prem OLTP databases to the cloud storage? I would prefer to avoid hard-to-maintain ETL-processes. Some kind of change feed would be preferred.

3. What options do I have for sharing the data files with third party partners from Azure storage? The partners don't necessarily have an Azure subscription so Azure Data Share isn't always an option?

  • Rubio67 

     

    1) would be Data Lake, Data Warehouse or Cosmos DB

    2) would be Azure Database Migration Service, Azure Data Factory or Event Streaming

    3) would be Azure Blob Storage, Azure Files or SFTP Gateway

    • jitendra's avatar
      jitendra
      Copper Contributor

      Rubio67 

      check this.....

      1. Data Storage: Use Azure Data Lake Storage Gen2 for storing large datasets. Optionally, use Azure Synapse Analytics for complex queries and analytics.

      2. Data Transfer: Implement Azure Data Factory with Change Data Capture (CDC) for ongoing data transfer from on-premises databases. Use Azure Database Migration Service for initial migrations.

      3. Secure Data Sharing: Share data with third parties via Azure Blob Storage using Shared Access Signatures (SAS) for secure, temporary access. For SFTP requirements, configure SFTP on Azure Blob Storage.

      This solution provides a scalable and secure way to replace the legacy system with Azure cloud services.

  • Rubio67
    - Azure Data Lake Storage for aggregating large datasets from various sources.
    - Azure Data Factory with Change Data Capture or Incremental Loads to transfer data from on-prem systems to the cloud.
    - Azure Blob Storage with SAS tokens or Azure SFTP for securely sharing data with third parties.

Resources