Forum Widgets
Latest Discussions
Data Warehousing using Apache Spark on Azure HDinsight
Hi Team Hope all are safe! This is my first project in Azure and we are looking at developing a DW using Apache Spark on Azure HDinsight. In simple terms we are currently trying to pick files from Share Point and then do transformations using pyspark and then load the data into a Azure Sql db. Can someone help me on the below queries: 1) Can we connect Apache Spark or Pyspark on Azure HDinsight to Share Point to pick files? 2) Can we implement the usual SCD1 or SCD2 logic using pyspark? Thanks in advance!Aishwar04Nov 12, 2024Copper Contributor1.3KViews2likes3CommentsProposal for a new data structure that extremely reduces data sizes for data in which two item types
I propose a new data structure that reduces data sizes for data in which two item types have many-to-many relations. The proposed data structure newly introduces container variables related to many values of both items, and these container variables record many-to-many relations between them. The proposed data structure maintains data normalization and integrity and is independent of indexing methods conventionally used for relational databases, allowing simultaneous use of both. When one item type has N items and the other item type has M items and all of N is related to all of M, the conventional RDB requests N×M rows, whereas the proposed data structure requests N+M rows. When N=100,000 and M=10,000,000, the conventional RDB requests 1,000,000,000,000 rows, whereas the proposed data structure requests only 10,100,000 rows. In detail, please show the journal article linked by https://www.iaiai.org/journals/index.php/IEE/article/view/589 , or, please show the US Patent No.11294961. In the patent, upper item or item group index is used as anther name of container variable.KuwabaraTSep 20, 2024Copper Contributor2.3KViews1like8CommentsAccessing data from an outsourced third party service company
Hi , Can someone please throw some light on this ? We have some services outsourced to a third-party company but, we realised the data which goes into their process is very valuable and want to access it for different purposes. we are currently downloading certain reports from their web pages and going forward they are planning to provide API end points to give us the data required in the form of multiple pre-defined reports. But, I am just wondering, if there is any other secure and feasible method with which we can get the current state of their entire database filtered for our company, which gets automatically refreshes every few hours? Thank you, KPKP_DataArchNov 16, 2022Copper Contributor435Views0likes0CommentsAzure Internal Load balancer for windows Always-on WSFC name on Azure VM
Hi Team, While configuring SQL Always-on setup on Azure VM's, we configure an Azure Internal Load balancer with a probe port for a virtual IP and configuring and connecting to SQL listener name, by following a standard procedure (adding listener name through cluster manager and running few powershell scripts on cluster to bound probe port and virtual IP). As we do not use windows WSFC name for connecting to Databases, do we need to configure anAzure Internal Load balancer with a probe port for windows Always-on WSFC virtual ip as well? What kind of issues we may face during failover if we have configured internal load balancer for SQL listener virtual IP but not for WSFC virtual IP? Thanks. Best Regards, AnshulAnshulDBAOct 14, 2022Copper Contributor1.2KViews0likes2CommentsClarifying false assertions by Oracle sales about Oracle licensing on Azure constrained VMs
Recently, I received the following question from a customer... How much of a challenge would it be to defend against Oracle's claim that, for a constrained Standard_E96-24ds_v5 VM, we owe them licensing for 96 vCPUs instead of 24 vCPUs? I've been receiving questions of this sort more frequently these days, so I wanted to share advice on dealing with it. Oracle's own documentation on public cloud licensing (HERE) states... For the purposes of licensing Oracle programs in an Authorized Cloud Environment,customers are required to countthe maximumavailablevCPUsof an instancetypeasfollows: Microsoft Azure–count two vCPUs as equivalent to one Oracle Processor license ifmulti-threadingof processor coresis enabled,and one vCPU as equivalent to one Oracle Processorlicense ifmulti-threadingof processor coresis not enabled Please note that the highlighted wordavailablewhichmeansable to be used or obtained; at someone's disposal according to the Oxford dictionary. Azure constrained VMs are explainedHERE, including the following description... Azure offers certain VM sizes where you can constrain the VM vCPU count to reduce the cost of software licensing, while maintaining the same memory, storage, and I/O bandwidth. The vCPU count can beconstrained to one half or one quarter of the original VM size. These new VM sizes have a suffix that specifies the number of active vCPUs to make them easier for you to identify. So, constrained VMs in Azure offer only the memory, storage limits, and I/O bandwidth associated with a VM of a larger number of vCPUs in the name, but the number of vCPUs is the lower number in the name. For example, in the case of the above-mentioned Standard_E96-24ds_v5 VM instance type, the "96" represents the memory, I/O, and network resources normally associated with a 96 vCPU virtual machine, but it does not indicate that 96 vCPUs are available. Only 24 vCPUs are available with this instance type, and that is the count to be used when licensing Oracle. Referring to the guidance from Oracle licensing provided above, these 24 vCPUs,each hyperthreaded by 2, represent 12 CPU cores, so the number of Oracle processor licenses for this VM is 12. As an interesting side note, according to the same Oracle documentation on licensing in public clouds (HERE)... When counting Oracle Processor license requirements in Authorized Cloud Environments, the Oracle Processor Core Factor Table is not applicable. Thus, the popular Oracle Processor Core Factor Table discount is available only on-prem and in Oracle cloud, but not in Azure. This is the basis of another myth by Oracle sales teams suggesting that Oracle database is half as expensive in Oracle cloud than in Azure. It has nothing to do with technology or performance or cost of resources, merely a discount that Oracle has reserved only for themselves. Of course, for basic technical questions such as counting CPUs, there must be an empirical way to prove one way or the other. Oracle is welcome to recommend any Linux or Oracle utility they prefer to count the number of vCPUs presented by a VM, but one good suggestion is the Linux lscpu command. Whatever count is returned by such a utility should determine licensing count, of course. In summary, please beware of Oracle sales personnel attempting to freelance with their own perspectives on licensing. Oracle sales personnel are not the most reliable source of such information, due to the obvious conflict of interest. Oracle'sLicense Management Services(LMS) team provides authoritative decisions on licensing. When anyone spreads misinformation about Oracle licensing, then please click theContact Oracle LMSbutton on theLMS home pageto get the word from the folks who can provide the real answer.TimGormanTechSep 04, 2022Microsoft1.2KViews4likes2CommentsMicrosoft Azure: Routing manufacturing IoT Edge data between on-premise PURDUE model levels via MQTT
Microsoft Azure IoT Hub provides out-of-the-box capabilities to send device-to-cloud messages directly into Azure for advanced logging/routing and generating actions based on events occurring on the edge. However, many customers, for example, in manufacturing domain adoptPurdue Enterprise Reference Architecture (PERA)in their plant IoT implementations. And one of the frequent requirements is to allow Azure IoT hub to send data to their internal MQTT brokers, especially to allow communication between PURDUE's Level 2 (Control Systems) to Level 4 (Business Planning) . However, this scenario is NOT just limited to manufacturing domain. AlthoughAzure IoT Hub itself supports MQTT end-pointsfor direct communication, it doesn't provide out-of-the-box capability to post messages to "customer managed" local MQTT brokers. In fact, Azure IoT product group is working on BYOMB (Bring your own [MQTT] broker), but this may take some time to fully bake this capability into out-of-the-box experience. It is very interesting to note that routing IoT device messages to local eco-systems (on premise) without reaching out to Azure cloud is becoming increasingly popular data architecture patterns in manufacturing and many other industries. Most customers want this capability to generate actions/alerts locally, for example, manufacturingplants wants to send an alert to SCADA (Supervisory Controls And Data Acquisition) / HMI (Human Machine Interface) systems for an immediate actions without making a round trip to Azure Cloud. Provisioning MQTT brokers like eclipse-mosquitto is very common to fulfill this kind of needs, so that single alert can be fanned-out to many subscribedsystems, if necessary, to continuouslyfulfill the need for event driven data architecture for improved decisionand business outcomes. Recently, one of the manufacturing customers was looking to addressing this exact gap in Azure IoT data architecture solutions. While designing the solution the customer wanted to leverage only Azure PaaS (Platform-as-a-service) offerings available on the edge, which makes lot of sense. Hence, the solution was developed using Azure Functions PaaS service which already supports deployment on edge. And we chose Python as a language- the most adopted scripting language in recent days. However, Azure Functions on the edge also supports C#.Net - if you are a .NET shop! The step-by-step instructions & some learnings from the solution we created are already documented here on this GitHubrepository.2.3KViews1like1CommentContainerization and Machine Learning Service
One of my favorite open source tools is Docker. It just makes sense with a lot of the work that I do whether it's executing CLI commandsso I can test experimental features without having to re-install CLI each and every time and then enable those commands. Or, if it's working through labs in Jupyter notebooks where I can build and maintain an environment in which I can run experiments against Azure Machine Learning Services (MLS) by simply changing the confi.json file in my root folder. So how's all this work? Well, let me show you. First, let's start out by connecting to a base image that I've built for the labs using vscode: I can go into the details of building that image at a later date, but at this time, I just want to share some of the flexibility of the image itself. Here's the code that I run. docker run -it -p 10000:8888 thejamesherring/labs:latest this will give me an output that looks like the following: Set username to: jovyan usermod: no changes Granting jovyan sudo access and appending /opt/conda/bin to sudo PATH Executing the command: jupyter lab [I 14:31:37.465 LabApp] JupyterLab extension loaded from /opt/conda/lib/python3.7/site-packages/jupyterlab [I 14:31:37.466 LabApp] JupyterLab application directory is /opt/conda/share/jupyter/lab [I 14:31:37.468 LabApp] Serving notebooks from local directory: /home/jovyan [I 14:31:37.468 LabApp] The Jupyter Notebook is running at: [I 14:31:37.468 LabApp] http://a30c0b11acd3:8888/?token=dc3db6e906dcb0d403bad05640cf492981105fb81ba2eb25 [I 14:31:37.468 LabApp] or http://127.0.0.1:8888/?token=dc3db6e906dcb0d403bad05640cf492981105fb81ba2eb25 [I 14:31:37.468 LabApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 14:31:37.471 LabApp] To access the notebook, open this file in a browser: file:///home/jovyan/.local/share/jupyter/runtime/nbserver-18-open.html Or copy and paste one of these URLs: http://a30c0b11acd3:8888/?token=dc3db6e906dcb0d403bad05640cf492981105fb81ba2eb25 or http://127.0.0.1:8888/?token=dc3db6e906dcb0d403bad05640cf492981105fb81ba2eb25 you'll notice that I re-routed the port 8888 to port 10000 so I'll need to use that when connecting to the jupyter environment along with the ?token= line. so when I connect it will look something like this: http://localhost:10000/?token=dc3db6e906dcb0d403bad05640cf492981105fb81ba2eb25 Which gives me access to my notebooks, now there are a few things I have to be mindful of: am I connected to the correct MLS Environment? are my lab files updated? To configure the first one, I simply go to the root path of my jupyter lab and modify the config.json file that's located there. subscription_id:"<your azure subscription>" resource_group:"<your resource group>" workspace_name:"<your MLS Workspace name>" after completing this task then I can simply open a terminal navigate to the path of my lab and instruction files and issue a git pull which will ensure I have the latest files. All done, I'm able to start writing experiments against my Azure compute targets from a local containerized docker image from wherever I happen to be. I hope you found this information useful and are able to expand upon and share your learnings with other. Best, James HSolved804Views0likes1CommentPermission to write a blog in Data Architecture space
Hi All, I would like to know how to get a permission to write a blog in tech community ?Sagar_LadNov 29, 2021Brass Contributor748Views0likes1CommentExam DP-300: Administering Relational Databases on Microsoft Azure
Hello Guys Please where can i get materials for the above exam?MVPromiseMay 06, 2020Brass Contributor1.8KViews1like1Comment
Resources
Tags
- data architecture1 Topic
- Manufacturing IoT1 Topic
- AZURE AMA1 Topic
- Machine Learning Service1 Topic
- DP-1001 Topic
- docker1 Topic
- learning1 Topic