azure data lake storage
2 TopicsUnable to load large delta table in azure ml studio
I am writing to report an issue that I am currently experiencing while trying to read a delta table from Azure ML. I have already created data assets to register the delta table, which is located at an ADLS location. However, when attempting to load the data, I have noticed that for large data sizes it is taking an exceedingly long time to load. I have confirmed that for small data sizes, the data is returned within few seconds, which leads me to believe that there may be an issue with the scalability of the data loading process. I would greatly appreciate it if you could investigate this issue and provide me with any recommendations or solutions to resolve this issue. I can provide additional details such as the size of the data, the steps I am taking to load the data, and any error messages if required. I'm following this document: https://learn.microsoft.com/en-us/python/api/mltable/mltable.mltable?view=azure-ml-py#mltable-mltable-from-delta-lake Using this command to read delta table using data asset URI from mltable import from_delta_lake mltable_ts = from_delta_lake(delta_table_uri=<DATA ASSET URI>, timestamp_as_of="2999-08-26T00:00:00Z", include_path_column=True)586Views0likes0CommentsAzure Function with Blob Output Binding returning 404 on GetProperties check before writing the Blob
Hi. This question is similar: https://stackoverflow.com/questions/64546302/how-to-disable-blob-existence-check-in-azure-function-output-binding But I'm wondering if there are other answers or comments out there, and more recent. I have an Azure Function with an HTTP Trigger input binding and a Blob Storage output binding. For every execution, the output binding looks like it tries to get Blob Properties first, resulting in a 404. Quite rightly, as the data to be written is going to a new Blob. But this will always fail and in this case is redundant. It takes time to go through these steps - admittedly milliseconds, but still. Presumably it's also logging somewhere, so that's a storage cost - might be negligible now, but something to not be ignored. I'm not 100% sure where that logging would be stored, either, to go and manage it. The positive is that the overall function execution is fine. But it's still recording all these failures, and we're getting 10s of thousands through it a day. Is there a way to use the concise output binding code but not do this prior if-exists-get-properties check? My options seem to be live with it, or rewrite to use BlobContainer, BlobClient and so on instead of the Blob attribute output binding. Anyone got some clever ideas?2KViews0likes1Comment