Bug in Data Lake Analytics USQL Extractors Text Function

%3CLINGO-SUB%20id%3D%22lingo-sub-769200%22%20slang%3D%22en-US%22%3EBug%20in%20Data%20Lake%20Analytics%20USQL%20Extractors%20Text%20Function%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-769200%22%20slang%3D%22en-US%22%3E%3CP%3EHello%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20believe%20I%20have%20found%20a%20bug%20in%20the%20USQL%26nbsp%3Bextractors.text%20and%20wanted%20to%20check%20if%20anyone%20else%20is%20facing%20a%20similar%20issue.%3C%2FP%3E%3CP%3EFYI%20-%20This%20was%20running%20fine%20before%2016th%20July.%20It%20has%20run%20fine%20for%20%26gt%3B%208%20months.%20The%20bug%20appears%20to%20have%20got%20introduced%20since%2017th%20July.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThe%20process%20is%20a%20simple%20script%20(to%20reproduce%20the%20issue)%20with%206%20lines%20of%20code%20(attached%20image).%3C%2FP%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20style%3D%22width%3A%20400px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Fgxcuf89792.i.lithium.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F124165iB8166E2817794C85%2Fimage-size%2Fmedium%3Fv%3D1.0%26amp%3Bpx%3D400%22%20alt%3D%22USQL_Code.JPG%22%20title%3D%22USQL_Code.JPG%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3CP%3EInput%20file%20should%20be%20a%20fie%20of%20around%20%26gt%3B%208.3%20GB%20in%20size%20containing%20millions%20of%20record.%20My%20file%20is%20%3CSTRONG%3E8.3%20GB%3C%2FSTRONG%3E%26nbsp%3Band%20contains%20%3CSTRONG%3E37.5%20mil%3C%2FSTRONG%3Erecords.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWhen%20the%20process%20is%20run%2C%20the%20very%20first%20step%20of%20the%20Job%20Graph%20(that%20shows%20how%20many%20records%20are%20read)%20shows%20it%20has%20read%20%3CSTRONG%3E38.8%20mil%3C%2FSTRONG%3Erecords%20%3CSTRONG%3Einstead%20of%2037.5%20mil%3C%2FSTRONG%3Erecords.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20style%3D%22width%3A%20400px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Fgxcuf89792.i.lithium.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F124166iFCF6381EFDA2E54B%2Fimage-size%2Fmedium%3Fv%3D1.0%26amp%3Bpx%3D400%22%20alt%3D%22USQL_Issue_Forum.JPG%22%20title%3D%22USQL_Issue_Forum.JPG%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAfter%20checking%20the%20output%20file%2C%20I%20found%20that%20the%20process%20had%20read%20only%20the%20first%204GB%20of%20the%20input%20file%2C%26nbsp%3B%20doubled%20the%20records%20up%20(in%20some%20cases%20duplicated%203%20times%20over)%20in%20the%20output%20by%20creating%20duplicates.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20have%20raised%20this%20with%20Microsoft%20but%20they%20are%20not%20acknowledging%20the%20problem%20yet%20(It's%20been%205%20days).%20I%20just%20wanted%20to%20know%20if%20others%20are%20seeing%20similar%20issue.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ERegards%3C%2FP%3E%3CP%3ELohith%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-769200%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Highlighted
New Contributor

Hello,

 

I believe I have found a bug in the USQL extractors.text and wanted to check if anyone else is facing a similar issue.

FYI - This was running fine before 16th July. It has run fine for > 8 months. The bug appears to have got introduced since 17th July.

 

The process is a simple script (to reproduce the issue) with 6 lines of code (attached image).

USQL_Code.JPG

Input file should be a fie of around > 8.3 GB in size containing millions of record. My file is 8.3 GB and contains 37.5 mil records.

 

When the process is run, the very first step of the Job Graph (that shows how many records are read) shows it has read 38.8 mil records instead of 37.5 mil records.

 

USQL_Issue_Forum.JPG

 

After checking the output file, I found that the process had read only the first 4GB of the input file,  doubled the records up (in some cases duplicated 3 times over) in the output by creating duplicates.

 

I have raised this with Microsoft but they are not acknowledging the problem yet (It's been 5 days). I just wanted to know if others are seeing similar issue.

 

Regards

Lohith

0 Replies