SOLVED

Computer Vision API - OCR bounding boxes

%3CLINGO-SUB%20id%3D%22lingo-sub-71774%22%20slang%3D%22en-US%22%3EComputer%20Vision%20API%20-%20OCR%20bounding%20boxes%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-71774%22%20slang%3D%22en-US%22%3E%3CP%3EI'm%20building%20an%20API%20for%20a%20customer%20than%20leverages%20computer%20vision%20to%20analyse%20images.%20I%20am%20trying%20to%20get%20it%20to%20analyse%20handwriting%20on%20the%20white%20board.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EWhen%20I%20upload%20my%20test%20image%20to%20my%20API%2C%20the%20JSON%20response%20from%20the%20computer%20vision%20API%20seems%20to%20have%20the%20lines%20all%20jumped%20up.%20For%20example%2C%20the%208th%20line%20of%20text%20is%20coming%20up%20as%20the%20first%20line%20in%20the%20array%2C%20etc.%3C%2FP%3E%3CP%3EWhen%20I%20upload%20the%20image%20into%20the%20Microsoft%20computer%20vision%20website%2C%20the%20JavaScript%20applet%20returns%20the%20lines%20in%20the%20correct%20order.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EI%20am%20assuming%20the%20line%20order%20can%20be%20inferred%20somehow%20using%20the%20bounding%20boxe%20coordinates%2C%20but%20I%20am%20struggling%20to%20find%20the%20pattern.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThe%20documentation%20seems%20to%20indicate%20the%20bounding%20box%20are%20x-y%20coordinates%2C%20but%20are%20they%20XXXX%20YYYY%2C%20or%20XY%20XY%20XY%20XY%3F%26nbsp%3B%3C%2FP%3E%3CP%3EAny%20ideas%20how%20I%20can%20get%20the%20line%20order%20correctly%20from%20the%20computer%20vision%20JSON%20response%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-71774%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3ECortana%20Intelligence%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-86003%22%20slang%3D%22en-US%22%3ERe%3A%20Computer%20Vision%20API%20-%20OCR%20bounding%20boxes%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-86003%22%20slang%3D%22en-US%22%3E%3CP%3EYou%20could%20start%20by%20adding%20it%20to%20UserVoice!%26nbsp%3B%3CA%20href%3D%22https%3A%2F%2Fcognitive.uservoice.com%2Fforums%2F430309-computer-vision%22%20target%3D%22_blank%22%20rel%3D%22noopener%20noreferrer%22%3Ehttps%3A%2F%2Fcognitive.uservoice.com%2Fforums%2F430309-computer-vision%3C%2FA%3E%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-72867%22%20slang%3D%22en-US%22%3ERe%3A%20Computer%20Vision%20API%20-%20OCR%20bounding%20boxes%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-72867%22%20slang%3D%22en-US%22%3E%3CP%3EI%20worked%20it%20out.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThe%20API%20gives%20back%20coordinates%20based%20on%20XY%2CXY%2CXY%2CXY%2CXY%2CXY%2CXY%2CXY%20but%20it%20sorts%20the%20lines%20based%20on%20the%20first%20X%20coordinate%2C%20not%20the%20first%20Y%20coordinate.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ESo%20for%20example%3A%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%3CSPAN%3ELine%201%3A%26nbsp%3B%3C%2FSPAN%3E179%2C%2073%2C%20767%2C%2060%2C%20770%2C%20145%2C%20181%2C%20158%3CBR%20%2F%3ELine%202%3A%20214%2C%20257%2C%201328%2C%20219%2C%201331%2C%20306%2C%20217%2C%20344%3CBR%20%2F%3ELine%203%3A%20185%2C345%2C1298%2C350%2C1297%2C%20444%2C%20184%2C%20438%3CBR%20%2F%3ELine%209%3A%2029%2C%201099%2C%201396%2C%201162%2C%201391%2C%201281%2C%2024%2C1218%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EThe%20vision%20API%20however%20is%20returning%20line%209%20first%2C%20because%20it's%20sorting%20by%20the%20first%20X%20coordinate.%20In%20reality%20though%20we%20read%20from%20top%20to%20bottom%20(Y%20not%20X)%20so%20it%20should%20be%20sorting%20by%20the%20first%20Y.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIs%20there%20anywhere%20I%20can%20leave%20feedback%20for%20Microsoft%20to%20look%20at%20this%3F%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-673705%22%20slang%3D%22en-US%22%3ERe%3A%20Computer%20Vision%20API%20-%20OCR%20bounding%20boxes%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-673705%22%20slang%3D%22en-US%22%3E%3CP%3E%3CA%20href%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fuser%2Fviewprofilepage%2Fuser-id%2F6299%22%20target%3D%22_blank%22%3E%40Jake%20Dan%20Attis%3C%2FA%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%26nbsp%3B%3CSPAN%3EI%20am%20using%26nbsp%3B%3C%2FSPAN%3E%3CA%20href%3D%22https%3A%2F%2Fwestcentralus.api.cognitive.microsoft.com%2Fvision%2Fv2.0%2Fread%2Fcore%2FasyncBatchAnalyze%22%20rel%3D%22noopener%20noreferrer%22%20target%3D%22_blank%22%3Ehttps%3A%2F%2Fwestcentralus.api.cognitive.microsoft.com%2Fvision%2Fv2.0%2Fread%2Fcore%2FasyncBatchAnalyze%3C%2FA%3E%3CSPAN%3E%26nbsp%3BDoes%20boundingbox%20gives%20%7B%20X%20top%20left%2C%20Y%20top%20left%20%2C%20X%20top%20right%20%2C%20Y%20top%20right%2C%20X%20bottom%20right%20%2C%20Y%20bottom%20right%20%2C%20X%20bottom%20left%20%2C%20Y%20bottom%20left%20%7D%20in%20response%20%3F%20Need%20to%20find%20x%2Cy%2Cheight%20and%20width%20please%20suggest%3C%2FSPAN%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E
New Contributor

I'm building an API for a customer than leverages computer vision to analyse images. I am trying to get it to analyse handwriting on the white board.

 

When I upload my test image to my API, the JSON response from the computer vision API seems to have the lines all jumped up. For example, the 8th line of text is coming up as the first line in the array, etc.

When I upload the image into the Microsoft computer vision website, the JavaScript applet returns the lines in the correct order.

 

I am assuming the line order can be inferred somehow using the bounding boxe coordinates, but I am struggling to find the pattern.

 

The documentation seems to indicate the bounding box are x-y coordinates, but are they XXXX YYYY, or XY XY XY XY? 

Any ideas how I can get the line order correctly from the computer vision JSON response?

3 Replies
Best Response confirmed by David Enright (New Contributor)
Solution

I worked it out.

 

The API gives back coordinates based on XY,XY,XY,XY,XY,XY,XY,XY but it sorts the lines based on the first X coordinate, not the first Y coordinate.

 

So for example:

 

Line 1: 179, 73, 767, 60, 770, 145, 181, 158
Line 2: 214, 257, 1328, 219, 1331, 306, 217, 344
Line 3: 185,345,1298,350,1297, 444, 184, 438
Line 9: 29, 1099, 1396, 1162, 1391, 1281, 24,1218

 

The vision API however is returning line 9 first, because it's sorting by the first X coordinate. In reality though we read from top to bottom (Y not X) so it should be sorting by the first Y.

 

Is there anywhere I can leave feedback for Microsoft to look at this?

@Jake Dan Attis 

 I am using https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/read/core/asyncBatchAnalyze Does boundingbox gives { X top left, Y top left , X top right , Y top right, X bottom right , Y bottom right , X bottom left , Y bottom left } in response ? Need to find x,y,height and width please suggest