Computer Vision API - OCR bounding boxes

David Enright · ‎May 21 2017

I'm building an API for a customer than leverages computer vision to analyse images. I am trying to get it to analyse handwriting on the white board.

When I upload my test image to my API, the JSON response from the computer vision API seems to have the lines all jumped up. For example, the 8th line of text is coming up as the first line in the array, etc.

When I upload the image into the Microsoft computer vision website, the JavaScript applet returns the lines in the correct order.

I am assuming the line order can be inferred somehow using the bounding boxe coordinates, but I am struggling to find the pattern.

The documentation seems to indicate the bounding box are x-y coordinates, but are they XXXX YYYY, or XY XY XY XY?

Any ideas how I can get the line order correctly from the computer vision JSON response?

David Enright · ‎May 24 2017

I worked it out.

The API gives back coordinates based on XY,XY,XY,XY,XY,XY,XY,XY but it sorts the lines based on the first X coordinate, not the first Y coordinate.

So for example:

Line 1: 179, 73, 767, 60, 770, 145, 181, 158
Line 2: 214, 257, 1328, 219, 1331, 306, 217, 344
Line 3: 185,345,1298,350,1297, 444, 184, 438
Line 9: 29, 1099, 1396, 1162, 1391, 1281, 24,1218

The vision API however is returning line 9 first, because it's sorting by the first X coordinate. In reality though we read from top to bottom (Y not X) so it should be sorting by the first Y.

Is there anywhere I can leave feedback for Microsoft to look at this?

Jake Dan Attis · ‎Jul 10 2017

You could start by adding it to UserVoice! https://cognitive.uservoice.com/forums/430309-computer-vision

dakesh · ‎Jun 06 2019

@Jake Dan Attis

I am using https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/read/core/asyncBatchAnalyze Does boundingbox gives { X top left, Y top left , X top right , Y top right, X bottom right , Y bottom right , X bottom left , Y bottom left } in response ? Need to find x,y,height and width please suggest

David Enright · ‎May 24 2017

I worked it out.

The API gives back coordinates based on XY,XY,XY,XY,XY,XY,XY,XY but it sorts the lines based on the first X coordinate, not the first Y coordinate.

So for example:

Line 1: 179, 73, 767, 60, 770, 145, 181, 158
Line 2: 214, 257, 1328, 219, 1331, 306, 217, 344
Line 3: 185,345,1298,350,1297, 444, 184, 438
Line 9: 29, 1099, 1396, 1162, 1391, 1281, 24,1218

The vision API however is returning line 9 first, because it's sorting by the first X coordinate. In reality though we read from top to bottom (Y not X) so it should be sorting by the first Y.

Is there anywhere I can leave feedback for Microsoft to look at this?

View solution in original post

Computer Vision API - OCR bounding boxes

Computer Vision API - OCR bounding boxes

Re: Computer Vision API - OCR bounding boxes

Re: Computer Vision API - OCR bounding boxes

Re: Computer Vision API - OCR bounding boxes

Re: Computer Vision API - OCR bounding boxes

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Computer Vision API - OCR bounding boxes