Forum Discussion

David Enright's avatar
David Enright
Copper Contributor
May 22, 2017

Computer Vision API - OCR bounding boxes

I'm building an API for a customer than leverages computer vision to analyse images. I am trying to get it to analyse handwriting on the white board.

 

When I upload my test image to my API, the JSON response from the computer vision API seems to have the lines all jumped up. For example, the 8th line of text is coming up as the first line in the array, etc.

When I upload the image into the Microsoft computer vision website, the JavaScript applet returns the lines in the correct order.

 

I am assuming the line order can be inferred somehow using the bounding boxe coordinates, but I am struggling to find the pattern.

 

The documentation seems to indicate the bounding box are x-y coordinates, but are they XXXX YYYY, or XY XY XY XY? 

Any ideas how I can get the line order correctly from the computer vision JSON response?

  • I worked it out.

     

    The API gives back coordinates based on XY,XY,XY,XY,XY,XY,XY,XY but it sorts the lines based on the first X coordinate, not the first Y coordinate.

     

    So for example:

     

    Line 1: 179, 73, 767, 60, 770, 145, 181, 158
    Line 2: 214, 257, 1328, 219, 1331, 306, 217, 344
    Line 3: 185,345,1298,350,1297, 444, 184, 438
    Line 9: 29, 1099, 1396, 1162, 1391, 1281, 24,1218

     

    The vision API however is returning line 9 first, because it's sorting by the first X coordinate. In reality though we read from top to bottom (Y not X) so it should be sorting by the first Y.

     

    Is there anywhere I can leave feedback for Microsoft to look at this?

  • David Enright's avatar
    David Enright
    Copper Contributor

    I worked it out.

     

    The API gives back coordinates based on XY,XY,XY,XY,XY,XY,XY,XY but it sorts the lines based on the first X coordinate, not the first Y coordinate.

     

    So for example:

     

    Line 1: 179, 73, 767, 60, 770, 145, 181, 158
    Line 2: 214, 257, 1328, 219, 1331, 306, 217, 344
    Line 3: 185,345,1298,350,1297, 444, 184, 438
    Line 9: 29, 1099, 1396, 1162, 1391, 1281, 24,1218

     

    The vision API however is returning line 9 first, because it's sorting by the first X coordinate. In reality though we read from top to bottom (Y not X) so it should be sorting by the first Y.

     

    Is there anywhere I can leave feedback for Microsoft to look at this?

Resources