Perspective correction with Qt

I'm developing an OCR app for Sailfish OS called Textractor. There was a clear need for a cropping feature. Cropping itself isn't very useful considering the OCR process but what if we combine perspective correction with it?

I didn't want to add new dependencies (OpenCV) to the project (there are already Tesseract OCR, Leptonica and libexif) so I had to figure this out in pure Qt. Turns out it was actually pretty easy.

Let's say we have a picture of a some target taken in an angle. Text in the target is not straight and in addition it's distorted due to the fact that the camera's optical axis was not perpendicular to the target. Running the OCR process with normal settings (some preprocessing which binarizes the image and fixes skew) will probably result in garbage output since the text is distorted.

This distortion can be fixed with some matrix transformations.

Textractor is an mobile app running on touch screen devices, so we can simply show the image to the user in the UI. After this it's just a matter of dragging the four corner points of the new image to the desired quadrilateral arrangement.

cropping

Next step is to pass the selected points from QML side to C++ side. Let's just assume that we have the corner points at C++ side. Now we can calculate the size of the new image:

QImage img(path);
qreal width = 0;
qreal height = 0;

// Get the lines limiting the new image area
QLineF topLine(points.value("topLeft"), points.value("topRight"));
QLineF bottomLine(points.value("bottomLeft"), points.value("bottomRight"));
QLineF leftLine(points.value("topLeft"), points.value("bottomLeft"));
QLineF rightLine(points.value("topRight"), points.value("bottomRight"));

// Select the longest lines
if(topLine.length() > bottomLine.length()) {
    width = topLine.length();
} else {
    width = bottomLine.length();
}

if(topLine.length() > bottomLine.length()) {
    height = leftLine.length();
} else {
    height = rightLine.length();
}

The code above creates the lines limiting the new image area and selects the longest lines by comparing top line with bottom line and left line with right line. The points variable contains the selected corner points in a QMap (QMap<QString, QPointF>).

With this information we can create the two polygons for the transformation method to get the matrix:

// Create the QPolygonF containing the corner points
// in user specified quadrilateral arrangement
QPolygonF fromPolygon;
fromPolygon << points.value("topLeft");
fromPolygon << points.value("topRight");
fromPolygon << points.value("bottomRight");
fromPolygon << points.value("bottomLeft");

// target polygon
QPolygonF toPolygon;
toPolygon << QPointF(0, 0);
toPolygon << QPointF(width, 0);
toPolygon << QPointF(width, height);
toPolygon << QPointF(0, height);

QTransform transform;
// create the matrix
bool success = QTransform::quadToQuad(fromPolygon, toPolygon, transform);

if (!success) {
    qDebug() << "Could not create the transformation matrix.";
    return;
}

This piece of code creates first fromPolygon which is the user specified quadrilateral selection. Then it creates the target polygon by using the width and height of the new image calculated earlier. Finally, it creates the transformation matrix by passing the polygons created earlier to the QTransform::quadToQuad method.

By applying this matrix to the image we don't yet get the wanted result. The resulting image contains areas which are not part of the area which user selected. We have to calculate the correct area and crop it from the image.

transformed

// The resulting image has to be cropped by transferring the original image 
// top left, top right and bottom left coordinates to the transformed image coordinates.
// After this the crop offset can be calculated.
QPoint tl = transform.map(QPoint(0, 0));
QPoint bl = transform.map(QPoint(0, img.height()));
QPoint tr = transform.map(QPoint(img.width(), 0));

// execute the transform
img = img.transformed(transform);

int x;
int y;

// The points are "outside" the new image area hence the minus sign.
// Select the topLeft point which has largest x and y values.
if(-tl.x() > -bl.x()) {
    x = -tl.x();
} else {
    x = -bl.x();
}

if(-tr.y() > -tl.y()) {
    y = -tr.y();
} else {
    y = -tl.y();
}

// This is the top left coordinate of the crop area.
qDebug() << x << y;

// Finally, copy the area from the transformed image.
img = img.copy(x, y, width, height);
img.save(path, "jpg", 100);

We use the map method of the QTransform to transform the corner points of the original image to the new image coordinate space. The coordinates which we are interested in are negative because they are not in the area of the target image. We get the crop area top left point by
selecting the largest x and y values. Then we simply use Qimage's copy method to copy the final area limited by the top left point and the width and height calculated earlier.

This is the final cropped & preprocessed image:

preprocessed

It should be now pretty easy to combine these pieces into a function. Note that the polygon points have to be most likely in correct order (top left, top right, bottom right, bottom left) to get this working properly.