Solving a sudoku is all about putting digits between 1 and 9 into a square, 9×9 grid, subdivided into 9 boxes. But, the value of a cell cannot be repeated among any of its peers.

In today’s blog, I’ll explain how to create a sudoku solver using OpenCV. The steps are very simple, take the sudoku image, solve it and project back the solution to the original image, that’s all!!

We will be dividing the whole process into 3 steps:

- Extract the sudoku from the image
- Extract each number present in the image
- Compute the solution of the sudoku using algorithm

# Extracting Sudoku from the image

We’ll first pre-process the image followed by cropping and warping it.

### Step-1: Pre-Processing the image

Firstly, we apply Gaussian blur with a kernel size of 9 to the image. Note that, kernel sizes must be odd, positive and square. Then, we use adaptive threshold using 11 nearest neighbor pixels.

proc = cv2.GaussianBlur(img.copy(), (9, 9), 0) proc = cv2.adaptiveThreshold(proc, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2) # Invert colours, so gridlines have non-zero pixel values. # Necessary to dilate the image, otherwise will look like erosion instead. proc = cv2.bitwise_not(proc, proc) kernel = np.array([[0., 1., 0.], [1., 1., 1.], [0., 1., 0.]],np.uint8) proc = cv2.dilate(proc, kernel)

### Step-2: Find the corners of the largest polygon

Moving ahead, we’ll find the 4 extreme corners of the largest contour in the image. We’ll find all the contours, sort by area in descending order and pick the one with the largest area.

contours, h = cv2.findContours(img.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) contours = sorted(contours, key=cv2.contourArea, reverse=True) polygon = contours[0] bottom_right, _ = max(enumerate([pt[0][0] + pt[0][1] for pt in polygon]), key=operator.itemgetter(1)) top_left, _ = min(enumerate([pt[0][0] + pt[0][1] for pt in polygon]), key=operator.itemgetter(1)) bottom_left, _ = min(enumerate([pt[0][0] - pt[0][1] for pt in polygon]), key=operator.itemgetter(1)) top_right, _ = max(enumerate([pt[0][0] - pt[0][1] for pt in polygon]), key=operator.itemgetter(1)) corners = [polygon[top_left][0], polygon[top_right][0], polygon[bottom_right][0], polygon[bottom_left][0]]

### Step-3: Crop and Warp Image

Next, we will crop and warp a rectangular section from an image into a square of similar size. The rectangle will be described by the top left, top right, bottom right and bottom left points.

We’ll describe a square with side of the calculated length, which is the new perspective we want to warp to. Next thing is to get the transformation matrix for skewing the image to fit a square by comparing the 4 before and after points. In the end, we will perform the transformation on the original image.

top_left, top_right, bottom_right, bottom_left = crop_rect[0], crop_rect[1], crop_rect[2], crop_rect[3] src = np.array([top_left, top_right, bottom_right, bottom_left], dtype='float32') side = max([ distance_between(bottom_right, top_right), distance_between(top_left, bottom_left), distance_between(bottom_right, bottom_left), distance_between(top_left, top_right) ]) dst = np.array([[0, 0], [side - 1, 0], [side - 1, side - 1], [0, side - 1]], dtype='float32') m = cv2.getPerspectiveTransform(src, dst) cropped = cv2.warpPerspective(img, m, (int(side), int(side)))

### Step-4: Infer grid from the square image

squares = [] side = img.shape[:1] side = side[0] / 9 for j in range(9): for i in range(9): p1 = (i * side, j * side) #Top left corner of a box p2 = ((i + 1) * side, (j + 1) * side) #Bottom right corner squares.append((p1, p2)) return squares

Here, we are inferring 81 cell grid from the square image. We swapped j and i, so the rectangles are stored in the list reading left-right instead of top-down.

### Step-5: Extract each digit

def extract_digit(img, rect, size): digit = cut_from_rect(img, rect) h, w = digit.shape[:2] margin = int(np.mean([h, w]) / 2.5) _, bbox, seed = find_largest_feature(digit, [margin, margin], [w - margin, h - margin]) digit = cut_from_rect(digit, bbox) w = bbox[1][0] - bbox[0][0] h = bbox[1][1] - bbox[0][1] if w > 0 and h > 0 and (w * h) > 100 and len(digit) > 0: return scale_and_centre(digit, size, 4) else: return np.zeros((size, size), np.uint8) digits = [] img = pre_process_image(img.copy(), skip_dilate=True) for square in squares: digits.append(extract_digit(img, square, size))

**extract_digit** is a function which extracts a digit (if exists) from a Sudoku square. It gets the digital box from the whole square, use the fill feature finding to get the largest feature in the middle of the box. We would expect to find a pixel belonging to the digit in the margin which is used to define an area in the middle. Next, we will scale and pad the digit so that it fits a square of the digit size we’re using for machine learning. Also, we have to ignore any small bounding boxes.

Now, we have the final pre-processed image of Sudoku. Now, we need to extract each digit from the image, store it in a matrix and calculate the solution of Sudoku by using backtracking algorithm.

## Extracting numbers present in the image

In next step, we have to extract each number from the image, identify the number and save it into a 2D matrix.

For digit recognition, we will be using a trained neural network over MNIST dataset containing 60,000 images of digits from 0 to 9. You can have a glimpse of my blog on classifying digits using neural network here.

We will load the pre-trained model and weights.

json_file = open('model.json', 'r') loaded_model_json = json_file.read() json_file.close() loaded_model = model_from_json(loaded_model_json) loaded_model.load_weights("model.h5")

Now, we’ll resize the image and split the image into 9×9 small images. Each small image will be of a digit from 1- 9. **identify_number **function takes an image of digit and predicts the digit in the image.

def identify_number(image): image_resize = cv2.resize(image, (28,28)) # For plt.imshow image_resize_2 = image_resize.reshape(1,1,28,28) # For input to model.predict_classes # cv2.imshow('number', image_test_1) loaded_model_pred = loaded_model.predict_classes(image_resize_2 , verbose = 0) return loaded_model_pred[0] sudoku = cv2.resize(sudoku, (450,450)) grid = np.zeros([9,9]) for i in range(9): for j in range(9): image = sudoku[i*50:(i+1)*50,j*50:(j+1)*50] if image.sum() > 25000: grid[i][j] = identify_number(image) else: grid[i][j] = 0 grid = grid.astype(int)

## Compute the solution of the Sudoku using Backtracking Algorithm

We will use Backtracking Algorithm to compute the solution of the Sudoku.

We’ll start by searching the grid to find an entry that is still unassigned. If found, the reference parameters (row, col) will be set to the location that is unassigned, and true will be returned. If no unassigned entries remain, false is returned. ‘l’ is a list variable that has been passed from the solve_sudoku function to keep track of incrementation of rows and columns.

def find_empty_location(arr,l): for row in range(9): for col in range(9): if(arr[row][col]==0): l[0]=row l[1]=col return True return False

**used_in_row** function checks whether any assigned entry in the specified row matches the given number.

def used_in_row(arr,row,num): for i in range(9): if(arr[row][i] == num): return True return False

Function **used_in_col** checks whether any assigned entry in the specified column matches the given number.

def used_in_col(arr,col,num): for i in range(9): if(arr[i][col] == num): return True return False

The **used_in_box** function returns a boolean which indicates whether any assigned entry within the specified 3×3 box matches the given number.

def used_in_box(arr,row,col,num): for i in range(3): for j in range(3): if(arr[i+row][j+col] == num): return True return False

The function **check_location_is_safe** checks whether it will be legal to assign num to the given (row, col) location. Check if ‘num’ is not already placed in the current row, current column and current 3×3 box.

def check_location_is_safe(arr,row,col,num): return not used_in_row(arr,row,num) and not used_in_col(arr,col,num) and not used_in_box(arr,row - row%3,col - col%3,num)

The function **solve_sudoku** takes a partially filled-in grid as argument and attempts to assign values to all unassigned locations in a way that meets the requirements for Sudoku solution (non-duplication across rows, columns, and boxes). ‘l’ is a list variable that keeps the record of row and col in find_empty_location function.

The function **print_grid** helps in printing the solved Sudoku grid.

def solve_sudoku(arr): l=[0,0] if(not find_empty_location(arr,l)): return True row=l[0] col=l[1] for num in range(1,10): if(check_location_is_safe(arr,row,col,num)): arr[row][col]=num if(solve_sudoku(arr)): return True # failure, unmake & try again arr[row][col] = 0 return False def print_grid(arr): for i in range(9): for j in range(9): print (arr[i][j]) print ('\n')

Lastly, we will call the below function to solve the sudoku.

def sudoku_solver(grid): if(solve_sudoku(grid)): print('---') else: print ("No solution exists") grid = grid.astype(int) return grid

### Implementation

# Conclusion

So, we finally managed to solve a Sudoku grid from the image input. The goal, of course, was to ramp up on some new technologies and the project has been valuable from that perspective. The source code is available on Github. I have used Aakash Jhawar code for reference.

*Thanks!*

–*Krishna Pal Deora*

## 0 Comments