Draft: Sudoku blog post

2025-01-21 01:56:54 +01:00
parent 1173d1c5c4
commit 0fe5cf0883
3 changed files with 166 additions and 0 deletions
--- a/site/content/posts/exploration-fun-and-process-cycles-of-sudoku.md
+++ b/site/content/posts/exploration-fun-and-process-cycles-of-sudoku.md
@ -0,0 +1,166 @@
+---
+title: "Exploration, fun, and process cycles of Sudoku"
+date: 2025-01-20T23:33:06+01:00
+draft: true
+---
+
+# The idea
+I like to play games, or play with puzzles. Even better, if I can automate solving 1-off single puzzles like Sudoku puzzles I can write an algorithm to solve them and I will never have to play them ever again.
+
+[Sudoku puzzles](https://en.wikipedia.org/wiki/Sudoku) have been around for a while and with them [algorithms to solve them](https://en.wikipedia.org/wiki/Sudoku_solving_algorithms). Most of them revolve around throwing random numbers against and see what sticks, intelligent guesswork and finally coming to a solution. I wanted to take a different approach: brute-force all possible solutions, store  them in a database. And when I want to have a sudoku puzzle solved I just query the database and it returns all possible solutions.
+
+This idea has been in the back of my mind for close to a decade and late last year I decided to take a shot at it. I dusted off my trusty Go language skills as I a) wanted to learn the language a bit better and b) wanted to use Go routines to easily (ab)use all my CPU cores in this new quest of mine, and finally c) I am terribad at math, so I am working with the tools I have.
+
+# Lay of the land
+Classic Sudoku puzzles have 9 blocks in a 3x3 grid with each block containing all the digits from 1 to 9, each block consistent out of 3x3 digits. The puzzle setter provides a partially complete grid and its up to you to solve them, which usually have only a single solution.
+
+Example puzzle:
+
+![Example Sudoku Puzzle](/static/Sudoku_Puzzle_by_L2G-20050714_standardized_layout.svg.png)
+
+( _Honestly stolen from Wikipedia._ )
+
+The first step is to come up with all these unique blocks. As these are the puzzle pieces I need to work with. What I did was the following:
+
+1. Iterate from the lowest possible number (`123456789`) to the highest possible number (`987654321`).
+2. Check if all the digits were present once
+3. If the block was valid, then print or store it
+
+This resulted into [this file](https://gitea.ligthert.net/golang/sudoku-funpark/src/branch/trunk/blocks.csv). [Load the file](https://gitea.ligthert.net/golang/sudoku-funpark/src/branch/trunk/solver/blocks.go#L11-L34) into Go as a slice of ints and we can work with that from here on out. (This is faster than adding 363k lines to your source code and keep adding them to a slice, because after 20 minutes of compiling it still wasn't finished and I stopped it. So loading in a CSV was faster)
+
+Inspecting the file it resulted into `362880` possible blocks. It was only later that I noticed that this was the same as `9!` (9 factorial aka `9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1`). It wasn't entirely a surprise that the number 9 returned in a mathy game like Sudoku about 9 digits in a grid of 9 blocks. As far as I can tell this was the last time I encountered something 9 in the maths later on down the line.
+
+# Making it fit.
+The next step was writing code that would make a block like this:
+```
+123
+456
+789
+```
+next to a block like:
+```
+912
+345
+678
+```
+This become a mess as I had to ensure that:
+1. The block was uniq (done!)
+2. Every horizontal line was unique
+3. Every vertical line was unique
+
+The code for this became a lengthy headache, as I need to work with multi-dimensional array and make sure elements 3, 4, and 5 do not mess to much with other. And I ended up with some kind of mapping ensuring that there was no overlap. It was tedious to design, create, test, and very heavy on the processor to properly analyse everything.
+
+It was at this point I had an epiphany and realized that:
+1. All blocks have to be unique
+2. All columns shouldn't have repeating digits
+3. All rows shouldn't have repeating digits.
+
+So, instead of comparing 3x3 grids, why not use rows to populate the puzzle? 
+
+```
+123456789
+912345678
+...etc...
+```
+
+There are benefits to this approach:
+1. It would not impact the end result as 3 unique lines that do not violate the constraints of the puzzle will result in 3 unique and valid blocks in a 1x3 row.
+2. It is easier to compare rows than 3x3 digits and its adjecent 3x3 digits.
+3. It saves on precious process cycles, which ultimately would speed up the entire process.
+
+
+# Generate a (costly) solution
+With this piece of the puzzle, it was time to put this into practice and see what I would get out of it.
+
+I found some random and easy Sudoku puzzle and put this into my code:
+```
+	row1 := "769104802"
+	row2 := "154800060"
+	row3 := "002700150"
+	row4 := "600900308"
+	row5 := "045328670"
+	row6 := "328670945"
+	row7 := "597410280"
+	row8 := "006283090"
+	row9 := "200590006"
+```
+I substituted empty entries with a `0`, and used this to find possible substitutions with the remaining numbers.
+
+I will take `row1` as an example for the next bit:
+> row1 := "769104802"
+Replacing the null values with the remaining number I wrote an algorithm that would find all the possible solutions that could work for row one. Missing only two digits this would leave me with two compatible entries:
+> 769134852
+> 769154832
+I put them into a slice and moved on to the next row. And repeated this until all the 9 rows had a slice with possible compatible blocks (row1s, row2s, row3s, etc, etc)
+
+The next step was comparing all the 9 slices, compare every entry, and validate every possible solution. This resulted into [a nesting 9 levels deep](https://gitea.ligthert.net/golang/sudoku-funpark/src/commit/16de7dda97747812eb99ef14088656e5f413b090/solver/processing.go#L45-L71):
+* Iterate through row1 and take an element
+* Iterate through row2 and take an element
+* repeat 7 more times
+* Validate all the 9 different elements.
+
+If it validates, print the solution. It if doesn't, discard and move on.
+
+This worked great. It took a poor single core on my computer only ~2.5 hours to solve a simple Sudoku puzzle.
+
+# Go Routines and Speedbumps
+This was probably the time to utilize Go Routines, a handy way to give a function a task, run it somewhere in the background, and then spawn some more. Ensuring that I use all my CPU cores grinding my poor computer to a halt.
+
+This was my first foray in serious Go routine usage, and I've learnt something. I was fortunate I could exploit an interace for sharing interprocess memory, so I didn't need to resort to using channels (which would add to complexity and speed). I ran the validate step at the end of the 9th level of nesting as a go routine and had roughly thousands of Go routines running at the same time. This reduced the computation time from ~2.5 hours to ~1.5 hours. All things considered, it wasn't bad.
+
+Wanting to increase the brute force performance I tried to run a Go routine at the 8th nesting level, spawning a Go routine for every 8th row slice, that would in turn spawn Go routines for every element in the slice of row9. With the number of simultanious running Go routines in the 100s of thousands, all my cores were at 100%, my desktop was rendered useless, processing time was increased, and overall this was a detrimental approach.
+
+My top wasn't too happy with this:
+
+![Top output](/static/sudokufunparktop.png)
+
+The lesson of this exercise was that I needed to put a brake on the Go routines, and manage this.
+
+
+# Further (possible) optimisation
+At this stage I've been at this for roughly a week and was happy with the intermediate results, but this could be optimized further. I haven't taken the time but intend to work on this in the future. So I would like to explain my thinking and possible solutions.
+
+Comparing rows like the one below are costly:
+```
+123456789
+912345678
+...etc...
+```
+
+You compare every digit with the digit in the lower rows and do this for every possible solution. This is CPU intensive and with that a costly way to validate all possible solutions. To combat this I would like to compare something more abstract, since I already have a slice with blocks I can use the indexes.
+
+I want to compare the slice of blocks with itself, and validate two entries, and store incompatible entries.
+
+1. Take blocks[1]
+2. Validate it with blocks[2]
+3. If it is invalid, store the pair
+4. Repeat step two, but with blocks[3]
+
+Once step 2 has been exhausted, replace step 1 with blocks[2], validate with blocks[3], and work it way through the slice. Keep in mind that if blocks[2] and blocks[3] are incompatible, this also mean that blocks[3] and blocks[2] are incompatible. This would hopefully reduce the time required to process all possible combinations. (Otherwise I would need to make 362880 * 362880 = 131681894400 comparisons)
+
+Once this the set of invalid combinations have established I can iterate through the 9x9! possible combinations (109110688415571316480344899355894085582848000000000).
+
+Why pairs of incompatible indices?
+
+Because, if I render and abstract notation of possible solutions using blocks[] index numbers:
+> 123:345:910:789:684:24:738:182:102
+
+If I know that indices `123` and `910` are not compatible with eachother I can discard this potential solution and move on. It doesn't matter where in the possible solution these indices are place, we know it will never validate.
+
+What I am not sure about, and I doubt if it is more efficient compared to bruteforcing the 9x9! solutions is that comparing possible solutions with invalid pairs may be just as costly, if not more.
+
+Although, once the set of incompatible pairs have been generated, it may be easier to run this on separate machines by giving each machine an index for the first row, let them generate the rest and compare.
+
+🤔 Thinking about this a bit more, the ordering of possible solutions shouldn't really matter, which may speed the process up a bit.
+
+
+# The numbers
+It was somewhere at this stage I started looking into other solutions and numbers:
+* As you have seen earlier, comparing 9 rows of 9! possible solutions will result in 109110688415571316480344899355894085582848000000000 comparisons.
+* [Looking at Wikipedia again](https://en.wikipedia.org/wiki/Mathematics_of_Sudoku) there are 6670903752021072936960 possible solutions for Sudoku
+* This is going to consume a lot of a) time b) energy c) storage
+
+And the latter starts adding up when it comes to storage requiring me to have at least 540 zettabytes to store the solutions as efficiently as possible (a string 81 bytes). Let alone the upfront costs and infra required to hosts such a database.
+
+# Conclusion
+I tried, I bit of more than I could chew, I learned a lot, it was fun. I can sleep well knowning I did my best, challanged myself, and can cross something of my todo list that has been living rent free in the back of my head for the better part of a decade.
--- a/site/content/static/Sudoku_Puzzle_by_L2G-20050714_standardized_layout.svg.png
+++ b/site/content/static/Sudoku_Puzzle_by_L2G-20050714_standardized_layout.svg.png
--- a/site/content/static/sudokufunparktop.png
+++ b/site/content/static/sudokufunparktop.png