# algorithm - Optimizing an R Boggle Solver

Forenote: this is a follow-up question to this one.

I've programmed a Boggle Game Solver in R (see this github page for source code), and find its performance disappointing.

Here's how it works...

# Say we have the following set of letters

bog.letters <- c("t", "e", "n", "s", "d", "a", "i", "o",

"l", "e", "r", "o", "c", "f", "i", "e")

# We get the list of paths (permutations) from a pre-existing list

paths <- paths.by.length[[6]] # 6th element corresponds to 8-element "paths"

dim(paths) # [1] 183472 8

# The following function is the key here,

# mapping the 183,472 combinations to the 16 letters

candidates <- apply(X = paths, MARGIN = 1, FUN = function(x) paste(bog.letters[x], collapse=""))

# The only remaining thing is to intersect the candidate words

# with the actual words from our dictionary

dict.words <- dict.fr\$mot[dict.fr\$taille == 8]

valid.words <- intersect(candidates, dict.words)

Reproducible example for 13-letter words candidates

bog.letters <- c("t", "e", "n", "s", "d", "a", "i", "o", "l", "e", "r", "o", "c", "f", "i", "e")

n.letters <- 13

paths <- structure(list(V1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), V2 = c(2,

2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

2, 2, 2, 2, 2, 2, 2, 2), V3 = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3,

3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3),

V4 = c(4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,

4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), V5 = c(7, 7, 7, 7,

7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7,

7, 7, 7, 7, 7, 7, 7), V6 = c(6, 6, 6, 6, 6, 6, 6, 6, 6, 6,

6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,

6), V7 = c(5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,

5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), V8 = c(9, 9, 9,

9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,

9, 9, 9, 9, 9, 9, 9, 9), V9 = c(10, 10, 10, 10, 10, 10, 10,

10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,

10, 10, 10, 10, 10, 10, 10, 10), V10 = c(11, 11, 11, 11,

11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 13, 13, 13, 13,

13, 13, 13, 14, 14, 14, 14, 14, 14, 14, 14), V11 = c(8, 8,

12, 12, 12, 14, 14, 15, 15, 15, 15, 16, 16, 16, 16, 14, 14,

14, 14, 14, 14, 14, 11, 11, 11, 11, 11, 11, 11, 11), V12 = c(12,

12, 15, 15, 16, 15, 15, 12, 12, 14, 16, 12, 12, 15, 15, 11,

11, 11, 11, 15, 15, 15, 8, 12, 12, 12, 15, 15, 16, 16), V13 = c(15,

16, 14, 16, 15, 12, 16, 8, 16, 13, 12, 8, 15, 12, 14, 8,

12, 15, 16, 11, 12, 16, 12, 8, 15, 16, 12, 16, 12, 15)), .Names = c("V1",

"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11",

"V12", "V13"), row.names = c(NA, 30L), class = "data.frame")

candidates <- apply(X = paths, MARGIN = 1, FUN = function(x) paste(bog.letters[x], collapse=""))

For such a small path list, this is pretty fast. But the actual number of paths for 13-letter words is 2,644,520. So it can take a minute or even more to find all candidates. Using doSNOW, I am able to parrallelize the searches, reducing the total time by a significant amount, but there is a huge drawback to this: when using a normal loop, I can exit/break whenever I reach the point where no more words are found. This is not obvious (impossible?) to do with parrallel processes.

So my question is: can you think of a better function/algorithm for this task? Some websites provide solutions to Boggle game in a matter of seconds... Either they generated all possible letter combinations and stored the results in a database (!), else they clearly use a better algorithm (and probably a compiled language) to achieve those results.

Any ideas?

Using cpp_str_split function from the Rcpp Gallery, running time is now reduced to 3secs for 2644520 paths.

library(stringi)
paths <- data.frame(matrix(sample(1:16, 13*2644520, TRUE), ncol=13))
a1 <- stri_c(bog.letters[t(as.matrix(paths))], collapse="")
candidates <- cpp_str_split(a1, 13)[[1]]

For 2644520 paths, apply approach takes about 80secs on my notebook.