Reduction of HAMILTONIAN CIRCUIT to Perl Regular Expression Matching

Credit: Mike Rosulek

The HAMILTONIAN CIRCUIT Problem

A Hamiltonian Circuit in a graph G=(V,E) is a cycle that visits every vertex in V exactly once. The Hamiltonian Circuit problem is: given a graph as input, decide if it contain a Hamiltonian Circuit.

Transformation to Regex Match

Let's represent the vertices of the graph as numbers between 1 and $V. We also suppose that we have an adjacency list for the graph. The adjacency list contains [u, v] if the graph contains an edge uv. So for example, the following graph:

is represented as follows:

        $V = 5;
        @E = ([1,3],[1,5],[2,3],[2,4],[2,5],[4,5]);

Construct a temporary list of all valid possible edges. This list is used to check that the vertices chosen by the regular expression engine are all pairwise distinct:

@all_edges = map { my $x = $_; map { [$x, $_] } $x+1 .. $V } 1 .. $V-1;

Construct a string as follows:

$string = (join(' ', 1 .. $V) . "\n") x $V
        . "\n"
        . (join(' ', map { join "-", @$_ } @all_edges ) . "\n") x @all_edges
        . "\n"
        . (join(' ', map { join "-", @$_ } @E ) . "\n") x $V;

Now construct a regex as follows:

$regex = "^ "
       . ".* \\b (\\d+) \\b .* \\n\n" x $V 
       . "\\n\n"
       . join("", map { my ($x, $y) = @$_;
                        ".* \\b (?: \\$x-\\$y | \\$y-\\$x ) \\b .* \\n\n"
                      } @all_edges)
       . "\\n\n" 
       . join("", map { my ($x, $y) = ($_, $_+1);
                        ".* \\b (?: \\$x-\\$y | \\$y-\\$x ) \\b .* \\n\n"
                      } 1 .. ($V-1))
       . ".* \\b (?: \\$V-\\1 | \\1-\\$V ) \\b .* \\n \$\n";

Now the graph has a Hamiltonian Circuit if and only if

        $string =~ /$regex/x;

is true. If so, the backreference variables $1, $2, etc., will contain the vertices of V in the order visited by a Hamiltonian Circuit.

Example

Suppose the graph is as depicted above. Then the value of $string is as follows: (newlines are significant)

        1 2 3 4 5
        1 2 3 4 5
        1 2 3 4 5
        1 2 3 4 5
        1 2 3 4 5
        
        1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
        1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
        1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
        1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
        1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
        1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
        1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5 
        1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
        1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5
        1-2 1-3 1-4 1-5 2-3 2-4 2-5 3-4 3-5 4-5

        1-3 1-5 2-3 2-4 2-5 4-5
        1-3 1-5 2-3 2-4 2-5 4-5 
        1-3 1-5 2-3 2-4 2-5 4-5
        1-3 1-5 2-3 2-4 2-5 4-5
        1-3 1-5 2-3 2-4 2-5 4-5

and the value of $regex is: (whitespace in the regular expression is ignored by using the /x modifier)

      ^ .* \b (\d+) \b .* \n
        .* \b (\d+) \b .* \n
        .* \b (\d+) \b .* \n
        .* \b (\d+) \b .* \n
        .* \b (\d+) \b .* \n
        \n
        .* \b (?: \1-\2 | \2-\1 ) \b .* \n
        .* \b (?: \1-\3 | \3-\1 ) \b .* \n
        .* \b (?: \1-\4 | \4-\1 ) \b .* \n
        .* \b (?: \1-\5 | \5-\1 ) \b .* \n
        .* \b (?: \2-\3 | \3-\2 ) \b .* \n
        .* \b (?: \2-\4 | \4-\2 ) \b .* \n
        .* \b (?: \2-\5 | \5-\2 ) \b .* \n
        .* \b (?: \3-\4 | \4-\3 ) \b .* \n
        .* \b (?: \3-\5 | \5-\3 ) \b .* \n
        .* \b (?: \4-\5 | \5-\4 ) \b .* \n
        \n
        .* \b (?: \1-\2 | \2-\1 ) \b .* \n
        .* \b (?: \2-\3 | \3-\2 ) \b .* \n
        .* \b (?: \3-\4 | \4-\3 ) \b .* \n
        .* \b (?: \4-\5 | \5-\4 ) \b .* \n
        .* \b (?: \5-\1 | \1-\5 ) \b .* \n $

The test $string =~ /$regex/x succeeds, so the graph does contain a Hamiltonian Circuit. Additionaly, the match assigns to the special variables $1 through $5 the values (5, 4, 2, 3, 1), indicating the order in which the cycle visited all the vertices. For an example where the graph does not contain a Hamiltonian Circuit, remove the edge [2,4] from the list.

Conclusion

The Hamiltonian Circuit problem is NP-hard. If there were an efficient (polynomial-time) algorithm for computing whether or not a regex matched a certain string, we could use it to quickly compute solutions to the graph 3-colorability problem, and, by extension, to the knapsack problem, the travelling salesman problem, etc. etc.

For A More Detailed Discussion

See this thread on perlmonks.org. A slightly improved regex which uses lookaheads is also given.