wiiki - TortoiseAndHare

Nickname for "Floyd's Cycle-Finding Algorithm" invented by Robert W. Floyd in 1967; see my update to the wikipedia entry http://en.wikipedia.org/wiki/Floyd's_cycle-finding_algorithm.

A simple technique for detecting infinite loops inside lists, symlinks, state machines, etc.

For example, to detect a loop inside a linked list:

make a pointer to the start of the list and call it the hare
make a pointer to the start of the list and call it the tortoise
advance the hare by two and the tortoise by one for every iteration
there's a loop if the tortoise and the hare meet
there's no loop if the hare reaches the end

This algorithm is particularly interesting because it is O(n) time and O(1) space, which appears to be optimal; more obvious algorithms are O(n) space.

There is an alternative version in which the hare races ahead for 2^n steps, then the tortoise teleports to him and n is increased by 1. More efficient (lower multiplicative constant) and only slightly more complex to code.

Algorithm A, the original:

tortoise = head
rabbit = head
length = 0
if rabbit == null: return length
rabbit, length = step(rabbit), length+1
if rabbit == null: return length
while rabbit != tortoise:
    rabbit, length = step(rabbit), length+1
    if rabbit==null: return length
    rabbit, length = step(rabbit), length+1
    if rabbit==null: return length
    tortoise = step(tortoise)
return infinity

Algorithm B, the leaping version:

tortoise = head
rabbit = head
length,gap = 0, 0
if rabbit == null: return "Finite list of length "+string(length)
rabbit, length, gap = step(rabbit), length+1, gap+1
if rabbit == null: return "Finite list of length "+string(length)
limit = 1
while rabbit != tortoise:
    if gap==limit: tortoise, gap, limit = rabbit, 0, limit*2
    rabbit, length, gap = step(rabbit), length+1, gap+1
    if rabbit==null: return "Finite list of length "+string(length)
return "List with loop of size "+string(gap)

In each, length is how long the list is so far as traversed by rabbit. In the first, if rabbit takes 2n steps, tortoise takes n steps. In the second tortoise never takes any steps. In linked lists stepping to the next node isn't expensive, but this loop finding algorithm can be used in places where the list is implicit (such as the Pollard Rho and Elliptic Curve methods of factoring integers) and stepping is expensive. Algorithm B then wins by a substantial margin.

Here's the complete analysis:

Suppose there is no loop and the list is of length 2n. Then algorithm A takes 3n steps, and algorithm B takes 2n steps. Now suppose there is a loop, and let t be the number of steps taken to reach the loop and let p be the size of the loop.

Algorithm A:

If t<=p then steps_A = 3p
If t>=p then let d be such that d+t=k*p. Note that 0<=d<p.
Then steps_A = 3t + 3d, so that 3t <= steps_A <= 3(t+p-1)

Algorithm B:

If t<=p then let P=2^n such that P <= p < 2P, and let e=2P-p, so that e<P<=p.
Then steps_B = 2P + p = 2p+e <= 3p = steps_A
If t>=p then let Q=2^m such that Q <= t < 2Q, and let f=2Q-t, so that f<Q<=t.
Then time_B = 2Q+p = t+p+f <= 3t <= steps_A

Therefore algorithm B never takes more steps than algorithm A and almost always takes fewer. Algorithm B is significantly harder to analyse, and should perhaps be introduced second in a teaching context.

A comment about the tortoise in the second version: copying the rabbit's position into the tortoise (aka teleporting the tortoise to the rabbit) is about the same complexity as following a link in a linked list. However, the teleportation is far less expensive than a step in the context of integer factoring algorithms.

It occurs to me that I don't see a rationale for doubling as being the most efficient limit change. Assuming a multiplicative change, at least, is most efficient, then a heuristic argument would be that multiplying by e would be best, since infinity is too much and (1 + infinitesimal) is too small, and e is the usual minimum of such things.

Do you mean e as in the number whose natural logarithm is 1? Of course. One wonders what a graph of depth e would like like. Perhaps one could do it in CeePlusPlus with a few WildPointers......
- The constant e arises as a limit of an infinite process, naturally; any finite subsequence of such a process would involve a rational approximation to e rather than e itself. Thus a discrete data structure with depth 2 or depth 3 is such an approximation. Since we're talking about iteration, more exactly we're talking about approximating powers of e, such as e^2 ~= 7, e^3 ~= 20, e^4 ~= 55, etc, whereas the original algorithm was based on powers of two, 2^2 = 4, 2^3 = 8, 2^4 = 16, etc.
Another option is to have more than two animals in the race. Imagine a cheetah, a hare, a cat, a penguin, and a tortoise (traversing each at different speeds, in order to maximize the chances of a speedy meeting). Any meeting between any two critters would signify a loop. Analysis of this algorithm is left as an exercise to the reader. :)
- That's similar to what I'm talking about, since the overhead increases as the number of animals increases, yet the chances of a meeting also increase. An infinite number of animals would have infinite overhead, so that's too much...

Do the analysis. I've done the *2 case, surely you can work out what gives the minimum expected time.

That would be fair. But you're probably the better mathematician of us two, and it is typical of me to be lazy like this: to offer a heuristic motivation rather than an actual proof. :-) (This is undoubtedly how I ended up in the commercial world rather than academia. I do appreciate the importance of proofs, but I tend to suffer from the FrameProblem: getting tangled up in possible issues that an expert mathematician knows can simply be ignored.)

Well, if t is really small then we want the multiplier to be really big, and if t is really big then we want the multiplier to be really small. It depends on the assumptions you make about how the cycles have arisen, and therefore what the likely relationship is. For practical purposes a multiplier of 2 would seem a very good choice.

Sure. The point I was trying to make is, what if you have no idea about the likely relationships? I basically was speculating that the limit tends towards e (yes, 2.718281828459045..., that e) over all random problems of arbitrary tail and cycle lengths. Now that I've said it more explicitly, let me modify that speculation to exp(N) in the number of nodes. This is just because e or exp(N) is very frequently involved in such limits, for instance, in a very precise sense, e is the optimum number base. But I haven't thought about the proof in the intervening months, I suppose I should. Or maybe doing some Monte Carlo runs would be more fun. :-)

Is there an analog to this algorithm for detecting cycles in an arbitrary graph?

If you have a deterministic algorithm for iterating through the nodes in your graph then just follow it and you'll find a cycle or a dead-end. If you find a dead-end you don't know that there are no cycles. If your graph can't be iterated in a deterministic manner then I think you're out of luck.

You can always find cycles by taking the n-th power of the adjacency matrix and looking on the diagonal.

Are dead-ends a fundamental problem - wouldn't backtracking (e.g. depth-first) iteration avoid them? Are we talking about directed or undirected graphs?

The entire point of the algorithm on this page is that it doesn't use back-tracking. The question was whether this algorithm works. There are other algorithms, including some which back-track, but that wasn't the question. When you ask a question like that I no longer know how to be helpful. You either don't understand the algorithm or you're asking a different question entirely.

Sorry, I interpreted "analog" differently. The depth-first graph version seems analogous to me since it finds cycles in a data structure by "racing" to see if a hare overtakes a tortoise. At O(n) time and space it is still an efficient algorithm, but not quite as good as the original.

I can't think of a deterministic O(1) space algorithm to find cycles in graphs. I'd be interested to see one.

[I'm not the one who asked about an "analog" and I'm not sure where that question was going, FYI] On the other hand, the modification of Floyd's Cycle Finding Algorithm to random graphs is much closer to being deterministic and O(1) space than it is to being nondeterministic and O(lg N) space -- the nondeterminism tends to be sharply limited (effectively bounded by a constant) for all random graphs with average connectivity at or above the "network effect" level. It's true that, below that level, fully- but sparsely- connected graphs may be problematic in this regard...say, a graph consisting of nothing but many, many dead-ending cycle-free subtrees all connected at a single root. However such graphs are rare in most problem domains (and certainly are rare across random graphs). Then again, if what you've got are deep trees and you want to prove there are no cycles, that's a problem, yep. Knuth addresses this in O(1) space, note...it requires pointer arithmetic! Ha. Algorithmic efficiency growing out of manure; purists beware.
However, let's re-frame the problem for clarity. First there is an issue of traversing the graph, and only second is there the further problem of finding cycles. If it is impossible to traverse the graph in O(1) space, then obviously you can't solve the whole problem in O(1) space. So let's just assume you have a traversal method that is O(1), without regard for its nature. Floyd cycle-finding can then be implemented on top of that (or any) traversal method in O(1) additional' space, just by traversing (somehow, anyway) at both half speed and full speed, and comparing the two positions. The cycle-finding is independent of the traversal method. (Unless, of course, traversal requires destroying the graph during the first traversal, such that there can be no second traversal, or at least, not until the first traversal finishes -- which is not hypothetical, again see Knuth). -- dm

CategoryAlgorithm