Recursion and Code Compression

Overview

Today, a user by the name of Catalin asked in the SuShCodingYT Discord channel:

What is the equivalent recursive code without using the loop?

1
2
3
4
5
6
7
8
9
int N=4,C=2;
void gen(int n,int c){
    if(n<N){
    printf("n:%d,c:%d\n",n,c);
    for(int k=0;k<C;++k){
     gen(n+1,k);
    }
    }
}

While I was certain it was possible to convert any recursive function into a loop, I wasn’t entirely sure if the converse was true: is it possible to convert any loop into a recursive function? While that question is certainly out of the context of this post, I did discover a method that works for code above.

Formatting

The first step to reading someone else’s code is almost always to clean it up. Especially code written by newer programmers asking questions in Discord or on StackOverflow. Proper formatting is extremely important to ensure you understand the code and aren’t missing any important details.

Everyone has their own style, but for me, cleaning up the code resulted in this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#include <iostream>
#include <string>

int N=4, C=2;
void gen(int n, int c)
{
    if (n < N) {
        printf("  OLD: %d, %d\n", n, c);
        for (int k = 0; k < C; ++k) {
            gen(n + 1, k);
        }
    }
}

Ah, much better. :)

Note: I changed the printf text a bit later for other reasons, but we’ll get to that.

Correctness

The next step to refactoring code is to plan some way to verify that your refactored code is still correct, and exactly matches the behavior of the original code (assuming logical parity is the goal, which it was this case).

Since I had no context about this problem or its importance, I opted for a rather simple test. The original code defined harm limits on N and C at 4 and 2, respectively, so I opted to test all combinations of 0-3 for each parameter. I figured this would keep the printed output tractable while still providing a decent variety of tests.

This is the code I used to call the various implementations as I was solving the problem:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
int main()
{
    for (int i = 0; i < 4; i++) {
        for (int j = 0; j < 4; j++) {
            printf("Test Case %d, %d:\n", i, j);
            gen(i, j);
            printf("\n");
            new_gen(i, j);
            printf("\n");
        }
    }
    return 0;
}

Experimentation - Flail around trying random things blindly

The experimentation phase was about 20 implementations, all of which were terrible and random and didn’t get me anywhere code-wise. Their main purpose was for me to familiarize myself with the code. I gained an intuition for what the function was doing over time, which helped me quite a bit later on when coming up with alternate ways to attack the problem. Any time you’re presented with a new and complex topic, it’s important to spend some time trying to understand how that thing works and writing a bunch of dirty code, to experiment and gain experience with the concepts.

Implementation - Separate the loop into a secondary function

This was the first real breakthrough I had in how I was thinking about the problem. Up until now, I was trying to rush to the end solution of a single function that did everything. That was extremely difficult to have intuition for and thus resulted in the aforementioned flailing. The simplest first step is to replace the loop with a recursive function, totally independent from the original function. After a few tries, I finally ended up with something that felt was a step in the right direction, and which passed all of the test cases.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
void new_gen(int n, int c);
void new_gen_loop(int n, int k)
{
    if (k < C) {
        new_gen(n, k);
        new_gen_loop(n, k + 1);
    }
}

void new_gen(int n, int c)
{
    if (n < N) {
        printf("  NEW: %d, %d\n", n, c);
        new_gen_loop(n + 1, 0);
    }
}

The idea here is fairly straightforward once you make the realization that this is possible. Create a second function, which I cleverly called new_gen_loop, send the initial iterator value from the calling code (0 in this case) as a new parameter (k), surround the entire body of the new loop function with the loop end condition, call the body of the loop, then have the loop call itself, adding 1 to the iterator, as the last call in the loop function.

Now we have successfully converted the loop into a recursive function! However, we now have two functions instead of one, which is not really in line with the spirit of the question. So how can we combine these functions back into one without the for loop reappearing?

Compression #1 - Combining multiple functions into one

Compressing code is something I’m generally quite good at. Once you have a working snippet of code, it’s often not difficult for an experienced programmer to see ways to improve it. Certain code, though, is rather tricky and even simple refactors will cause the behavior to change. Luckily for us, this code is fairly straightforward and easy to compress.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
void new_gen(int n, int c, int k=0, bool loop=false)
{
    if (loop) {
        if (k < C) {
            new_gen(n, k, 999, false);
            new_gen(n, 999, k + 1, true);
        }
    } else {
        if (n < N) {
            printf("  NEW: %d, %d\n", n, c);
            new_gen(n + 1, 999, 0, true);
        }
    }
}

There are two important things we’ve done here. The first is that we’ve combined both parameter lists in a single list. Note that both c and k are parameters in this function. Second, we added a new parameter to pass some important state back to the function, loop. Soon after I wrote and tested this code, I realized that this technique could be extended to compress arbitrarily many functions into one by replacing the boolean with an enum, state, and replacing the if statement with a switch statement. Essentially, this code is a very simple state machine where there are only two states: loop = true and loop = false.

Note that while doing this, I simultaneously had the realization that this code may be able to be compressed further since k and c are the same type and mutually exclusive. That is, c is only used when loop = false and k is only used when loop = true. I didn’t want to rush to optimize too many things at once, so I decided to pass a junk value, 999, to the call to temporarily represent a variable that would never be used.

This code also passed all of the tests. Great! Let’s go a bit further just for fun.

Compression #2 - Overloading mutually exclusive parameters

This step would be questionable in real-world code, as overloading parameters to mean different things in different contexts is a dangerous business to be in. However, this function is simple enough that I wanted to try compressing it as much as possible. Also, getting rid of the junk values may actually help readability in this case.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
void new_gen(int n, int k, bool loop = false)
{
    if (loop) {
        if (k < C) {
            new_gen(n, k, false);
            new_gen(n, k + 1, true);
        }
    } else {
        if (n < N) {
            printf("  NEW: %d, %d\n", n, k);
            new_gen(n + 1, 0, true);
        }
    }
}

As you can see, we’ve simply removed the junk values and reused the k parameter for both k and c. This is the final code I produced and presented to the user who asked the question (along with all of the previous snippets for context).

Closing remarks

In practice, using recursion is almost never a good idea. Calling a function recursively is essentially just a clever way to abuse the program stack in order to maintain state. In mathematics, recursion is a fairly clean and pure way to represent functions, but in code, using a for loop and/or explicit stack data structure almost always makes the code much more readable and performant.

Code compression is also not necessarily always the right answer. Compressing code too far can cause the code to become harder to read and navigate. For instance, if this were more than two functions and you wanted to compress it using a state machine like I mentioned, it may be better for readability to maintain the logic in separate functions and have the switch statement call those functions. At that point, you may as well use a lookup table (array, if state values are contiguous; hash table, if not) that maps the state to a state handler function. Perhaps I’ll write more about that one day.

While problems like these are difficult and frustrating, and sometimes seem pointless to solve, solving or attempting to solve them can give you a much better understanding of concepts you didn’t even expect were related. Exploration and experimentation are a very important part of being a great engineer, so don’t be afraid to jump in and try your best.

Happy coding!

Additional reading

The idea and vocabulary of “compressing” code is not an original idea. This technique is heavily inspired by Casey Muratori’s blog post, “Semantic Compression”, which you can find here:
https://caseymuratori.com/blog_0015

Third-party links

If you’d like to check out the SuShCodingYT Discord server, you can join it via this link:
https://discordapp.com/invite/xtthkcF

This Discord server is the community hub for Suraj Sharma’s YouTube channel, which you can find here:
https://www.youtube.com/channel/UC2i39AOpDSlO1Mrn1jQ8Xkg

Contents