The case for unmanaged programming languages: data allocation

In this article, I aim to discuss the benefits of unmanaged programming languages, such as C++.

As a currently enrolled Computer Science student and a programmer with at least six years of programming experience, I have come across my fair share of programming languages, from the ubiquitous commercial languages like Java to the extremely specialized and mostly unheard of ones, like Prolog. The point being, in six years of programming, the most useful languages I come across are almost always unmanaged languages. With a couple of exceptions that I will also discuss, such as Python.

But wait! I'm not even a programmer!

That's okay! (Jeez, stop shouting...) I will be explaining the concepts necessary for understanding my argument in full. So just hold on tight, maybe grab a coffee, and enjoy the ride.

Data allocation: Variables

If you've never programmed before, you must understand the concept of variables. Specifically, how the underlying memory they are contained in is allocated.

Programming is mainly a practice of getting data from the user or some other source, manipulating it in some specific way, and often times sending the data back to the user. Once data is collected from the user, it has to be stored somewhere. Although there are many places where data is stored in practice, when programming, especially in managed languages, programmers don't really have to worry about that.

In it's simplest form, a variable is just a way to put a name to a piece of data. Data is stored in a variable and later retrieved using the same name. Take for example the following short Python program.

	
    	name = input("Input your name here: ") // Gets the users's name
    	print("Hello ", name, "!") // Displays the name on the screen
        // Input: John
        // Output: Hello John!

This very simple program illustrates how a programmer would capture some data and then use it during the course of the program.

Data allocation: the stack and the heap

Now that you know what exactly a variable is, I'm sure you are just dying to ask: "Where the heck does the program keep your 'variables'?!" (No? Nobody is wondering that? Darn...)

The answer to that requires that we delve into how the operating system organizes the program's internal memory in the RAM. Specifically, the difference between stack and heap allocation.

The stack is a very complicated data structure and I'm not going to go into a tonne of detail in this article, but you can learn more about it from this video.

The stack is a small region of RAM, typically only a few megabytes, where small variables are stored. It is also somewhat important that the data will not change size suddenly. If this happens, the data will most likely have to be moved to the heap. This is because of the nature of how the stack works. The only data directly accessible from the stack is whatever piece of data was last added to it. That doesn't mean other data can't be accessed, just that it is incredibly difficult to change the size of a variable somewhere in the middle of the stack. One would need to move all of the other variables around to compensate for the increase/decrease in the stack pointer.

These might seem like unreasonable restrictions to put on data storage, but these restrictions lend themselves to some extreme advantages.

But first, let me explain how the heap works.

Conversely to the stack, the heap is a large area of RAM, given to the program by the operating system. Unlike the stack, when the program runs out of space on the heap, it can simply request more space from the operating system and, given enough free RAM on the system, will be given it. Changing the size of a variable on the heap is much easier than on the stack, as it doesn't require moving any other variables around. To change the size of a variable, you simply allocate some new memory with the required amount of space and copy the old data over.

The big difference, which is responsible for most of the differences in operation between the stack and the heap, is how data is allocated. To allocate data on the stack, the program simply moves the stack pointer by the required amount of bytes. Allocating on the heap requires that the program searches for an empty space in memory with the required amount of bytes, failing which it needs to ask the operating system to provide it with more memory. On top of that, in most unmanaged languages, data needs to be deleted from the heap by the programmer, whereas this is done automatically for the stack.

If it wasn't extremely obvious, the big problem with the heap is that it is slow. Allocating and deleting data on the stack is extremely fast. There are other problems with the heap that slow it down even further, such as cache misses, but I won't go into that in this article. I might do another article on this topic some other time if people want that.

So, why then are unmanaged languages better?

The answer is simple: they are unmanaged!

Okay, okay, this is an admittedly crappy answer, but it speaks to the heart of the problem. Most managed languages make use of a separate program called a garbage collector. Remember how I said that data on the heap needs to be manually deleted by the programmer? Well, this turns out to become a big problem if the programmer doesn't know what they're doing. I won't go into specifics, but suffice it to say that having a program that does this for you makes life a lot easier. Unfortunately, it has a nasty downside: it slows your program down.

The garbage collector gets notified when a variable is created and it keeps track of which parts of the program are using the said variable. Once it finds that a variable is no longer being used, it deletes it. This is a great concept in theory, but there are some problems. One big problem is that these types of languages tend to store most of their variables on the heap. The reason for this is that it is much easier to give other parts of the program access to the data, as it doesn't need to copy the data every time. Unfortunately, this also means that programs tend to suffer from slowdown as a result of cache misses and the aforementioned complexity of heap allocations.

Now, to be fair, this isn't always that much of a problem. Particularly if the program doesn't need to have millisecond response times. It does, however, make a noticeable difference when dealing with programs that are extremely CPU intensive such as games or video editing software.

Conclusion

So, it turns out that data allocation is pretty much the biggest problem with managed languages, at least in my opinion. I will be making a second article about some of the other things I don't like about managed languages, but this article is getting quite long. In light of the fact that this isn't the complete story, the second article will be published within the coming month and will not delay the production of next month's article.

P.S. If you liked the video about how the stack works, please consider subscribing to RgMechEx. They make really good videos explaining all kinds of mechanics surrounding old retro gaming consoles. The stack is an old invention and was found in many older systems. It has several advantages and, even though it wasn't meant to do a whole lot, what it does do, it does extremely well and with a great deal of speed.

Public Key Cryptography: The Mathematics of RSA

Initially, when I created this blog, I stated that only opinions would be contained here. However, it has become very clear to me that I am obviously better at explaining things than I am at expressing complex opinions. And beyond that, the articles I enjoyed writing the most were the explanations about topics in computer science. That brings us to now. Moving forward, I'd like to shift my focus toward explaining topics I am interested in as well as continuing to express my concerns and opinions. I am, however, going to try to move more toward opinion articles that require some explanation as to the principles at work. On to the Maths! Note: If, at this point, you haven't read my article on public-key cryptography, I would highly suggest that you read it before continuing. You can find it here . In this article, I will be explaining the mathematics of RSA. RSA is but one example of public-key cryptography and there are many out there, but today we'll only be looking a...

A Student's Perspective

Search This Blog