ARC and Garbage Collection

I always think, that there quite a lot of things most of us developers know and can explain on a higher level. In one or two sentences. But I think in most cases it is worth diving a little deeper. Just a little bit to answer possible follow-up questions with more confidence and to not sound like you are playing a round of buzz-word-bingo.

🔄 ARC

What is ARC?

Even though, it usually compared to garbage collection in general it actually basically is one type of garbage collection in a sense since reference counting in general is one strategy used in garbage collection. ARC stands for Automatic Reference Counting and it is the name for the memomry management feature of the so called Clang compiler, it basically describes that at run time the references of each object are counted.

How does it work?

All strong references to an object are counted. If that count of strong references reaches zero, the object is deallocated. There is no background process asynchronously deallocating objects at run time. Therefore, it is not possible to automatically handle refernece cycles automatically with ARC. The basic rule remains: As long as there is one strong reference to an object, it will not be deallocated. Since the deallocation is done instantly, the compiler has usually no way of detecting reference cycles. This means, that we as developers need to be careful not to create dead locks or memory leaks in our code. In order to avoid those cycles and unwanted strong references to objects, we use weak and unowned(at least in Swift) references.


Here is a quick reminder/ refresher in Swift:

  • weak-reference:

    • always of type Optional
    • the object the reference points to can be deallocated and the refenrence is then set to nil
    private weak var delegate: Delegate // Very common case

  • unowned-reference:

    • Neither Optional nor strong
    • The compiler simply assumes that the object the reference points to has not been deallocated
    • Keep in mind: *If the object has been deallocated the program will crash*
    object.methodWithCallBack(callback: { [unowned self] in
    	// Do something here ...
    })

When using a weak reference we should keep in mind possible side effects that this might have. If we would, for example, do something like this:

object.methodWithCallback(callback: { [weak self] in 
	self?.foo()
})

We should keep in mind that foo() might never be called. Hence, it is not when self is nil. So we should probably have some kind of error handling or at least think about if it is "okay" that foo() might never be executed.

Now, some might think, we could just unwrap the optional self here like this:

object.methodWithCallback(callback: { [weak self] in 
	guard let `self` = self else { return }
	self.foo()
})

But if you have a look at this, you still don't really handle self being nil. In that case you might not realize that foo() is never or not always executed. It might somewhat "silently fail". So you could think about using unowned instead here like in the example above. But, as stated above, this could also have some side effect like the app crashing when self is nil.

The point I am making here, that both weak and unowned should be used with clear intention. E.g. think about if self being nil is more a programming error, hence, the programmer should be made aware of this quite quickly, then unonwed might be good solution. Or is it more important that program does not crash at whatever cost, then weak might be good. In such cases it might then be a good idea to establish some kind of handling of sefl being nil. Like logging that, or throwing an error, whatever it might that serves your goals


What happens when we use a weak reference?

What the compiler does, when we use weak is quite interesting and one should at least heart about it once. The compiler does something called "Zeroing weak references". It is a feature that ARC comes with. It automatically clears (= sets it to nil) weak-referenced variables, instance variables and declared properties right before the object the reference points is deallocated. The pointer then only goes to a valid object or nil.

On a side note here: "Zeroing weak references" is only available on Mac OSX Lion or later and iOS 5 or later.


And what happens with value types?

When we are talking about memory management we usually only talk about objects or reference types. But what happens with value types? Types that are copied when passed around? For some it may seem obvious but I will state it here anyway: Value types are dealloacted with the reference that created them.


Where is it used?

It is used in the Objective-C and Swift programming languages.

It is also deployed in the macOS and iOS operating systems.



Garbage collection 🚮

As briefly mention above, garbage collection is the umbrella term for automatic memory management. In general it describes that the collector "reclaims" the garbage ↔ memory obtained by objects which are no longer used by the programm. Its main purpose to relieve the programmer from manually handling the memory management. The programmer does not need to think about what objects need to be deallocated and which need be to held in memory.

👍 Advantages

  • Certain categories of bugs are substantially reduced

    • Dangling pointer bugs

      • Pointer that point to parts of the memory that has already been freed by the collector. This part could then have been already reassigned which could lead to unpredictable results.
    • Double free bugs

      • Bugs where the program tries to free a region of memory that already has been freed.
    • Certain kinds of memory leaks

      • Some types of garbage collection can detect that some objects, that are not longer reachable by the program, occupy memory. Which could potentially lead to memory exhaustion.

👎 Disadvantages

  • Consumes additional resources (Logic to devide what region of memory to free)
  • Performance impacts (Objects that are known to not be used anymore are kept in memory)
  • The moment the garbage collection occurs is unpredictable


How does it work?

The main task is to find data that is no longer accessible in the program and reclaim the resources taken up by this data. So accomplish this task, there are different strategies used:

Tracing

It is the most common technique when talking about garbage collection. It is basically keeping track of which objects are still reachable through a chain of references from certain root objects and which are not. Hence, which objects (garbage) can be collected. There two basic algorithms used for this:

  • Naïve mark-and-sweep

    This algorithm has two stages. One where all objects that point to a root (aka are in-use) are marked. And one where all memory is scanned to examine which region of memory can be freed / swept. This disadvantage of this algorithm being that the system must be suspended during the collection and no mutations of the working set can be allowed which might led to the freezing of the program.

  • Tri-color marking

    This algorithm is used by most modern garbage collectors. All objects are split into three different sets:

    • white ➡️ objects that might have their allocated memory recycled
    • gray ➡️. objects that are reachable from roots but still need to be scanned for references to white objects
    • black ➡️. objects that cannot be collected

Another detail you might encounter when looking into tracing garbage collector is the differentiation between moving and non-moving garbage collection. Moving here means that all objects are moved to a new region in memory after the garbage has been collected.

Reference counting

As already explained above. Reference counting in general means counting the strong references to each object. Objects are detected as garbage if their reference count is zero. The space in memory of those objects is then reclaimed instantly.

Escape Analysis

This analysis is a compile-time technique that converts heap allocations into stack allocations. Meaning that there is less garbage collection to be done. It is able to determine if an object allocated inside of a function is accessible outside of it. If so, the allocation is said to “escape” and cannot be done on the stack. Otherwise, the object may be allocated directly on the stack and released when the function returns, bypassing the heap and associated memory management costs.

Heartbeat and timestamp

It is a technique to free resources like file handlers and network pointers. E.g. it is used to close network connections once a client does not send a hearbeat signal to the monitor anymore.


Where is it used?

Garbage collection is used in the following languages:

  • Java
  • C#
  • Go
  • most scripting languages


Where to go from here?