Software Obfuscation

Theoretical Foundation

According to the formalization by Barak et al. [15], an effective obfuscation technique is a program transformation algorithm O, where given any program P, O(P) computes the same function f ∈ F as P does; meanwhile, for any non-trivial function property ϕ : F→ S and any program analyzer Aϕ that tries to efficiently compute ϕ, if ϕ(f) is intractably hard given only black box access to P as an oracle, the result ofAϕ(O(P)) is no better than randomly guessing ϕ(f).

To better understand the definition above, take symmetric encryption as an example. Suppose there is a cipher E that takes a key k and consumes plaintext p to compute the ciphertext E(k,p).Ifk is hard-coded into E, E(k, ·) can be considered as a function taking plaintext as the sole argument, denoted by Ek. Let ϕ be the prop- erty function that decides the hard-coded key of a cipher, namely ϕ(Ek) = k. Assuming E is resilient to chosen-plaintext attacks, computing ϕ will be prohibitively expensive if attackers can only access E as a black box oracle. However, it is possible to recover k in a reasonable amount of time by directly looking at P which is the code of an implementation of Ek. In that case, if a perfect obfuscator O exists and the implementation of Ek is released as O(P), any attempt to efficiently learn k by analyzing O(P) will fail.

It has been proven that a perfect obfuscator does not exist, even if the properties to hide are limited to {0, 1}-valued functions [15]. That is, analyzing the code of a program can always reveal at least 1 bit information about what the program computes without spending too much time, no matter how complicated the program code is rendered. Therefore, some theorists relaxed the security requirement for obfuscation instead of trying to develop a technique that is universally effective [27].

Obfuscation in practice

Since a theoretically perfect solution to the generalize problem of software obfuscation is unfeasible, practitioners usually set limits to problem characteristics so that the problems can be addressed within a reasonable scope. In industry, the goal ofobfuscation is not to make reverse engineering impossible but to increase the cost of it such that attacks can be delayed or diverted to relatively poorly protected targets.

A recent literature review classified obfuscation algorithms into three categories according to how they are implemented [40].

The first kind is data obfuscation that alters the structures in which data are stored in binaries. One typical data obfuscation technique is to statically encrypt the string literals and decode them at run time.

The second kind is static code rewriting that transforms the executable code into a semantically equivalent but syntactically obscuring form. For example, a static code rewriting technique called movfuscator [26] can transform an x86 binary into a form that only contains mov instructions, making it difficult for reverse engineering tools to reconstruct the original control flow.

The third kind is dynamic code rewriting, also known as self-modifying obfuscation. For programs protected by self-modifying obfuscation, the statically observable code is different from what is actually executed at run time. One of the most widely used dynamic rewriting obfuscation techniques is the packing method. A packer encodes the original code of the obfuscated program into data and dynamically decodes the data back into code during execution.

Source: Protecting million-user iOS apps with obfuscation: Motivations, pitfalls, and experience, Proceedings - International Conference on Software Engineering