Code/Markup Injection and Its Prevention
Starting from the early UNIXes and before, operating systems represented code and its data as sequences of units of a certain number of bits called bytes. Starting from Unix, which was designed to run on such machines as the 16-bit PDP-11 and later on the 32-bit VAX family of computers, this unit has generally been 8-bits. Today, there are many 8-bit based text encodings (and many more binary encodings for binary data), and the interested reader is referred to Joel Spolsky’s introduction on the subject and Juerd Waalboer’s perlunitut or your language’s equivalent document.
In any case, let’s suppose we have a string where we want to embed a variable containing a string. In Perl we can do:
my $total = "Hello " . $adjective . " World!";
Or more simply:
my $total = "Hello $adjective World!";
So if we put "beautiful"
in $adjective
, we’ll get "Hello beautiful World!"
in $total
and if we put "cruel"
there we’ll get "Hello cruel World!"
there.
So far so good, if it’s a plain string written in plaintext. However, what if it’s in a more well-formed format? Let’s say HTML:
# Untested my $input = get_input_from_user_somehow(); print <<"EOF"; <p> $input </p> EOF
The alert reader will notice that $input was inserted as is into the HTML output. And since we didn’t check if it contains special characters or escaped its special characters, a malicious user can insert arbitrary HTML code and even JavaScript code there. This in turn can wreck havoc upon the users of the page.
This form of HTML injection is called a a cross-site-scripting attack (XSS). If present in web applications or web-sites, it may allow malicious crackers to set up traps to the unwary, and possibly gain access to sensitive information on the site, such as the passwords of users or administrators. And you did notice how easy it was to write code that exhibited this problem, right?
Here are some other forms of code or markup injection:
Shell Command Injection - I’ve discussed it briefly in a different post about "shell variable injection" in Bash, but it also exists in Perl. Imagine doing
system("ls $dir");
or as some newcomers are tempted to do`ls $dir`
, which the latter still has some legitimate uses. Now I as a malicious user can put in the$dir
variable some malicious shell code which will wreck havoc on the system of the user that is running the script.One way to avoid it is to properly escape the arguments using a module such as String-ShellQuote. A more preferable way is to use the list-forms of function arguments whenever possible.
SQL injection allows a user to inject malevolent SQL code that can do untold damages in the database. It is very common in web applications and many other applications that use SQL code. If you do something like
"SELECT id FROM users WHERE name='$name'"
then by putting single-quotes in the name, and using SQL syntax one can insert arbitrary SQL there and do a lot of damage. There was also a very nice xkcd comic about it.Perl Code injection - let’s suppose we want to construct an optimised anonymous function (
sub { ... }
) on the fly. We can build its code and then use the string eval - eval "". A lot of Perl programmers think it should be avoided at all costs, but metaprogramming has some legitimate uses. Moreover, this can happen in other cases, like when we construct a Perl program (or a program of any other language on the fly and execute it).In any case, if we insert a variable into the
eval ""
which was input from the user without being escaped or validated, we might have an arbitrary code execution vulnerability.Regular expressions’ code injection - imagine you want to see if a string is contained in a list of strings. One naïve way would be to concatenate the strings using a separator that is unlikely to be contained in them (such as
\0
) and then match this gigantic string using$haystack =~ m{$needle}
. However, if $needle contains special regex characters, then the operation might take a lot of time to match or worse - yield an incorrect result. One way to avoid that is to use perldoc -f quotemeta or its \Q and \E regular expression escapes -$haystack =~ m{\Q$needle\E}g
. In this particular case, it is also probably better to use a hash, but naturally this was just one example where we’d like to embed some arbitrary (but plain) text inside a regular expression.Path Injection - if you append a shell path to another, using something like
my $path = "/home/hello/$myvar"; open my $fh, '<', $path or die …
then if someone puts some..
in$myvar
’s components, they can gain access to any arbitrary file in the system.Perl open argument injection - if you do
open(my $fh, "/some/directory/$filename");
, then one may putmy $filename = "|echo h4x0r3d|";
and execute arbitrary code. To avoid such things one should use Perl’s three-argument open. (thanks to Brian Phillips)
These are the prominent examples I can think of now, but they are not the only ones. Your program is in danger whenever it accepts text input from the user and passes it directly to an output format that has some grammar and syntax that can be influenced by this string.
So how to mitigate such code injection problems? There are many ways - sometimes providing alternatives and sometimes complementing each other:
Make sure you have enough discipline to escape the input before it is passed to the output venue. Write automated tests for that.
If you still want to allow some user input, then make sure that you analyse it to make sure it does not contain any malicious code that can abuse the system. For example, you may wish to restrict input only to certain HTML tags and attributes.
Taint your data using unsafe typing or "kinding" and make sure that it can only be output after either being escaped or being untainted. Joel on Software recommends making the wrong code look wrong, which while desirable and important, is probably less preferable than the wrong code to behave wrong and abort with a huge "You suck!" error or something. This may not be very possible given certain limitations of the programming language, but it is a better ideal.
Use auto-escaping features of your environment such as SQL place-holders (e.g:
"SELECT * FROM mytable WHERE id = ?"
), and the list argument of "perldoc -f system" (e.g:system { $cmd[0] } @cmd
).Perform frequent code reviews, black box tests, and encourage hackers to find problems in your code.
Use complementary security measures that make sure that even if a problem occurs, its damage is mitigated. As examples, you can try running the script under an underprivileged operating system user, or as a database user that lacks certain database privileges.
In any case, be careful when writing code that may cause code or markup injection because the consequences may be dire.