A Warning Before Reading
- The software in this article is for demonstration purposes only. There is probably better software out there that will do a better job using a better algorithm (such as embedding within lines of contrast).
- The pictures in this article are JPGs which will not work with the PNG Steganography software. This is to reduce bandwidth. Clicking a picture will open the over-10MB PNG files.
The Hypothetical Problem
I have a scenario for you. Let's say I own a Detroit-based robotics company, and I just created the most awesome robot ever created. It's the first robot that is actually alive!
I have an electronics convention coming up where I will demo my new Johnny 5. I need to send the design files to the manufacturer located somewhere outside of Michigan near a large mountain where there is more lightning. However, I have a competitor that doesn't want this to happen: OCP.
OCP will be demoing its new living-robot prototype, Robocop, at the same convention. The firmware engineering department at OCP was understaffed, so Robocop will not be done in time for the convention. OCP has decided to steal my Johnny 5 firmware and adapt it for Robocop.
If OCP steals my firmware, I will not have Johnny 5 in time for the convention. In turn, OCP has its own production plant, so Robocop would be done in time.
OCP is a highly influential corporation with many political connections. My company is a small start-up business looking to break into the market. My manufacturer is in the same position. We both want this contract, and the manufacturer is willing to do what it can to obtain my files. By the end of the day, OCP will be monitoring all phone calls, emails, and web traffic (including FTP and SSH) going into and out of my company. Because it takes a whole day to package my files in a format the manufacturer can use, I will only have enough time to make a drop-off arrangement with the manufacturer, but the manufacturer is located too far away to personally hand him the files.
Whichever company demos its robot at the convention will be named the first company to have successfully created a living robot, winning a large, multi-year contract producing them for the government. How do I send the files to the manufacturer?
It will ship the Johnny 5 prototypes directly to the convention for me; they don't have to come in to Detroit. I only have to get the 3MB of files to the manufacturer to triumph over OCP.
OCP has a vast amount of resources, including a huge number of data processors and servers. Any encrypted traffic I send, whether it is over SSH, FTPS, or HTTPS, will be logged and decrypted. This requires me to find a way to hide the data in plain sight.
I can't trust anybody with these files. OCP will plant people everywhere, including the post office. My friends will be bribed. Trackers will be installed on my car. Taxis will drive me in the wrong direction. I'm pretty much screwed in the corporate espionage department. My loyal manufacturer is the only one I can trust will this data. (Note: Although I'm taking this to the extreme, corporate espionage and data theft is very much a real thing, not just a concept used in movies.)
After talking with the manufacturer on our final untapped phone call, we agree that I will announce the success of the Johnny 5 prototype on my company blog. In the entry, I will post a hi-res picture of my team and I standing around the prototype as most companies do. This will seem like normal traffic. However, the design files will be hidden _inside_ the picture itself. We agree on a passphrase, and we hang up the phone, not speaking to each other again until after the convention.
Whenever the topic of information hiding comes up, people usually think of encryption. An attacker can see the data, but he can't make sense of it without a password, token, etc. However, there are ways of hiding information in such a way that an attack will not even know there is data there in the first place. Hiding data in this way is called steganography.
Steganography Before Computers
In the past, steganography was accomplished by posting letters. Anyone who sees the letter can harmlessly read it. The person who the hidden data can find the real message through the use of a mask.
One of the best examples of this is from a letter sent during the Revolutionary War in 1777. Click the link for a transcript.
Of course, the concept of steganography is even older than the Revolutionary War. The earliest example I can find was written by Herodotus in 440 BCE. Histiaeus sent a message to Aristagoras by shaving a slave's head, tattooing a message on the head, and sending the slave once the hair had regrown. Once the slave had arrived, Aristagoras once again shaved the slave's head to read the message. (source)
There are a lot of different places to hide information on a computer. Picture files are simple to post on the Web and are inconspicuous. Let's use the picture of my engineering team in the lab with Johnny 5.
Later, Dr. Bonsai left the company to join OCP. I hear that he is currently a key person on the Robocop project.
Click here for the original size
One way of hiding a message is to take each bit of the message and change the color of the picture's pixels. This means that the least significant bit of pseudo-randomly determined x, y, and color channel is changed to be a bit of the message. The change is so subtle that no one will notice the change visually.
The engineer picture is 5500 pixels wide and 3115 pixels high. There are three color channels: red, green, and blue. If every (x, y, color) location is used, then the largest message that can be embedded is about 6 MB. The path through the locations is determined via an algorithm based on the password.
First, let's build the software.
steg $ ls Engineers.png build steg.c steg $ cat build #!/bin/sh gcc -o pngsteg steg.c -lgd -lssl steg $ ./build steg $ ls Engineers.png build pngsteg steg.c steg $ ./pngsteg --help pngsteg - A steganography program for PNG files Written by firstname.lastname@example.org Copyright 2012 See http://waronpants.net/png-steganography for more info Usage: Embedding: pngsteg -e -i INPUT_PIC -d MESSAGE_INPUT -o OUTPUT_PIC -p PASSWORD [-m MAP_OUTPUT_PIC] Extracting: pngsteg -x -i INPUT_PIC -d MESSAGE_OUTPUT -p PASSWORD Help: pngsteg -h -e, --embed Embeds MESSAGE_INPUT into INPUT_PIT png file, creating OUTPUT_PIC png file -x, --extract Extracts embedded message from INPUT_PIC png file, creating MESSAGE_OUTPUT file -i, --in-pic INPUT_PIC The png file to be embedded into/extracted from -o, --out-pic OUTPUT_PIC The png file to be created after embedding the MESSAGE_INPUT into INPUT_PIC -d, --data MESSAGE The cleartext message to embed/extract -m, --map MAP_OUTPUT_PIC The optional map output png file. When a bit is embedded in INPUT_PIC, the same pixel location will be modified in map file. The color channel affected will set to value 255. -p, --password PASSWORD The password on which to base the algorithm -v, --verbose This switch can be used multiple times. Each time increases the verbosity and number of output messages. (max used: 4) -h, --help This text
To demonstrate the software with a small amount of data, we'll create a 500kB file and embed it.
steg $ dd if=/dev/urandom of=testdata_small bs=1K count=500 500+0 records in 500+0 records out 512000 bytes (512 kB) copied, 0.0507097 s, 10.1 MB/s steg $ ./pngsteg -e -i Engineers.png -o StegPic_small.png -d testdata_small -m StegMap_small.png -p TestPassword1 -v Building node list Preallocation successful. Shuffling nodes Embedding data Data embedded Saving output picture Saving map output picture steg $ ./pngsteg -x -i StegPic_small.png -d testdata_small_extracted -p TestPassword1 -v Preallocation successful. Shuffling nodes Extracting data Data extracted. steg $ diff testdata_small testdata_small_extracted steg $
The resulting picture and map file are below. The map starts as a black picture. As each bit is of the message is embedded, the color channel of the same pixel is turned "on" (set to 0xFF).
How about a larger file? Let's try the same process with a 4MB file.
steg $ dd if=/dev/urandom of=testdata bs=1M count=4 4+0 records in 4+0 records out 4194304 bytes (4.2 MB) copied, 0.454543 s, 9.2 MB/s steg $ ./pngsteg -e -i Engineers.png -o StegPic.png -d testdata -m StegMap.png -p TestPassword1 -v Building node list Preallocation successful. Shuffling nodes Embedding data Data embedded Saving output picture Saving map output picture steg $ ./pngsteg -x -i StegPic.png -d testdata_extracted -p TestPassword1 -v Preallocation successful. Shuffling nodes Extracting data Data extracted. steg $ diff testdata testdata_extracted steg $
The Source Code
The (admittedly unclean) source code can be found here:
Update (March 20, 2012): A few people have asked to see the revision history. Check it out through subversion. (Repository link)
Why not JPG?
"JPG files are smaller and more common. Why not use them?" Well, fictional interrogator, it's because of why JPGs are smaller. They use lossy compression. This means that once the data has been embedded, saving the file will erase part of the data. JPG steganography usually hides the data in the EXIF data of the JPG, which is usually very obvious to find and read.
Other lossless picture formats are BMP and TIFF. Neither are compressed, so they are usually very large. Due to bandwidth and storage, posting them as an image on the Web is generally not a good idea whether there is data embedded or not.
The algorithm I used is a buck-shot approach. It scatters the data all over picture across all color channels. It also doesn't consider an alpha channel (transparency).
There are more intricate methods, such as hiding the data in the anti-aliased lines between contrasting colors. However, because this method restricts the locations where data can be hidden, less data can be embedded.
Just because encryption works pretty well most of the time, it should not be the only thing considered whenever security is mentioned. There are other layers. Data can be encrypted then embedded in a picture.
There are also algorithms and programs for embedding data in sound files. If the data doesn't need to be extracted without errors, then pictures can even be inserted into music as an Easter egg, such as the audio spectrum of Windowlicker by Aphex Twin (source).
There are other creative uses for steganography. Besides secretly hiding data, steganography might be used to hide copyright information in a photograph without requiring a destructive watermark, for example.
Whether the intent is malicious, defensive, or just for fun, remember that not everything may be as it seems, especially on the Internet. Just don't get too paranoid.
Note: I wanted to include a picture of the letter John Nash would see popping out at him in A Beautiful Mind, but I couldn't find the screenshot I wanted. Instead, I included the above picture from a conspiracy theory website. To make up for this, please accept this picture of Jennifer Connelly.
I'm going to end this on a high note with that picture.