Linked on German Wikipedia

December 21, 2007

Hi, I just saw that this blog has been linked on the German Wikipedia. Now I will continue writing about the rest of the files. So, stay tuned, I will be back after Christmas.

Happy Holidays to all of you.


Episode 2 aka Barber.z5

November 12, 2007

Let’s take a look at the 2nd Episode. As pointed out here I was unable to decode the file with the correct md5sum. Nevertheless I was able to use it.

Barber.z5 is a story file for a z-machine. See wikipedia
Read the rest of this entry »

We’ve got the files, but what now?

November 9, 2007

After we extracted and decoded the files, we now want to use them. So, what are they, how to use them?

At this stage, we take a look at (Only awaylable in the Google-cache)

Edit: I fixed the links

Read the rest of this entry »

Getting the files

November 8, 2007

After my last post, you will end up with a lot of text files. We will now convert these text files to the actual binary files.

To do that we will first have to remove duplicated files. Transcode will sometimes extract one subtitle twice. Use your favourite image viewer (best with automatic search for duplicated files) to identify those. Do not delete them, as it could of course be possible (even if not very probable) that it wasn’t transcode’s fault but they are actually two identical subtitles.

The next step is to join all txt-files and to remove accidentally inserted spaces. A simple
sed -e “s/ //g” subtitles-*txt > base64.txt
entered on the console will do.

Now we have one file containing the base64-encoded source, Read the rest of this entry »

Converting images to text

November 5, 2007

Now we get to the interesting part:
Converting the images to text.
I’m using pgm2txt for this purpose. Using database only mode (“-d”) ensures that we have full control. Using gocr intern heuristics gave many errors, e.g. “l” was read as “1”.

pgm2txt will ask you to enter the display text, whenever it cannot identify the characters. This means in database only mode, you will have to enter every character at least once. That would be a problem, but as the letters aren’t always clearly separated, you will often end up having to enter the text for combinations like “KWY”. Nevertheless, after having trained the db for some time, all you need is patience and cpu power. After some time (1000 pics => several hours) you will end up with a lot of text files, each containing one part of base64 encoded file.

Extracting subtites

November 4, 2007

Step one is easy.
You will need the following programs:

  • transcode
  • mplayer
  • subtitleripper

Using with Linux just use the following script (originally taken from the Gentoo Wiki):

echo “Please type in the stream number”
mplayer -dvd-device /dev/dvd dvd://$DVDSTREAM -vo null -ao null -frames 0 -v 2>&1 | grep sid
# ask the user for sid…# the correct number is 0x20 + sid
echo “Please type in the subtitle SID, type in hexadecimal and add 0x20. Example: for sid 0, type 0x20”
read SID;

tccat -i /dev/dvd -T $DVDSTREAM -L | tcextract -x ps1 -t vob -a $SID > subs
subtitles2pgm -o subtitles-$DVDSTREAM -c 0,255,255,255
We will end up with a lot of pictures, each containing one piece of the final base64 encoded file. The next step will be to convert these pictures into text using an ocr-programm

What’s this all about?

November 4, 2007

It recently ordered the “The IT Crowd” DVD set from Season 1 contains some normal easter eggs, but the Season 2 DVD goes far beyond. The leet subtitles are actually base64 encoded files.
So, what to do now?

  1. Extract subtitles from DVD
  2. Convert subtitles to text
  3. Decode text to files