After my last post, you will end up with a lot of text files. We will now convert these text files to the actual binary files.
To do that we will first have to remove duplicated files. Transcode will sometimes extract one subtitle twice. Use your favourite image viewer (best with automatic search for duplicated files) to identify those. Do not delete them, as it could of course be possible (even if not very probable) that it wasn’t transcode’s fault but they are actually two identical subtitles.
The next step is to join all txt-files and to remove accidentally inserted spaces. A simple
sed -e “s/ //g” subtitles-*txt > base64.txt
entered on the console will do.
Now we have one file containing the base64-encoded source, the md5sum for that file and the original file name. As I didn’t find a program which could handle these files the way I wanted I wrote the following python script to extract the file and check the md5sum:
import sys, base64, md5
result = “”
for ch in md5bin:
realCh = hex(ord(ch)).replace(“0x”,””)
result += realCh
if len(sys.argv) < 2:
print “””Usage: %s in_b64_enc_file
in_b64_enc_file – The Base64 encoded file to be converted
f = file(sys.argv, ‘rb’)
s = f.readline()
md5string = “”
sarray = s.strip().split(‘:’)
if (sarray == “Content-MD5”):
md5base64 = sarray
md5bin = base64.decodestring(md5base64)
md5string = hexToString(md5bin)
print “MD5 should be: ” + md5string
sarray = s.strip().split(“=”)
filename = sarray.strip(‘”‘)
s = f.readline()
fout = file(filename, ‘wb’)
s = f.read()
decoded = base64.decodestring(s)
m = md5.new(decoded)
print “MD5 is: ” + hexToString(m.digest())
print “MD5 matches: ” + str(hexToString(m.digest()) == md5string)
If you were not careful enough (like me) and entered a wrong letter, or enter the right letter with the wrong case, when pgm2txt asked you, you will notice that now. Well, it’s only a question of hours, the convert the image to text again :-). If any one of you gets the right md5sum for Episode 2 (Barber.z5), please let me know. I always get it wrong even though I am able to use the file.
The next post will talk about the extracted file, and what to do with them.