Using Java Zip API to automate your compression chores K Venu, S V Ramu (2003-07-27) Motivation
Downloading and browsing lot of stuff from Internet is one part, while storing and retrieving relevant information from it is another. Retrieval is tough and needs more thought, but storage is simple and hence should be automated. The usual model of storing the downloads is to have a separate folder for each topic and keeping these folders inside one 'downloads' folder. (Usually I also prefix with an underscore '_', to make it appear first when sorted).
Once while trying to burn a CD of some 600mb of downloads (web pages and software), I faced with an interesting problem. CD formats are quirky and some
times incompatible. Come to think of it, it is natural. CDs usually
transcend the OS borders. We naturally like to copy from MS Windows, and maybe
run it in Linux (assuming they are all documents of HTML or PDF format). If so,
what should be the format of the CD? For this portability reason, CD's have
their own file structure standardized by ISO. This standard restricts the
number of sub directories that can be created (something like 8 sub folder or more),
and the maximum size of the file name (usually 32 characters). Of course there
are MS Windows friendly formats like 'Joliet'. But these too have restrictions.
The first time I burned a CD, I was oblivious to these nuances. So when I went
ahead with the burning of my 600mb downloads, I faced an harangue of insults from
my CD burning software. Being in a hurry, I looked around for some tool, which
can rename all the long file names with small meaningful ones. Obviously this was
a doomed search, moreover how am I to handle my deep folder structure? I looked
for a solution with anxiety as I have to leave that place shortly. Also being
conservative, I wanted to comply with basic CD rules, so that it will be readable
in most machines. But this means I need a flexible zipping solution within easy reach. I tried to use the command line tools of WinZip, but it lacked a way to dynamically specify all folders inside a folder. I tried Apache Ant, but that too did not have an handy way to iterate over the sub folders inside a folder. In fact if these tools provided a way to iterate over files in all the sub folders, we will have a potent tool (maybe this should be the next goal). Anyway, not having a canned tool for iterating over the file system, I'm forced to use programming. In Java, this means I needed a way to zip programatically. Java Zip API
The
JAR file is a file format based on the popular ZIP file format and is used
for aggregating many files into one. A JAR file is essentially a zip file
that contains an optional META-INF directory... In many cases, JAR files are
not just simple archives of java classes files and/or resources...
The META-INF directory, if it exists, is used to store package and extension
configuration data, including security, versioning, extension and services.
J2SE 1.4.1 - JAR File Specification So we just need to concentrate on the zipping, and other things are easy. But, jar package does have some cute methods to associate a compressed file with an entry in the manifest file. So, zipping is not everything in the Java world! Anyway, for our archiving example above, zipping and file system recursion is more than enough. Recursion over sub folders
Recursively iterating over the file system is pretty simple and an useful task.
This is possible as Java treats both a file and a folder as a
public static void iterate(File file) {The recursion logic used in the above method can be effectively used for zipping a given folder or a file. If it is a folder, do nothing. If it is a file, you write it to the java.util.zip.ZipOutputStream .
Zipping a file
Zipping a file involves two steps: One, to put a zipentry, which is just the destination file path to be recorded inside the zip file (you can play tricks here!). //Create the output zip streamAnd two, write the content to the zip stream, which also compress it. //Write the content of this file. UnZipping a file
Unzipping again is the reverse of these two steps: One, get the
ZipFile zip = new ZipFile(sourceFile); Epilogue
For a long time I was procrastinating to write this tool. But when Venu asked for a neat task that can be useful, I gave this. Venu, though a new comer to Java (hardly two months now), but since he was strong in C++, completed this tool within few days, working just in his spare time. Thanks to him, this article and the zip tool (Tattvum-20030727-ZipUnZip.zip) is before you. To test this tool, run it in a test folder which contain one or more non-empty folders, and then use it for your regular archiving. Please note that this is a very early release, but it works fine with the 300mb+ downloads! Known bugs are, it doesn't handle empty folders (it stops), and unzip needs some tweaking to work with zips created by WinZip. Comments and bug reports are welcome. |