I have a set of multi-GB Windows folders that I need to archive in 7-zip format each month. I'd prefer not to use the mouse to compress the folders "manually." Also, I didn't want to use the command line with the subprocess module like I have with some other programs. Ideally, I wanted to control 7zip programmatically. The 7-Zip-JBinding libraries offered a means to do this from jython.
7-Zip-JBinding is written using java Interfaces that are structured pretty specifically. I did not venture too far away from the examples given in the 7-Zip-JBinding documentation. I smithed two modules for my own purposes, compressing and uncompressing, and present them (java code) below. The decompression one has a separate method for retrieving paths of the compressed files. This is not efficient, but for what I need to do, and for the limitations of the library and the approach, it works out for the best.
import java.io.IOException;
import java.io.RandomAccessFile;
import net.sf.sevenzipjbinding.IOutCreateArchive7z;
import net.sf.sevenzipjbinding.IOutCreateCallback;
import net.sf.sevenzipjbinding.IOutItem7z;
import net.sf.sevenzipjbinding.ISequentialInStream;
import net.sf.sevenzipjbinding.SevenZip;
import net.sf.sevenzipjbinding.SevenZipException;
import net.sf.sevenzipjbinding.impl.OutItemFactory;
import net.sf.sevenzipjbinding.impl.RandomAccessFileOutStream;
import net.sf.sevenzipjbinding.util.ByteArrayStream;
/* Off StackOverflow - works for getting
* file content/bytes from path */
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.Path;
public class SevenZipThing {
private static final String RETCHAR = "\n";
private static final String INTFMT = "%,d";
private static final String BYTESTOCOMPRESS = " bytes total to compress\n";
private static final String ERROCCURS = "Error occurs: ";
private static final String COMPRESSFILE = "\nCompressing file ";
private static final String RW = "rw";
private static final int LVL = 5;
private static final String SEVZERR = "7z-Error occurs:";
private static final String ERRCLOSING = "Error closing archive: ";
private static final String ERRCLOSINGFLE = "Error closing file: ";
private static final String SUCCESS = "\nCompression operation succeeded\n";
private String filename;
/* String[] array conversion from jython list
* implicit and poses no problems (JKD7) */
private String[] pathsx;
public SevenZipThing(String filename, String[] pathsx) {
this.filename = filename;
this.pathsx = pathsx;
}
/**
* The callback provides information about archive items.
*/
/**
* I copied this straight from the sevenZipJBinding's author's
* code - but I haven't put much in to deal with messaging
* or error handling
* */
private final class MyCreateCallback
implements IOutCreateCallback<IOutItem7z> {
public void setOperationResult(boolean operationResultOk)
throws SevenZipException {
// Track each operation result here
}
public void setTotal(long total) throws SevenZipException {
// Track operation progress here
System.out.print(RETCHAR + String.format(INTFMT, total) +
BYTESTOCOMPRESS);
}
public void setCompleted(long complete) throws SevenZipException {
// Track operation progress here
}
public IOutItem7z getItemInformation(int index,
OutItemFactory<IOutItem7z> outItemFactory) {
IOutItem7z item = outItemFactory.createOutItem();
Path path = Paths.get(pathsx[index]);
item.setPropertyPath(pathsx[index]);
try {
// Java arrays are limited to 2 ** 31 items - small.
byte[] data = Files.readAllBytes(path);
item.setDataSize((long) data.length);
return item;
// XXX - I could do a lot better than this (error handling).
} catch (Exception e) {
System.err.println(ERROCCURS + e);
}
return null;
}
public ISequentialInStream getStream(int i)
throws SevenZipException {
Path path = Paths.get(pathsx[i]);
try {
byte[] data = Files.readAllBytes(path);
System.out.println(COMPRESSFILE + path);
return new ByteArrayStream(data, true);
} catch (Exception e) {
System.err.println(ERROCCURS + e);
}
return null;
}
}
public void compress() {
/* Mostly copied from sevenZipJBinding's author's code -
* I made the compress method public to work from jython.
* Also, I deal with all of the file listing in jython
* and just pass a list to this class. */
boolean success = false;
RandomAccessFile raf = null;
IOutCreateArchive7z outArchive = null;
try {
raf = new RandomAccessFile(filename, RW);
// Open out-archive object
outArchive = SevenZip.openOutArchive7z();
// Configure archive
outArchive.setLevel(LVL);
outArchive.setSolid(true);
// All available processors.
outArchive.setThreadCount(0);
// Create archive
outArchive.createArchive(new RandomAccessFileOutStream(raf),
pathsx.length, new MyCreateCallback());
success = true;
} catch (SevenZipException e) {
System.err.println(SEVZERR);
// Get more information using extended method
e.printStackTraceExtended();
} catch (Exception e) {
System.err.println(ERROCCURS + e);
} finally {
if (outArchive != null) {
try {
outArchive.close();
} catch (IOException e) {
System.err.println(ERRCLOSING + e);
success = false;
}
}
if (raf != null) {
try {
raf.close();
} catch (IOException e) {
System.err.println(ERRCLOSINGFLE + e);
success = false;
}
}
}
if (success) {
System.out.println(SUCCESS);
}
}
}
import java.io.IOException;
import java.io.RandomAccessFile;
import java.io.File;
import java.io.OutputStream;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.ArrayList;
import net.sf.sevenzipjbinding.IInArchive;
import net.sf.sevenzipjbinding.PropID;
import net.sf.sevenzipjbinding.SevenZip;
import net.sf.sevenzipjbinding.SevenZipException;
import net.sf.sevenzipjbinding.impl.RandomAccessFileInStream;
import net.sf.sevenzipjbinding.IArchiveExtractCallback;
import net.sf.sevenzipjbinding.ExtractOperationResult;
import net.sf.sevenzipjbinding.ExtractAskMode;
import net.sf.sevenzipjbinding.ISequentialOutStream;
/* 7z archive format */
/* SEVEN_ZIP is the one I want */
import net.sf.sevenzipjbinding.ArchiveFormat;
public class SevenZipThingExtract {
private String filename;
private String extractdirectory;
private ArrayList<String> foldersx = null;
private boolean subdirectory = false;
private static final String ERROPENINGFLE = "Error opening file: ";
private static final String ERRWRITINGFLE = "Error writing to file: ";
private static final String EXTERR = "Extraction error";
private static final String INFOFMT = "%9X | %10s | %s";
private static final String RETCHAR = "\n";
private static final String INTFMT = "%,d";
private static final String BYTESTOEXTRACT = " bytes total to extract\n";
private static final String RW = "rw";
private static final String BACKSLASH = "\\";
private static final String SEVZERR = "7z-Error occurs:";
private static final String ERROCCURS = "Error occurs: ";
private static final String ERRCLOSING = "Error closing archive: ";
private static final String ERRCLOSINGFLE = "Error closing file: ";
public SevenZipThingExtract(String filename, String extractdirectory,
boolean subdirectory) {
this.filename = filename;
foldersx = new ArrayList<String>();
this.foldersx = foldersx;
this.extractdirectory = extractdirectory;
this.subdirectory = subdirectory;
}
private final class MyExtractCallback
implements IArchiveExtractCallback {
// Copied mostly from example.
private int hash = 0;
private int size = 0;
private int index;
private boolean skipExtraction;
private IInArchive inArchive;
private OutputStream outputStream;
private File file;
public MyExtractCallback(IInArchive inArchive) {
this.inArchive = inArchive;
}
@Override
public ISequentialOutStream getStream(int index,
ExtractAskMode extractAskMode)
throws SevenZipException {
this.index = index;
// I'm not skipping anything.
skipExtraction = (Boolean) false;
String path = (String) inArchive.getProperty(index, PropID.PATH);
// Try preprending extractdirectory.
if (subdirectory) {
path = extractdirectory + BACKSLASH + path.substring(2);
} else {
path = extractdirectory + BACKSLASH + path;
}
file = new File(path);
try {
outputStream = new FileOutputStream(file);
} catch (FileNotFoundException e) {
throw new SevenZipException(ERROPENINGFLE
+ file.getAbsolutePath(), e);
}
return new ISequentialOutStream() {
public int write(byte[] data) throws SevenZipException {
try {
outputStream.write(data);
} catch (IOException e) {
throw new SevenZipException(ERRWRITINGFLE
+ file.getAbsolutePath());
}
return data.length; // Return amount of consumed data
}
};
}
public void prepareOperation(ExtractAskMode extractAskMode)
throws SevenZipException {
}
public void setOperationResult(ExtractOperationResult extractOperationResult)
throws SevenZipException {
// Track each operation result here
if (extractOperationResult != ExtractOperationResult.OK) {
System.err.println(EXTERR);
} else {
System.out.println(String.format(INFOFMT, hash, size,//
inArchive.getProperty(index, PropID.PATH)));
hash = 0;
size = 0;
}
}
public void setTotal(long total) throws SevenZipException {
System.out.print(RETCHAR + String.format(INTFMT, total) +
BYTESTOEXTRACT);
}
public void setCompleted(long complete) throws SevenZipException {
// Track operation progress here
}
}
private final class MyGetPathsCallback
implements IArchiveExtractCallback {
// Copied mostly from example.
private int hash = 0;
private int size = 0;
private int index;
private boolean skipExtraction;
private IInArchive inArchive;
public MyGetPathsCallback(IInArchive inArchive) {
this.inArchive = inArchive;
}
public ISequentialOutStream getStream(int index,
ExtractAskMode extractAskMode)
throws SevenZipException {
this.index = index;
// I'm not skipping anything.
skipExtraction = (Boolean) false;
String path = (String) inArchive.getProperty(index,
PropID.PATH);
foldersx.add(path);
return new ISequentialOutStream() {
public int write(byte[] data) throws SevenZipException {
hash ^= Arrays.hashCode(data);
size += data.length;
// Return amount of processed data
return data.length;
}
};
}
public void prepareOperation(ExtractAskMode extractAskMode)
throws SevenZipException {
}
public void setOperationResult(ExtractOperationResult extractOperationResult)
throws SevenZipException {
// Track each operation result here
if (extractOperationResult != ExtractOperationResult.OK) {
System.err.println(EXTERR);
} else {
System.out.println(String.format(INFOFMT, hash, size,
inArchive.getProperty(index, PropID.PATH)));
hash = 0;
size = 0;
}
}
public void setTotal(long total) throws SevenZipException {
System.out.print(RETCHAR + String.format(INTFMT, total) +
BYTESTOEXTRACT);
}
public void setCompleted(long complete) throws SevenZipException {
// Track operation progress here
}
}
public void extractfiles() {
boolean success = false;
RandomAccessFile raf = null;
IInArchive inArchive = null;
try {
raf = new RandomAccessFile(filename, RW);
inArchive = SevenZip.openInArchive(ArchiveFormat.SEVEN_ZIP,
new RandomAccessFileInStream(raf));
int itemCount = inArchive.getNumberOfItems();
// From StackOverflow - could use IntStream,
// but that's Java 1.8 (using 1.7).
int[] fileindices = new int[itemCount];
for(int k = 0; k < fileindices.length; k++)
fileindices[k] = k;
inArchive.extract(fileindices, false,
new MyExtractCallback(inArchive));
} catch (SevenZipException e) {
System.err.println(SEVZERR);
// Get more information using extended method
e.printStackTraceExtended();
} catch (Exception e) {
System.err.println(ERROCCURS + e);
} finally {
if (inArchive != null) {
try {
inArchive.close();
} catch (IOException e) {
System.err.println(ERRCLOSING + e);
}
}
if (raf != null) {
try {
raf.close();
} catch (IOException e) {
System.err.println(ERRCLOSINGFLE + e);
}
}
}
}
public ArrayList<String> getfolders() {
boolean success = false;
RandomAccessFile raf = null;
IInArchive inArchive = null;
try {
raf = new RandomAccessFile(filename, RW);
inArchive = SevenZip.openInArchive(ArchiveFormat.SEVEN_ZIP,
new RandomAccessFileInStream(raf));
int itemCount = inArchive.getNumberOfItems();
// From StackOverflow - could use IntStream,
// but that's Java 1.8 (using 1.7).
int[] fileindices = new int[itemCount];
for(int k = 0; k < fileindices.length; k++)
fileindices[k] = k;
inArchive.extract(fileindices, false,
new MyGetPathsCallback(inArchive));
} catch (SevenZipException e) {
System.err.println(SEVZERR);
// Get more information using extended method
e.printStackTraceExtended();
} catch (Exception e) {
System.err.println(ERROCCURS + e);
} finally {
if (inArchive != null) {
try {
inArchive.close();
} catch (IOException e) {
System.err.println(ERRCLOSING + e);
}
}
if (raf != null) {
try {
raf.close();
} catch (IOException e) {
System.err.println(ERRCLOSINGFLE + e);
}
}
}
return foldersx;
}
}
The method getfolders in the SevenZipThingExtract class is the extra method to get the list of folders. As noted in the jython code below, the limitations on the number of bytes and files to be compressed necessitates splitting larger files into chunks. Also, for my specific use case, I need to extract files to a specific folder and set of subfolders. My methodology is outlined in the comments in the jython code. The good news: if I get run over by a bus and the uncompression part of the program gets lost, people will be able to get the files back with some effort. The bad news: they will be cursing my headstone. You do the best you can.
The three jython modules - the first one, folderstozip.py is just constants:
#!java -jar C:\jython-2.7.0\jython.jar
# folderstozip.py
"""
Constants used in compression and
decompression.
"""
FRONTSLASH = '/'
BACKSLASH = '\\'
EMPTY = ''
SAMEFOLDER = './'
SAMEFOLDERWIN = u'.\\'
SPLITFILETRACKER = 'SPLITFILETRACKER.csv'
SPLITFILE = '{0:s}.{1:s}'
UCOMMA = u','
# 3rd party sevenZipJBindings library.
PATH7ZJB = 'C:/MSPROJECTS/EOMReconciliation/2016/03March'
PATH7ZJB += '/Backup/sevenzipjbinding/lib/sevenzipjbinding.jar'
# OS specific 3rd party sevenZipJBindings library.
PATH7ZJBOSSPEC = r'C:/MSPROJECTS/EOMReconciliation/2016/03March'
PATH7ZJBOSSPEC += '/Backup/sevenzipjbinding/lib/sevenzipjbinding-Windows-amd64.jar'
PROGFOLDER = 'C:/MSPROJECTS/EOMReconciliation/2016/03March/Backup'
PROGFOLDER += FRONTSLASH
# Informational messages.
WROTEFILE = 'Wrote file {:s}\n'
SPLITFILEMSG = 'Have now split {0:,d} bytes of file {1:s} into {2:d} {3:,d} chunks.\n'
DONESPLITTING = '\nDone splitting file'
FILESAFTERSPLIT = '\n{:d} files after split'
COMPRESSING = '\nCompressing file {:s} . . .\n'
DELETING = '\nDeleting file {:s} . . .\n'
DELETINGDIR = '\nNow deleting {:s} . . .\n'
# Room for 9999 file names.
UNIQUEX = '{0:05d}'
# XXX - multiple file archives limited to
# 10KB - reason unknown - crashes jvm
# with IInStream interface class not
# found.
# XXX - choked on 8700 bytes - try dropping
# this from 9500 to 8500.
MULTFILELIMIT = 8500
HALFLIMIT = MULTFILELIMIT/2
# About 50 splits for a 3GB file.
CHUNK = 2 ** 26
# Path plus split number.
FILEN = r'{0:s}.{1:03d}'
# Path plus basefilename.
FILEB = r'{0:s}{1:s}'
# Read/Write constants.
RB = 'rb'
WB = 'wb'
W = 'w'
# Filename plus split number.
ARCHIVEX = '{0:s}/{1:s}.7z'
# multifile archive
MULTARCHIVEX = '{0:s}/archive{1:03d}.7z'
MULTFILES = '. . . multiple files'
# File categories.
# Size less than HALFLIMIT.
SMALL = 'small'
# Size greater than or equal to HALFLIMIT but
# less than or equal to CHUNK.
MEDIUM = 'medium'
# Larger than CHUNK.
LARGE = 'large'
BASEPATH = 'basepath'
FILES = 'files'
# XXX - this folder has recognizable
# folder names within your domain
# space - mine are open pit mining
# area names.
BASEDIRS = ['Pit-1', 'Pit-2', 'Pit-3']
#!java -jar C:/jython-2.7.0/jython.jar
# sevenzipper.py
"""
Use java 3rd party 7-zip compression
library (sevenZipJBindings) from
jython to 7zip up MineSight project
files.
"""
import folderstozip as fld
# Need to adjust path to get necessary jar imports.
import sys
# Need for os.path
import os
# Original path of file plus split number.
SPLITFILERECORD = '{0:s},{1:03d}'
sys.path.append(fld.PATH7ZJB)
sys.path.append(fld.PATH7ZJBOSSPEC)
# java 7zip library
import SevenZipThing as z7thing
# For copying files to program
# directory and deleting the old
# ones where necessary.
import shutil
# For unique archive names.
import itertools
COUNTERX = itertools.count(0, 1)
def splitfile(originalfilepath, splitfilestrackerfile):
"""
Split file at (string) originalfilepath
into fld.CHUNK sized chunks and indicate
sequence by number in new split file
name.
Return generator of relative file paths
inside project folder.
originalfilepath is the path of the
file that needs to be split into parts.
splitfilestrackerfile is an open file
object used for tracking file splits
for later retrieval.
"""
sizeoffile = os.path.getsize(originalfilepath)
chunks = sizeoffile/fld.CHUNK + 1
# Counter.
i = 1
with open(originalfilepath, fld.RB) as f:
while i < chunks + 1:
with open(fld.FILEN.format(originalfilepath, i), fld.WB) as f2:
f2.write(f.read(fld.CHUNK))
print(fld.WROTEFILE.format(fld.FILEN.format(originalfilepath, i)))
print(fld.SPLITFILEMSG.format(f.tell(), originalfilepath, i, fld.CHUNK))
print >> splitfilestrackerfile, (SPLITFILERECORD.format(originalfilepath, i))
i += 1
print(fld.DONESPLITTING)
print(fld.FILESAFTERSPLIT.format(i - 1))
return (fld.FILEN.format(originalfilepath, x) for x in xrange(1, i))
def movefiles(movefilesx, intermediatepath):
"""
Move files from MineSight project directory
to program directory.
Return a list of base file names for the
moved files.
movefilesx is a generator of file paths.
intermediatepath is a string relative path
between the program folder and the sub-folder
of the MineSight directory (_msresources/06SOLIDS,
for example).
"""
# Move files to that folder.
movedfiles = []
for pathx in movefilesx:
shutil.move(pathx, fld.PROGFOLDER + intermediatepath +
os.path.basename(pathx))
movedfiles.append(intermediatepath + os.path.basename(pathx))
return movedfiles
def copyfiles(copyfilesx, intermediatepath):
"""
Copy files from MineSight project directory
to program directory.
Return a list of base file names for the
copied files.
copyfilesx is a generator of file paths.
intermediatepath is a string relative path
between the program folder and the sub-folder
of the MineSight directory (_msresources/06SOLIDS,
for example).
"""
# Copy files to that folder.
copiedfiles = []
for pathx in copyfilesx:
shutil.copyfile(pathx, fld.PROGFOLDER + intermediatepath +
os.path.basename(pathx))
copiedfiles.append(intermediatepath + os.path.basename(pathx))
return copiedfiles
def compressfilessingle(filestocompress, prefix, basedir):
"""
Compresses files into an archive.
This is for larger files that take up
an entire archive (7z file).
filestocompress is a list of paths of
files to be compressed. These files
reside inside the program directory.
prefix is a string path addition, usually
'./' that allows the function to deal
with relative paths for files that reside
in subfolders.
basedir is the name of the main MineSight
project directory (Fwaulu, for example).
Side effect function.
"""
for pathx in filestocompress:
basename = os.path.split(pathx)[1]
# Need unique name for subfolder files with same names.
uniqueid = fld.UNIQUEX.format(COUNTERX.next())
uniquename = uniqueid + basename
print(fld.COMPRESSING.format(prefix + basename))
archx = z7thing(fld.ARCHIVEX.format(basedir, uniquename),
[prefix + basename])
archx.compress()
def compressfilesmultiple(filestocompress, indexx, basedir):
"""
Compresses files into an archive.
filestocompress is a list of paths of
files to be compressed. These files
reside inside the program directory.
indexx is an integer that gives the
archive a unique name.
basedir is the name of the main MineSight
project directory (Fwaulu, for example).
Side effect function.
"""
print(fld.COMPRESSING.format(fld.MULTFILES))
archx = z7thing(fld.MULTARCHIVEX.format(basedir, indexx),
filestocompress)
archx.compress()
def segregatefiles(directoryx, basefiles):
"""
From a string directory path directoryx
and a list of base file names, returns
a dictionary of lists of files and their
sizes sorted on size and keyed on file
category.
"""
retval = {}
# Add separator to end of directory path.
directoryx += fld.FRONTSLASH
# Get all files in folder and their sizes.
allfiles = [(os.path.getsize(fld.FILEB.format(directoryx, filex)), filex)
for filex in basefiles]
retval[fld.SMALL] = [x for x in allfiles if x[0] < fld.HALFLIMIT]
retval[fld.SMALL].sort()
retval[fld.MEDIUM] = [x for x in allfiles if x[0] >= fld.HALFLIMIT and
x[0] <= fld.CHUNK]
retval[fld.MEDIUM].sort()
retval[fld.LARGE] = [x for x in allfiles if x[0] > fld.CHUNK]
retval[fld.LARGE].sort()
return retval
def deletefiles(movedfiles):
"""
Delete files that have been compressed.
movedfiles is a list of paths of
files that have been moved or copied to
the program directory for compression.
Side effect function.
"""
for pathx in movedfiles:
print(fld.DELETING.format(pathx))
os.remove(pathx)
def getsmallfilegroupings(smallfiles):
"""
Generator function that yields
a list of files whose sum is
less than the program's limit
for bytes to be archived in a
multiple file archive.
smallfiles is a list of two tuples
of (filesize in bytes, file path).
"""
lenx = len(smallfiles)
insidecounter1 = 0
insidecounter2 = 1
sumx = 0
while (insidecounter2 < (lenx + 1)):
sumx = sum(x[0] for x in smallfiles[insidecounter1:insidecounter2])
if sumx > fld.MULTFILELIMIT:
# Back up one.
insidecounter2 -= 1
yield (x[1] for x in smallfiles[insidecounter1:insidecounter2])
# Reset and advance counters.
sumx = 0
insidecounter1 = insidecounter2 + 1
insidecounter2 = insidecounter1 + 1
else:
insidecounter2 += 1
def compresslargefiles(largefiles, dirx, prefix, basedir, splitfilestrackerfile):
"""
Deal with compression of files that need to
be split prior to compression.
largefiles is a list of two tuples of file
sizes and names.
dirx is the directory (str) in which the files
are located.
prefix is a string prefix to augment path
identification for compression.
basedir is the name of the main MineSight
project directory (Fwaulu, for example).
splitfilestrackerfile is an open file
object used for tracking file splits
for later retrieval.
Side effect function.
"""
for filex in largefiles:
# Get generator of paths of splits.
splitfiles = splitfile(fld.FILEB.format(dirx, filex[1]),
splitfilestrackerfile)
movedfiles = movefiles(splitfiles, prefix)
compressfilessingle(movedfiles, prefix, basedir)
deletefiles(movedfiles)
def compressmediumfiles(mediumfiles, dirx, prefix, basedir):
"""
Deal with compression of files that need to
be compressed each to its own archive.
mediumfiles is a list of two tuples of file
sizes and paths.
dirx is the directory (str) in which the files
are located.
prefix is a string prefix to augment path
identification for compression.
basedir is the name of the main MineSight
project directory (Fwaulu, for example).
Side effect function.
"""
filestocopy = (dirx + x[1] for x in mediumfiles)
copiedfiles = copyfiles(filestocopy, prefix)
compressfilessingle(copiedfiles, prefix, basedir)
deletefiles(copiedfiles)
def compresssmallfiles(smallfiles, dirx, prefix, indexx, basedir):
"""
Deal with compression of files that can be
compressed in groups.
mediumfiles is a list of two tuples of file
sizes and paths.
dirx is the directory (str) in which the files
are located.
prefix is a string prefix to augment path
identification for compression.
indexx is the current index that the 7zip
file counter (ensures unique archive name)
is on.
basedir is the name of the main MineSight
project directory (Fwaulu, for example).
Returns integer for current archive counter
index.
"""
smallgroupings = getsmallfilegroupings(smallfiles)
while True:
try:
grouplittlefiles = smallgroupings.next()
littlefiles = (dirx + x for x in grouplittlefiles)
copiedfiles = copyfiles(littlefiles, prefix)
compressfilesmultiple(copiedfiles, indexx, basedir)
indexx += 1
deletefiles(copiedfiles)
except StopIteration:
break
return index
# XXX - hack
def matchbasedir(folderlist):
"""
Get MineSight project folder name
that matches a folder in the path
in question.
folderlist is a list (in order)
of directories in a path.
Returns string.
"""
for folderx in folderlist:
for projx in fld.BASEDIRS:
if projx == folderx:
return folderx
return None
def getbasedir(pathx):
"""
Returns two tuple of strings for
basedir and basefolder (project
directory name and base path under
project directory copied to program
directory).
pathx is the directory path being
processed (str).
"""
# basedir is project name (Fwaulu, for example).
foldernames = pathx.split(fld.FRONTSLASH)
basedir = matchbasedir(foldernames)
# Get directory under project directory.
# _msresources, for example.
idx = foldernames.index(basedir)
# Directory under program directory ./ for MineSight files.
basefolder = fld.SAMEFOLDER + fld.FRONTSLASH.join(foldernames[idx + 1:])
return basedir, basefolder
def dealwithtoplevel(firstdir):
"""
Compress top level files in the
MineSight project directory.
firstdir is the three tuple returned
from the os.walk() generator function.
Returns two tuple of integer smallfile
multifilecounter used for naming
multiple file archives and splitfilestrackerfile,
an open file object for tracking split
files for later reconstruction.
"""
# Top level files.
dirx = firstdir[0] + fld.FRONTSLASH
basedir, basefolder = getbasedir(dirx)
# File to track split files for later glueing back together.
splitfilestrackerfile = open(fld.SAMEFOLDER + basedir + fld.FRONTSLASH +
fld.SPLITFILETRACKER, fld.W)
firstdirfiles = segregatefiles(firstdir[0], firstdir[2])
compresslargefiles(firstdirfiles[fld.LARGE], dirx, fld.EMPTY, basedir,
splitfilestrackerfile)
compressmediumfiles(firstdirfiles[fld.MEDIUM], dirx, fld.EMPTY, basedir)
# This is for keeping track of
# archives with more than one file.
multifilecounter = 1
mulitfilecounter = compresssmallfiles(firstdirfiles[fld.SMALL], dirx,
fld.EMPTY, multifilecounter, basedir)
return multifilecounter, splitfilestrackerfile
def dealwithlowerleveldirectories(dirs, multifilecounter, splitfilestrackerfile):
"""
Finishes out compression of lower level
folders under top level MineSight project
directory.
dirs is a partially exhausted (one iteration)
os.walk() generator.
multifilecounter is an integer used for
naming multiple file archives.
splitfilestrackerfile is an open file
object used for tracking file splits
for later retrieval.
Returns orphanedfolders, a list of lower level
folders to be deleted at the end of the program
run.
"""
orphanedfolders = []
for dirx in dirs:
# XXX - hack - I hate dealing with Windows paths.
dirn = dirx[0].replace(fld.BACKSLASH, fld.FRONTSLASH)
diry = dirn + fld.FRONTSLASH
basedir, basefolder = getbasedir(diry)
# Create directory in program path.
fauxdir = fld.PROGFOLDER[:-1] + basefolder[1:-1]
os.mkdir(fauxdir)
orphanedfolders.append(fauxdir)
# Skip anything that doesn't have files.
if not dirx[2]:
continue
# Easiest way to do this might be
# to track directories and sort
# files according to size, then
# filter them accordingly.
dirfiles = segregatefiles(dirx[0], dirx[2])
compresslargefiles(dirfiles[fld.LARGE], diry, basefolder,
basedir, splitfilestrackerfile)
compressmediumfiles(dirfiles[fld.MEDIUM], diry, basefolder, basedir)
multifilecounter = compresssmallfiles(dirfiles[fld.SMALL], diry, basefolder,
multifilecounter, basedir)
splitfilestrackerfile.close()
return orphanedfolders
def walkdir(dirx):
"""
Traverse MineSight project directory,
7zipping everything along the way.
dirx is a string for the directory
to traverse.
Side effect function.
"""
dirs = os.walk(dirx)
# OK - os.walk returns generator that
# yields a tuple in the format
# (str path,
# [list of paths for directories under path],
# [list of filenames under path])
# Top level (Fwaulu, for instance).
# These files will not have a path
# prefix of any sort in their respective
# archives.
firstdir = dirs.next()
multifilecounter, splitfilestrackerfile = dealwithtoplevel(firstdir)
# All other files and folders.
orphanedfolders = dealwithlowerleveldirectories(dirs, multifilecounter,
splitfilestrackerfile)
# Delete lower level folders first - this is necessary.
orphanedfolders.reverse()
for orphanx in orphanedfolders:
print(fld.DELETINGDIR.format(orphanx))
os.rmdir(orphanx)
def cyclefolders(folderx):
"""
Wrapper function for compression
of folder folderx (string).
Side effect function.
"""
# 1) Set up empty project directory (ex: Fwaulu)
# in program directory.
# 2) For first set of files, use no prefix for
# 7zip archive storage (filename only).
# 3) Check for size of file.
# 4) If file is bigger than fld.CHUNK, split.
# 5) If file is smaller than fld.CHUNK, but bigger than
# MULTFILELIMIT, compress to one archive.
# 6) If file is smaller than fld.CHUNK, and smaller than
# MULTFILELIMIT, check subsequent files to determine
# files to include in archive. Keep track of file
# index that puts number of bytes over limit.
# 7) Compress multiple files to one archive - index
# archive to ensure unique name.
# 8) For all following sets of files, same process,
# but must prefix paths with SAMEFOLDER and any
# additional folder names.
foldertracker = []
# Make directory folder in program directory
# to hold 7zip files.
zipfolder = getbasedir(folderx)[0]
os.mkdir(zipfolder)
foldertracker.append(zipfolder)
walkdir(folderx)
print('\nDone')
cyclefolders is the overarching wrapper function for the module (compression operation).
#!java -jar C:\jython2.7.0\jython.jar
# unsevenzipper.py
"""
Use java 3rd party 7-zip compression
library (sevenZipJBindings) from
jython to un-7zip archives.
"""
# Need to adjust path to get necessary jar imports.
# XXX - it might be cleaner to chain imports by using
# the sevenzipper (s7 alias) below to reference
# double imported modules. For development and
# convenience I reimported everything as though
# sevenzipper.py and unsevenzipper.py were separate
# operations.
import sys
import folderstozip as fld
sys.path.append(fld.PATH7ZJB)
sys.path.append(fld.PATH7ZJBOSSPEC)
import os
import sevenzipper as s7
import SevenZipThingExtract
def subdirectoryornot(pathx):
"""
Boolean function that returns
True if string pathx is a
subdirectory of the MineSight
project folder and False if
the files belong directly to
the MineSight project folder.
"""
pathx = pathx.replace(fld.SAMEFOLDERWIN, fld.BACKSLASH)
pathlist = pathx.split(fld.BACKSLASH)
if len(pathlist) > 1:
return True
return False
def getdirectories(dirx):
"""
Get list of lists of directories
in path under project folder
from 7zip archives in project
folder for archives.
Returns two tuple of list and
dictionary indicating which
7z files are same directory
archives and which are archived
subdirectory files.
dirx is a string for the file
path of the directory to
be walked (./Fwaulu for example).
"""
dirs = os.walk(dirx)
# One level, no subfolders.
files = dirs.next()[2]
# Get directories first.
rawpaths = []
subdirornot = {}
for filex in files:
# Skip uncompressed split file tracker.
if filex == fld.SPLITFILETRACKER:
continue
# I don't know if it's a subdirectory or not, so I'll go with False.
s7tx = SevenZipThingExtract(dirx + fld.FRONTSLASH + filex, dirx, False)
folders = list(s7tx.getfolders())
rawpaths.extend(folders)
# All the paths in folders have the same prefix -
# just do one.
subdirornot[filex] = subdirectoryornot(folders[0])
# Get just directories
justdirectories = [pathx.replace(fld.SAMEFOLDERWIN, fld.BACKSLASH).split(fld.BACKSLASH)[1:-1]
for pathx in rawpaths if pathx.split(fld.BACKSLASH)[1:-1]]
justdirectories = set([tuple(x) for x in justdirectories])
justdirectories = list(justdirectories)
justdirectories.sort()
return justdirectories, subdirornot
def makedirectories(dirn):
"""
Create directory paths within archive
project folder to accept uncompressed
files.
Returns subdirornot dictionary.
dirn is a string for the file
path of the directory to
be walked (./Fwaulu for example).
"""
justdirectories, subdirornot = getdirectories(dirn)
maxdepth = max(len(dirx) for dirx in justdirectories)
for x in xrange(0, maxdepth):
justdirectoriesii = set([tuple(dirx[0:x + 1]) for dirx in justdirectories
if len(dirx) >= x + 1])
for diry in justdirectoriesii:
dirw = dirn + fld.FRONTSLASH + fld.FRONTSLASH.join(diry)
os.mkdir(dirw)
return subdirornot
def extractfiles(dirx):
"""
Extract files from 7z files
in project archive folder.
Side effect function.
dirx is a string for the file
path of the directory to
be walked.
"""
subdirornot = makedirectories(dirx)
dirs = os.walk(dirx)
# One level, no subfolders.
files = dirs.next()[2]
for filex in files:
# Skip uncompressed split file tracker.
if filex == fld.SPLITFILETRACKER:
continue
s7tx = SevenZipThingExtract(dirx + fld.FRONTSLASH + filex,
dirx, subdirornot[filex])
s7tx.extractfiles()
def gluetogethersplitfiles(dirx):
"""
Make split up files whole.
Side effect function.
dirx is the folder in which the split
files reside.
"""
# Glue together big files.
# Do this in a very controlling,
# structured way:
# 1) Read the split file tracker csv file.
# 2) Determine the number and names and paths
# of files to be reconstructed and the
# number of parts in each.
# 3) Check that everything is there for
# each file to be reconstructed.
# 4) Get the new relative path.
# 5) Glue back together programmatically.
splitfiles = []
# fld.SPLITFILETRACKER is structured as original path
# of file split, number of file split.
with open(fld.SAMEFOLDERWIN + dirx +
fld.FRONTSLASH + fld.SPLITFILETRACKER, 'r') as f:
for linex in f:
strippedline = [x.strip() for x in linex.split(fld.UCOMMA)]
splitfiles.append(tuple(strippedline))
orignames = [x[0] for x in splitfiles]
splitoriginals = set(orignames)
# Make dictionary that is easy to cycle through.
filesx = {}
for orig in splitoriginals:
basedir, basefolder = s7.getbasedir(orig)
filesx[orig] = {}
filesx[orig][fld.BASEPATH] = fld.SAMEFOLDER + basedir + basefolder[1:]
filesx[orig][fld.FILES] = (fld.SPLITFILE.format(filesx[orig][fld.BASEPATH], filex[1])
for filex in splitfiles if filex[0] == orig)
for orig in filesx:
with open(filesx[orig][fld.BASEPATH], fld.WB) as mainfile:
for filex in filesx[orig][fld.FILES]:
with open(filex, fld.RB) as splitfile:
mainfile.write(splitfile.read())
def restore(dirx):
"""
Restores MineSight project directory
inside program path.
dirx is a string for the directory
to be restored (./Fwaulu, for example).
Side effect function.
"""
extractfiles(dirx)
gluetogethersplitfiles(dirx)
print('Done')
restore is the main function for the module (uncompression).
Notes:
1) I don't have admin rights at work and did not have javac (the compiler for java) available. You can download an SDK or SRE java package from Oracle that has it. Without admin rights, you can't install it normally. Still you can use it. My compilation went something like this:
<path to downloaded JDK>/bin/javac -cp <path to downloaded 7-ZipJBinding>/lib/* <myclassname>.java
2) I've left all the split up files and 7z archives in the folder where I decompress my files and recombine the split files. This takes up a lot of space depending on what you're working with. If space is at a premium, you probably want to write jython code to move or delete the archives after uncompressing them.
3) The most time consuming part of runtime is the compression, uncompression, and splitting and recombining of split files. Porting some of this to java (instead of jython) might speed things up. I code faster and generally better in jython. Also, my objective was control, not speed. YMMV (your mileage may vary) with this approach. There are far better general purpose ones.
Thanks for stopping by.
7-Zip-JBinding is written using java Interfaces that are structured pretty specifically. I did not venture too far away from the examples given in the 7-Zip-JBinding documentation. I smithed two modules for my own purposes, compressing and uncompressing, and present them (java code) below. The decompression one has a separate method for retrieving paths of the compressed files. This is not efficient, but for what I need to do, and for the limitations of the library and the approach, it works out for the best.
import java.io.IOException;
import java.io.RandomAccessFile;
import net.sf.sevenzipjbinding.IOutCreateArchive7z;
import net.sf.sevenzipjbinding.IOutCreateCallback;
import net.sf.sevenzipjbinding.IOutItem7z;
import net.sf.sevenzipjbinding.ISequentialInStream;
import net.sf.sevenzipjbinding.SevenZip;
import net.sf.sevenzipjbinding.SevenZipException;
import net.sf.sevenzipjbinding.impl.OutItemFactory;
import net.sf.sevenzipjbinding.impl.RandomAccessFileOutStream;
import net.sf.sevenzipjbinding.util.ByteArrayStream;
/* Off StackOverflow - works for getting
* file content/bytes from path */
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.Path;
public class SevenZipThing {
private static final String RETCHAR = "\n";
private static final String INTFMT = "%,d";
private static final String BYTESTOCOMPRESS = " bytes total to compress\n";
private static final String ERROCCURS = "Error occurs: ";
private static final String COMPRESSFILE = "\nCompressing file ";
private static final String RW = "rw";
private static final int LVL = 5;
private static final String SEVZERR = "7z-Error occurs:";
private static final String ERRCLOSING = "Error closing archive: ";
private static final String ERRCLOSINGFLE = "Error closing file: ";
private static final String SUCCESS = "\nCompression operation succeeded\n";
private String filename;
/* String[] array conversion from jython list
* implicit and poses no problems (JKD7) */
private String[] pathsx;
public SevenZipThing(String filename, String[] pathsx) {
this.filename = filename;
this.pathsx = pathsx;
}
/**
* The callback provides information about archive items.
*/
/**
* I copied this straight from the sevenZipJBinding's author's
* code - but I haven't put much in to deal with messaging
* or error handling
* */
private final class MyCreateCallback
implements IOutCreateCallback<IOutItem7z> {
public void setOperationResult(boolean operationResultOk)
throws SevenZipException {
// Track each operation result here
}
public void setTotal(long total) throws SevenZipException {
// Track operation progress here
System.out.print(RETCHAR + String.format(INTFMT, total) +
BYTESTOCOMPRESS);
}
public void setCompleted(long complete) throws SevenZipException {
// Track operation progress here
}
public IOutItem7z getItemInformation(int index,
OutItemFactory<IOutItem7z> outItemFactory) {
IOutItem7z item = outItemFactory.createOutItem();
Path path = Paths.get(pathsx[index]);
item.setPropertyPath(pathsx[index]);
try {
// Java arrays are limited to 2 ** 31 items - small.
byte[] data = Files.readAllBytes(path);
item.setDataSize((long) data.length);
return item;
// XXX - I could do a lot better than this (error handling).
} catch (Exception e) {
System.err.println(ERROCCURS + e);
}
return null;
}
public ISequentialInStream getStream(int i)
throws SevenZipException {
Path path = Paths.get(pathsx[i]);
try {
byte[] data = Files.readAllBytes(path);
System.out.println(COMPRESSFILE + path);
return new ByteArrayStream(data, true);
} catch (Exception e) {
System.err.println(ERROCCURS + e);
}
return null;
}
}
public void compress() {
/* Mostly copied from sevenZipJBinding's author's code -
* I made the compress method public to work from jython.
* Also, I deal with all of the file listing in jython
* and just pass a list to this class. */
boolean success = false;
RandomAccessFile raf = null;
IOutCreateArchive7z outArchive = null;
try {
raf = new RandomAccessFile(filename, RW);
// Open out-archive object
outArchive = SevenZip.openOutArchive7z();
// Configure archive
outArchive.setLevel(LVL);
outArchive.setSolid(true);
// All available processors.
outArchive.setThreadCount(0);
// Create archive
outArchive.createArchive(new RandomAccessFileOutStream(raf),
pathsx.length, new MyCreateCallback());
success = true;
} catch (SevenZipException e) {
System.err.println(SEVZERR);
// Get more information using extended method
e.printStackTraceExtended();
} catch (Exception e) {
System.err.println(ERROCCURS + e);
} finally {
if (outArchive != null) {
try {
outArchive.close();
} catch (IOException e) {
System.err.println(ERRCLOSING + e);
success = false;
}
}
if (raf != null) {
try {
raf.close();
} catch (IOException e) {
System.err.println(ERRCLOSINGFLE + e);
success = false;
}
}
}
if (success) {
System.out.println(SUCCESS);
}
}
}
import java.io.IOException;
import java.io.RandomAccessFile;
import java.io.File;
import java.io.OutputStream;
import java.io.FileOutputStream;
import java.io.FileNotFoundException;
import java.util.Arrays;
import java.util.ArrayList;
import net.sf.sevenzipjbinding.IInArchive;
import net.sf.sevenzipjbinding.PropID;
import net.sf.sevenzipjbinding.SevenZip;
import net.sf.sevenzipjbinding.SevenZipException;
import net.sf.sevenzipjbinding.impl.RandomAccessFileInStream;
import net.sf.sevenzipjbinding.IArchiveExtractCallback;
import net.sf.sevenzipjbinding.ExtractOperationResult;
import net.sf.sevenzipjbinding.ExtractAskMode;
import net.sf.sevenzipjbinding.ISequentialOutStream;
/* 7z archive format */
/* SEVEN_ZIP is the one I want */
import net.sf.sevenzipjbinding.ArchiveFormat;
public class SevenZipThingExtract {
private String filename;
private String extractdirectory;
private ArrayList<String> foldersx = null;
private boolean subdirectory = false;
private static final String ERROPENINGFLE = "Error opening file: ";
private static final String ERRWRITINGFLE = "Error writing to file: ";
private static final String EXTERR = "Extraction error";
private static final String INFOFMT = "%9X | %10s | %s";
private static final String RETCHAR = "\n";
private static final String INTFMT = "%,d";
private static final String BYTESTOEXTRACT = " bytes total to extract\n";
private static final String RW = "rw";
private static final String BACKSLASH = "\\";
private static final String SEVZERR = "7z-Error occurs:";
private static final String ERROCCURS = "Error occurs: ";
private static final String ERRCLOSING = "Error closing archive: ";
private static final String ERRCLOSINGFLE = "Error closing file: ";
public SevenZipThingExtract(String filename, String extractdirectory,
boolean subdirectory) {
this.filename = filename;
foldersx = new ArrayList<String>();
this.foldersx = foldersx;
this.extractdirectory = extractdirectory;
this.subdirectory = subdirectory;
}
private final class MyExtractCallback
implements IArchiveExtractCallback {
// Copied mostly from example.
private int hash = 0;
private int size = 0;
private int index;
private boolean skipExtraction;
private IInArchive inArchive;
private OutputStream outputStream;
private File file;
public MyExtractCallback(IInArchive inArchive) {
this.inArchive = inArchive;
}
@Override
public ISequentialOutStream getStream(int index,
ExtractAskMode extractAskMode)
throws SevenZipException {
this.index = index;
// I'm not skipping anything.
skipExtraction = (Boolean) false;
String path = (String) inArchive.getProperty(index, PropID.PATH);
// Try preprending extractdirectory.
if (subdirectory) {
path = extractdirectory + BACKSLASH + path.substring(2);
} else {
path = extractdirectory + BACKSLASH + path;
}
file = new File(path);
try {
outputStream = new FileOutputStream(file);
} catch (FileNotFoundException e) {
throw new SevenZipException(ERROPENINGFLE
+ file.getAbsolutePath(), e);
}
return new ISequentialOutStream() {
public int write(byte[] data) throws SevenZipException {
try {
outputStream.write(data);
} catch (IOException e) {
throw new SevenZipException(ERRWRITINGFLE
+ file.getAbsolutePath());
}
return data.length; // Return amount of consumed data
}
};
}
public void prepareOperation(ExtractAskMode extractAskMode)
throws SevenZipException {
}
public void setOperationResult(ExtractOperationResult extractOperationResult)
throws SevenZipException {
// Track each operation result here
if (extractOperationResult != ExtractOperationResult.OK) {
System.err.println(EXTERR);
} else {
System.out.println(String.format(INFOFMT, hash, size,//
inArchive.getProperty(index, PropID.PATH)));
hash = 0;
size = 0;
}
}
public void setTotal(long total) throws SevenZipException {
System.out.print(RETCHAR + String.format(INTFMT, total) +
BYTESTOEXTRACT);
}
public void setCompleted(long complete) throws SevenZipException {
// Track operation progress here
}
}
private final class MyGetPathsCallback
implements IArchiveExtractCallback {
// Copied mostly from example.
private int hash = 0;
private int size = 0;
private int index;
private boolean skipExtraction;
private IInArchive inArchive;
public MyGetPathsCallback(IInArchive inArchive) {
this.inArchive = inArchive;
}
public ISequentialOutStream getStream(int index,
ExtractAskMode extractAskMode)
throws SevenZipException {
this.index = index;
// I'm not skipping anything.
skipExtraction = (Boolean) false;
String path = (String) inArchive.getProperty(index,
PropID.PATH);
foldersx.add(path);
return new ISequentialOutStream() {
public int write(byte[] data) throws SevenZipException {
hash ^= Arrays.hashCode(data);
size += data.length;
// Return amount of processed data
return data.length;
}
};
}
public void prepareOperation(ExtractAskMode extractAskMode)
throws SevenZipException {
}
public void setOperationResult(ExtractOperationResult extractOperationResult)
throws SevenZipException {
// Track each operation result here
if (extractOperationResult != ExtractOperationResult.OK) {
System.err.println(EXTERR);
} else {
System.out.println(String.format(INFOFMT, hash, size,
inArchive.getProperty(index, PropID.PATH)));
hash = 0;
size = 0;
}
}
public void setTotal(long total) throws SevenZipException {
System.out.print(RETCHAR + String.format(INTFMT, total) +
BYTESTOEXTRACT);
}
public void setCompleted(long complete) throws SevenZipException {
// Track operation progress here
}
}
public void extractfiles() {
boolean success = false;
RandomAccessFile raf = null;
IInArchive inArchive = null;
try {
raf = new RandomAccessFile(filename, RW);
inArchive = SevenZip.openInArchive(ArchiveFormat.SEVEN_ZIP,
new RandomAccessFileInStream(raf));
int itemCount = inArchive.getNumberOfItems();
// From StackOverflow - could use IntStream,
// but that's Java 1.8 (using 1.7).
int[] fileindices = new int[itemCount];
for(int k = 0; k < fileindices.length; k++)
fileindices[k] = k;
inArchive.extract(fileindices, false,
new MyExtractCallback(inArchive));
} catch (SevenZipException e) {
System.err.println(SEVZERR);
// Get more information using extended method
e.printStackTraceExtended();
} catch (Exception e) {
System.err.println(ERROCCURS + e);
} finally {
if (inArchive != null) {
try {
inArchive.close();
} catch (IOException e) {
System.err.println(ERRCLOSING + e);
}
}
if (raf != null) {
try {
raf.close();
} catch (IOException e) {
System.err.println(ERRCLOSINGFLE + e);
}
}
}
}
public ArrayList<String> getfolders() {
boolean success = false;
RandomAccessFile raf = null;
IInArchive inArchive = null;
try {
raf = new RandomAccessFile(filename, RW);
inArchive = SevenZip.openInArchive(ArchiveFormat.SEVEN_ZIP,
new RandomAccessFileInStream(raf));
int itemCount = inArchive.getNumberOfItems();
// From StackOverflow - could use IntStream,
// but that's Java 1.8 (using 1.7).
int[] fileindices = new int[itemCount];
for(int k = 0; k < fileindices.length; k++)
fileindices[k] = k;
inArchive.extract(fileindices, false,
new MyGetPathsCallback(inArchive));
} catch (SevenZipException e) {
System.err.println(SEVZERR);
// Get more information using extended method
e.printStackTraceExtended();
} catch (Exception e) {
System.err.println(ERROCCURS + e);
} finally {
if (inArchive != null) {
try {
inArchive.close();
} catch (IOException e) {
System.err.println(ERRCLOSING + e);
}
}
if (raf != null) {
try {
raf.close();
} catch (IOException e) {
System.err.println(ERRCLOSINGFLE + e);
}
}
}
return foldersx;
}
}
The method getfolders in the SevenZipThingExtract class is the extra method to get the list of folders. As noted in the jython code below, the limitations on the number of bytes and files to be compressed necessitates splitting larger files into chunks. Also, for my specific use case, I need to extract files to a specific folder and set of subfolders. My methodology is outlined in the comments in the jython code. The good news: if I get run over by a bus and the uncompression part of the program gets lost, people will be able to get the files back with some effort. The bad news: they will be cursing my headstone. You do the best you can.
The three jython modules - the first one, folderstozip.py is just constants:
#!java -jar C:\jython-2.7.0\jython.jar
# folderstozip.py
"""
Constants used in compression and
decompression.
"""
FRONTSLASH = '/'
BACKSLASH = '\\'
EMPTY = ''
SAMEFOLDER = './'
SAMEFOLDERWIN = u'.\\'
SPLITFILETRACKER = 'SPLITFILETRACKER.csv'
SPLITFILE = '{0:s}.{1:s}'
UCOMMA = u','
# 3rd party sevenZipJBindings library.
PATH7ZJB = 'C:/MSPROJECTS/EOMReconciliation/2016/03March'
PATH7ZJB += '/Backup/sevenzipjbinding/lib/sevenzipjbinding.jar'
# OS specific 3rd party sevenZipJBindings library.
PATH7ZJBOSSPEC = r'C:/MSPROJECTS/EOMReconciliation/2016/03March'
PATH7ZJBOSSPEC += '/Backup/sevenzipjbinding/lib/sevenzipjbinding-Windows-amd64.jar'
PROGFOLDER = 'C:/MSPROJECTS/EOMReconciliation/2016/03March/Backup'
PROGFOLDER += FRONTSLASH
# Informational messages.
WROTEFILE = 'Wrote file {:s}\n'
SPLITFILEMSG = 'Have now split {0:,d} bytes of file {1:s} into {2:d} {3:,d} chunks.\n'
DONESPLITTING = '\nDone splitting file'
FILESAFTERSPLIT = '\n{:d} files after split'
COMPRESSING = '\nCompressing file {:s} . . .\n'
DELETING = '\nDeleting file {:s} . . .\n'
DELETINGDIR = '\nNow deleting {:s} . . .\n'
# Room for 9999 file names.
UNIQUEX = '{0:05d}'
# XXX - multiple file archives limited to
# 10KB - reason unknown - crashes jvm
# with IInStream interface class not
# found.
# XXX - choked on 8700 bytes - try dropping
# this from 9500 to 8500.
MULTFILELIMIT = 8500
HALFLIMIT = MULTFILELIMIT/2
# About 50 splits for a 3GB file.
CHUNK = 2 ** 26
# Path plus split number.
FILEN = r'{0:s}.{1:03d}'
# Path plus basefilename.
FILEB = r'{0:s}{1:s}'
# Read/Write constants.
RB = 'rb'
WB = 'wb'
W = 'w'
# Filename plus split number.
ARCHIVEX = '{0:s}/{1:s}.7z'
# multifile archive
MULTARCHIVEX = '{0:s}/archive{1:03d}.7z'
MULTFILES = '. . . multiple files'
# File categories.
# Size less than HALFLIMIT.
SMALL = 'small'
# Size greater than or equal to HALFLIMIT but
# less than or equal to CHUNK.
MEDIUM = 'medium'
# Larger than CHUNK.
LARGE = 'large'
BASEPATH = 'basepath'
FILES = 'files'
# XXX - this folder has recognizable
# folder names within your domain
# space - mine are open pit mining
# area names.
BASEDIRS = ['Pit-1', 'Pit-2', 'Pit-3']
#!java -jar C:/jython-2.7.0/jython.jar
# sevenzipper.py
"""
Use java 3rd party 7-zip compression
library (sevenZipJBindings) from
jython to 7zip up MineSight project
files.
"""
import folderstozip as fld
# Need to adjust path to get necessary jar imports.
import sys
# Need for os.path
import os
# Original path of file plus split number.
SPLITFILERECORD = '{0:s},{1:03d}'
sys.path.append(fld.PATH7ZJB)
sys.path.append(fld.PATH7ZJBOSSPEC)
# java 7zip library
import SevenZipThing as z7thing
# For copying files to program
# directory and deleting the old
# ones where necessary.
import shutil
# For unique archive names.
import itertools
COUNTERX = itertools.count(0, 1)
def splitfile(originalfilepath, splitfilestrackerfile):
"""
Split file at (string) originalfilepath
into fld.CHUNK sized chunks and indicate
sequence by number in new split file
name.
Return generator of relative file paths
inside project folder.
originalfilepath is the path of the
file that needs to be split into parts.
splitfilestrackerfile is an open file
object used for tracking file splits
for later retrieval.
"""
sizeoffile = os.path.getsize(originalfilepath)
chunks = sizeoffile/fld.CHUNK + 1
# Counter.
i = 1
with open(originalfilepath, fld.RB) as f:
while i < chunks + 1:
with open(fld.FILEN.format(originalfilepath, i), fld.WB) as f2:
f2.write(f.read(fld.CHUNK))
print(fld.WROTEFILE.format(fld.FILEN.format(originalfilepath, i)))
print(fld.SPLITFILEMSG.format(f.tell(), originalfilepath, i, fld.CHUNK))
print >> splitfilestrackerfile, (SPLITFILERECORD.format(originalfilepath, i))
i += 1
print(fld.DONESPLITTING)
print(fld.FILESAFTERSPLIT.format(i - 1))
return (fld.FILEN.format(originalfilepath, x) for x in xrange(1, i))
def movefiles(movefilesx, intermediatepath):
"""
Move files from MineSight project directory
to program directory.
Return a list of base file names for the
moved files.
movefilesx is a generator of file paths.
intermediatepath is a string relative path
between the program folder and the sub-folder
of the MineSight directory (_msresources/06SOLIDS,
for example).
"""
# Move files to that folder.
movedfiles = []
for pathx in movefilesx:
shutil.move(pathx, fld.PROGFOLDER + intermediatepath +
os.path.basename(pathx))
movedfiles.append(intermediatepath + os.path.basename(pathx))
return movedfiles
def copyfiles(copyfilesx, intermediatepath):
"""
Copy files from MineSight project directory
to program directory.
Return a list of base file names for the
copied files.
copyfilesx is a generator of file paths.
intermediatepath is a string relative path
between the program folder and the sub-folder
of the MineSight directory (_msresources/06SOLIDS,
for example).
"""
# Copy files to that folder.
copiedfiles = []
for pathx in copyfilesx:
shutil.copyfile(pathx, fld.PROGFOLDER + intermediatepath +
os.path.basename(pathx))
copiedfiles.append(intermediatepath + os.path.basename(pathx))
return copiedfiles
def compressfilessingle(filestocompress, prefix, basedir):
"""
Compresses files into an archive.
This is for larger files that take up
an entire archive (7z file).
filestocompress is a list of paths of
files to be compressed. These files
reside inside the program directory.
prefix is a string path addition, usually
'./' that allows the function to deal
with relative paths for files that reside
in subfolders.
basedir is the name of the main MineSight
project directory (Fwaulu, for example).
Side effect function.
"""
for pathx in filestocompress:
basename = os.path.split(pathx)[1]
# Need unique name for subfolder files with same names.
uniqueid = fld.UNIQUEX.format(COUNTERX.next())
uniquename = uniqueid + basename
print(fld.COMPRESSING.format(prefix + basename))
archx = z7thing(fld.ARCHIVEX.format(basedir, uniquename),
[prefix + basename])
archx.compress()
def compressfilesmultiple(filestocompress, indexx, basedir):
"""
Compresses files into an archive.
filestocompress is a list of paths of
files to be compressed. These files
reside inside the program directory.
indexx is an integer that gives the
archive a unique name.
basedir is the name of the main MineSight
project directory (Fwaulu, for example).
Side effect function.
"""
print(fld.COMPRESSING.format(fld.MULTFILES))
archx = z7thing(fld.MULTARCHIVEX.format(basedir, indexx),
filestocompress)
archx.compress()
def segregatefiles(directoryx, basefiles):
"""
From a string directory path directoryx
and a list of base file names, returns
a dictionary of lists of files and their
sizes sorted on size and keyed on file
category.
"""
retval = {}
# Add separator to end of directory path.
directoryx += fld.FRONTSLASH
# Get all files in folder and their sizes.
allfiles = [(os.path.getsize(fld.FILEB.format(directoryx, filex)), filex)
for filex in basefiles]
retval[fld.SMALL] = [x for x in allfiles if x[0] < fld.HALFLIMIT]
retval[fld.SMALL].sort()
retval[fld.MEDIUM] = [x for x in allfiles if x[0] >= fld.HALFLIMIT and
x[0] <= fld.CHUNK]
retval[fld.MEDIUM].sort()
retval[fld.LARGE] = [x for x in allfiles if x[0] > fld.CHUNK]
retval[fld.LARGE].sort()
return retval
def deletefiles(movedfiles):
"""
Delete files that have been compressed.
movedfiles is a list of paths of
files that have been moved or copied to
the program directory for compression.
Side effect function.
"""
for pathx in movedfiles:
print(fld.DELETING.format(pathx))
os.remove(pathx)
def getsmallfilegroupings(smallfiles):
"""
Generator function that yields
a list of files whose sum is
less than the program's limit
for bytes to be archived in a
multiple file archive.
smallfiles is a list of two tuples
of (filesize in bytes, file path).
"""
lenx = len(smallfiles)
insidecounter1 = 0
insidecounter2 = 1
sumx = 0
while (insidecounter2 < (lenx + 1)):
sumx = sum(x[0] for x in smallfiles[insidecounter1:insidecounter2])
if sumx > fld.MULTFILELIMIT:
# Back up one.
insidecounter2 -= 1
yield (x[1] for x in smallfiles[insidecounter1:insidecounter2])
# Reset and advance counters.
sumx = 0
insidecounter1 = insidecounter2 + 1
insidecounter2 = insidecounter1 + 1
else:
insidecounter2 += 1
def compresslargefiles(largefiles, dirx, prefix, basedir, splitfilestrackerfile):
"""
Deal with compression of files that need to
be split prior to compression.
largefiles is a list of two tuples of file
sizes and names.
dirx is the directory (str) in which the files
are located.
prefix is a string prefix to augment path
identification for compression.
basedir is the name of the main MineSight
project directory (Fwaulu, for example).
splitfilestrackerfile is an open file
object used for tracking file splits
for later retrieval.
Side effect function.
"""
for filex in largefiles:
# Get generator of paths of splits.
splitfiles = splitfile(fld.FILEB.format(dirx, filex[1]),
splitfilestrackerfile)
movedfiles = movefiles(splitfiles, prefix)
compressfilessingle(movedfiles, prefix, basedir)
deletefiles(movedfiles)
def compressmediumfiles(mediumfiles, dirx, prefix, basedir):
"""
Deal with compression of files that need to
be compressed each to its own archive.
mediumfiles is a list of two tuples of file
sizes and paths.
dirx is the directory (str) in which the files
are located.
prefix is a string prefix to augment path
identification for compression.
basedir is the name of the main MineSight
project directory (Fwaulu, for example).
Side effect function.
"""
filestocopy = (dirx + x[1] for x in mediumfiles)
copiedfiles = copyfiles(filestocopy, prefix)
compressfilessingle(copiedfiles, prefix, basedir)
deletefiles(copiedfiles)
def compresssmallfiles(smallfiles, dirx, prefix, indexx, basedir):
"""
Deal with compression of files that can be
compressed in groups.
mediumfiles is a list of two tuples of file
sizes and paths.
dirx is the directory (str) in which the files
are located.
prefix is a string prefix to augment path
identification for compression.
indexx is the current index that the 7zip
file counter (ensures unique archive name)
is on.
basedir is the name of the main MineSight
project directory (Fwaulu, for example).
Returns integer for current archive counter
index.
"""
smallgroupings = getsmallfilegroupings(smallfiles)
while True:
try:
grouplittlefiles = smallgroupings.next()
littlefiles = (dirx + x for x in grouplittlefiles)
copiedfiles = copyfiles(littlefiles, prefix)
compressfilesmultiple(copiedfiles, indexx, basedir)
indexx += 1
deletefiles(copiedfiles)
except StopIteration:
break
return index
# XXX - hack
def matchbasedir(folderlist):
"""
Get MineSight project folder name
that matches a folder in the path
in question.
folderlist is a list (in order)
of directories in a path.
Returns string.
"""
for folderx in folderlist:
for projx in fld.BASEDIRS:
if projx == folderx:
return folderx
return None
def getbasedir(pathx):
"""
Returns two tuple of strings for
basedir and basefolder (project
directory name and base path under
project directory copied to program
directory).
pathx is the directory path being
processed (str).
"""
# basedir is project name (Fwaulu, for example).
foldernames = pathx.split(fld.FRONTSLASH)
basedir = matchbasedir(foldernames)
# Get directory under project directory.
# _msresources, for example.
idx = foldernames.index(basedir)
# Directory under program directory ./ for MineSight files.
basefolder = fld.SAMEFOLDER + fld.FRONTSLASH.join(foldernames[idx + 1:])
return basedir, basefolder
def dealwithtoplevel(firstdir):
"""
Compress top level files in the
MineSight project directory.
firstdir is the three tuple returned
from the os.walk() generator function.
Returns two tuple of integer smallfile
multifilecounter used for naming
multiple file archives and splitfilestrackerfile,
an open file object for tracking split
files for later reconstruction.
"""
# Top level files.
dirx = firstdir[0] + fld.FRONTSLASH
basedir, basefolder = getbasedir(dirx)
# File to track split files for later glueing back together.
splitfilestrackerfile = open(fld.SAMEFOLDER + basedir + fld.FRONTSLASH +
fld.SPLITFILETRACKER, fld.W)
firstdirfiles = segregatefiles(firstdir[0], firstdir[2])
compresslargefiles(firstdirfiles[fld.LARGE], dirx, fld.EMPTY, basedir,
splitfilestrackerfile)
compressmediumfiles(firstdirfiles[fld.MEDIUM], dirx, fld.EMPTY, basedir)
# This is for keeping track of
# archives with more than one file.
multifilecounter = 1
mulitfilecounter = compresssmallfiles(firstdirfiles[fld.SMALL], dirx,
fld.EMPTY, multifilecounter, basedir)
return multifilecounter, splitfilestrackerfile
def dealwithlowerleveldirectories(dirs, multifilecounter, splitfilestrackerfile):
"""
Finishes out compression of lower level
folders under top level MineSight project
directory.
dirs is a partially exhausted (one iteration)
os.walk() generator.
multifilecounter is an integer used for
naming multiple file archives.
splitfilestrackerfile is an open file
object used for tracking file splits
for later retrieval.
Returns orphanedfolders, a list of lower level
folders to be deleted at the end of the program
run.
"""
orphanedfolders = []
for dirx in dirs:
# XXX - hack - I hate dealing with Windows paths.
dirn = dirx[0].replace(fld.BACKSLASH, fld.FRONTSLASH)
diry = dirn + fld.FRONTSLASH
basedir, basefolder = getbasedir(diry)
# Create directory in program path.
fauxdir = fld.PROGFOLDER[:-1] + basefolder[1:-1]
os.mkdir(fauxdir)
orphanedfolders.append(fauxdir)
# Skip anything that doesn't have files.
if not dirx[2]:
continue
# Easiest way to do this might be
# to track directories and sort
# files according to size, then
# filter them accordingly.
dirfiles = segregatefiles(dirx[0], dirx[2])
compresslargefiles(dirfiles[fld.LARGE], diry, basefolder,
basedir, splitfilestrackerfile)
compressmediumfiles(dirfiles[fld.MEDIUM], diry, basefolder, basedir)
multifilecounter = compresssmallfiles(dirfiles[fld.SMALL], diry, basefolder,
multifilecounter, basedir)
splitfilestrackerfile.close()
return orphanedfolders
def walkdir(dirx):
"""
Traverse MineSight project directory,
7zipping everything along the way.
dirx is a string for the directory
to traverse.
Side effect function.
"""
dirs = os.walk(dirx)
# OK - os.walk returns generator that
# yields a tuple in the format
# (str path,
# [list of paths for directories under path],
# [list of filenames under path])
# Top level (Fwaulu, for instance).
# These files will not have a path
# prefix of any sort in their respective
# archives.
firstdir = dirs.next()
multifilecounter, splitfilestrackerfile = dealwithtoplevel(firstdir)
# All other files and folders.
orphanedfolders = dealwithlowerleveldirectories(dirs, multifilecounter,
splitfilestrackerfile)
# Delete lower level folders first - this is necessary.
orphanedfolders.reverse()
for orphanx in orphanedfolders:
print(fld.DELETINGDIR.format(orphanx))
os.rmdir(orphanx)
def cyclefolders(folderx):
"""
Wrapper function for compression
of folder folderx (string).
Side effect function.
"""
# 1) Set up empty project directory (ex: Fwaulu)
# in program directory.
# 2) For first set of files, use no prefix for
# 7zip archive storage (filename only).
# 3) Check for size of file.
# 4) If file is bigger than fld.CHUNK, split.
# 5) If file is smaller than fld.CHUNK, but bigger than
# MULTFILELIMIT, compress to one archive.
# 6) If file is smaller than fld.CHUNK, and smaller than
# MULTFILELIMIT, check subsequent files to determine
# files to include in archive. Keep track of file
# index that puts number of bytes over limit.
# 7) Compress multiple files to one archive - index
# archive to ensure unique name.
# 8) For all following sets of files, same process,
# but must prefix paths with SAMEFOLDER and any
# additional folder names.
foldertracker = []
# Make directory folder in program directory
# to hold 7zip files.
zipfolder = getbasedir(folderx)[0]
os.mkdir(zipfolder)
foldertracker.append(zipfolder)
walkdir(folderx)
print('\nDone')
cyclefolders is the overarching wrapper function for the module (compression operation).
#!java -jar C:\jython2.7.0\jython.jar
# unsevenzipper.py
"""
Use java 3rd party 7-zip compression
library (sevenZipJBindings) from
jython to un-7zip archives.
"""
# Need to adjust path to get necessary jar imports.
# XXX - it might be cleaner to chain imports by using
# the sevenzipper (s7 alias) below to reference
# double imported modules. For development and
# convenience I reimported everything as though
# sevenzipper.py and unsevenzipper.py were separate
# operations.
import sys
import folderstozip as fld
sys.path.append(fld.PATH7ZJB)
sys.path.append(fld.PATH7ZJBOSSPEC)
import os
import sevenzipper as s7
import SevenZipThingExtract
def subdirectoryornot(pathx):
"""
Boolean function that returns
True if string pathx is a
subdirectory of the MineSight
project folder and False if
the files belong directly to
the MineSight project folder.
"""
pathx = pathx.replace(fld.SAMEFOLDERWIN, fld.BACKSLASH)
pathlist = pathx.split(fld.BACKSLASH)
if len(pathlist) > 1:
return True
return False
def getdirectories(dirx):
"""
Get list of lists of directories
in path under project folder
from 7zip archives in project
folder for archives.
Returns two tuple of list and
dictionary indicating which
7z files are same directory
archives and which are archived
subdirectory files.
dirx is a string for the file
path of the directory to
be walked (./Fwaulu for example).
"""
dirs = os.walk(dirx)
# One level, no subfolders.
files = dirs.next()[2]
# Get directories first.
rawpaths = []
subdirornot = {}
for filex in files:
# Skip uncompressed split file tracker.
if filex == fld.SPLITFILETRACKER:
continue
# I don't know if it's a subdirectory or not, so I'll go with False.
s7tx = SevenZipThingExtract(dirx + fld.FRONTSLASH + filex, dirx, False)
folders = list(s7tx.getfolders())
rawpaths.extend(folders)
# All the paths in folders have the same prefix -
# just do one.
subdirornot[filex] = subdirectoryornot(folders[0])
# Get just directories
justdirectories = [pathx.replace(fld.SAMEFOLDERWIN, fld.BACKSLASH).split(fld.BACKSLASH)[1:-1]
for pathx in rawpaths if pathx.split(fld.BACKSLASH)[1:-1]]
justdirectories = set([tuple(x) for x in justdirectories])
justdirectories = list(justdirectories)
justdirectories.sort()
return justdirectories, subdirornot
def makedirectories(dirn):
"""
Create directory paths within archive
project folder to accept uncompressed
files.
Returns subdirornot dictionary.
dirn is a string for the file
path of the directory to
be walked (./Fwaulu for example).
"""
justdirectories, subdirornot = getdirectories(dirn)
maxdepth = max(len(dirx) for dirx in justdirectories)
for x in xrange(0, maxdepth):
justdirectoriesii = set([tuple(dirx[0:x + 1]) for dirx in justdirectories
if len(dirx) >= x + 1])
for diry in justdirectoriesii:
dirw = dirn + fld.FRONTSLASH + fld.FRONTSLASH.join(diry)
os.mkdir(dirw)
return subdirornot
def extractfiles(dirx):
"""
Extract files from 7z files
in project archive folder.
Side effect function.
dirx is a string for the file
path of the directory to
be walked.
"""
subdirornot = makedirectories(dirx)
dirs = os.walk(dirx)
# One level, no subfolders.
files = dirs.next()[2]
for filex in files:
# Skip uncompressed split file tracker.
if filex == fld.SPLITFILETRACKER:
continue
s7tx = SevenZipThingExtract(dirx + fld.FRONTSLASH + filex,
dirx, subdirornot[filex])
s7tx.extractfiles()
def gluetogethersplitfiles(dirx):
"""
Make split up files whole.
Side effect function.
dirx is the folder in which the split
files reside.
"""
# Glue together big files.
# Do this in a very controlling,
# structured way:
# 1) Read the split file tracker csv file.
# 2) Determine the number and names and paths
# of files to be reconstructed and the
# number of parts in each.
# 3) Check that everything is there for
# each file to be reconstructed.
# 4) Get the new relative path.
# 5) Glue back together programmatically.
splitfiles = []
# fld.SPLITFILETRACKER is structured as original path
# of file split, number of file split.
with open(fld.SAMEFOLDERWIN + dirx +
fld.FRONTSLASH + fld.SPLITFILETRACKER, 'r') as f:
for linex in f:
strippedline = [x.strip() for x in linex.split(fld.UCOMMA)]
splitfiles.append(tuple(strippedline))
orignames = [x[0] for x in splitfiles]
splitoriginals = set(orignames)
# Make dictionary that is easy to cycle through.
filesx = {}
for orig in splitoriginals:
basedir, basefolder = s7.getbasedir(orig)
filesx[orig] = {}
filesx[orig][fld.BASEPATH] = fld.SAMEFOLDER + basedir + basefolder[1:]
filesx[orig][fld.FILES] = (fld.SPLITFILE.format(filesx[orig][fld.BASEPATH], filex[1])
for filex in splitfiles if filex[0] == orig)
for orig in filesx:
with open(filesx[orig][fld.BASEPATH], fld.WB) as mainfile:
for filex in filesx[orig][fld.FILES]:
with open(filex, fld.RB) as splitfile:
mainfile.write(splitfile.read())
def restore(dirx):
"""
Restores MineSight project directory
inside program path.
dirx is a string for the directory
to be restored (./Fwaulu, for example).
Side effect function.
"""
extractfiles(dirx)
gluetogethersplitfiles(dirx)
print('Done')
restore is the main function for the module (uncompression).
Notes:
1) I don't have admin rights at work and did not have javac (the compiler for java) available. You can download an SDK or SRE java package from Oracle that has it. Without admin rights, you can't install it normally. Still you can use it. My compilation went something like this:
<path to downloaded JDK>/bin/javac -cp <path to downloaded 7-ZipJBinding>/lib/* <myclassname>.java
2) I've left all the split up files and 7z archives in the folder where I decompress my files and recombine the split files. This takes up a lot of space depending on what you're working with. If space is at a premium, you probably want to write jython code to move or delete the archives after uncompressing them.
3) The most time consuming part of runtime is the compression, uncompression, and splitting and recombining of split files. Porting some of this to java (instead of jython) might speed things up. I code faster and generally better in jython. Also, my objective was control, not speed. YMMV (your mileage may vary) with this approach. There are far better general purpose ones.
Thanks for stopping by.