class ZipTricks::FileReader

ZipTricks::FileReader
Reference
Object

Overview

A very barebones ZIP file reader. Is made for maximum interoperability, but at the same time we attempt to keep it somewhat concise.

Please BEWARE - using this is a security risk if you are reading files that have been supplied by users. This implementation has not been formally verified for correctness. As ZIP files contain relative offsets in lots of places it might be possible for a maliciously crafted ZIP file to put the decode procedure in an endless loop, make it attempt huge reads from the input file and so on. Additionally, the reader module for deflated data has no support for ZIP bomb protection. So either limit the FileReader usage to the files you trust, or triple-check all the inputs upfront.

Supported features

Deflate and stored storage modes
Zip64 (extra fields and offsets)
Data descriptors

Unsupported features

Archives split over multiple disks/files
Any ZIP encryption
EFS language flag and InfoZIP filename extra field
CRC32 checksums are not verified

Mode of operation

By default, FileReader ignores the data in local file headers (as it is often unreliable). It reads the ZIP file "from the tail", finds the end-of-central-directory signatures, then reads the central directory entries, reconstitutes the entries with their filenames, attributes and so on, and sets these entries up with the absolute offsets into the source file/IO object. These offsets can then be used to extract the actual compressed data of the files and to expand it.

Recovering damaged or incomplete ZIP files

If the ZIP file you are trying to read does not contain the central directory records #read_zip_structure will not work, since it starts the read process from the EOCD marker at the end of the central directory and then crawls "back" in the IO to figure out the rest. You can explicitly apply a fallback for reading the archive "straight ahead" instead using #read_zip_straight_ahead

the method will instead scan your IO from the very start, skipping over the actual entry data. This is less efficient than central directory parsing since it involves a much larger number of reads (1 read from the IO per entry in the ZIP).

Defined in:

file_reader.cr

Constant Summary

MAX_END_OF_CENTRAL_DIRECTORY_RECORD_SIZE = (((((((4 + 4) + 2) + 4) + 2) + 2) + 2) + 2) + 65535: To prevent too many tiny reads, read the maximum possible size of end of central directory record upfront (all the fixed fields + at most 0xFFFF bytes of the archive comment)
MAX_LOCAL_HEADER_SIZE = (((((((((((4 + 2) + 2) + 2) + 2) + 2) + 4) + 4) + 4) + 2) + 2) + 65535) + 65535: To prevent too many tiny reads, read the maximum possible size of the local file header upfront.
SIZE_OF_USABLE_EOCD_RECORD = (((((4 + 2) + 2) + 2) + 2) + 4) + 4

Class Method Summary

.read_zip_straight_ahead(io) : Array(ZipEntry)
.read_zip_structure(io, read_local_headers : Bool = true) : Array(ZipEntry)
Class method convenience wrappers

Instance Method Summary

#get_compressed_data_offset(io, local_file_header_offset : Int) : UInt64
Get the compressed data offset for an entry at a given local file header offset
#read_cdir_entry(io) : ZipEntry
Read a single central directory entry from the IO.
#read_local_file_header(io) : ZipEntry
Parse the local header entry and get the offset in the IO at which the actual compressed data of the file starts within the ZIP.
#read_zip_straight_ahead(io) : Array(ZipEntry)
Read entries from a ZIP "straight ahead", without using the central directory.
#read_zip_structure(io, read_local_headers : Bool = true) : Array(ZipEntry)
Parse an IO handle to a ZIP archive into an array of Entry objects, reading from the end of the IO object (central directory).

Class Method Detail

def self.read_zip_straight_ahead(io) : Array(ZipEntry) #

[View source]

def self.read_zip_structure(io, read_local_headers : Bool = true) : Array(ZipEntry) #

Class method convenience wrappers

[View source]

Instance Method Detail

def get_compressed_data_offset(io, local_file_header_offset : Int) : UInt64 #

Get the compressed data offset for an entry at a given local file header offset

[View source]

def read_cdir_entry(io) : ZipEntry #

Read a single central directory entry from the IO. Exposed for testing.

[View source]

def read_local_file_header(io) : ZipEntry #

Parse the local header entry and get the offset in the IO at which the actual compressed data of the file starts within the ZIP.

[View source]

def read_zip_straight_ahead(io) : Array(ZipEntry) #

Read entries from a ZIP "straight ahead", without using the central directory. Useful for recovering damaged or truncated ZIP files. Does not support data descriptors.

[View source]

def read_zip_structure(io, read_local_headers : Bool = true) : Array(ZipEntry) #

Parse an IO handle to a ZIP archive into an array of Entry objects, reading from the end of the IO object (central directory).

[View source]

CrystalDoc.info

cr_zip_tricks

class ZipTricks::FileReader

Overview

Supported features

Unsupported features

Mode of operation

Recovering damaged or incomplete ZIP files

Defined in:

Constant Summary

Class Method Summary

Instance Method Summary

Class Method Detail

Instance Method Detail