class ZipTricks::FileReader

Overview

A very barebones ZIP file reader. Is made for maximum interoperability, but at the same time we attempt to keep it somewhat concise.

Please BEWARE - using this is a security risk if you are reading files that have been supplied by users. This implementation has not been formally verified for correctness. As ZIP files contain relative offsets in lots of places it might be possible for a maliciously crafted ZIP file to put the decode procedure in an endless loop, make it attempt huge reads from the input file and so on. Additionally, the reader module for deflated data has no support for ZIP bomb protection. So either limit the FileReader usage to the files you trust, or triple-check all the inputs upfront.

Supported features

Unsupported features

Mode of operation

By default, FileReader ignores the data in local file headers (as it is often unreliable). It reads the ZIP file "from the tail", finds the end-of-central-directory signatures, then reads the central directory entries, reconstitutes the entries with their filenames, attributes and so on, and sets these entries up with the absolute offsets into the source file/IO object. These offsets can then be used to extract the actual compressed data of the files and to expand it.

Recovering damaged or incomplete ZIP files

If the ZIP file you are trying to read does not contain the central directory records #read_zip_structure will not work, since it starts the read process from the EOCD marker at the end of the central directory and then crawls "back" in the IO to figure out the rest. You can explicitly apply a fallback for reading the archive "straight ahead" instead using #read_zip_straight_ahead

Defined in:

file_reader.cr

Constant Summary

MAX_END_OF_CENTRAL_DIRECTORY_RECORD_SIZE = (((((((4 + 4) + 2) + 4) + 2) + 2) + 2) + 2) + 65535

To prevent too many tiny reads, read the maximum possible size of end of central directory record upfront (all the fixed fields + at most 0xFFFF bytes of the archive comment)

MAX_LOCAL_HEADER_SIZE = (((((((((((4 + 2) + 2) + 2) + 2) + 2) + 4) + 4) + 4) + 2) + 2) + 65535) + 65535

To prevent too many tiny reads, read the maximum possible size of the local file header upfront.

SIZE_OF_USABLE_EOCD_RECORD = (((((4 + 2) + 2) + 2) + 2) + 4) + 4

Class Method Summary

Instance Method Summary

Class Method Detail

def self.read_zip_straight_ahead(io) : Array(ZipEntry) #

[View source]
def self.read_zip_structure(io, read_local_headers : Bool = true) : Array(ZipEntry) #

Class method convenience wrappers


[View source]

Instance Method Detail

def get_compressed_data_offset(io, local_file_header_offset : Int) : UInt64 #

Get the compressed data offset for an entry at a given local file header offset


[View source]
def read_cdir_entry(io) : ZipEntry #

Read a single central directory entry from the IO. Exposed for testing.


[View source]
def read_local_file_header(io) : ZipEntry #

Parse the local header entry and get the offset in the IO at which the actual compressed data of the file starts within the ZIP.


[View source]
def read_zip_straight_ahead(io) : Array(ZipEntry) #

Read entries from a ZIP "straight ahead", without using the central directory. Useful for recovering damaged or truncated ZIP files. Does not support data descriptors.


[View source]
def read_zip_structure(io, read_local_headers : Bool = true) : Array(ZipEntry) #

Parse an IO handle to a ZIP archive into an array of Entry objects, reading from the end of the IO object (central directory).


[View source]