class
ZipTricks::FileReader
- ZipTricks::FileReader
- Reference
- Object
Overview
A very barebones ZIP file reader. Is made for maximum interoperability, but at the same time we attempt to keep it somewhat concise.
Please BEWARE - using this is a security risk if you are reading files that have been
supplied by users. This implementation has not been formally verified for correctness. As
ZIP files contain relative offsets in lots of places it might be possible for a maliciously
crafted ZIP file to put the decode procedure in an endless loop, make it attempt huge reads
from the input file and so on. Additionally, the reader module for deflated data has
no support for ZIP bomb protection. So either limit the FileReader usage to the files you
trust, or triple-check all the inputs upfront.
Supported features
- Deflate and stored storage modes
- Zip64 (extra fields and offsets)
- Data descriptors
Unsupported features
- Archives split over multiple disks/files
- Any ZIP encryption
- EFS language flag and InfoZIP filename extra field
- CRC32 checksums are not verified
Mode of operation
By default, FileReader ignores the data in local file headers (as it is
often unreliable). It reads the ZIP file "from the tail", finds the
end-of-central-directory signatures, then reads the central directory entries,
reconstitutes the entries with their filenames, attributes and so on, and
sets these entries up with the absolute offsets into the source file/IO object.
These offsets can then be used to extract the actual compressed data of
the files and to expand it.
Recovering damaged or incomplete ZIP files
If the ZIP file you are trying to read does not contain the central directory
records #read_zip_structure will not work, since it starts the read process
from the EOCD marker at the end of the central directory and then crawls
"back" in the IO to figure out the rest. You can explicitly apply a fallback
for reading the archive "straight ahead" instead using #read_zip_straight_ahead
- the method will instead scan your IO from the very start, skipping over the actual entry data. This is less efficient than central directory parsing since it involves a much larger number of reads (1 read from the IO per entry in the ZIP).
Defined in:
file_reader.crConstant Summary
-
MAX_END_OF_CENTRAL_DIRECTORY_RECORD_SIZE =
(((((((4 + 4) + 2) + 4) + 2) + 2) + 2) + 2) + 65535 -
To prevent too many tiny reads, read the maximum possible size of end of central directory record upfront (all the fixed fields + at most 0xFFFF bytes of the archive comment)
-
MAX_LOCAL_HEADER_SIZE =
(((((((((((4 + 2) + 2) + 2) + 2) + 2) + 4) + 4) + 4) + 2) + 2) + 65535) + 65535 -
To prevent too many tiny reads, read the maximum possible size of the local file header upfront.
-
SIZE_OF_USABLE_EOCD_RECORD =
(((((4 + 2) + 2) + 2) + 2) + 4) + 4
Class Method Summary
- .read_zip_straight_ahead(io) : Array(ZipEntry)
-
.read_zip_structure(io, read_local_headers : Bool = true) : Array(ZipEntry)
Class method convenience wrappers
Instance Method Summary
-
#get_compressed_data_offset(io, local_file_header_offset : Int) : UInt64
Get the compressed data offset for an entry at a given local file header offset
-
#read_cdir_entry(io) : ZipEntry
Read a single central directory entry from the IO.
-
#read_local_file_header(io) : ZipEntry
Parse the local header entry and get the offset in the IO at which the actual compressed data of the file starts within the ZIP.
-
#read_zip_straight_ahead(io) : Array(ZipEntry)
Read entries from a ZIP "straight ahead", without using the central directory.
-
#read_zip_structure(io, read_local_headers : Bool = true) : Array(ZipEntry)
Parse an IO handle to a ZIP archive into an array of Entry objects, reading from the end of the IO object (central directory).
Class Method Detail
Class method convenience wrappers
Instance Method Detail
Get the compressed data offset for an entry at a given local file header offset
Read a single central directory entry from the IO. Exposed for testing.
Parse the local header entry and get the offset in the IO at which the actual compressed data of the file starts within the ZIP.
Read entries from a ZIP "straight ahead", without using the central directory. Useful for recovering damaged or truncated ZIP files. Does not support data descriptors.
Parse an IO handle to a ZIP archive into an array of Entry objects, reading from the end of the IO object (central directory).