module MediaFilter

Defined in:

utils/media_filter.cr

Constant Summary

MAX_FILE_SIZE = begin if size_str = ENV["NOIR_MAX_FILE_SIZE"]? parsed = MediaFilter.parse_size(size_str) parsed > 0 ? parsed : (10 * 1024) * 1024 else (10 * 1024) * 1024 end rescue (10 * 1024) * 1024 end

Maximum file size for processing (default 10MB). Can be overridden with the environment variable NOIR_MAX_FILE_SIZE. Supported formats for NOIR_MAX_FILE_SIZE:

  • Plain bytes integer (e.g., 5242880)
  • Human-readable with unit suffix (K, KB, M, MB, G, GB) e.g., 5MB, 500K, 1G Invalid / unparsable values fall back to the default (10MB).
MEDIA_EXTENSION_SET = MEDIA_EXTENSIONS.to_set

O(1) lookup set materialized from MEDIA_EXTENSIONS. Used on the hot path — every file in the project is checked once.

MEDIA_EXTENSIONS = [".jpg", ".jpeg", ".png", ".gif", ".bmp", ".webp", ".tiff", ".svg", ".ico", ".psd", ".raw", ".cr2", ".nef", ".orf", ".sr2", ".heic", ".heif", ".mp4", ".avi", ".mkv", ".mov", ".wmv", ".flv", ".webm", ".m4v", ".mpg", ".mpeg", ".3gp", ".vob", ".rm", ".rmvb", ".asf", ".ogv", ".mp3", ".wav", ".flac", ".aac", ".ogg", ".wma", ".m4a", ".ape", ".ac3", ".dts", ".opus", ".amr", ".au", ".ra", ".aiff", ".zip", ".rar", ".7z", ".tar", ".gz", ".bz2", ".xz", ".dmg", ".iso", ".pdf", ".doc", ".docx", ".ppt", ".pptx", ".xls", ".xlsx", ".exe", ".dll", ".so", ".dylib", ".bin", ".app", ".deb", ".rpm", ".db", ".sqlite", ".sqlite3", ".mdb", ".accdb", ".ttf", ".otf", ".woff", ".woff2", ".eot"]

Common media file extensions that should be skipped

Class Method Summary

Class Method Detail

def self.binary_content_signature?(file_path : String) : Bool #

Cheap binary-content sniff. Reads the first 512 bytes and returns true if the buffer contains a NUL byte (\x00), which is the canonical "this is binary, not text" marker — text files in any common encoding (UTF-8, UTF-16-with-BOM, Latin-1, etc.) don't contain interior NULs in practice.


[View source]
def self.file_too_large?(file_path : String, max_size : Int32 = MAX_FILE_SIZE) : Bool #

Check if a file is too large to process


[View source]
def self.media_file?(file_path : String) : Bool #

Check if a file should be skipped based on extension


[View source]
def self.parse_size(str : String) : Int32 #

Parse size strings like "10MB", "500K", "1G" or raw bytes ("1048576")


[View source]
def self.should_skip_file?(file_path : String, max_size : Int32 = MAX_FILE_SIZE, info : File::Info | Nil = nil) : Bool #

Combined check - returns true if file should be skipped. Prefer {skip_check} on hot paths: it returns the reason in the same call so the caller does not re-stat to log.


[View source]
def self.skip_check(file_path : String, max_size : Int32 = MAX_FILE_SIZE, info : File::Info | Nil = nil) : String | Nil #

Decide whether a file should be skipped and, if so, return the human readable reason in a single pass — avoids re-stat'ing the file just to compose the log message. Returns nil when the file should be processed.

When the caller has already obtained a File::Info (e.g. the detector walker stats each entry with follow_symlinks: false), it can be passed as info to skip the size stat entirely.


[View source]
def self.skip_reason(file_path : String, max_size : Int32 = MAX_FILE_SIZE, info : File::Info | Nil = nil) : String | Nil #

Get a human-readable reason why a file was skipped. Kept for backwards compatibility; new callers should use {skip_check}.


[View source]