mymodern
mymodern is a command line program written in Crystal that scrapes web pages from public library websites. It uses CSS selector rules to parse the pages and lists checked-out books and books on hold ready for pickup.
The program uses Crystal's concurrency support to handle multiple HTTP requests simultaneously. It is modular, allowing you to define modules for different library websites and customize the scraping logic for each.
Installation
To set up mymodern on your machine:
- Clone the repository.
- Navigate to the project directory.
- Run the following commands:
shards install
shards build
The library kostya/lexbor parses HTML using
CSS selectors. It wraps native C code and the compilation step for this
dependency happens automatically when you run shards install
Once installed, the library will be available locally within your project directory.
For more information about the project and how to customize it for your needs, please refer to our:
-
Detailed README for detailed installation instructions, and subsequent setup and usage notes.
-
Project Structure Documentation which outlines the specific parts you'll need to adapt or modify to work with your library's website.
Usage
~/at_the_prompt/crystal/mymodern$ bin/mymodern --help
Usage: mymodern [OPTIONS]
Scrape pages at public libraries' web sites.
-m, --mock this option mocks everything
-t, --trace trace where it's all happening
-h, --help show this message
Example program output:
Hennepin Books Out
The Astronomer
Tuesday December 04, 2018
Subterranean Twin Cities
Wednesday December 05, 2018
Renewed: 1 time
2 people waiting
Hennepin Books on Hold
A History of America in Ten Strikes
Wednesday November 14, 2018
St. Paul Books Out
St. Paul Books on Hold
The mysterious flame of Queen Loana
Saturday November 17, 2018
Development
The program calls upon identically named methods provided by one or another library's module.
module Hennepin
def self.lib_data
# returns data in the form of a NamedTuple
def self.parse_checkedout_page(page)
# return a sorted array of CheckedOutRecord
# uses already existing record and comparison definitions
def self.parse_on_hold_page(page)
# return a sorted array of OnHoldRecord
Code for another library where you're a regular patron would do the same, and parts of the program that invoke these methods iterate over an array of the module names, and don't hard code the number or names of the modules.
You can adapt the modules that define these few methods, name them Springfield and Shelbyville, and require them from a single file which also lists them in an array of module names.
require "./hennepin"
require "./stpaul"
MODULE_NAMES = [Hennepin, StPaul]
So this can be used to scrape web pages at 1, 2, or 3 libraries, and all things being equal, runs in constant time no matter how many you're doing. *
* Because concurrency.
1 web site, 1.5 times faster. All things being equal, 2 websites, could be 3 times faster, 3 web sites, 4.5 times faster. Conceivably.
Contributing
- Fork the repository (https://github.com/Traven-B/mymodern/fork)
- Create a new feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
Contributors
- Traven-B Michael Kamb - creator, maintainer