Croupier
Croupier is a smart task definition and execution library, which can be used for dataflow programming.
What does it mean
You use Croupier to define tasks. Tasks have:
-
An id
-
Zero or more input files or k/v store keys
-
Zero or more output files or k/v store keys
-
A
Procthat consumes the inputs and returns a string -
After the
Procreturns data which is saved to the output(s) unless the task has theno_saveflag set totrue, in which case it's expected to have already saved it.Note: the return value for procs depends on several factors, see below. Note: A reference to a k/v key is of the form
kv://mykey
And here is the fun part:
Croupier will examine the inputs and outputs for your tasks and use them to build a dependency graph. This expresses the connections between your tasks and the files on disk, and between tasks, and will use that information to decide what to run.
So, suppose you have task1 consuming input.txt producing
fileA and task2 that has fileA as input and outputs fileB.
That means your tasks look something like this:
graph LR;
id1(["📁 input.txt"])-->idt1["⚙️ task1"]-->id2(["📁 fileA"]);
id2-->idt2["⚙️ task2"]-->id3(["📁 fileB"]);
Croupier guarantees the following:
- If
task1has never run before, it will run and createfileA - If
task1has run before andinput.txthas not changed, it will not run. - If
task1has run before and ìnput.txt` has changed, it will run - If
task1runs,task2will run and createfileB task1will run beforetask2
That's a very long way to say: Croupier will run whatever needs running, based on the content of the dependency files and the dependencies between tasks. In this example it may look silly because it's simple, but it should work even for thousands of tasks and dependencies.
The state between runs is kept in .croupier so if you delete
that file all tasks will run.
Further documentation at the doc pages
Notes
Notes about proc return types
-
Procs in Tasks without outputs can return nil or a string, it will be ignored.
-
Procs with one output and
no_save==falseshould return a string which will be saved to that output.If
no_save==truethen the returned value is ignored. -
Procs with multiple outputs and
no_save==falseshould return anArray(String)which will be saved to those outputs.If
no_save==truethen the returned value is ignored.
No target conflicts
If there are two or more tasks with the same output they will be merged into the first task created. The resulting task will:
- Depend on the combination of all dependencies of all merged tasks
- Run the procs of all merged tasks in order of creation
Tasks without output
A task with no output will be registered under its id and is not expected
to create any output files. Other than that, it's just a regular task.
Tasks with multiple outputs
If a task expects the TaskManager to create multiple files, it should return an array of strings.
Master/Subtask Tasks
Croupier supports hierarchical tasks through the master/subtask pattern. This allows you to create tasks that dynamically generate and manage other tasks at runtime.
What are Master Tasks?
A master task is a special type of task that:
- Has
master_task: truein its definition - Runs on every build (typically with
always_run: true) - Dynamically creates, removes, or manages subtasks based on runtime conditions
- Has no outputs of its own (returns
nil)
When to Use Master/Subtask Pattern
The master/subtask pattern is ideal when:
- You have a variable number of similar tasks (e.g., processing files in a directory)
- Tasks need to be created dynamically based on folder contents
- You want to avoid manually defining a task for each file
- The set of tasks changes frequently (files added/removed)
Example: Static Site Generator
require "croupier"
# Master task watches content/ folder and creates subtasks for each markdown file
master_task = Croupier::Task.new(
id: "content_master",
inputs: ["content/"],
always_run: true,
master_task: true,
) do
current_files = Dir.glob("content/**/*.md").to_set
# Get previously created subtasks from k/v store
previous_data = Croupier::TaskManager.get("content_subtasks")
previous_files = previous_data ? previous_data.split("\n").to_set : Set(String).new
# Remove subtasks for deleted files
(previous_files - current_files).each do |deleted_file|
puts "🗑️ Removing subtask for deleted file: #{deleted_file}"
subtask_id = "render_#{Digest::SHA1.hexdigest(deleted_file)[0..6]}"
Croupier::TaskManager.tasks.each do |key, task|
Croupier::TaskManager.tasks.delete(key) if task.id == subtask_id
end
# Also remove output file
output_path = deleted_file.sub("content", "output").sub(".md", ".html")
File.delete?(output_path)
end
# Create subtasks for new/changed files
(current_files - previous_files).each do |new_file|
puts "✨ Creating subtask for new file: #{new_file}"
subtask_id = "render_#{Digest::SHA1.hexdigest(new_file)[0..6]}"
output_file = new_file.sub("content", "output").sub(".md", ".html")
subtask = Croupier::Task.new(
id: subtask_id,
inputs: [new_file],
outputs: [output_file],
) do
# Render markdown to HTML
content = File.read(new_file)
Markd.to_html(content)
end
Croupier::TaskManager.register_subtask("content_master", subtask)
end
# Save current list for next run
Croupier::TaskManager.set("content_subtasks", current_files.to_a.join("\n"))
nil # Master tasks return nil
end
# Run tasks (master task creates subtasks, then we run again to execute them)
Croupier::TaskManager.run_tasks
Croupier::TaskManager.run_tasks
Key Methods
register_subtask(master_id, subtask): Register a newly created subtask with a master taskremove_subtasks(master_id): Remove all subtasks belonging to a master taskinvalidate_graph_cache: Force the task graph to rebuild (called automatically when subtasks are added/removed)
Important Notes
-
Double Execution: When using master tasks, call
run_taskstwice:- First run: Master task executes and creates/removes subtasks
- Second run: Newly created subtasks execute
In auto mode, this is handled automatically - Croupier detects when the task graph changes and runs tasks again as needed.
-
Subtasks are Regular Tasks: Subtasks participate in the task graph just like any other task. They can have dependencies, outputs, and are subject to incremental builds.
-
Graph Invalidation: When
register_subtaskorremove_subtasksis called, the task graph is automatically invalidated and rebuilt on the nextrun_taskscall. -
State Persistence: Use the k/v store to persist state between runs (e.g., the list of files from the previous build).
See It In Action
For a complete working example, see the examples/ssg/ directory, which contains a full static site generator using the master/subtask pattern.
Installation
-
Add the dependency to your
shard.yml:dependencies: croupier: github: ralsina/croupier -
Run
shards install
Usage
This is the example described above, in actual code:
require "croupier"
Croupier::Task.new(
output: "fileA",
inputs: ["input.txt"],
) {
puts "task1 running"
File.read("input.txt").downcase
}
Croupier::Task.new(
output: "fileB",
inputs: ["fileA"],
) do
puts "task2 running"
File.read("fileA").upcase
end
Croupier::TaskManager.run_tasks
If we create a input.txt file with some text in it and run this
program, it will print task1 running and task2 running and
produce fileA with that same text in lowercase, and fileB
with the text in uppercase.
The second time we run it, it will do nothing because all tasks dependencies are unchanged.
If we modify index.txt or fileA then one or both tasks
will run, as needed.
Auto Mode
Besides run_tasks, there is another way to run your tasks,
auto_run. It will run tasks as needed, when their input
files change. This allows for some sorts of "continuous build"
which is useful for things like web development.
You start the auto mode with TaskManager.auto_run and stop
it with TaskManager.auto_stop. It runs in a separate fiber
so your main fiber needs to do something else and yield. For
details on that, see Crystal's docs.
This feature is still under development and may change, but here is an example of how it works, taken from the specs:
# We create a proc that has a visible side effect
x = 0
counter = TaskProc.new { x += 1; x.to_s }
# This task depends on a file called "i" and produces "t1"
Task.new(output: "t1", inputs: ["i"], proc: counter)
# Launch in auto mode
TaskManager.auto_run
# We have to yield and/or do stuff in the main fiber
# so the auto_run fibers can run
Fiber.yield
# Trigger a build by creating the dependency
File.open("i", "w") << "foo"
Fiber.yield
# Stop the auto_run
TaskManager.auto_stop
# It should only have ran once
x.should eq 1
File.exists?("t1").should eq true
Development
Let's try to keep test coverage good :-)
- To run tests:
make testorcrystal spec - To check coverage:
make coverage - To run mutation testing:
make mutation
Other than that, anything is fair game. In the TODO.md file there is a section for things that were considered and decided to be a bad idea, but that is conditional and can change when presented with a good argument.
Contributing
- Fork it (https://github.com/ralsina/croupier/fork)
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Add some feature') - Push to the branch (
git push origin my-new-feature) - Create a new Pull Request
Contributors
- Roberto Alsina - creator and maintainer