statistics
A statistical library to perform descriptive statistics and generate random values based on popular probability distributions.
Installation
Add the dependency to your shard.yml
:
dependencies:
statistics:
github: lbarasti/statistics
Run shards install
Usage
require "statistics"
Descriptive statistics
You can compute mean, variance and standard deviation of a collection as follows.
include Statistics
x = [1, 10, 7]
mean(x) # 6
var(x) # 14
std(x) # 3.7416...
If you'd like to know a bit more about your dataset, you can simply describe
it
x = (1..1000).map { rand }.to_a # a uniformely distributed dataset
describe(x)
# {
# mean: 0.48, var: 0.08, std: 0.28,
# skewness: 0.04, kurtosis: 1.81,
# min: 0.01, middle: 0.49, max: 0.99,
# q1: 0.24, median: 0.49, q3: 0.73
# }
Statistics.describe
returns a NamedTuple
, so you can extract any value via indexing:
stats = describe(x)
stats[:q1] # returns the first quartile of your sample
For a complete list of the statistical functions provided, including quantile
, moment
and skew
, check out the docs.
Sampling
To work with distributions, import the Distributions
namespace as follows.
include Statistics::Distributions
Now, here is how we sample values from a normal distribution with mean = 1.5
and std = 0.2
.
Normal.new(1.5, 0.2).rand
We can generate an iterable of normally distributed random values as follows.
gen = Normal.new(1.5, 0.2)
1000.times.map { gen.rand }
Supported distributions
The following distributions are supported:
- Constant
- Exponential
- Normal
- Poisson
- Uniform
Don't see your favourite one on the list? Just fork the repo, add your distribution to the distributions.cr
file, and open a PR.
Development
This shard is a work in progress. Everyone's contribution is welcome.
The guiding principle at this stage is
make it work before you make it right
Which in this context means: let's not focus on benchmarks and performance, but rather on usability and correctness.
References
- numpy.random: distributions and random sampling
- numpy statistics: order statistics, averages and variances
- scipy stats module and related tests tests
- julia random module
- julia statistics module
- julia distributions package.
- on skewness and kurtosis, by Stan Brown
- more on skewness and kurtosis, from NIST.
Contributing
- Fork it (https://github.com/lbarasti/statistics/fork)
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create a new Pull Request
Contributors
- lbarasti - creator and maintainer