module Crysda
Overview
CrysDAis a **{Crys}**tal shard for **{D}**ata **{A}**nalysis. Provides you modern functional-style API for data manipulation to filter, transform, aggregate and reshape tabular data. Core of the library isCrysDA::DataFrame` an immutable data structure interface.
Features
- [X] Filter, transform, aggregate and reshape tabular data
- [X] Modern, user-friendly and easy-to-learn data-science API
- [X] Reads from plain and compressed tsv, csv, json, or any delimited format with or without header from local or remote.
- [X] Supports grouped operations
- [X] Supports reading data from DB
- [X] Tables can contain atomic columns (Number, Float, Bool, String) as well as object columns
- [X] Reshape tables from wide to long and back
- [X] Table joins (left, right, semi, inner, outer)
- [X] Cross tabulation
- [X] Descriptive statistics (mean, min, max, median, ...)
- [X] Functional API inspired by dplyr, pandas
- [X] many more...
Defined in:
crysda.crcrysda/builder.cr
crysda/columns.cr
crysda/context.cr
crysda/dataframe.cr
crysda/groupdf.cr
crysda/joins.cr
crysda/reshape.cr
crysda/select.cr
crysda/simpledf.cr
crysda/utils.cr
Constant Summary
-
MISSING_VALUE =
"NA" -
PRINT_MAX_DIGITS =
3 -
PRINT_MAX_ROWS =
10 -
PRINT_MAX_WIDTH =
100 -
PRINT_ROW_NUMBERS =
true -
VERSION =
{{ (`shards version \"/srv/crystaldoc.info/github-naqvis-CrysDA-v0.1.3/src\"`).chomp.stringify }}
Class Method Summary
-
.bind_cols(left : DataFrame, right : DataFrame, rename_duplicates = true) : DataFrame
Binds dataframes by column.
-
.bind_rows(*dfs : DataFrame) : DataFrame
Adds new rows.
-
.column_types(df : DataFrame) : Array(ColSpec)
return column types as an array of
ColSpecstruct -
.dataframe_of(rows : Iterable(Hash(String, Any)))
Creates a new data-frame from Array of
{} of String => Any -
.dataframe_of(rows : Iterable(DataFrameRow))
Creates a new data-frame from array of
DataFrameRow -
.dataframe_of(cols : Iterable(DataCol))
Creates a data-frame from Array of
DataCol -
.dataframe_of(*rows : Hash(String, Any))
Creates a new data-frame from
{} of String => Any -
.dataframe_of(*header : String)
Creates a new dataframe in place.
-
.dataframe_of(*rows : DataFrameRow)
Creates a new data-frame from records encoded as key-value maps Column types will be inferred from the value types
-
.dataframe_of(*cols : DataCol)
Create a new data-frame from a list of
DataColinstances -
.empty_df
Creates an empty dataframe with 0 observation
-
.from(resultset : DB::ResultSet)
build a data-frame from a
DB::ResultSet -
.from_json(json : String)
builds a data-frame from a JSON string
-
.read_csv(file : String | IO, separator : Char = ',', quote_char : Char = '"', skip_blank_lines : Bool = true, skip : Int32 = 0, comment : Char | Nil = '#', header : Int32 | Nil = 0, na_value : String = MISSING_VALUE, true_values = ["T", "TRUE"], false_values = ["F", "FALSE"])
reads a comma separated value file/io into a dataframe.
-
.read_json(file : String | IO)
reads a json file or URL
-
.selector(&block : ColumnSelector)
helper method to return the block as
Proc.
Class Method Detail
Binds dataframes by column. Rows are matched by position, so all data frames must have the same number of rows.
Adds new rows. Missing entries are set to null. The output of bind_rows will contain a column if that column appears in any of the inputs. When row-binding, columns are matched by name, and any missing columns will be filled with NA. Grouping will be discarded when binding rows
return column types as an array of ColSpec struct
Creates a new data-frame from Array of {} of String => Any
Creates a new data-frame from array of DataFrameRow
Creates a data-frame from Array of DataCol
Creates a new data-frame from {} of String => Any
Creates a new dataframe in place.
header - pass headers as variadic parameter
call values after this call to pass the values
df = dataframe_of("quarter", "sales", "location").values(1, 300.01, "london", 2, 290, "chicago")
Creates a new data-frame from records encoded as key-value maps Column types will be inferred from the value types
Create a new data-frame from a list of DataCol instances
reads a comma separated value file/io into a dataframe.
file could be local file path or a URL. It will read compressed(gz, gzip) files.
separator defaults to , and can be changed to other separator (e.g \t for tab separated files)
skip_blank_lines defaults to true, will skip all blank lines
skip defaults to 0, will skip this much lines from start of file.
comment character default # will ignore all lines starting with this character
header line defaults to 0 (first row), if set to nil then column names are auto generated starting with Col1.
if skip_blank_lines and comment are enabled, header will start reading after removing blank and comment lines
na_value defaults to NA Strings which should be treated as Nil. values matching this param will be treated as nil
true_values defaults to ["T","TRUE"] values to consider as boolean true
false_values defaults to ["F","FALSE"] values to consider as boolean false
helper method to return the block as Proc. Used when doing select with multiple criteria.
Kind of workaround as Crystal doesn't allow variadic blocks and Proc definition requires
complete signature like Crysda::ColumnSelector.new{|e| ....}
so instead of
df.select(
Crysda::ColumnSelector.new { |s| ... },
Crysda::ColumnSelector.new { |s| ... }
)
One can simply use this helper
df.select(
Crysda.selector{|e| ....},
Crysda.selector{|e| ....},
)