chem.cr

Crystal CI Version License


A modern library written in [Crystal]1 primarily designed for manipulating molecular files created by computational chemistry programs. It aims to be both fast and easy to use.

NOTE: PSIQUE documentation can be found here

IMPORTANT: this library is in alpha stage, meaning that there is missing functionality, documentation, etc. and there will be breaking changes.

Features | Installation | Usage | Tools | Benchmarks | Roadmap | Testing | Similar software | Contributing | Contributors | License

Features

Installation

First check that the Crystal compiler is installed correctly:

$ crystal --version
Crystal 0.35.1 [5999ae29b] (2020-06-19)

LLVM: 8.0.0
Default target: x86_64-unknown-linux-gnu

If the command fails, you need to install the crystal compiler by following [these steps]3.

Using Shards

Crystal handles dependencies by listing the packages on a shard.yml file in a Crystal project. First, create a new project:

$ crystal init app myapp
    create  myapp/.gitignore
    create  myapp/.editorconfig
    create  myapp/LICENSE
    create  myapp/README.md
    create  myapp/.travis.yml
    create  myapp/shard.yml
    create  myapp/src/myapp.cr
    create  myapp/src/myapp/version.cr
    create  myapp/spec/spec_helper.cr
    create  myapp/spec/myapp_spec.cr
Initialized empty Git repository in /home/crystal/myapp/.git/
$ cd myapp

Read more about projects at the [Crystal guides]4. Add this to your application's shard.yml:

dependencies:
  chem:
    github: franciscoadasme/chem.cr
    version: ~> 0.1

Then, resolve and install missing dependencies:

shards install

Dependencies are installed into the lib folder. More about dependencies at the [Shards guides]5.

Manual download

Either clone the repository or download a copy of the chem module to your local machine:

tag=$(curl -s https://api.github.com/repos/franciscoadasme/chem.cr/releases/latest | grep "tag_name" | cut -d\" -f4)
wget https://github.com/franciscoadasme/chem.cr/archive/$tag.tar.gz -O chem.cr-${tag#v}.tar.gz
tar xvf chem.cr-${tag#v}.tar.gz

This will download the latest version of the chem module to a folder named chem.cr-X.Y.Z, where X.Y.Z stands for the current version.

Requirements

Usage

First require the chem module. You must have used the shards command to install the dependencies:

require "chem"

Refer to the [Crystal guides]6 for more information about how to require other files.

Either way, write the following to avoid typing the Chem:: prefix:

include Chem

Let's first read a structure:

structure = Structure.read "/path/to/file.pdb"
structure # => <Structure "1cbn": 644 atoms, 47 residues, periodic>

You can also use a custom read method that accepts specific options:

Structure.from_pdb "/path/to/file.pdb"
Structure.from_pdb "/path/to/file.pdb", chains: ['A'] # read only chain A
Structure.from_pdb "/path/to/file.pdb", het: false # skip HET atoms
Structure.from_pdb "/path/to/file.pdb", alt_loc: 'A' # select alternate location A

You can access PDB header information via the #experiment property:

if expt = structure.experiment # checks if experiment data is present
  expt.title           # => "ATOMIC RESOLUTION (0.83 ANGSTROMS) CRYSTAL STRUCTURE..."
  expt.kind            # => XRayDiffraction
  expt.resolution      # => 0.83
  expt.deposition_date # => 1991-10-11
  ...
end

You can also read many structures at once:

# read all models
Array(Structure).from_pdb "/path/to/file.pdb"
# read 2th and 5th models
Array(Structure).from_pdb "/path/to/file.pdb", indexes: [1, 4]

Alternatively, you could use an IO iterator to read one by one:

PDB::Reader.new("/path/to/file.pdb").each { |s| ... }
PDB::Reader.new("/path/to/file.pdb").each(indexes: [1, 4]) { |s| ... }

Topology access

You can access topology objects using the bracket syntax (like a hash or associative array or dictionary):

structure['A'] # => <Chain A>
structure['A'][10] # => <Residue A:ARG10>
structure['A'][10]["CA"] # => <Atom A:ARG10:CA(146)>

Alternatively, you can use the #dig and #dig? methods:

structure.dig 'A' # => <Chain A>
structure.dig 'A', 10 # => <Residue A:ARG10>
structure.dig 'A', 10, "CA" # => <Atom A:ARG10:CA(146)>

structure.dig 'A', 10, "CJ" # causes an error because "CJ" doesn't exist
structure.dig? 'A', 10, "CJ" # => nil

Each topology object have several modifiable properties:

atom = structure.dig 'A', 10, "CA"
atom.element.name # => "Carbon"
atom.coords # => [8.47 4.577 8.764]
atom.occupancy # => 1.0
atom.bonded_atoms.map &.name # => ["N", "C", "HA", "CB"]

Thanks to Crystal's powerful standard library, manipulating topology objects is very easy:

# ramachandran angles
structure.residues.map { |r| {r.phi, r.psi} } # => [{129.5, 90.1}, ...]
# renumber residues starting from 1
structure.residues.each_with_index { |res, i| res.number = i + 1 }
# constrain Z-axis
structure.atoms.each &.constraint=(:z)
# total charge
structure.atoms.sum_of &.partial_charge
# iterate over secondary structure elements
structure.residues.chunk(&.sec).each do |sec, residues|
  sec # => HelixAlpha
  residues # => [<Residue A:ARG1>, <Residue A:LEU2>, ...]
end

Here #residues and #atoms return an array of Residue and Atom instances, respectively. Collections also provide iterator-based access, e.g., #each_atom, that avoids expensive memory allocations:

structure.atoms.any? &.constraint # array allocation to just check a condition
structure.each_atom.any? &.constraint # faster!

Atom selection

Right now, there is no custom language to select a subset of atoms. However, thanks to Crystal, one can achieve a similar result with an intuitive syntax:

structure.atoms.select { |atom| atom.partial_charge > 0 }
# or
structure.atoms.select &.partial_charge.>(0)
# compared to a custom language
structure.atoms.select "partial_charge > 0"

# select atoms within a cylinder of radius = 4 A and centered at the origin
structure.atoms.select { |atom| atom.x**2 + atom.y**2 < 4 }
# compared to a custom language
structure.atoms.select "sqrt(x) + sqrt(y) < 4" # or "x**2 + y**2 < 4"

One advantage to using Crystal itself is that it provides type-safety: doing something like atom.name**2 will result in a compilation error, whereas using a custom language will probably produce a confusing error during runtime. Additionally, the code block can be as big and complex as necessary with multiple intermediary computations. Furthermore, a negative condition may be confusing and not be trivial to write, but in Crystal you would simply use #reject instead.

Finally, the above also works for chain and residue collections:

# select protein chains
structure.chains.select &.each_residue.any?(&.protein?)
# select only solvent residues
structure.residues.select &.solvent?
# select residues with any atom within 5 A of the first CA atom
# (this is equivalent to "same residue as" or "fillres" in other libraries)
ca = structure.dig 'A', 1, "CA"
structure.residues.select do |res|
  res.each_atom.any? { |atom| Spatial.distance(atom, ca) < 5 }
end
# or
structure.atoms.select { |atom| Spatial.distance(atom, ca) < 5 }.residues

Coordinates manipulation

All coordinates manipulation is done using a CoordinatesProxy instance, available for any atom collection (i.e., structure, chain or residue) via #coords:

# geometric center
structure.coords.center
# center at origin
structure.coords.translate! -structure.coords.center
# wraps atoms into the primary unit cell
structure.coords.wrap
...

Benchmarks

chem.cr is implemented in pure Crystal, making it as fast or even faster than some C-powered packages.

The benchmark is designed as follows:

IMPORTANT: direct comparison of parsing times should be taken with a grain of salt because each library does something slightly different, e.g., error checking. Some of this functionality is listed below. Nonetheless, these results gives an overall picture in terms of the expected performance.

| | Biopython | chem.cr | Chemfiles | MDAnalysis | MDTraj | schrodinger | VMD | | -------------------- | --------: | ------: | --------: | ---------: | -----: | ----------: | ----: | | Parse 1CRN [ms] | 6.521 | 1.028 | 1.668 | 5.059 | 11.923 | 45.497 | 2.285 | | Parse 3JYV [s] | 0.837 | 0.086 | 0.199 | 0.404 | 1.490 | 0.766 | 0.162 | | Parse 1HTQ [s] | 16.146 | 1.673 | 2.540 | 1.387 | 18.969 | 11.997 | 0.236 | | Count [ms] | 0.210 | 0.009 | 0.322 | 0.041 | 0.079 | 25.997 | 0.165 | | Distance [ms] | 0.172 | 0.000 | 1.016 | 0.382 | 0.990 | 43.101 | 0.379 | | Ramachandran [ms] | 110.450 | 0.607 | - | 690.201 | 4.947 | 68.758 | 1.814 | | | | | | | | | | | License | Biopython | MIT | BSD | GPLv2 | LGPL | Proprietary | VMD | | Parse Header | yes | yes | yes | no | no | no | no | | Parse CONECT | no | yes | yes | no | yes | yes | yes | | Guess bonds | no | no | yes | no | yes | yes | yes | | Hybrid36 | no | yes | no | yes | no | no | no | | Hierarchical parsing | yes | yes | no | no | no | no | no | | Supports disorder | yes | yes | no | no | yes | yes | no |

Latest update: 2019-11-10

Scripts and details are provided at pdb-bench.

Roadmap

Topology manipulation

Input and Output

File formats

Analysis

Other

Testing

Run the tests with crystal spec.

Similar software

There are several libraries providing similar functionality. Here we list some of them and provide one or more reasons for why we didn't use them. However, each one of them is good in their own right so please do check them out if chem.cr does not work for you.

Contributing

  1. Fork it (https://github.com/franciscoadasme/chem.cr/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors

License

Licensed under the MIT license, see the separate LICENSE file.