chem.cr

Crystal CI Version License

Features | Installation | Usage | Benchmark | Roadmap | Similar software | Testing | Contributing | Contributors | License

A modern library written in [Crystal]1 for manipulating molecular files used in computational chemistry and biology. It aims to be both fast and easy to use.

[!NOTE] PSIQUE was moved to the psique repository.

Features

[!IMPORTANT] This library is in alpha stage, meaning that there is missing functionality, documentation, etc. and there will be breaking changes.

Installation

Ensure the Crystal compiler is installed:

$ crystal -v
Crystal 1.13.1 [0cef61e51] (2024-07-12)

LLVM: 18.1.6
Default target: x86_64-unknown-linux-gnu

If the command fails, you need to install the crystal compiler by following [these steps]3.

Crystal requires listing the dependencies in the shard.yml file. Let's create a new project:

$ crystal init app myapp
    create  myapp/.gitignore
    create  myapp/.editorconfig
    create  myapp/LICENSE
    create  myapp/README.md
    create  myapp/.travis.yml
    create  myapp/shard.yml
    create  myapp/src/myapp.cr
    create  myapp/src/myapp/version.cr
    create  myapp/spec/spec_helper.cr
    create  myapp/spec/myapp_spec.cr
Initialized empty Git repository in /home/crystal/myapp/.git/
$ cd myapp

Add the following to the application's shard.yml:

dependencies:
  chem:
    github: franciscoadasme/chem.cr

Then, resolve and install missing dependencies:

$ shards install

Dependencies are installed into the lib folder. More about dependencies at the [Requiring files]4 guide.

Requirements

Usage

First require the chem module:

require "chem"

Let's first read a structure:

struc = Chem::Structure.read "file.pdb"
struc # => <Structure "1cbn": 644 atoms, 47 residues, periodic>

You can also use a custom .from_* method to specify reading options when available:

Chem::Structure.from_pdb "/path/to/file.pdb"
Chem::Structure.from_pdb "/path/to/file.pdb", chains: ['A'] # reads only chain A
Chem::Structure.from_pdb "/path/to/file.pdb", het: false    # skips HET atoms
Chem::Structure.from_pdb "/path/to/file.pdb", alt_loc: 'A'  # selects alternate location A

You can access PDB header information via the #experiment property:

if expt = struc.experiment # checks if experiment data is present
  expt.title               # => "ATOMIC RESOLUTION (0.83 ANGSTROMS) CRYSTAL STRUCTURE..."
  expt.kind                # => XRayDiffraction
  expt.resolution          # => 0.83
  expt.deposition_date     # => 1991-10-11
end

You can also read many structures at once:

# read all models
Array(Chem::Structure).from_pdb "/path/to/file.pdb"
# read 2th and 5th models
Array(Chem::Structure).from_pdb "/path/to/file.pdb", indexes: [1, 4]

Alternatively, you could use the reader class directly to read one by one via the #each method:

Chem::PDB::Reader.new("/path/to/file.pdb").each { |struc| ... }
Chem::PDB::Reader.new("/path/to/file.pdb").each(indexes: [1, 4]) { |struc| ... }

Topology access

You can access topology objects using the #dig methods:

struc.dig('A')           # => <Chain A>
struc.dig('A', 10)       # => <Residue A:ARG10>
struc.dig('A', 10, "CA") # => <Atom A:ARG10:CA(146)>

struc.dig 'A', 10, "CJ"  # raises a KeyError because "CJ" doesn't exist
struc.dig? 'A', 10, "CJ" # => nil

Each topology object have several modifiable properties:

atom = struc.dig 'A', 10, "CA"
atom.element.name            # => "Carbon"
atom.pos                     # => [8.47 4.577 8.764]
atom.occupancy               # => 1.0
atom.bonded_atoms.map &.name # => ["N", "C", "HA", "CB"]

atom.residue.number   # => 10
atom.residue.protein? # => true
atom.residue.pred     # => <Residue A:PHE9>

atom.chain.id            # => 'A'
atom.chain.residues.size # => 152

Thanks to Crystal's powerful standard library, manipulating topology objects is very easy:

# select chains longer than 50 residues
struc.chains.select { |chain| chain.residues.size > 50 }
# ramachandran angles
struc.residues.map { |residue| {residue.phi, residue.psi} } # => [{129.5, 90.1}, ...]
# renumber residues starting from 1
struc.residues.each_with_index { |residue, i| residue.number = i + 1 }
# constrain Z-axis
struc.atoms.each &.constraint=(:z)
# total partial charge
struc.atoms.sum_of &.partial_charge
# iterate over secondary structure elements
struc.residues.chunk(&.sec).each do |sec, residues|
  sec      # => HelixAlpha
  residues # => [<Residue A:ARG1>, <Residue A:LEU2>, ...]
end

The #chains, #residues, and #atoms methods return an array view of Chain, Residue and Atom instances, respectively. Refer to the Enumerable and Indexable modules for more information about available methods.

Atom selection

Unlike most other libraries for computational chemistry, there is no text-based language to select a subset of atoms (for now). However, one can achieve a rather similar experience using Crystal's own syntax:

struc.atoms.select { |atom| atom.partial_charge > 0 }
# or (Crystal's short block syntax)
struc.atoms.select &.partial_charge.>(0)
# compared to a custom language
struc.atoms.select "partial_charge > 0"

# select atoms within a cylinder of radius = 4 A along the Z axis and centered at the origin
struc.atoms.select { |atom| atom.x**2 + atom.y**2 < 4 }
# compared to a custom language
struc.atoms.select "sqrt(x) + sqrt(y) < 4" # or "x**2 + y**2 < 4"

Using Crystal itself for selection provides one big advantage: type-safety. Doing something like atom.name**2 will result in an error during compilation, pointing exactly the error's location. Instead, using a custom language will produce an error during runtime at some point during execution, where the message may be useful or not depending on the type of error.

Additionally, the code block can be as big and complex as necessary with multiple intermediary computations. Furthermore, a negative condition may be confusing and not be trivial to write, but in Crystal you would simply use the #reject method instead. The same syntax can also be used for counting, grouping, etc. via the standard library.

Thanks to the topology hierarchical access, the above also works for chains and residues:

# select protein chains
struc.chains.select &.residue.any?(&.protein?)
# select only solvent residues
struc.residues.select &.solvent?
# select residues with its CA atom within 5 A of the first CA atom (this is equivalent to "same residue as" or "fillres" in other libraries)
ca = struc.dig('A', 1, "CA")
struc.residues.select { |residue| residue.dig("CA").pos.distance ca.pos < 5 }

This may improve performance drastically when selecting chains/residues as it only requires traversing the chains/residues, which will be significantly smaller than the number of atoms. Most libraries do not offer such functionality, and one often needs to resort to select unique atoms within the desired residues.

Coordinates manipulation

All coordinates manipulation is done using a Positions3Proxy instance, available for any topology object containing atoms (i.e. structure, chain, and residue) via the #pos method:

# geometric center
struc.pos.center
# center at origin
struc.pos.center_at_origin
# wraps atoms into the primary unit cell
struc.pos.wrap
# rotate about an axis
struc.pos.rotate Chem::Spatial::Vec3[1, 2, 3], 90.degrees
# align coordinates to a reference structure
struc.pos.align_to ref_struc

Benchmark

chem.cr is implemented in pure Crystal, making it as fast or even faster than some C-powered packages.

There is a benchmark at the pdb-bench repository that compares chem.cr with popular software for computational chemistry. The benchmark includes the following tests:

Overall, chem.cr (orange) comes first in most tests, sometimes over two orders of magnitude faster than the tested software. Otherwise, it is slightly slower than the faster software, even compared to C/C++ code.

Parsing a large PDB file like 1HTQ seems to be slow in chem.cr, but the implementation may drastically differ between software (e.g. error checking, implicit bond guessing). Please refer to the table at the benchmark results for a detailed comparison.

Roadmap

Topology manipulation

Input and Output

File formats

Analysis

Other

Similar software

There are several libraries providing similar functionality. Here we list some of them and provide one or more reasons for why we didn't use them. However, each one of them is good in their own right, so please do check them out if chem.cr does not work for you.

Testing

Run the tests with the crystal spec command. Some tests work by compiling small programs, which may be slower to run. These may be skipped by running:

$ crystal spec --tag='~codegen'

Contributing

  1. Fork it (https://github.com/franciscoadasme/chem.cr/fork)
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Add some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create a new Pull Request

Contributors

License

Licensed under the MIT license, see the separate LICENSE file.