Models API

Embedder

The main SIMBA model for embedding MS/MS spectra.

class simba.core.models.transformers.embedder.Embedder(d_model, n_layers, dropout=0.1, weights=None, lr=None, use_element_wise=True, use_cosine_distance=True, use_adduct=False, categorical_adducts=False, adduct_mass_map='', use_ce=False, use_ion_activation=False, use_ion_method=False)[source]

Bases: LightningModule

It receives a set of pairs of molecules and it must train the similarity model based on it. Embed spectra.

__init__(d_model, n_layers, dropout=0.1, weights=None, lr=None, use_element_wise=True, use_cosine_distance=True, use_adduct=False, categorical_adducts=False, adduct_mass_map='', use_ce=False, use_ion_activation=False, use_ion_method=False)[source]

Initialize the CCSPredictor

normalized_dot_product(a, b)[source]
forward(batch)[source]

The inference pass

step(batch, batch_idx, threshold=0.5)[source]

A training/validation/inference step.

training_step(batch, batch_idx)[source]

A training step

validation_step(batch, batch_idx)[source]

A validation step

predict_step(batch, batch_idx)[source]

A predict step

configure_optimizers()[source]

Configure the optimizer for training.

load_weights()[source]
load_pretrained_maldi_embedder(model_path)[source]
are_weights_changed(original_weights, new_weights, layer_test='spectrum_encoder.transformer_encoder.layers.0.norm2.bias')[source]
set_freeze_layers(layer_names_to_freeze, freeze)[source]
get_maldi_embedder_keys(model_path)[source]
get_all_keys()[source]

Spectrum Transformer Encoder

class simba.core.models.transformers.spectrum_transformer_encoder_custom.SpectrumTransformerEncoderCustom(*args, use_adduct: bool = False, categorical_adducts: bool = False, adduct_mass_map: str = '', use_ce: bool = False, use_ion_activation: bool = False, use_ion_method: bool = False, **kwargs)[source]

Bases: SpectrumTransformerEncoder

__init__(*args, use_adduct: bool = False, categorical_adducts: bool = False, adduct_mass_map: str = '', use_ce: bool = False, use_ion_activation: bool = False, use_ion_method: bool = False, **kwargs)[source]

Custom Spectrum Transformer Encoder with optional metadata usage.

use_adduct

use adduct info during training

Type:

bool

categorical_adduct

convert adduct mass to vector

Type:

bool

adduct_mass_map

file that maps adduct masses to vectors

Type:

str

use_ce

use collision energy during training

Type:

bool

use_ion_activation

use ion activation info during training

Type:

bool

use_ion_method

use ionization method during training

Type:

bool

precursor_hook(mz_array: Tensor, intensity_array: Tensor, **kwargs: dict)[source]

Define how additional information in the batch may be used.

Overwrite this method to define custom functionality dependent on information in the batch. Examples would be to incorporate any combination of the mass, charge, retention time, or ion mobility of a precursor ion.

The representation returned by this method is preprended to the peak representations that are fed into the Transformer encoder and ultimately contribute to the spectrum representation that is the first element of the sequence in the model output.

By default, this method returns a tensor of zeros.

Parameters:
  • mz_array (torch.Tensor of shape (n_spectra, n_peaks)) – The zero-padded m/z dimension for a batch of mass spectra.

  • intensity_array (torch.Tensor of shape (n_spectra, n_peaks)) – The zero-padded intensity dimension for a batch of mass spctra.

  • **kwargs (dict) – The additional data passed with the batch.

Returns:

The precursor representations.

Return type:

torch.Tensor of shape (batch_size, d_model)

Ordinal Classification

class simba.core.models.ordinal.ordinal_classification.OrdinalClassification[source]

Bases: object

static from_float_to_class(array, n_classes)[source]

convert a float between 0 and 1 to an integer value between 0 and N_max

static custom_random(array)[source]

round the percentage values to integer, letting 1.5 -> 2