BINN

This is the API reference for the BINN-package. For usage examples, see Examples. Note that the API is still stabilizing and will undergo changes.

BINN

Bases: Module

A biologically informed neural network (BINN) in pure PyTorch.

If heads_ensemble=False, we build a standard sequential network with layer-to-layer connections.

If heads_ensemble=True, we build an 'ensemble of heads' network: each hidden layer also produces a separate head (dimension = n_outputs) which is passed through a sigmoid, then summed at the end.

Parameters:

Name Type Description Default
data_matrix DataFrame

A DataFrame of input features (samples x features). If not needed, can be None.

None
use_reactome bool

If True, loads mapping and pathways from load_reactome_db(), ignoring the ones provided.

required
mapping DataFrame

A DataFrame describing how each input feature maps into the pathway graph. If None, the user must rely on use_reactome=True.

None
pathways DataFrame

A DataFrame describing the edges among pathway nodes.

None
entity_col str

Datamatrix: The column for the entity, in the datamatrix file.

'Protein'
input_col str

Mapping: The column for the input in the mapping file. Should correspond to entity in the datamatrix file.

'input'
translation_col str

Mapping: The column for the translation in the mapping file.

'translation'
target_col str

Pathways: The column for the target in the pathways file.

'target'
source_col str

Pathways: The column for the source in the pathways file.

'source'
activation str

The activation function to use in each layer. Defaults to "tanh".

'tanh'
n_layers int

Number of layers in the network (depth). Defaults to 4.

4
n_outputs int

Dimension of the final output (e.g., 2 for binary classification). Defaults to 2.

2
dropout float

Dropout probability. Defaults to 0.

0
heads_ensemble bool

If True, build an ensemble-of-heads network. Otherwise, a standard MLP.

False
device str

The PyTorch device to place this model on. Defaults to "cpu".

'cpu'

Attributes:

Name Type Description
inputs List[str]

The list of input feature names derived from the first connectivity matrix.

layers Module

The built network (either standard sequential or ensemble-of-heads).

layer_names List[List[str]]

The node (feature) names for each layer, for interpretability.

connectivity_matrices List[DataFrame]

The adjacency (pruning) masks for each layer, derived from the pathway network.

Source code in binn/model/binn.py
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
class BINN(nn.Module):
    """
    A biologically informed neural network (BINN) in pure PyTorch.

    If `heads_ensemble=False`, we build a standard sequential network
    with layer-to-layer connections.

    If `heads_ensemble=True`, we build an 'ensemble of heads' network:
    each hidden layer also produces a separate head (dimension = n_outputs)
    which is passed through a sigmoid, then summed at the end.

    Args:
        data_matrix (pd.DataFrame, optional):
            A DataFrame of input features (samples x features). If not needed, can be None.
        use_reactome (bool, optional):
            If True, loads `mapping` and `pathways` from `load_reactome_db()`, ignoring the ones provided.
        mapping (pd.DataFrame, optional):
            A DataFrame describing how each input feature maps into the pathway graph.
            If None, the user must rely on `use_reactome=True`.
        pathways (pd.DataFrame, optional):
            A DataFrame describing the edges among pathway nodes.
        entity_col (str, optional):
            **Datamatrix**: The column for the entity, in the datamatrix file.
        input_col (str, optional):
            **Mapping**: The column for the input in the mapping file. Should correspond to entity in the datamatrix file.
        translation_col (str, optional):
            **Mapping**: The column for the translation in the mapping file.
        target_col (str, optional):
            **Pathways**: The column for the target in the pathways file.
        source_col (str, optional):
            **Pathways**: The column for the source in the pathways file.
        activation (str, optional):
            The activation function to use in each layer. Defaults to "tanh".
        n_layers (int, optional):
            Number of layers in the network (depth). Defaults to 4.
        n_outputs (int, optional):
            Dimension of the final output (e.g., 2 for binary classification). Defaults to 2.
        dropout (float, optional):
            Dropout probability. Defaults to 0.
        heads_ensemble (bool, optional):
            If True, build an ensemble-of-heads network. Otherwise, a standard MLP.
        device (str, optional):
            The PyTorch device to place this model on. Defaults to "cpu".


    Attributes:
        inputs (List[str]):
            The list of input feature names derived from the first connectivity matrix.
        layers (nn.Module):
            The built network (either standard sequential or ensemble-of-heads).
        layer_names (List[List[str]]):
            The node (feature) names for each layer, for interpretability.
        connectivity_matrices (List[pd.DataFrame]):
            The adjacency (pruning) masks for each layer, derived from the pathway network.
    """

    def __init__(
        self,
        data_matrix: pd.DataFrame = None,
        network_source: str = None,
        input_source: str = "uniprot",
        mapping: pd.DataFrame = None,
        pathways: pd.DataFrame = None,
        entity_col: str = "Protein",
        input_col: str = "input",
        translation_col: str = "translation",
        target_col: str = "target",
        source_col: str = "source",
        activation: str = "tanh",
        n_layers: int = 4,
        n_outputs: int = 2,
        dropout: float = 0,
        heads_ensemble: bool = False,
        device: str = "cpu",
    ):
        super().__init__()

        self.device = device
        self.to(self.device)

        self.n_layers = n_layers
        self.heads_ensemble = heads_ensemble

        # Build the pathway network from dataframes

        if network_source == "reactome":
            reactome_db = load_reactome_db(input_source=input_source)
            mapping = reactome_db["mapping"]
            pathways = reactome_db["pathways"]

        # Build connectivity from the pathway network
        pn = dataframes_to_pathway_network(
            data_matrix=data_matrix,
            pathway_df=pathways,
            mapping_df=mapping,
            input_col=input_col,
            target_col=target_col,
            source_col=source_col,
            entity_col=entity_col,
            translation_col=translation_col,
        )

        # The connectivity matrices for each layer
        self.connectivity_matrices = pn.get_connectivity_matrices(n_layers=n_layers)

        # Collect layer sizes
        layer_sizes = []
        self.layer_names = []

        # First matrix => input layer size
        mat_first = self.connectivity_matrices[0]
        in_features, _ = mat_first.shape
        layer_sizes.append(in_features)

        self.inputs = mat_first.index.tolist()  # feature names
        self.layer_names.append(mat_first.index.tolist())

        # Additional layers
        for mat in self.connectivity_matrices[1:]:
            i, _ = mat.shape
            layer_sizes.append(i)
            self.layer_names.append(mat.index.tolist())

        # Build actual layers
        if heads_ensemble:
            self.layers = _generate_ensemble_of_heads(
                layer_sizes,
                self.connectivity_matrices,
                activation=activation,
                n_outputs=n_outputs,
                bias=True,
            )
        else:
            self.layers = _generate_sequential(
                layer_sizes,
                self.connectivity_matrices,
                activation=activation,
                n_outputs=n_outputs,
                dropout=dropout,
                bias=True,
            )

        # Weight init
        self.apply(_init_weights)

        # Print device info
        print(f"\n[INFO] BINN is on device: {self.device}")

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Standard forward pass; if heads_ensemble=True, sum-of-heads is used."""
        return self.layers(x)

forward(x)

Standard forward pass; if heads_ensemble=True, sum-of-heads is used.

Source code in binn/model/binn.py
158
159
160
def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Standard forward pass; if heads_ensemble=True, sum-of-heads is used."""
    return self.layers(x)