Scikit-learn API¶
For those not familiar with PyTorch, we've created a wrapper for scikit-learn. This contains the familiar fit/predict-methods.
In [1]:
Copied!
from binn import BINNClassifier, Network, SuperLogger
import pandas as pd
from binn import BINNClassifier, Network, SuperLogger
import pandas as pd
IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
Similar to the PyTorch API, we load data and create a network, however, now we instead create a BINNClassifier object (this is the scikit-learn wrapper class).
In [2]:
Copied!
pathways = pd.read_csv("../data/pathways.tsv", sep="\t")
translation = pd.read_csv("../data/translation.tsv", sep="\t")
input_data = pd.read_csv("../data/test_qm.csv")
design_matrix = pd.read_csv("../data/design_matrix.tsv", sep="\t")
network = Network(
input_data=input_data,
pathways=pathways,
mapping=translation,
source_column="child",
target_column="parent"
)
binn = BINNClassifier(
network=network,
n_layers=4,
dropout=0.2,
epochs=3,
threads=10,
logger=SuperLogger("logs/test")
)
binn.clf.features
pathways = pd.read_csv("../data/pathways.tsv", sep="\t")
translation = pd.read_csv("../data/translation.tsv", sep="\t")
input_data = pd.read_csv("../data/test_qm.csv")
design_matrix = pd.read_csv("../data/design_matrix.tsv", sep="\t")
network = Network(
input_data=input_data,
pathways=pathways,
mapping=translation,
source_column="child",
target_column="parent"
)
binn = BINNClassifier(
network=network,
n_layers=4,
dropout=0.2,
epochs=3,
threads=10,
logger=SuperLogger("logs/test")
)
binn.clf.features
Missing logger folder: logs/test/lightning_logs
BINN is on the device: cpu
Out[2]:
Index(['A0M8Q6', 'O00194', 'O00391', 'O14786', 'O14791', 'O15145', 'O43707', 'O75369', 'O75594', 'O75636', ... 'Q9UBE0', 'Q9UBQ7', 'Q9UBR2', 'Q9UBX5', 'Q9UGM3', 'Q9UK55', 'Q9UNW1', 'Q9Y490', 'Q9Y4L1', 'Q9Y6Z7'], dtype='object', length=449)
We have to make our data-matrix fit the input layer in the BINN. Then we fit the BINN.
In [3]:
Copied!
from util_for_examples import generate_data, fit_data_matrix_to_network_input
X = fit_data_matrix_to_network_input(input_data.reset_index(), features=binn.clf.features)
X, y = generate_data(X, design_matrix)
X_test = X[:10]
X_train = X[10:]
y_test = y[:10]
y_train = y[10:]
binn.fit(X_train, y_train, epochs=5)
from util_for_examples import generate_data, fit_data_matrix_to_network_input
X = fit_data_matrix_to_network_input(input_data.reset_index(), features=binn.clf.features)
X, y = generate_data(X, design_matrix)
X_test = X[:10]
X_train = X[10:]
y_test = y[:10]
y_train = y[10:]
binn.fit(X_train, y_train, epochs=5)
GPU available: True (mps), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs You defined a `validation_step` but have no `val_dataloader`. Skipping val loop. | Name | Type | Params -------------------------------------------- 0 | layers | Sequential | 364 K 1 | loss | CrossEntropyLoss | 0 -------------------------------------------- 364 K Trainable params 0 Non-trainable params 364 K Total params 1.457 Total estimated model params size (MB) Consider setting `persistent_workers=True` in 'train_dataloader' to speed up the dataloader worker initialization. The number of training batches (24) is smaller than the logging interval Trainer(log_every_n_steps=50). Set a lower value for log_every_n_steps if you want to see logs for the training epoch.
Epoch 4: 100%|██████████| 24/24 [00:02<00:00, 10.01it/s, v_num=0, train_loss=0.788, train_acc=0.503]
`Trainer.fit` stopped: `max_epochs=5` reached.
Epoch 4: 100%|██████████| 24/24 [00:07<00:00, 3.19it/s, v_num=0, train_loss=0.788, train_acc=0.503]
We can predict some instances.
In [4]:
Copied!
binn.predict(X_test)
binn.predict(X_test)
Out[4]:
tensor([[-0.7339, 0.7361], [ 0.2570, -1.3820], [ 0.9844, 1.6763], [-0.2509, 0.0399], [-0.6966, -0.6890], [ 0.4983, -0.1070], [-0.7190, -0.5171], [-0.5098, 0.7708], [ 1.3238, 0.2883], [-0.4951, 0.3959]])