Loading Data

DPKS can load data from a variety of different proteomic processing pipelines directly. If you have a filetype that you would like to be able to parse directly into DPKS, please let us know.

The QuantMatrix is the main entry point to all analysis in DPKS. A new QuantMatrix object can be instantiated with your input data and a design matrix by passing the file paths:

quant_matrix = QuantMatrix(
    quantification_file="path_to_quant_file.tsv",
    design_matrix_file="path_to_design_matrix_file.tsv"
)

Or by passing in a pandas DataFrame:


quant_data = pd.read_csv(
    "path_to_quant_file.tsv",
    sep="\t"
)

design_matrix = pd.read_csv(
    "path_to_design_matrix_file.tsv",
    sep="\t"
)

quant_matrix = QuantMatrix(
    quantification_file=quant_data,
    design_matrix_file=design_matrix
)

This is particularly useful if you want to process your data (reformat, filter, etc.) in someway before loading it into DPKS. The ability to pass in files or DataFrames directly to the QuantMatrix object provides some flexibility in the type of data that you can load, making it easy to write custom parsers for new result file types.

Tip

If you encounter errors during parsing, it is useful to first load your data as DataFrames to first verify that everything is formatted correctly

Generic Input

Quantitative Data

DPKS accepts a generic results file that you can reformat your own data to if there is not a built-in parser available.

Column	Description
PrecursorId	A unique identifier generally composed of the Peptide Sequence (with mods) and the charge.
Charge	The precursor charge.
PeptideSequence	The modified peptide sequence.
Decoy (Optional)	Indicating if the precursor is a decoy (used for filtering).
RetentionTime	The retention time of the precursor.
Protein	The protein accession code associated with the precursor.
PeptideQValue (Optional)	The global peptide level q-value (used for filtering).
ProteinQValue (Optional)	The global protein level q-value (used for filtering).
Sample Columns (Many Columns)	All other columns containing quantification data for your samples.

If you already have controlled for the global FDR, you do not need to include the Decoy, PeptideQValue, or ProteinQValue columns.

A generic file format may look like this:

PeptideSequence	Charge	Protein	RetentionTime	PeptideQValue	ProteinQValue	SAMPLE_1.osw	SAMPLE_2.osw	SAMPLE_3.osw
PEPTIK	4	P00352	5736.15	7.81e-06	0.0001169	29566.2	59295.7	24536.4
EFMEEVIQR	2	P04275	3155.5	9.41e-06	0.0001169	69900.3	195571.0	403947.0
SSSGTPDLPVLLTDLK	2	P00352	5386.69	7.815e-06	0.000116	115684.0	132524.0	217962.0

Note

If you want to pass already quantified Proteins you could do this:

Protein	SAMPLE_1.osw	SAMPLE_2.osw	SAMPLE_3.osw
P00352	29566.2	59295.7	24536.4
P04275	69900.3	195571.0	403947.0
P00352	115684.0	132524.0	217962.0

Design Matrix

A basic design matrix will have 2 main columns:

Column	Description
Sample (Required)	A list of the samples. This helps differentiate between sample columns and annotation columns in the QuantMatrix
Group (Optional)	The group the sample belongs to. Used in differential testing and explainable machine learning.

A minimal design matrix for the above input examples could look like this:

Sample
SAMPLE_1.osw
SAMPLE_2.osw
SAMPLE_3.osw

And an example using the Group column:

sample	group
AAS_P2009_167	6
AAS_P2009_169	4
AAS_P2009_176	6
AAS_P2009_178	4
AAS_P2009_187	4
AAS_P2009_194	6
AAS_P2009_196	4
AAS_P2009_203	6
AAS_P2009_205	4
AAS_P2009_212	6
AAS_P2009_214	4
AAS_P2009_221	6
AAS_P2009_230	6
AAS_P2009_232	4
AAS_P2009_239	6
AAS_P2009_241	4
AAS_P2009_248	6
AAS_P2009_250	4

DIANN

You can load data directly from DIA-NN using the long-format diann-output.tsv file that is generated. The samples in your design matrix column should match the Run column in the DIA-NN output, but other columns can be indicated if desired.

Additionally, if you have used MBR, the correct columns will be used to filter precursors at the indicated FDR threshold.

quant_matrix = QuantMatrix(
    quantification_file=quant_file,
    design_matrix_file=simple_design,
    quant_type="diann",
    diann_qvalue=0.01
)