gssnng¶
Gene Set Scoring on the Nearest Neighbor Graph (gssnng) for Single Cell RNA-seq (scRNA-seq).
Notebook using gmt gene set files ===>>> Open In Colab
Notebook using the Decoupler/Omnipath style API ===>>> Open In Colab
See the paper ===>>> gssnng
Contents¶
The problem: The sparsity of scRNA-seq data creates a poor overlap with many gene sets, which in turn makes gene set scoring difficult. The GSSNNG method is based on using the nearest neighbor graph of cells for data smoothing. This essentially creates mini-pseudobulk expression profiles for each cell, which can be scored by using single sample gene set scoring methods often associated with bulk RNA-seq. Nearest neighbor graphs (NNG) are constructed based on user defined groups (see the ‘groupby’ parameter below). The defined groups can be processed in parallel, speeding up the calculations. For example, a NNG could be constructed within each cluster or jointly by cluster and sample. Smoothing can be performed using either the adjacency matrix (all 1s) or the weighted graph to give less weight to more distant cells.
This package works with AnnData objects stored as h5ad files. Expression values are taken from adata.X. For creating groups, up to four categorical variables can be used, which are found in the adata.obs table. Gene sets can be provided by using .gmt files or through the OmniPath API (see below).
Note
This project is under active development. Please consider using a named release if you’re concerned about reproducibility.