Usage
Import the package
from Harmonics import *
Initialize the model
model = Harmonics_Model(
adata_list,
slice_name_list,
cond_list=cond_list,
cond_name_list=cond_name_list,
concat_label='slice_name',
proportion_label=None,
refine_k=0,
seed=1234,
parallel=True,
verbose=True,
)
Parameters
adata_list: list of anndata objects. Whole dataset for condition-agnostic analysis, control group for case-control study, or reference data for label transfer.
slice_name_list: list of strings, names of slices corresponding to adata_list.
cond_list: Default: None. list of anndata objects or None. None for condition-agnostic analysis, case group for case-control study, or query data for label transfer.
cond_name_list: Default: None. list of strings or None, names of slices corresponding to cond_list.
concat_label: Default: ‘slice_name’. string, key in .obs to store slice names.
proportion_label: Default: None. string or None, key in .obsm storing cell type proportions for low-resolution data.
refine_k: Default: 0. int, number of cell types with highest proportion used for niche refinement; set 0 to skip refinement.
seed: Default: 1234. int, random seed.
parallel: Default: True. bool, whether to run computations in parallel.
verbose: Default: True. bool, whether to print progress messages.
Construct cell representations
model.preprocess(
ct_key='celltype',
spatial_key='spatial',
method='joint',
n_step=3,
n_neighbors=20,
radius='auto',
cut_percentage=99,
kernel=None,
)
Parameters
ct_key: Default: ‘celltype’. string, key in .obs storing the cell type information.
spatial_key: Default: ‘spatial’. string, key in .obsm storing the spatial coordinates.
method: Default: ‘joint’. string or None, method for graph construction. Options:
'joint': n-step hop Delaunay triangulation with graph completion to at least n_neighbors per cell.'delaunay': n-step hop Delaunay triangulation.'knn': connect n_neighbors neighbors per cell.'radius': connect all cells within a specified radius. If radius=’auto’, the radius is set to the median distance to the n_neighbors-th nearest neighbor across cells.None: directly use cell type composition (for low-resolution data).
n_step: Default: 3. int, number of steps for n-step Delaunay triangulation.
n_neighbors: Default: 20. int, minimum number of neighbors per cell when method is ‘joint’ or ‘knn’.
radius: Default: ‘auto’. float or ‘auto’, radius used when method=’radius’. If set to ‘auto’, the radius is automatically determined as the median distance to the n_neighbors-th nearest neighbor.
cut_percentage: Default: 99. int, percentage of shortest edges to keep in the Delaunay triangulation adjacency graph.
kernel: Default: None. string or None, optional weighting scheme used when computing microenvironment cell type distributions. Options:
None: use unweighted averaging over neighboring cells.'gaussian': use an adaptive Gaussian kernel to assign distance-dependent weights to neighbors before averaging.
Over-clustering initialization (whole dataset / control group / reference)
model.initialize_clusters(
dim_reduction=True,
explained_var=None,
n_components=None,
n_components_max=100,
standardize=True,
method='kmeans',
Qmax=20,
)
Parameters
dim_reduction: Default: True. bool, whether to perform dimensionality reduction (PCA) before clustering.
explained_var: Default: None. float or None, target cumulative explained variance for dimensionality reduction.
n_components: Default: None. int or None, number of components to retain after dimensionality reduction. If None, retain no more than n_components_max.
n_components_max: Default: 100. int, maximum number of components allowed during reduction.
standardize: Default: True. bool, whether to z-score normalize each feature before dimensionality reduction.
method: Default: ‘kmeans’. string, clustering method for initialization. Options: ‘kmeans’, ‘gmm’.
Qmax: Default: 20. int, number of clusters for initialization.
Perform HDM to find solution
model.hier_dist_match(
assign_metric='jsd',
weighted_merge=True,
max_iters=100,
tol=1e-4,
Qmin=2,
)
Parameters
assign_metric: Default: ‘jsd’. string, metric used to evaluate distribution similarity between niches.
weighted_merge: Default: True. bool, whether to use weighted JSD (WJSD) during merging phase.
max_iters: Default: 100. int, maximum number of iterations for convergence.
tol: Default: 1e-4. float, tolerance for convergence.
Qmin: Default: 2. int, minimum number of niches to consider.
Select the solution
adata_list, adata_concat = model.select_solution(
n_niche=None,
niche_key='niche_label',
auto=True,
metric='jsd_v2',
threshold=0.1,
return_adata=True,
plot=True,
save=False,
fig_size=(10, 6),
save_dir=None,
file_name='score_vs_nichecount_basic.pdf',
)
Parameters
n_niche: Default: None. int or None, number of niches to select. If None, solution is selected automatically using metric.
niche_key: Default: ‘niche_label’. string, key in .obs to store niche assignment results.
auto: Default: True. bool, whether to automatically determine the solution if n_niche=None.
metric: Default: ‘jsd_v2’. string, metric used for solution selection. Supported options are:
'jsd': minimum pairwise Jensen-Shannon divergence between niches.'wjsd': weighted minimum pairwise Jensen-Shannon divergence between niches.'jsd_v2': bootstrap-based minimum Jensen-Shannon divergence with confidence intervals.
threshold: Default: 0.1. float, threshold for selecting solution based on metric.
return_adata: Default: True. bool, whether to return an anndata object with niche assignments.
plot: Default: True. bool, whether to plot the minJSD curve.
save: Default: False. bool, whether to save the minJSD plot.
fig_size: Default: (10, 6). tuple, figure size for plotting.
save_dir: Default: None. string or None, directory to save the plot.
file_name: Default: ‘score_vs_nichecount_basic.pdf’. string, name of the saved plot file.
Over-clustering initialization (case group)
model.initialize_clusters_cond(
assign_metric='jsd',
threshold=0.1,
min_cell_per_niche=100,
dim_reduction=True,
explained_var=None,
n_components=None,
n_components_max=100,
standardize=True,
method='kmeans',
Rmax=10,
)
Parameters
assign_metric: Default: ‘jsd’. string, metric used for evaluating distribution similarity when assigning cells to BCNs.
threshold: Default: 0.1. float, minimum divergence threshold below which cells are assigned to BCNs.
min_cell_per_niche: Default: 100. int, minimum average number of cells per new niche.
dim_reduction: Default: True. bool, whether to perform dimensionality reduction (PCA) before clustering.
explained_var: Default: None. float or None, target cumulative explained variance for dimensionality reduction.
n_components: Default: None. int or None, number of components to retain after dimensionality reduction. If None, retain no more than n_components_max.
n_components_max: Default: 100. int, maximum number of components allowed during reduction.
standardize: Default: True. bool, whether to z-score normalize each feature before dimensionality reduction.
method: Default: ‘kmeans’. string, clustering method for initialization.
Rmax: Default: 10. int, number of clusters for initialization.
Perform HDM to find solution (case group)
model.hier_dist_match_cond(
assign_metric='jsd',
weighted_merge=True,
max_iters=100,
tol=1e-4,
)
Parameters
assign_metric: Default: ‘jsd’. string, metric used to evaluate distribution similarity between niches.
weighted_merge: Default: True. bool, whether to use weighted JSD (WJSD) during merging phase.
max_iters: Default: 100. int, maximum number of iterations for convergence.
tol: Default: 1e-4. float, tolerance for convergence.
Select the solution (case group)
cond_list, cond_concat = model.select_solution_cond(
n_csn=None,
niche_key='niche_label',
csn_key='csn_label',
auto=True,
metric='jsd_v2',
threshold=0.1,
return_adata=True,
plot=True,
save=False,
fig_size=(10, 6),
save_dir=None,
file_name='score_vs_nichecount_cond.pdf',
)
Parameters
n_csn: Default: None. int or None, number of CSNs to select. If None, solution is selected automatically using metric.
niche_key: Default: ‘niche_label’. string, key in .obs to store niche assignment results.
csn_key: Default: ‘csn_label’. string, key in .obs to store CSN assignment results.
auto: Default: True. bool, whether to automatically determine the solution if n_csn=None.
metric: Default: ‘jsd_v2’. string, metric used for solution selection. Supported options are:
'jsd': minimum pairwise Jensen-Shannon divergence between niches.'wjsd': weighted minimum pairwise Jensen-Shannon divergence between niches.'jsd_v2': bootstrap-based minimum Jensen-Shannon divergence with confidence intervals.
threshold: Default: 0.1. float, threshold for selecting solution based on metric.
return_adata: Default: True. bool, whether to return an anndata object with CSN assignments.
plot: Default: True. bool, whether to plot the minJSD curve.
save: Default: False. bool, whether to save the minJSD plot.
fig_size: Default: (10, 6). tuple, figure size for plotting.
save_dir: Default: None. string or None, directory to save the plot.
file_name: Default: ‘score_vs_nichecount_cond.pdf’. string, name of the saved plot file.
Label transfer
trans_list, trans_concat = model.label_transfer(
assign_metric='jsd',
niche_key='niche_label',
return_adata=True,
)
Parameters
assign_metric: Default: ‘jsd’. string, metric used to evaluate distribution similarity when assigning cells to niches.
niche_key: Default: ‘niche_label’. string, key in .obs to store niche assignment results.
return_adata: Default: True. bool, whether to return an anndata object with niche assignments.
Cell type enrichment test
ct_results = ct_enrichment_test(
niche_dist,
cell_count_niche,
idx2ct_dict,
niche_summary,
method='fisher',
alpha=0.05,
fdr_method='fdr_by',
log2fc_threshold=1,
prop_threshold=0.01,
verbose=True,
eps=1e-10,
)
Parameters
niche_dist: array-like or sparse matrix, shape (n_niche, n_celltype), representing the proportion of each cell type in each niche.
cell_count_niche: array-like, shape (n_niche,), number of cells in each niche.
idx2ct_dict: dict, mapping from cell type indices to cell type names.
niche_summary: list of strings, names or labels for each niche.
method: Default: ‘fisher’. string, statistical test method. Options:
'fisher': two-sided Fisher’s exact test.'fisher_greater': one-sided Fisher’s exact test (greater).'chi2': chi-square test.
alpha: Default: 0.05. float, significance level for multiple testing correction.
fdr_method: Default: ‘fdr_by’. string, method for false discovery rate correction.
log2fc_threshold: Default: 1. float, minimum log2 fold-change required for enrichment.
prop_threshold: Default: 0.01. float, minimum proportion of cell type in niche required for enrichment.
verbose: Default: True. bool, whether to print progress messages.
eps: Default: 1e-10. float, small value to avoid division by zero.
Returns
ct_results: pandas.DataFrame containing enrichment results with columns:
niche_idx: index of the nicheniche: name of the nichecelltype_idx: index of the cell typecelltype: name of the cell typeoddsratioorchi2_stat: test statisticp-value: raw p-valueq-value: FDR-corrected p-valuelog2fc: log2 fold-changeprop: proportion of cell type in nicheenrichment: bool, whether cell type is significantly enriched.
Cell-cell interaction enrichment test
cci_results, test_norm_list, bg_norm_list, test_edge_count_list, bg_edge_count_list = cci_enrichment_test(
adata_list,
niche_key,
ct_key,
niche_summary=None,
spatial_key='spatial',
cut_percentage=99,
method='fisher',
alpha=0.05,
fdr_method='fdr_by',
log2fc_threshold=1,
prop_threshold=0.01,
verbose=True,
eps=1e-10,
)
Parameters
adata_list: anndata object or list of anndata objects, input datasets to test CCI enrichment.
niche_key: string, key in .obs representing niche labels.
ct_key: string, key in .obs representing cell type labels.
niche_summary: Default: None. list of niche names to test; if None, all unique niches in adata_list are used.
spatial_key: Default: ‘spatial’. string, key in .obsm representing spatial coordinates.
cut_percentage: Default: 99. float, percentage of shortest edges to retain in Delaunay adjacency graph.
method: Default: ‘fisher’. string, statistical test method. Options:
'fisher': two-sided Fisher’s exact test.'fisher_greater': one-sided Fisher’s exact test (greater).
alpha: Default: 0.05. float, significance level for multiple testing correction.
fdr_method: Default: ‘fdr_by’. string, method for false discovery rate correction.
log2fc_threshold: Default: 1. float, minimum log2 fold-change required for enrichment.
prop_threshold: Default: 0.01. float, minimum proportion of cell type pairs in niche required for enrichment.
verbose: Default: True. bool, whether to print progress messages.
eps: Default: 1e-10. float, small value to avoid division by zero.
Returns
cci_results: pandas.DataFrame containing CCI enrichment results with columns:
niche_idx: index of the nicheniche: name of the nichect1_idx,ct2_idx: indices of interacting cell typesct1,ct2: names of interacting cell typestest_edge_count,bg_edge_count: number of observed and background edgestest_edge_prop,bg_edge_prop: proportion of observed and background edgesoddsratio: odds ratio from statistical testp-value: raw p-valueq-value: FDR-corrected p-valuelog2fc: log2 fold-changeenrichment: bool, whether interaction is significantly enriched
test_norm_list: list of normalized test adjacency matrices for each niche.
bg_norm_list: list of normalized background adjacency matrices for each niche.
test_edge_count_list: list of total test edges per niche.
bg_edge_count_list: list of total background edges per niche.
Niche-niche colocalization enrichment test
df_results, edge_prop_mtx, n1_count = nnc_enrichment_test(
adata_list,
niche_key,
niche_summary=None,
spatial_key='spatial',
cut_percentage=99,
method='fisher',
alpha=0.05,
fdr_method='fdr_by',
log2fc_threshold=1,
prop_threshold=0.01,
verbose=True,
eps=1e-10,
)
Parameters
adata_list: anndata object or list of anndata objects, input datasets for NNC enrichment testing.
niche_key: string, key in .obs representing niche labels.
niche_summary: Default: None. list of niche names to test; if None, all unique niches in adata_list are used.
spatial_key: Default: ‘spatial’. string, key in .obsm representing spatial coordinates.
cut_percentage: Default: 99. float, percentage of shortest edges to retain in Delaunay adjacency graph.
method: Default: ‘fisher’. string, statistical test method. Options:
'fisher': two-sided Fisher’s exact test.'fisher_greater': one-sided Fisher’s exact test (greater).'chi2': chi-square test with continuity correction.
alpha: Default: 0.05. float, significance level for multiple testing correction.
fdr_method: Default: ‘fdr_by’. string, method for false discovery rate correction.
log2fc_threshold: Default: 1. float, minimum log2 fold-change required for enrichment.
prop_threshold: Default: 0.01. float, minimum proportion of edges between niches required for enrichment.
verbose: Default: True. bool, whether to print progress messages.
eps: Default: 1e-10. float, small value to avoid division by zero.
Returns
df_results: pandas.DataFrame containing NNC enrichment results with columns:
niche1_idx,niche2_idx: indices of interacting niches. Niche 1 is source niche and niche 2 is target niche.niche1,niche2: names of interacting niches. Niche 1 is source niche and niche 2 is target niche.edge_count: number of edges observed between niche pairsedge_prop: proportion of edges between niche pairs for the source nicheoddsratioorchi2_stat: statistic from the testp-value: raw p-valueq-value: FDR-corrected p-valuelog2fc: log2 fold-changeenrichment: bool, whether the interaction is significantly enriched
edge_prop_mtx: numpy array, normalized edge proportions between all niche pairs.
n1_count: numpy array, total outgoing edges for each niche.
Niche-niche colocalization matrix
edge_prop_mtx, n1_count = cal_nnc_mtx(
adata,
niche_key,
niche_summary=None,
adj_mtx_key=None,
spatial_key=None,
cut_percentage=99,
reserve_nonexist=False,
)
Parameters
adata: anndata object, input dataset used to compute the niche-niche colocalization matrix.
niche_key: string, key in .obs representing niche labels.
niche_summary: Default: None. list of niche names or labels. If None, all unique niche labels in adata.obs[niche_key] are used.
adj_mtx_key: Default: None. string or None, key in .obsp storing a precomputed adjacency matrix. If provided, this adjacency matrix is used directly.
spatial_key: Default: None. string or None, key in .obsm storing spatial coordinates. If adj_mtx_key is None, a Delaunay adjacency graph is constructed from these coordinates.
cut_percentage: Default: 99. float, percentage threshold for retaining Delaunay edges when constructing the adjacency graph from spatial coordinates. Only used when spatial_key is provided and adj_mtx_key is None.
reserve_nonexist: Default: False. bool, whether to retain niches with zero outgoing niche-niche edges by setting their total edge count to 1 before normalization, thereby avoiding division by zero.
Returns
edge_prop_mtx: numpy.ndarray, shape (n_niche, n_niche), row-normalized niche-niche colocalization matrix. Each entry represents the proportion of outgoing edges from one niche to another niche, excluding self-niche edges.
n1_count: numpy.ndarray, shape (n_niche,), total number of outgoing edges from each niche to other niches before normalization.
Niche-niche colocalization patterns differential test between groups
df_nnc_between_groups = nnc_between_groups_test(
g1_list,
g2_list,
niche_labels,
min_valid=3,
alpha=0.05,
alternative="two-sided",
fdr_method="fdr_by",
)
Parameters
g1_list: array-like, shape (n_sample1, n_niche, n_niche), collection of niche-niche colocalization matrices from group 1.
g2_list: array-like, shape (n_sample2, n_niche, n_niche), collection of niche-niche colocalization matrices from group 2.
niche_labels: list of strings, niche names or labels corresponding to the rows and columns of the colocalization matrices.
min_valid: Default: 3. int, minimum number of valid samples required in each group for testing a niche-niche pair. If either group has fewer than min_valid non-NaN values for a given pair, the corresponding test is not performed.
alpha: Default: 0.05. float, significance level for multiple testing correction.
alternative: Default: ‘two-sided’. string, alternative hypothesis for the Mann-Whitney U test. Options typically include:
'two-sided': tests whether the two groups differ.'greater': tests whether values in group 1 tend to be greater than those in group 2.'less': tests whether values in group 1 tend to be less than those in group 2.
fdr_method: Default: ‘fdr_by’. string, method for false discovery rate correction.
Returns
df_nnc_between_groups: pandas.DataFrame containing niche-niche colocalization comparison results with columns:
niche1: source nicheniche2: target nichemean1: mean colocalization value in group 1mean2: mean colocalization value in group 2delta_mean: difference in mean colocalization value between group 1 and group 2n1_valid: number of valid samples in group 1n2_valid: number of valid samples in group 2p_value: raw p-value from the Mann-Whitney U testq_value: FDR-corrected p-valuerejected: bool, whether the comparison is significant after multiple testing correction.