Usage

Import the package

from Harmonics import *

Initialize the model

model = Harmonics_Model(
    adata_list,
    slice_name_list,
    cond_list=cond_list,
    cond_name_list=cond_name_list,
    concat_label='slice_name',
    proportion_label=None,
    refine_k=0,
    seed=1234,
    parallel=True,
    verbose=True,
)

Parameters

  • adata_list: list of anndata objects. Whole dataset for condition-agnostic analysis, control group for case-control study, or reference data for label transfer.

  • slice_name_list: list of strings, names of slices corresponding to adata_list.

  • cond_list: Default: None. list of anndata objects or None. None for condition-agnostic analysis, case group for case-control study, or query data for label transfer.

  • cond_name_list: Default: None. list of strings or None, names of slices corresponding to cond_list.

  • concat_label: Default: ‘slice_name’. string, key in .obs to store slice names.

  • proportion_label: Default: None. string or None, key in .obsm storing cell type proportions for low-resolution data.

  • refine_k: Default: 0. int, number of cell types with highest proportion used for niche refinement; set 0 to skip refinement.

  • seed: Default: 1234. int, random seed.

  • parallel: Default: True. bool, whether to run computations in parallel.

  • verbose: Default: True. bool, whether to print progress messages.

Construct cell representations

model.preprocess(
    ct_key='celltype',
    spatial_key='spatial',
    method='joint',
    n_step=3,
    n_neighbors=20,
    radius='auto',
    cut_percentage=99,
    kernel=None,
)

Parameters

  • ct_key: Default: ‘celltype’. string, key in .obs storing the cell type information.

  • spatial_key: Default: ‘spatial’. string, key in .obsm storing the spatial coordinates.

  • method: Default: ‘joint’. string or None, method for graph construction. Options:

    • 'joint': n-step hop Delaunay triangulation with graph completion to at least n_neighbors per cell.

    • 'delaunay': n-step hop Delaunay triangulation.

    • 'knn': connect n_neighbors neighbors per cell.

    • 'radius': connect all cells within a specified radius. If radius=’auto’, the radius is set to the median distance to the n_neighbors-th nearest neighbor across cells.

    • None: directly use cell type composition (for low-resolution data).

  • n_step: Default: 3. int, number of steps for n-step Delaunay triangulation.

  • n_neighbors: Default: 20. int, minimum number of neighbors per cell when method is ‘joint’ or ‘knn’.

  • radius: Default: ‘auto’. float or ‘auto’, radius used when method=’radius’. If set to ‘auto’, the radius is automatically determined as the median distance to the n_neighbors-th nearest neighbor.

  • cut_percentage: Default: 99. int, percentage of shortest edges to keep in the Delaunay triangulation adjacency graph.

  • kernel: Default: None. string or None, optional weighting scheme used when computing microenvironment cell type distributions. Options:

    • None: use unweighted averaging over neighboring cells.

    • 'gaussian': use an adaptive Gaussian kernel to assign distance-dependent weights to neighbors before averaging.

Over-clustering initialization (whole dataset / control group / reference)

model.initialize_clusters(
    dim_reduction=True,
    explained_var=None,
    n_components=None,
    n_components_max=100,
    standardize=True,
    method='kmeans',
    Qmax=20,
)

Parameters

  • dim_reduction: Default: True. bool, whether to perform dimensionality reduction (PCA) before clustering.

  • explained_var: Default: None. float or None, target cumulative explained variance for dimensionality reduction.

  • n_components: Default: None. int or None, number of components to retain after dimensionality reduction. If None, retain no more than n_components_max.

  • n_components_max: Default: 100. int, maximum number of components allowed during reduction.

  • standardize: Default: True. bool, whether to z-score normalize each feature before dimensionality reduction.

  • method: Default: ‘kmeans’. string, clustering method for initialization. Options: ‘kmeans’, ‘gmm’.

  • Qmax: Default: 20. int, number of clusters for initialization.

Perform HDM to find solution

model.hier_dist_match(
    assign_metric='jsd',
    weighted_merge=True,
    max_iters=100,
    tol=1e-4,
    Qmin=2,
)

Parameters

  • assign_metric: Default: ‘jsd’. string, metric used to evaluate distribution similarity between niches.

  • weighted_merge: Default: True. bool, whether to use weighted JSD (WJSD) during merging phase.

  • max_iters: Default: 100. int, maximum number of iterations for convergence.

  • tol: Default: 1e-4. float, tolerance for convergence.

  • Qmin: Default: 2. int, minimum number of niches to consider.

Select the solution

adata_list, adata_concat = model.select_solution(
    n_niche=None,
    niche_key='niche_label',
    auto=True,
    metric='jsd_v2',
    threshold=0.1,
    return_adata=True,
    plot=True,
    save=False,
    fig_size=(10, 6),
    save_dir=None,
    file_name='score_vs_nichecount_basic.pdf',
)

Parameters

  • n_niche: Default: None. int or None, number of niches to select. If None, solution is selected automatically using metric.

  • niche_key: Default: ‘niche_label’. string, key in .obs to store niche assignment results.

  • auto: Default: True. bool, whether to automatically determine the solution if n_niche=None.

  • metric: Default: ‘jsd_v2’. string, metric used for solution selection. Supported options are:

    • 'jsd': minimum pairwise Jensen-Shannon divergence between niches.

    • 'wjsd': weighted minimum pairwise Jensen-Shannon divergence between niches.

    • 'jsd_v2': bootstrap-based minimum Jensen-Shannon divergence with confidence intervals.

  • threshold: Default: 0.1. float, threshold for selecting solution based on metric.

  • return_adata: Default: True. bool, whether to return an anndata object with niche assignments.

  • plot: Default: True. bool, whether to plot the minJSD curve.

  • save: Default: False. bool, whether to save the minJSD plot.

  • fig_size: Default: (10, 6). tuple, figure size for plotting.

  • save_dir: Default: None. string or None, directory to save the plot.

  • file_name: Default: ‘score_vs_nichecount_basic.pdf’. string, name of the saved plot file.

Over-clustering initialization (case group)

model.initialize_clusters_cond(
    assign_metric='jsd',
    threshold=0.1,
    min_cell_per_niche=100,
    dim_reduction=True,
    explained_var=None,
    n_components=None,
    n_components_max=100,
    standardize=True,
    method='kmeans',
    Rmax=10,
)

Parameters

  • assign_metric: Default: ‘jsd’. string, metric used for evaluating distribution similarity when assigning cells to BCNs.

  • threshold: Default: 0.1. float, minimum divergence threshold below which cells are assigned to BCNs.

  • min_cell_per_niche: Default: 100. int, minimum average number of cells per new niche.

  • dim_reduction: Default: True. bool, whether to perform dimensionality reduction (PCA) before clustering.

  • explained_var: Default: None. float or None, target cumulative explained variance for dimensionality reduction.

  • n_components: Default: None. int or None, number of components to retain after dimensionality reduction. If None, retain no more than n_components_max.

  • n_components_max: Default: 100. int, maximum number of components allowed during reduction.

  • standardize: Default: True. bool, whether to z-score normalize each feature before dimensionality reduction.

  • method: Default: ‘kmeans’. string, clustering method for initialization.

  • Rmax: Default: 10. int, number of clusters for initialization.

Perform HDM to find solution (case group)

model.hier_dist_match_cond(
    assign_metric='jsd',
    weighted_merge=True,
    max_iters=100,
    tol=1e-4,
)

Parameters

  • assign_metric: Default: ‘jsd’. string, metric used to evaluate distribution similarity between niches.

  • weighted_merge: Default: True. bool, whether to use weighted JSD (WJSD) during merging phase.

  • max_iters: Default: 100. int, maximum number of iterations for convergence.

  • tol: Default: 1e-4. float, tolerance for convergence.

Select the solution (case group)

cond_list, cond_concat = model.select_solution_cond(
    n_csn=None,
    niche_key='niche_label',
    csn_key='csn_label',
    auto=True,
    metric='jsd_v2',
    threshold=0.1,
    return_adata=True,
    plot=True,
    save=False,
    fig_size=(10, 6),
    save_dir=None,
    file_name='score_vs_nichecount_cond.pdf',
)

Parameters

  • n_csn: Default: None. int or None, number of CSNs to select. If None, solution is selected automatically using metric.

  • niche_key: Default: ‘niche_label’. string, key in .obs to store niche assignment results.

  • csn_key: Default: ‘csn_label’. string, key in .obs to store CSN assignment results.

  • auto: Default: True. bool, whether to automatically determine the solution if n_csn=None.

  • metric: Default: ‘jsd_v2’. string, metric used for solution selection. Supported options are:

    • 'jsd': minimum pairwise Jensen-Shannon divergence between niches.

    • 'wjsd': weighted minimum pairwise Jensen-Shannon divergence between niches.

    • 'jsd_v2': bootstrap-based minimum Jensen-Shannon divergence with confidence intervals.

  • threshold: Default: 0.1. float, threshold for selecting solution based on metric.

  • return_adata: Default: True. bool, whether to return an anndata object with CSN assignments.

  • plot: Default: True. bool, whether to plot the minJSD curve.

  • save: Default: False. bool, whether to save the minJSD plot.

  • fig_size: Default: (10, 6). tuple, figure size for plotting.

  • save_dir: Default: None. string or None, directory to save the plot.

  • file_name: Default: ‘score_vs_nichecount_cond.pdf’. string, name of the saved plot file.

Label transfer

trans_list, trans_concat = model.label_transfer(
    assign_metric='jsd',
    niche_key='niche_label',
    return_adata=True,
)

Parameters

  • assign_metric: Default: ‘jsd’. string, metric used to evaluate distribution similarity when assigning cells to niches.

  • niche_key: Default: ‘niche_label’. string, key in .obs to store niche assignment results.

  • return_adata: Default: True. bool, whether to return an anndata object with niche assignments.

Cell type enrichment test

ct_results = ct_enrichment_test(
    niche_dist,
    cell_count_niche,
    idx2ct_dict,
    niche_summary,
    method='fisher',
    alpha=0.05,
    fdr_method='fdr_by',
    log2fc_threshold=1,
    prop_threshold=0.01,
    verbose=True,
    eps=1e-10,
)

Parameters

  • niche_dist: array-like or sparse matrix, shape (n_niche, n_celltype), representing the proportion of each cell type in each niche.

  • cell_count_niche: array-like, shape (n_niche,), number of cells in each niche.

  • idx2ct_dict: dict, mapping from cell type indices to cell type names.

  • niche_summary: list of strings, names or labels for each niche.

  • method: Default: ‘fisher’. string, statistical test method. Options:

    • 'fisher': two-sided Fisher’s exact test.

    • 'fisher_greater': one-sided Fisher’s exact test (greater).

    • 'chi2': chi-square test.

  • alpha: Default: 0.05. float, significance level for multiple testing correction.

  • fdr_method: Default: ‘fdr_by’. string, method for false discovery rate correction.

  • log2fc_threshold: Default: 1. float, minimum log2 fold-change required for enrichment.

  • prop_threshold: Default: 0.01. float, minimum proportion of cell type in niche required for enrichment.

  • verbose: Default: True. bool, whether to print progress messages.

  • eps: Default: 1e-10. float, small value to avoid division by zero.

Returns

  • ct_results: pandas.DataFrame containing enrichment results with columns:

    • niche_idx: index of the niche

    • niche: name of the niche

    • celltype_idx: index of the cell type

    • celltype: name of the cell type

    • oddsratio or chi2_stat: test statistic

    • p-value: raw p-value

    • q-value: FDR-corrected p-value

    • log2fc: log2 fold-change

    • prop: proportion of cell type in niche

    • enrichment: bool, whether cell type is significantly enriched.

Cell-cell interaction enrichment test

cci_results, test_norm_list, bg_norm_list, test_edge_count_list, bg_edge_count_list = cci_enrichment_test(
    adata_list,
    niche_key,
    ct_key,
    niche_summary=None,
    spatial_key='spatial',
    cut_percentage=99,
    method='fisher',
    alpha=0.05,
    fdr_method='fdr_by',
    log2fc_threshold=1,
    prop_threshold=0.01,
    verbose=True,
    eps=1e-10,
)

Parameters

  • adata_list: anndata object or list of anndata objects, input datasets to test CCI enrichment.

  • niche_key: string, key in .obs representing niche labels.

  • ct_key: string, key in .obs representing cell type labels.

  • niche_summary: Default: None. list of niche names to test; if None, all unique niches in adata_list are used.

  • spatial_key: Default: ‘spatial’. string, key in .obsm representing spatial coordinates.

  • cut_percentage: Default: 99. float, percentage of shortest edges to retain in Delaunay adjacency graph.

  • method: Default: ‘fisher’. string, statistical test method. Options:

    • 'fisher': two-sided Fisher’s exact test.

    • 'fisher_greater': one-sided Fisher’s exact test (greater).

  • alpha: Default: 0.05. float, significance level for multiple testing correction.

  • fdr_method: Default: ‘fdr_by’. string, method for false discovery rate correction.

  • log2fc_threshold: Default: 1. float, minimum log2 fold-change required for enrichment.

  • prop_threshold: Default: 0.01. float, minimum proportion of cell type pairs in niche required for enrichment.

  • verbose: Default: True. bool, whether to print progress messages.

  • eps: Default: 1e-10. float, small value to avoid division by zero.

Returns

  • cci_results: pandas.DataFrame containing CCI enrichment results with columns:

    • niche_idx: index of the niche

    • niche: name of the niche

    • ct1_idx, ct2_idx: indices of interacting cell types

    • ct1, ct2: names of interacting cell types

    • test_edge_count, bg_edge_count: number of observed and background edges

    • test_edge_prop, bg_edge_prop: proportion of observed and background edges

    • oddsratio: odds ratio from statistical test

    • p-value: raw p-value

    • q-value: FDR-corrected p-value

    • log2fc: log2 fold-change

    • enrichment: bool, whether interaction is significantly enriched

  • test_norm_list: list of normalized test adjacency matrices for each niche.

  • bg_norm_list: list of normalized background adjacency matrices for each niche.

  • test_edge_count_list: list of total test edges per niche.

  • bg_edge_count_list: list of total background edges per niche.

Niche-niche colocalization enrichment test

df_results, edge_prop_mtx, n1_count = nnc_enrichment_test(
    adata_list,
    niche_key,
    niche_summary=None,
    spatial_key='spatial',
    cut_percentage=99,
    method='fisher',
    alpha=0.05,
    fdr_method='fdr_by',
    log2fc_threshold=1,
    prop_threshold=0.01,
    verbose=True,
    eps=1e-10,
)

Parameters

  • adata_list: anndata object or list of anndata objects, input datasets for NNC enrichment testing.

  • niche_key: string, key in .obs representing niche labels.

  • niche_summary: Default: None. list of niche names to test; if None, all unique niches in adata_list are used.

  • spatial_key: Default: ‘spatial’. string, key in .obsm representing spatial coordinates.

  • cut_percentage: Default: 99. float, percentage of shortest edges to retain in Delaunay adjacency graph.

  • method: Default: ‘fisher’. string, statistical test method. Options:

    • 'fisher': two-sided Fisher’s exact test.

    • 'fisher_greater': one-sided Fisher’s exact test (greater).

    • 'chi2': chi-square test with continuity correction.

  • alpha: Default: 0.05. float, significance level for multiple testing correction.

  • fdr_method: Default: ‘fdr_by’. string, method for false discovery rate correction.

  • log2fc_threshold: Default: 1. float, minimum log2 fold-change required for enrichment.

  • prop_threshold: Default: 0.01. float, minimum proportion of edges between niches required for enrichment.

  • verbose: Default: True. bool, whether to print progress messages.

  • eps: Default: 1e-10. float, small value to avoid division by zero.

Returns

  • df_results: pandas.DataFrame containing NNC enrichment results with columns:

    • niche1_idx, niche2_idx: indices of interacting niches. Niche 1 is source niche and niche 2 is target niche.

    • niche1, niche2: names of interacting niches. Niche 1 is source niche and niche 2 is target niche.

    • edge_count: number of edges observed between niche pairs

    • edge_prop: proportion of edges between niche pairs for the source niche

    • oddsratio or chi2_stat: statistic from the test

    • p-value: raw p-value

    • q-value: FDR-corrected p-value

    • log2fc: log2 fold-change

    • enrichment: bool, whether the interaction is significantly enriched

  • edge_prop_mtx: numpy array, normalized edge proportions between all niche pairs.

  • n1_count: numpy array, total outgoing edges for each niche.

Niche-niche colocalization matrix

edge_prop_mtx, n1_count = cal_nnc_mtx(
    adata,
    niche_key,
    niche_summary=None,
    adj_mtx_key=None,
    spatial_key=None,
    cut_percentage=99,
    reserve_nonexist=False,
)

Parameters

  • adata: anndata object, input dataset used to compute the niche-niche colocalization matrix.

  • niche_key: string, key in .obs representing niche labels.

  • niche_summary: Default: None. list of niche names or labels. If None, all unique niche labels in adata.obs[niche_key] are used.

  • adj_mtx_key: Default: None. string or None, key in .obsp storing a precomputed adjacency matrix. If provided, this adjacency matrix is used directly.

  • spatial_key: Default: None. string or None, key in .obsm storing spatial coordinates. If adj_mtx_key is None, a Delaunay adjacency graph is constructed from these coordinates.

  • cut_percentage: Default: 99. float, percentage threshold for retaining Delaunay edges when constructing the adjacency graph from spatial coordinates. Only used when spatial_key is provided and adj_mtx_key is None.

  • reserve_nonexist: Default: False. bool, whether to retain niches with zero outgoing niche-niche edges by setting their total edge count to 1 before normalization, thereby avoiding division by zero.

Returns

  • edge_prop_mtx: numpy.ndarray, shape (n_niche, n_niche), row-normalized niche-niche colocalization matrix. Each entry represents the proportion of outgoing edges from one niche to another niche, excluding self-niche edges.

  • n1_count: numpy.ndarray, shape (n_niche,), total number of outgoing edges from each niche to other niches before normalization.

Niche-niche colocalization patterns differential test between groups

df_nnc_between_groups = nnc_between_groups_test(
    g1_list,
    g2_list,
    niche_labels,
    min_valid=3,
    alpha=0.05,
    alternative="two-sided",
    fdr_method="fdr_by",
)

Parameters

  • g1_list: array-like, shape (n_sample1, n_niche, n_niche), collection of niche-niche colocalization matrices from group 1.

  • g2_list: array-like, shape (n_sample2, n_niche, n_niche), collection of niche-niche colocalization matrices from group 2.

  • niche_labels: list of strings, niche names or labels corresponding to the rows and columns of the colocalization matrices.

  • min_valid: Default: 3. int, minimum number of valid samples required in each group for testing a niche-niche pair. If either group has fewer than min_valid non-NaN values for a given pair, the corresponding test is not performed.

  • alpha: Default: 0.05. float, significance level for multiple testing correction.

  • alternative: Default: ‘two-sided’. string, alternative hypothesis for the Mann-Whitney U test. Options typically include:

    • 'two-sided': tests whether the two groups differ.

    • 'greater': tests whether values in group 1 tend to be greater than those in group 2.

    • 'less': tests whether values in group 1 tend to be less than those in group 2.

  • fdr_method: Default: ‘fdr_by’. string, method for false discovery rate correction.

Returns

  • df_nnc_between_groups: pandas.DataFrame containing niche-niche colocalization comparison results with columns:

    • niche1: source niche

    • niche2: target niche

    • mean1: mean colocalization value in group 1

    • mean2: mean colocalization value in group 2

    • delta_mean: difference in mean colocalization value between group 1 and group 2

    • n1_valid: number of valid samples in group 1

    • n2_valid: number of valid samples in group 2

    • p_value: raw p-value from the Mann-Whitney U test

    • q_value: FDR-corrected p-value

    • rejected: bool, whether the comparison is significant after multiple testing correction.