Usage ===== Import the package ----------------- .. code-block:: python from Harmonics import * Initialize the model ------------------- .. code-block:: python model = Harmonics_Model( adata_list, slice_name_list, cond_list=cond_list, cond_name_list=cond_name_list, concat_label='slice_name', proportion_label=None, refine_k=0, seed=1234, parallel=True, verbose=True, ) **Parameters** - **adata_list**: list of `anndata` objects. Whole dataset for condition-agnostic analysis, control group for case-control study, or reference data for label transfer. - **slice_name_list**: list of strings, names of slices corresponding to `adata_list`. - **cond_list**: Default: `None`. list of `anndata` objects or `None`. `None` for condition-agnostic analysis, case group for case-control study, or query data for label transfer. - **cond_name_list**: Default: `None`. list of strings or `None`, names of slices corresponding to `cond_list`. - **concat_label**: Default: `'slice_name'`. string, key in `.obs` to store slice names. - **proportion_label**: Default: `None`. string or `None`, key in `.obsm` storing cell type proportions for low-resolution data. - **refine_k**: Default: `0`. int, number of cell types with highest proportion used for niche refinement; set `0` to skip refinement. - **seed**: Default: `1234`. int, random seed. - **parallel**: Default: `True`. bool, whether to run computations in parallel. - **verbose**: Default: `True`. bool, whether to print progress messages. Construct cell representations ------------------------------ .. code-block:: python model.preprocess( ct_key='celltype', spatial_key='spatial', method='joint', n_step=3, n_neighbors=20, radius='auto', cut_percentage=99, kernel=None, ) **Parameters** - **ct_key**: Default: `'celltype'`. string, key in `.obs` storing the cell type information. - **spatial_key**: Default: `'spatial'`. string, key in `.obsm` storing the spatial coordinates. - **method**: Default: `'joint'`. string or `None`, method for graph construction. Options: - ``'joint'``: n-step hop Delaunay triangulation with graph completion to at least `n_neighbors` per cell. - ``'delaunay'``: n-step hop Delaunay triangulation. - ``'knn'``: connect `n_neighbors` neighbors per cell. - ``'radius'``: connect all cells within a specified radius. If `radius='auto'`, the radius is set to the median distance to the `n_neighbors`-th nearest neighbor across cells. - ``None``: directly use cell type composition (for low-resolution data). - **n_step**: Default: `3`. int, number of steps for n-step Delaunay triangulation. - **n_neighbors**: Default: `20`. int, minimum number of neighbors per cell when method is `'joint'` or `'knn'`. - **radius**: Default: `'auto'`. float or `'auto'`, radius used when `method='radius'`. If set to `'auto'`, the radius is automatically determined as the median distance to the `n_neighbors`-th nearest neighbor. - **cut_percentage**: Default: `99`. int, percentage of shortest edges to keep in the Delaunay triangulation adjacency graph. - **kernel**: Default: `None`. string or `None`, optional weighting scheme used when computing microenvironment cell type distributions. Options: - ``None``: use unweighted averaging over neighboring cells. - ``'gaussian'``: use an adaptive Gaussian kernel to assign distance-dependent weights to neighbors before averaging. Over-clustering initialization (whole dataset / control group / reference) --------------------------------------------------------------------------- .. code-block:: python model.initialize_clusters( dim_reduction=True, explained_var=None, n_components=None, n_components_max=100, standardize=True, method='kmeans', Qmax=20, ) **Parameters** - **dim_reduction**: Default: `True`. bool, whether to perform dimensionality reduction (PCA) before clustering. - **explained_var**: Default: `None`. float or `None`, target cumulative explained variance for dimensionality reduction. - **n_components**: Default: `None`. int or `None`, number of components to retain after dimensionality reduction. If `None`, retain no more than `n_components_max`. - **n_components_max**: Default: `100`. int, maximum number of components allowed during reduction. - **standardize**: Default: `True`. bool, whether to z-score normalize each feature before dimensionality reduction. - **method**: Default: `'kmeans'`. string, clustering method for initialization. Options: `'kmeans'`, `'gmm'`. - **Qmax**: Default: `20`. int, number of clusters for initialization. Perform HDM to find solution ---------------------------- .. code-block:: python model.hier_dist_match( assign_metric='jsd', weighted_merge=True, max_iters=100, tol=1e-4, Qmin=2, ) **Parameters** - **assign_metric**: Default: `'jsd'`. string, metric used to evaluate distribution similarity between niches. - **weighted_merge**: Default: `True`. bool, whether to use weighted JSD (WJSD) during merging phase. - **max_iters**: Default: `100`. int, maximum number of iterations for convergence. - **tol**: Default: `1e-4`. float, tolerance for convergence. - **Qmin**: Default: `2`. int, minimum number of niches to consider. Select the solution ------------------- .. code-block:: python adata_list, adata_concat = model.select_solution( n_niche=None, niche_key='niche_label', auto=True, metric='jsd_v2', threshold=0.1, return_adata=True, plot=True, save=False, fig_size=(10, 6), save_dir=None, file_name='score_vs_nichecount_basic.pdf', ) **Parameters** - **n_niche**: Default: `None`. int or `None`, number of niches to select. If `None`, solution is selected automatically using `metric`. - **niche_key**: Default: `'niche_label'`. string, key in `.obs` to store niche assignment results. - **auto**: Default: `True`. bool, whether to automatically determine the solution if `n_niche=None`. - **metric**: Default: `'jsd_v2'`. string, metric used for solution selection. Supported options are: - ``'jsd'``: minimum pairwise Jensen-Shannon divergence between niches. - ``'wjsd'``: weighted minimum pairwise Jensen-Shannon divergence between niches. - ``'jsd_v2'``: bootstrap-based minimum Jensen-Shannon divergence with confidence intervals. - **threshold**: Default: `0.1`. float, threshold for selecting solution based on `metric`. - **return_adata**: Default: `True`. bool, whether to return an `anndata` object with niche assignments. - **plot**: Default: `True`. bool, whether to plot the minJSD curve. - **save**: Default: `False`. bool, whether to save the minJSD plot. - **fig_size**: Default: `(10, 6)`. tuple, figure size for plotting. - **save_dir**: Default: `None`. string or `None`, directory to save the plot. - **file_name**: Default: `'score_vs_nichecount_basic.pdf'`. string, name of the saved plot file. Over-clustering initialization (case group) ------------------------------------------- .. code-block:: python model.initialize_clusters_cond( assign_metric='jsd', threshold=0.1, min_cell_per_niche=100, dim_reduction=True, explained_var=None, n_components=None, n_components_max=100, standardize=True, method='kmeans', Rmax=10, ) **Parameters** - **assign_metric**: Default: `'jsd'`. string, metric used for evaluating distribution similarity when assigning cells to BCNs. - **threshold**: Default: `0.1`. float, minimum divergence threshold below which cells are assigned to BCNs. - **min_cell_per_niche**: Default: `100`. int, minimum average number of cells per new niche. - **dim_reduction**: Default: `True`. bool, whether to perform dimensionality reduction (PCA) before clustering. - **explained_var**: Default: `None`. float or `None`, target cumulative explained variance for dimensionality reduction. - **n_components**: Default: `None`. int or `None`, number of components to retain after dimensionality reduction. If `None`, retain no more than `n_components_max`. - **n_components_max**: Default: `100`. int, maximum number of components allowed during reduction. - **standardize**: Default: `True`. bool, whether to z-score normalize each feature before dimensionality reduction. - **method**: Default: `'kmeans'`. string, clustering method for initialization. - **Rmax**: Default: `10`. int, number of clusters for initialization. Perform HDM to find solution (case group) ----------------------------------------- .. code-block:: python model.hier_dist_match_cond( assign_metric='jsd', weighted_merge=True, max_iters=100, tol=1e-4, ) **Parameters** - **assign_metric**: Default: `'jsd'`. string, metric used to evaluate distribution similarity between niches. - **weighted_merge**: Default: `True`. bool, whether to use weighted JSD (WJSD) during merging phase. - **max_iters**: Default: `100`. int, maximum number of iterations for convergence. - **tol**: Default: `1e-4`. float, tolerance for convergence. Select the solution (case group) -------------------------------- .. code-block:: python cond_list, cond_concat = model.select_solution_cond( n_csn=None, niche_key='niche_label', csn_key='csn_label', auto=True, metric='jsd_v2', threshold=0.1, return_adata=True, plot=True, save=False, fig_size=(10, 6), save_dir=None, file_name='score_vs_nichecount_cond.pdf', ) **Parameters** - **n_csn**: Default: `None`. int or `None`, number of CSNs to select. If `None`, solution is selected automatically using `metric`. - **niche_key**: Default: `'niche_label'`. string, key in `.obs` to store niche assignment results. - **csn_key**: Default: `'csn_label'`. string, key in `.obs` to store CSN assignment results. - **auto**: Default: `True`. bool, whether to automatically determine the solution if `n_csn=None`. - **metric**: Default: `'jsd_v2'`. string, metric used for solution selection. Supported options are: - ``'jsd'``: minimum pairwise Jensen-Shannon divergence between niches. - ``'wjsd'``: weighted minimum pairwise Jensen-Shannon divergence between niches. - ``'jsd_v2'``: bootstrap-based minimum Jensen-Shannon divergence with confidence intervals. - **threshold**: Default: `0.1`. float, threshold for selecting solution based on `metric`. - **return_adata**: Default: `True`. bool, whether to return an `anndata` object with CSN assignments. - **plot**: Default: `True`. bool, whether to plot the minJSD curve. - **save**: Default: `False`. bool, whether to save the minJSD plot. - **fig_size**: Default: `(10, 6)`. tuple, figure size for plotting. - **save_dir**: Default: `None`. string or `None`, directory to save the plot. - **file_name**: Default: `'score_vs_nichecount_cond.pdf'`. string, name of the saved plot file. Label transfer -------------- .. code-block:: python trans_list, trans_concat = model.label_transfer( assign_metric='jsd', niche_key='niche_label', return_adata=True, ) **Parameters** - **assign_metric**: Default: `'jsd'`. string, metric used to evaluate distribution similarity when assigning cells to niches. - **niche_key**: Default: `'niche_label'`. string, key in `.obs` to store niche assignment results. - **return_adata**: Default: `True`. bool, whether to return an `anndata` object with niche assignments. Cell type enrichment test ------------------------- .. code-block:: python ct_results = ct_enrichment_test( niche_dist, cell_count_niche, idx2ct_dict, niche_summary, method='fisher', alpha=0.05, fdr_method='fdr_by', log2fc_threshold=1, prop_threshold=0.01, verbose=True, eps=1e-10, ) **Parameters** - **niche_dist**: array-like or sparse matrix, shape `(n_niche, n_celltype)`, representing the proportion of each cell type in each niche. - **cell_count_niche**: array-like, shape `(n_niche,)`, number of cells in each niche. - **idx2ct_dict**: dict, mapping from cell type indices to cell type names. - **niche_summary**: list of strings, names or labels for each niche. - **method**: Default: `'fisher'`. string, statistical test method. Options: - ``'fisher'``: two-sided Fisher's exact test. - ``'fisher_greater'``: one-sided Fisher's exact test (greater). - ``'chi2'``: chi-square test. - **alpha**: Default: `0.05`. float, significance level for multiple testing correction. - **fdr_method**: Default: `'fdr_by'`. string, method for false discovery rate correction. - **log2fc_threshold**: Default: `1`. float, minimum log2 fold-change required for enrichment. - **prop_threshold**: Default: `0.01`. float, minimum proportion of cell type in niche required for enrichment. - **verbose**: Default: `True`. bool, whether to print progress messages. - **eps**: Default: `1e-10`. float, small value to avoid division by zero. **Returns** - **ct_results**: `pandas.DataFrame` containing enrichment results with columns: - ``niche_idx``: index of the niche - ``niche``: name of the niche - ``celltype_idx``: index of the cell type - ``celltype``: name of the cell type - ``oddsratio`` or ``chi2_stat``: test statistic - ``p-value``: raw p-value - ``q-value``: FDR-corrected p-value - ``log2fc``: log2 fold-change - ``prop``: proportion of cell type in niche - ``enrichment``: bool, whether cell type is significantly enriched. Cell-cell interaction enrichment test ------------------------------------------- .. code-block:: python cci_results, test_norm_list, bg_norm_list, test_edge_count_list, bg_edge_count_list = cci_enrichment_test( adata_list, niche_key, ct_key, niche_summary=None, spatial_key='spatial', cut_percentage=99, method='fisher', alpha=0.05, fdr_method='fdr_by', log2fc_threshold=1, prop_threshold=0.01, verbose=True, eps=1e-10, ) **Parameters** - **adata_list**: `anndata` object or list of `anndata` objects, input datasets to test CCI enrichment. - **niche_key**: string, key in `.obs` representing niche labels. - **ct_key**: string, key in `.obs` representing cell type labels. - **niche_summary**: Default: `None`. list of niche names to test; if `None`, all unique niches in `adata_list` are used. - **spatial_key**: Default: `'spatial'`. string, key in `.obsm` representing spatial coordinates. - **cut_percentage**: Default: `99`. float, percentage of shortest edges to retain in Delaunay adjacency graph. - **method**: Default: `'fisher'`. string, statistical test method. Options: - ``'fisher'``: two-sided Fisher's exact test. - ``'fisher_greater'``: one-sided Fisher's exact test (greater). - **alpha**: Default: `0.05`. float, significance level for multiple testing correction. - **fdr_method**: Default: `'fdr_by'`. string, method for false discovery rate correction. - **log2fc_threshold**: Default: `1`. float, minimum log2 fold-change required for enrichment. - **prop_threshold**: Default: `0.01`. float, minimum proportion of cell type pairs in niche required for enrichment. - **verbose**: Default: `True`. bool, whether to print progress messages. - **eps**: Default: `1e-10`. float, small value to avoid division by zero. **Returns** - **cci_results**: `pandas.DataFrame` containing CCI enrichment results with columns: - ``niche_idx``: index of the niche - ``niche``: name of the niche - ``ct1_idx``, ``ct2_idx``: indices of interacting cell types - ``ct1``, ``ct2``: names of interacting cell types - ``test_edge_count``, ``bg_edge_count``: number of observed and background edges - ``test_edge_prop``, ``bg_edge_prop``: proportion of observed and background edges - ``oddsratio``: odds ratio from statistical test - ``p-value``: raw p-value - ``q-value``: FDR-corrected p-value - ``log2fc``: log2 fold-change - ``enrichment``: bool, whether interaction is significantly enriched - **test_norm_list**: list of normalized test adjacency matrices for each niche. - **bg_norm_list**: list of normalized background adjacency matrices for each niche. - **test_edge_count_list**: list of total test edges per niche. - **bg_edge_count_list**: list of total background edges per niche. Niche-niche colocalization enrichment test --------------------------------- .. code-block:: python df_results, edge_prop_mtx, n1_count = nnc_enrichment_test( adata_list, niche_key, niche_summary=None, spatial_key='spatial', cut_percentage=99, method='fisher', alpha=0.05, fdr_method='fdr_by', log2fc_threshold=1, prop_threshold=0.01, verbose=True, eps=1e-10, ) **Parameters** - **adata_list**: `anndata` object or list of `anndata` objects, input datasets for NNC enrichment testing. - **niche_key**: string, key in `.obs` representing niche labels. - **niche_summary**: Default: `None`. list of niche names to test; if `None`, all unique niches in `adata_list` are used. - **spatial_key**: Default: `'spatial'`. string, key in `.obsm` representing spatial coordinates. - **cut_percentage**: Default: `99`. float, percentage of shortest edges to retain in Delaunay adjacency graph. - **method**: Default: `'fisher'`. string, statistical test method. Options: - ``'fisher'``: two-sided Fisher's exact test. - ``'fisher_greater'``: one-sided Fisher's exact test (greater). - ``'chi2'``: chi-square test with continuity correction. - **alpha**: Default: `0.05`. float, significance level for multiple testing correction. - **fdr_method**: Default: `'fdr_by'`. string, method for false discovery rate correction. - **log2fc_threshold**: Default: `1`. float, minimum log2 fold-change required for enrichment. - **prop_threshold**: Default: `0.01`. float, minimum proportion of edges between niches required for enrichment. - **verbose**: Default: `True`. bool, whether to print progress messages. - **eps**: Default: `1e-10`. float, small value to avoid division by zero. **Returns** - **df_results**: `pandas.DataFrame` containing NNC enrichment results with columns: - ``niche1_idx``, ``niche2_idx``: indices of interacting niches. Niche 1 is source niche and niche 2 is target niche. - ``niche1``, ``niche2``: names of interacting niches. Niche 1 is source niche and niche 2 is target niche. - ``edge_count``: number of edges observed between niche pairs - ``edge_prop``: proportion of edges between niche pairs for the source niche - ``oddsratio`` or ``chi2_stat``: statistic from the test - ``p-value``: raw p-value - ``q-value``: FDR-corrected p-value - ``log2fc``: log2 fold-change - ``enrichment``: bool, whether the interaction is significantly enriched - **edge_prop_mtx**: numpy array, normalized edge proportions between all niche pairs. - **n1_count**: numpy array, total outgoing edges for each niche. Niche-niche colocalization matrix -------------------------------- .. code-block:: python edge_prop_mtx, n1_count = cal_nnc_mtx( adata, niche_key, niche_summary=None, adj_mtx_key=None, spatial_key=None, cut_percentage=99, reserve_nonexist=False, ) **Parameters** - **adata**: `anndata` object, input dataset used to compute the niche-niche colocalization matrix. - **niche_key**: string, key in `.obs` representing niche labels. - **niche_summary**: Default: `None`. list of niche names or labels. If `None`, all unique niche labels in `adata.obs[niche_key]` are used. - **adj_mtx_key**: Default: `None`. string or `None`, key in `.obsp` storing a precomputed adjacency matrix. If provided, this adjacency matrix is used directly. - **spatial_key**: Default: `None`. string or `None`, key in `.obsm` storing spatial coordinates. If `adj_mtx_key` is `None`, a Delaunay adjacency graph is constructed from these coordinates. - **cut_percentage**: Default: `99`. float, percentage threshold for retaining Delaunay edges when constructing the adjacency graph from spatial coordinates. Only used when `spatial_key` is provided and `adj_mtx_key` is `None`. - **reserve_nonexist**: Default: `False`. bool, whether to retain niches with zero outgoing niche-niche edges by setting their total edge count to 1 before normalization, thereby avoiding division by zero. **Returns** - **edge_prop_mtx**: `numpy.ndarray`, shape `(n_niche, n_niche)`, row-normalized niche-niche colocalization matrix. Each entry represents the proportion of outgoing edges from one niche to another niche, excluding self-niche edges. - **n1_count**: `numpy.ndarray`, shape `(n_niche,)`, total number of outgoing edges from each niche to other niches before normalization. Niche-niche colocalization patterns differential test between groups --------------------------------------------------- .. code-block:: python df_nnc_between_groups = nnc_between_groups_test( g1_list, g2_list, niche_labels, min_valid=3, alpha=0.05, alternative="two-sided", fdr_method="fdr_by", ) **Parameters** - **g1_list**: array-like, shape `(n_sample1, n_niche, n_niche)`, collection of niche-niche colocalization matrices from group 1. - **g2_list**: array-like, shape `(n_sample2, n_niche, n_niche)`, collection of niche-niche colocalization matrices from group 2. - **niche_labels**: list of strings, niche names or labels corresponding to the rows and columns of the colocalization matrices. - **min_valid**: Default: `3`. int, minimum number of valid samples required in each group for testing a niche-niche pair. If either group has fewer than `min_valid` non-NaN values for a given pair, the corresponding test is not performed. - **alpha**: Default: `0.05`. float, significance level for multiple testing correction. - **alternative**: Default: `'two-sided'`. string, alternative hypothesis for the Mann-Whitney U test. Options typically include: - ``'two-sided'``: tests whether the two groups differ. - ``'greater'``: tests whether values in group 1 tend to be greater than those in group 2. - ``'less'``: tests whether values in group 1 tend to be less than those in group 2. - **fdr_method**: Default: `'fdr_by'`. string, method for false discovery rate correction. **Returns** - **df_nnc_between_groups**: `pandas.DataFrame` containing niche-niche colocalization comparison results with columns: - ``niche1``: source niche - ``niche2``: target niche - ``mean1``: mean colocalization value in group 1 - ``mean2``: mean colocalization value in group 2 - ``delta_mean``: difference in mean colocalization value between group 1 and group 2 - ``n1_valid``: number of valid samples in group 1 - ``n2_valid``: number of valid samples in group 2 - ``p_value``: raw p-value from the Mann-Whitney U test - ``q_value``: FDR-corrected p-value - ``rejected``: bool, whether the comparison is significant after multiple testing correction.