Tutorial for MIBI-TOF TNBC dataset

Need additional packages: scanpy seaborn plotly lifelines statannotations

Load the packages

[ ]:
%reload_ext autoreload
%autoreload 2

import os
import time
import scanpy as sc
import pandas as pd
import numpy as np
import anndata as ad
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.lines import Line2D

from Harmonics import *

import warnings
warnings.filterwarnings("ignore")

sc.settings.verbosity = 0
sc.settings.set_figure_params(dpi=50, dpi_save=500)

from matplotlib import rcParams
rcParams["figure.dpi"] = 50
rcParams["savefig.dpi"] = 500
rcParams['pdf.fonttype'] = 42
rcParams['svg.fonttype'] = 'none'
rcParams['ps.fonttype'] = 42
# rcParams['font.family'] = 'Arial'
rcParams['savefig.transparent'] = True
[2]:
data_dir = '../../../Data/Spatial/Proteomics/MIBI-TOF_TNBC_Keren2018/'
save_dir = '../../results/MIBI-TOF_TNBC_Keren2018/Harmonics/'
if not os.path.exists(save_dir):
    os.makedirs(save_dir)

Define the function to change p values to corresponding star representation, used to show the results of additional tests implemented in Harmonics

[3]:
def p2stars(p):
    if p < 0.001:
        return '***'
    elif p < 0.01:
        return '**'
    elif p < 0.05:
        return '*'
    else:
        return ''

Load dataset

Cells with unidentitied type are filtered out.

Use the cold group as the control group and use the mix and compartmentalized group as the case group

[4]:
cold_group = [15, 19, 22, 24, 25, 26]  # 6
mixed_group = [1, 2, 7, 8, 11, 12, 13, 14, 17, 18, 20, 21, 23, 27, 29, 31, 33, 38, 39]  # 19
compartmentalized_group = [3, 4, 5, 6, 9, 10, 16, 28, 32, 34, 35, 36, 37, 40, 41]  #15

adata_list = []
slice_name_list = []
cond_list = []
cond_name_list = []
for i in range(1, 42):
    if i == 30: continue
    adata = ad.read_h5ad(data_dir + f'TNBC_p{i}.h5ad')
    adata = adata[adata.obs['all_group_name'] != 'Unidentified', :].copy()
    if i in cold_group:
        adata_list.append(adata)
        slice_name_list.append(f'p{i}')
    else:
        cond_list.append(adata)
        cond_name_list.append(f'p{i}')

Run model

Instantiate Harmonics

[5]:
model = Harmonics_Model(adata_list,
                        slice_name_list,
                        cond_list=cond_list,
                        cond_name_list=cond_name_list,
                        concat_label='slice_name',  # default
                        proportion_label=None,  # default
                        seed=1234,  # default
                        parallel=True,  # default
                        verbose=True,  # default
                        )
Control set comprises 6 slices, 22748 cells/spots in total.
Condition set comprises 34 slices, 173205 cells/spots in total.

Preprocess the data (Generating the connection graph and calculating neighborhood cell type destribution for cells)

[6]:
model.preprocess(ct_key='all_group_name',
                 spatial_key='spatial',  # default
                 method='joint',  # default
                 n_step=3,  # default
                 n_neighbors=20,  # default
                 cut_percentage=99,  # default
                 )
Generating Delaunay neighbor graph...
100%|██████████| 40/40 [00:01<00:00, 29.43it/s]
All done!

Performing graph completion...
100%|██████████| 40/40 [00:11<00:00,  3.44it/s]
All done!

The cell types of interest are:
B
CD3 T
CD4 T
CD8 T
DC
DC/Mono
Endothelial
Keratin+ tumor
Macrophages
Mesenchymal like
Mono/Neu
NK
Neutrophils
Other immune
Tregs
Tumor

Generating one-hot matrix...
100%|██████████| 40/40 [00:00<00:00, 282.63it/s]
All done!

Dataset comprises 16 cell types.

Calculating cell type distribution for microenvironments...
Microenvironments comprise 40.06 cells/spots on average.
Minimum: 20, Maximum: 81

Perform overclustered initialization on the cell type distributions of cell neighborhoods for the control group. Resulting in Qmax niches. The distributions of niches are also computed.

[7]:
model.initialize_clusters(dim_reduction=True,  # default
                          explained_var=None,  # default
                          n_components=None,  # default
                          n_components_max=100,  # default
                          standardize=True,  # default
                          method='kmeans',  # default
                          Qmax=20,
                          )
Performing dimension reduction...
Returning 16 principal components.

Initializing niches...
20 initial niches defined.

Perform hierarchical distribution matching for the control group to reduce the niche number to no less than Qmin. This step results in niche assignment under a sequence of different niche numbers (usually from Qmax to Qmin).

[ ]:
model.hier_dist_match(assign_metric='jsd',  # default
                      weighted_merge=True,  # default
                      max_iters=100,  # default
                      tol=1e-4,  # default
                      test_kmeans=False,  # default
                      Qmin=2,  # default
                      )
Starting from 20 cell niches...

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
100%|██████████| 100/100 [00:02<00:00, 40.75it/s]
Unconverged at iteration 100!
13 cell niches left.
Merging cell niche 1 and cell niche 4...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 3, 5, 8, 9, 12, 13, 15, 16, 17, 19]
100%|██████████| 100/100 [00:01<00:00, 60.75it/s]
Unconverged at iteration 100!
11 cell niches left.
Merging cell niche 1 and cell niche 3...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 5, 8, 9, 12, 13, 15, 17, 19]
100%|██████████| 100/100 [00:01<00:00, 62.57it/s]
Unconverged at iteration 100!
10 cell niches left.
Merging cell niche 8 and cell niche 1...
Done!

Assigning cells to cell niche...
Current state: [0, 5, 8, 9, 12, 13, 15, 17, 19]
100%|██████████| 100/100 [00:01<00:00, 64.82it/s]
Unconverged at iteration 100!
9 cell niches left.
Merging cell niche 12 and cell niche 8...
Done!

Assigning cells to cell niche...
Current state: [0, 5, 9, 12, 13, 15, 17, 19]
 38%|███▊      | 38/100 [00:00<00:00, 63.01it/s]
Distribution of cell niches (centers) converge at iteration 39.
8 cell niches left.
Merging cell niche 12 and cell niche 17...
Done!

Assigning cells to cell niche...
Current state: [0, 5, 9, 12, 13, 15, 19]
 52%|█████▏    | 52/100 [00:00<00:00, 65.62it/s]
Distribution of cell niches (centers) converge at iteration 53.
7 cell niches left.
Merging cell niche 15 and cell niche 12...
Done!

Assigning cells to cell niche...
Current state: [0, 5, 9, 13, 15, 19]
100%|██████████| 100/100 [00:01<00:00, 65.35it/s]
Unconverged at iteration 100!
6 cell niches left.
Merging cell niche 15 and cell niche 5...
Done!

Assigning cells to cell niche...
Current state: [0, 9, 13, 15, 19]
 41%|████      | 41/100 [00:00<00:00, 67.87it/s]
Strictly converge at iteration 42.
5 cell niches left.
Merging cell niche 15 and cell niche 19...
Done!

Assigning cells to cell niche...
Current state: [0, 9, 13, 15]
  8%|▊         | 8/100 [00:00<00:01, 61.30it/s]
Distribution of cell niches (centers) converge at iteration 9.
4 cell niches left.
Merging cell niche 13 and cell niche 9...
Done!

Assigning cells to cell niche...
Current state: [0, 13, 15]
  7%|▋         | 7/100 [00:00<00:01, 69.65it/s]
Distribution of cell niches (centers) converge at iteration 12.
 11%|█         | 11/100 [00:00<00:01, 63.21it/s]

3 cell niches left.
Merging cell niche 15 and cell niche 0...
Done!

Assigning cells to cell niche...
Current state: [13, 15]
 20%|██        | 20/100 [00:00<00:01, 68.37it/s]
Distribution of cell niches (centers) converge at iteration 21.
2 cell niches left.
Niche count no more than 2.

Finished!


Automatically define the most appropriate number of basic cell niches based on minJSD score for the control group. The results are saved in .obs[niche_key]

[9]:
adata_list, adata_concat = model.select_solution(n_niche=None,  # default
                                                 niche_key='niche_label',  # default
                                                 auto=True,  # default
                                                 metric='jsd_v2',  # default
                                                 threshold=0.1,  # default
                                                 return_adata=True,  # default
                                                 plot=True,  # default
                                                 save=False,  # default
                                                 fig_size=None,  # default
                                                 save_dir=save_dir,
                                                 file_name=f'score_vs_nichecount_basic.pdf',
                                                 )
Automatically selecting best solution...
100%|██████████| 100/100 [00:00<00:00, 348.97it/s]
100%|██████████| 100/100 [00:00<00:00, 404.82it/s]
100%|██████████| 100/100 [00:00<00:00, 379.46it/s]
100%|██████████| 100/100 [00:00<00:00, 494.99it/s]
100%|██████████| 100/100 [00:00<00:00, 591.65it/s]
100%|██████████| 100/100 [00:00<00:00, 621.04it/s]
100%|██████████| 100/100 [00:00<00:00, 706.65it/s]
100%|██████████| 100/100 [00:00<00:00, 666.59it/s]
100%|██████████| 100/100 [00:00<00:00, 865.71it/s]
100%|██████████| 100/100 [00:00<00:00, 985.12it/s]
100%|██████████| 100/100 [00:00<00:00, 1069.42it/s]
Suggested range of niche count is from 5 to 6.
Recommended number of niches are [6]
Selecting 6 niches as the best solution.
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_18_3.png
Done!

Perform overclustered initialization on the cell type distributions of cell neighborhoods for the case group. Resulting in Rmax new niches. The distributions of new niches are also computed.

[10]:
model.initialize_clusters_cond(assign_metric='jsd',  # default
                               threshold=0.1,  # default
                               min_cell_per_niche=100,  # default
                               dim_reduction=True,  # default
                               explained_var=None,  # default
                               n_components=None,  # default
                               n_components_max=100,  # default
                               standardize=True,  # default
                               method='kmeans',  # default
                               Rmax=10,  # default
                               )
Assigning cells to fixed niches...
67789 out of 173205 cells are assigned to fixed niches.

Performing dimension reduction...
Returning 16 principal components.

Initializing niches...
10 new niches defined.

Perform hierarchical distribution matching for the case group to reduce the niche number to 0. This step results in niche assignment under a sequence of different niche numbers (usually from Rmax to 0).

[11]:
model.hier_dist_match_cond(assign_metric='jsd',  # default
                           weighted_merge=True,  # default
                           max_iters=100,  # default
                           tol=1e-4,  # default
                           )
Starting from 10 new cell niches...

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
 35%|███▌      | 35/100 [00:05<00:09,  6.78it/s]
Distribution of cell niches (centers) converge at iteration 36.
10 new cell niches left.
Merging new cell niche 14 and new cell niche 8...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15]
 94%|█████████▍| 94/100 [00:14<00:00,  6.68it/s]
Distribution of cell niches (centers) converge at iteration 95.
9 new cell niches left.
Merging new cell niche 14 and new cell niche 6...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 7, 9, 10, 11, 12, 13, 14, 15]
 73%|███████▎  | 73/100 [00:11<00:04,  6.55it/s]
Distribution of cell niches (centers) converge at iteration 74.
8 new cell niches left.
Merging new cell niche 14 and new cell niche 7...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 9, 10, 11, 12, 13, 14, 15]
 19%|█▉        | 19/100 [00:02<00:12,  6.53it/s]
Distribution of cell niches (centers) converge at iteration 20.
7 new cell niches left.
Merging new cell niche 12 and new cell niche 14...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 9, 10, 11, 12, 13, 15]
 14%|█▍        | 14/100 [00:02<00:13,  6.59it/s]
Distribution of cell niches (centers) converge at iteration 15.
6 new cell niches left.
Merging new cell niche 12 and new cell niche 15...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 9, 10, 11, 12, 13]
 56%|█████▌    | 56/100 [00:07<00:06,  7.23it/s]
Distribution of cell niches (centers) converge at iteration 57.
5 new cell niches left.
Merging new cell niche 12 and new cell niche 9...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 10, 11, 12, 13]
 10%|█         | 10/100 [00:01<00:13,  6.78it/s]
Distribution of cell niches (centers) converge at iteration 11.
4 new cell niches left.
Merging new cell niche 12 and new cell niche 13...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 10, 11, 12]
 21%|██        | 21/100 [00:02<00:10,  7.44it/s]
Distribution of cell niches (centers) converge at iteration 22.
3 new cell niches left.
Merging new cell niche 11 and new cell niche 12...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 10, 11]
 18%|█▊        | 18/100 [00:02<00:10,  7.72it/s]
Distribution of cell niches (centers) converge at iteration 19.
2 new cell niches left.
Merging new cell niche 11 into basic cell niche 1...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 10]
  8%|▊         | 8/100 [00:00<00:11,  8.08it/s]
Distribution of cell niches (centers) converge at iteration 9.
1 new cell niches left.
Merging new cell niche 10 into basic cell niche 1...
Done!

Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5]
No new cell niche, all cells assigned to basic niches.
0 new cell niches left.
No new cell niche left.

Finished!

Automatically define the most appropriate number of condition-specific niches based on minJSD score for the case group. The results of niche assignments are saved in .obs[niche_key] and .obs[csn_label]. All basic cell niches are named “basic” in .obs[csn_label] and condition-specific niches start with a prefix “R”.

[12]:
cond_list, cond_concat = model.select_solution_cond(n_csn=None,  # default
                                                    niche_key='niche_label',  # default
                                                    csn_key='csn_label',  # default
                                                    auto=True,  # default
                                                    metric='jsd_v2',  # default
                                                    threshold=0.1,  # default
                                                    return_adata=True,  # default
                                                    plot=True,  # default
                                                    save=False,  # default
                                                    fig_size=None,  # default
                                                    save_dir=save_dir,
                                                    file_name='score_vs_nichecount_cond.pdf',
                                                    )
Automatically selecting best solution...
100%|██████████| 100/100 [00:00<00:00, 231.25it/s]
100%|██████████| 100/100 [00:00<00:00, 232.76it/s]
100%|██████████| 100/100 [00:00<00:00, 261.93it/s]
100%|██████████| 100/100 [00:00<00:00, 247.27it/s]
100%|██████████| 100/100 [00:00<00:00, 249.47it/s]
100%|██████████| 100/100 [00:00<00:00, 266.48it/s]
100%|██████████| 100/100 [00:00<00:00, 271.80it/s]
100%|██████████| 100/100 [00:00<00:00, 352.75it/s]
100%|██████████| 100/100 [00:00<00:00, 382.92it/s]
100%|██████████| 100/100 [00:00<00:00, 573.89it/s]
Suggested range of condition specific niche count is from 4 to 5.
Recommended number of condition specific niches are [5]
Selecting 5 new niches as the best solution.
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_24_3.png
Done!

Save and reload the results

[ ]:
adata_concat.uns['ct_name'] = list(adata_concat.uns['ct2idx'].keys())
adata_concat.uns['ct_idx'] = list(adata_concat.uns['ct2idx'].values())
del adata_concat.uns['ct2idx'], adata_concat.uns['idx2ct']
adata_concat.write_h5ad(save_dir + 'Harmonics_basic_result_0.h5ad')

cond_concat.uns['ct_name'] = list(cond_concat.uns['ct2idx'].keys())
cond_concat.uns['ct_idx'] = list(cond_concat.uns['ct2idx'].values())
del cond_concat.uns['ct2idx'], cond_concat.uns['idx2ct']
cond_concat.write_h5ad(save_dir + 'Harmonics_cond_result_0.h5ad')
[14]:
adata_concat = ad.read_h5ad(save_dir + 'Harmonics_basic_result_0.h5ad')
adata_concat.uns['ct2idx'] = dict(zip(adata_concat.uns['ct_name'], adata_concat.uns['ct_idx']))
adata_concat.uns['idx2ct'] = dict(zip(adata_concat.uns['ct_idx'], adata_concat.uns['ct_name']))
cond_concat = ad.read_h5ad(save_dir + 'Harmonics_cond_result_0.h5ad')
cond_concat.uns['ct2idx'] = dict(zip(cond_concat.uns['ct_name'], cond_concat.uns['ct_idx']))
cond_concat.uns['idx2ct'] = dict(zip(cond_concat.uns['ct_idx'], cond_concat.uns['ct_name']))

for i, slice_name in enumerate(slice_name_list):
    adata = adata_concat[adata_concat.obs['slice_name'] == slice_name, :].copy()
    adata_list[i] = adata
for i, slice_name in enumerate(cond_name_list):
    adata = cond_concat[cond_concat.obs['slice_name'] == slice_name, :].copy()
    cond_list[i] = adata
[15]:
adata_concat_new = adata_concat.copy()
cond_concat_new = cond_concat.copy()
adata_concat_new, cond_concat_new
[15]:
(AnnData object with n_obs × n_vars = 22748 × 36
     obs: 'SampleID', 'cellLabelInImage', 'cellSize', 'tumorYN', 'group_name', 'immuneGroup_name', 'all_group_name', 'sample_types', 'slice_name', 'celltype_idx', 'n_neighbors', 'niche_label_13', 'niche_label_11', 'niche_label_10', 'niche_label_9', 'niche_label_8', 'niche_label_7', 'niche_label_6', 'niche_label_5', 'niche_label_4', 'niche_label_3', 'niche_label_2', 'niche_label_jsd_v2', 'niche_label_jsd', 'niche_label_fmi', 'niche_label_ari', 'niche_label_nmi', 'niche_label_asw', 'niche_label_js_asw', 'niche_label_fisher', 'niche_label_chi', 'niche_label_dbi', 'niche_label_dass_mean', 'niche_label_dass_min', 'niche_label_dafisher', 'niche_label_dachi', 'niche_label_0.09', 'niche_label_0.11', 'niche_label'
     uns: 'ct_idx', 'ct_name', 'niche_cell_count', 'niche_dist', 'niche_label_summary', 'score_dict', 'ct2idx', 'idx2ct'
     obsm: 'micro_dist', 'onehot', 'spatial',
 AnnData object with n_obs × n_vars = 173205 × 36
     obs: 'SampleID', 'cellLabelInImage', 'cellSize', 'tumorYN', 'group_name', 'immuneGroup_name', 'all_group_name', 'sample_types', 'slice_name', 'celltype_idx', 'n_neighbors', 'niche_label_10', 'csn_label_10', 'niche_label_9', 'csn_label_9', 'niche_label_8', 'csn_label_8', 'niche_label_7', 'csn_label_7', 'niche_label_6', 'csn_label_6', 'niche_label_5', 'csn_label_5', 'niche_label_4', 'csn_label_4', 'niche_label_3', 'csn_label_3', 'niche_label_2', 'csn_label_2', 'niche_label_1', 'csn_label_1', 'niche_label_0', 'csn_label_0', 'niche_label_jsd_v2', 'csn_label_jsd_v2', 'niche_label_jsd', 'csn_label_jsd', 'niche_label_fmi', 'csn_label_fmi', 'niche_label_ari', 'csn_label_ari', 'niche_label_nmi', 'csn_label_nmi', 'niche_label_asw', 'csn_label_asw', 'niche_label_js_asw', 'csn_label_js_asw', 'niche_label_fisher', 'csn_label_fisher', 'niche_label_chi', 'csn_label_chi', 'niche_label_dbi', 'csn_label_dbi', 'niche_label_dass_mean', 'csn_label_dass_mean', 'niche_label_dass_min', 'csn_label_dass_min', 'niche_label_dafisher', 'csn_label_dafisher', 'niche_label_dachi', 'csn_label_dachi', 'niche_label_0.09', 'csn_label_0.09', 'niche_label_0.11', 'csn_label_0.11', 'niche_label', 'csn_label'
     uns: 'ct_idx', 'ct_name', 'niche_cell_count', 'niche_dist', 'niche_label_summary', 'score_dict', 'ct2idx', 'idx2ct'
     obsm: 'micro_dist', 'onehot', 'spatial')

Plot the results

[16]:
tumor_color_dict = {f'{i}': sns.color_palette()[i] for i in range(2)}

niches = cond_concat_new.uns['niche_label_summary']
niche_colors = ['#1f77b4', '#ff7f0e', '#279e68', '#d62728', '#ffbb78', '#8c564b',
                '#aa40fc', '#b5bd61', '#aec7e8',  '#17becf', '#e377c2']
niche_color_dict = {niches[k]: niche_colors[k] for k in range(len(niches))}

n_basic_niches = len(adata_concat_new.uns['niche_label_summary'])
csns = [f'R{int(label)-n_basic_niches}' for label in niches[n_basic_niches:]]
csn_color_dict = {csns[k]: niche_colors[k+n_basic_niches] for k in range(len(csns))}
csn_color_dict['basic'] = '#d3d3d3'

celltypes = ['B', 'CD3 T', 'CD4 T', 'CD8 T', 'DC', 'DC/Mono', 'Endothelial', 'Keratin+ tumor', 'Macrophages', 'Mesenchymal like',
             'Mono/Neu', 'NK', 'Neutrophils', 'Other immune', 'Tregs', 'Tumor']#, 'Unidentified']
ct_colors = ['#1f77b4', '#ff7f0e', '#d62728', '#279e68', '#aa40fc', '#8c564b', '#e377c2', '#b5bd61', '#17becf', '#aec7e8',
             '#ffbb78', '#98df8a', '#ff9896', '#c5b0d5', '#c49c94', '#f7b6d2']#, '#d3d3d3']
ct_color_dict = {celltypes[k]: ct_colors[k] for k in range(len(celltypes))}

Control group (cold group)

[17]:
for i in range(len(cold_group)):

    print(f'p{cold_group[i]}')
    adata = adata_concat_new[adata_concat_new.obs['slice_name'] == f'p{cold_group[i]}', :].copy()

    fig, axes = plt.subplots(1, 3, figsize=(17, 4.5))

    sc.pl.embedding(adata, basis='spatial', color='tumorYN', palette=tumor_color_dict,
                    ax=axes[0], s=60, show=False, frameon=False, title='TumorYN', legend_fontsize=16)
    axes[0].set_title('TumorYN', fontsize=20)

    sc.pl.embedding(adata, basis='spatial', color='niche_label', palette=niche_color_dict,
                    ax=axes[1], s=60, show=False, frameon=False, title='Cell Niche', legend_fontsize=16)
    axes[1].set_title('Cell Niche', fontsize=20)

    sc.pl.embedding(adata, basis='spatial', color='all_group_name', palette=ct_color_dict,
                    ax=axes[2], s=60, show=False, frameon=False, title='Cell Type', legend_fontsize=16)
    axes[2].set_title('Cell Type', fontsize=20)

    ct_legend_elements = [
        Line2D([0], [0], marker='o', color='w', label=label,
            markerfacecolor=color, markersize=10)
        for label, color in ct_color_dict.items()
    ]
    axes[2].legend(handles=ct_legend_elements, loc=(1.0, 0.05), frameon=False, ncol=2)
    axes[2].axis('off')

    plt.tight_layout()
    plt.show()
p15
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_32_1.png
p19
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_32_3.png
p22
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_32_5.png
p24
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_32_7.png
p25
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_32_9.png
p26
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_32_11.png

Case group (mixed group)

[18]:
for i in range(len(mixed_group)):

    print(f'p{mixed_group[i]}')
    adata = cond_concat_new[cond_concat_new.obs['slice_name'] == f'p{mixed_group[i]}', :].copy()

    fig, axes = plt.subplots(1, 4, figsize=(25, 5.5))

    sc.pl.embedding(adata, basis='spatial', color='tumorYN', palette=tumor_color_dict,
                    ax=axes[0], s=70, show=False, frameon=False, title='TumorYN', legend_fontsize=16)
    axes[0].set_title('TumorYN', fontsize=20)

    sc.pl.embedding(adata, basis='spatial', color='niche_label', palette=niche_color_dict,
                    ax=axes[1], s=70, show=False, frameon=False, title='Cell Niche', legend_fontsize=16)
    axes[1].set_title('Cell Niche', fontsize=20)

    sc.pl.embedding(adata, basis='spatial', color='csn_label', palette=csn_color_dict,
                    ax=axes[2], s=70, show=False, frameon=False, title='CSN', legend_fontsize=16)
    axes[2].set_title('CSN', fontsize=20)

    sc.pl.embedding(adata, basis='spatial', color='all_group_name', palette=ct_color_dict,
                    ax=axes[3], s=70, show=False, frameon=False, title='Cell Type', legend_fontsize=16)
    axes[3].set_title('Cell Type', fontsize=20)

    ct_legend_elements = [
        Line2D([0], [0], marker='o', color='w', label=label,
            markerfacecolor=color, markersize=10)
        for label, color in ct_color_dict.items()
    ]
    axes[3].legend(handles=ct_legend_elements, loc=(1.05, 0.1), frameon=False, ncol=1)
    axes[3].axis('off')

    plt.tight_layout()
    plt.show()
p1
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_1.png
p2
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_3.png
p7
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_5.png
p8
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_7.png
p11
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_9.png
p12
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_11.png
p13
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_13.png
p14
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_15.png
p17
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_17.png
p18
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_19.png
p20
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_21.png
p21
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_23.png
p23
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_25.png
p27
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_27.png
p29
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_29.png
p31
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_31.png
p33
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_33.png
p38
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_35.png
p39
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_34_37.png

Case group (compartmentalized group)

[ ]:
for i in range(len(compartmentalized_group)):

    print(f'p{compartmentalized_group[i]}')
    adata = cond_concat_new[cond_concat_new.obs['slice_name'] == f'p{compartmentalized_group[i]}', :].copy()

    fig, axes = plt.subplots(1, 4, figsize=(25, 5.5))

    sc.pl.embedding(adata, basis='spatial', color='tumorYN', palette=tumor_color_dict,
                    ax=axes[0], s=70, show=False, frameon=False, title='TumorYN', legend_fontsize=16)
    axes[0].set_title('TumorYN', fontsize=20)

    sc.pl.embedding(adata, basis='spatial', color='niche_label', palette=niche_color_dict,
                    ax=axes[1], s=70, show=False, frameon=False, title='Cell Niche', legend_fontsize=16)
    axes[1].set_title('Cell Niche', fontsize=20)

    sc.pl.embedding(adata, basis='spatial', color='csn_label', palette=csn_color_dict,
                    ax=axes[2], s=70, show=False, frameon=False, title='CSN', legend_fontsize=16)
    axes[2].set_title('CSN', fontsize=20)

    sc.pl.embedding(adata, basis='spatial', color='all_group_name', palette=ct_color_dict,
                    ax=axes[3], s=70, show=False, frameon=False, title='Cell Type', legend_fontsize=16)
    axes[3].set_title('Cell Type', fontsize=20)

    ct_legend_elements = [
        Line2D([0], [0], marker='o', color='w', label=label,
            markerfacecolor=color, markersize=10)
        for label, color in ct_color_dict.items()
    ]
    axes[3].legend(handles=ct_legend_elements, loc=(1.05, 0.1), frameon=False, ncol=1)
    axes[3].axis('off')

    plt.tight_layout()
    plt.show()
p3
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_1.png
p4
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_3.png
p5
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_5.png
p6
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_7.png
p9
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_9.png
p10
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_11.png
p16
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_13.png
p28
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_15.png
p32
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_17.png
p34
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_19.png
p35
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_21.png
p36
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_23.png
p37
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_25.png
p40
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_27.png
p41
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_36_29.png

Cell type composition (control group)

[20]:
basic_niche_labels = adata_concat_new.uns['niche_label_summary'].copy()
ct_labels = sorted(cond_concat_new.obs['all_group_name'].unique())
basic_niche_dist = adata_concat_new.uns['niche_dist'].toarray().copy()
basic_cell_count_niche = adata_concat_new.uns['niche_cell_count'].copy()

fig, ax = plt.subplots(figsize=(6, 6))
bar_width = 0.7
n_niches, n_cell_types = basic_niche_dist.shape

x = np.arange(n_niches)

for j in range(n_cell_types):
    bottom = np.sum(basic_niche_dist[:, :j], axis=1)
    ax.bar(x,
           basic_niche_dist[:, j],
           bottom=bottom,
           width=bar_width,
           color=ct_color_dict[ct_labels[j]],
           label=ct_labels[j])

ax.set_ylabel('Proportion', fontsize=20)
ax.set_xlabel('Niche', fontsize=20)
ax.set_xticks(x)
ax.set_xticklabels(basic_niche_labels, rotation=0, ha='center')
ax.tick_params(axis='x', labelsize=20)
ax.tick_params(axis='y', labelsize=20)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.grid(False)

handles = [
    mpatches.Patch(color=color, label=ct)
    for ct, color in zip(celltypes, ct_colors)
]

ax.legend(handles=handles, title='Cell Types', loc=(1.05, 0.0), frameon=False, handleheight=0.8,
          handlelength=0.7, ncol=2, fontsize=20, title_fontsize=20)

plt.title('Cell Type Proportions in Different Cell Niches', fontsize=20)
plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_38_0.png

Cell type composition (case group)

[21]:
cond_niche_labels = cond_concat_new.uns['niche_label_summary'].copy()
ct_labels = sorted(cond_concat_new.obs['all_group_name'].unique())
cond_niche_dist = cond_concat_new.uns['niche_dist'].toarray().copy()
cond_cell_count_niche = cond_concat_new.uns['niche_cell_count'].copy()

fig, ax = plt.subplots(figsize=(16, 6))
bar_width = 0.7
n_niches, n_cell_types = cond_niche_dist.shape

x = np.arange(n_niches)

for j in range(n_cell_types):
    bottom = np.sum(cond_niche_dist[:, :j], axis=1)
    ax.bar(x,
           cond_niche_dist[:, j],
           bottom=bottom,
           width=bar_width,
           color=ct_color_dict[ct_labels[j]],
           label=ct_labels[j])

ax.set_ylabel('Proportion', fontsize=20)
ax.set_xlabel('Niche', fontsize=20)
ax.set_xticks(x)
ax.set_xticklabels(cond_niche_labels, rotation=0, ha='center')
ax.tick_params(axis='x', labelsize=20)
ax.tick_params(axis='y', labelsize=20)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.grid(False)

handles = [
    mpatches.Patch(color=color, label=ct)
    for ct, color in zip(celltypes, ct_colors)
]

ax.legend(handles=handles, title='Cell Types', loc=(1.05, 0.0), frameon=False, handleheight=0.8,
          handlelength=0.7, ncol=2, fontsize=20, title_fontsize=20)

plt.title('Cell Type Proportions in Different Cell Niches', fontsize=20)
plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_40_0.png

Calculate the similarity between niches from different groups

Similarities are measured using 1-JSD score

[22]:
from scipy.spatial.distance import jensenshannon
from scipy.stats import pearsonr
from sklearn.metrics.pairwise import cosine_similarity

basic_niche_dist = adata_concat_new.uns['niche_dist'].toarray().copy()
cond_niche_dist = cond_concat_new.uns['niche_dist'].toarray().copy()

basic_niche_labels = adata_concat_new.uns['niche_label_summary'].copy()
cond_niche_labels = cond_concat_new.uns['niche_label_summary'].copy()

n_niche_basic = basic_niche_dist.shape[0]
n_niche_cond = cond_niche_dist.shape[0]

js_sim = np.zeros((n_niche_basic, n_niche_cond))
# cos_sim = cosine_similarity(basic_niche_dist, cond_niche_dist)
# corr_sim = np.zeros((n_niche_basic, n_niche_cond))

for i in range(n_niche_basic):
    for j in range(n_niche_cond):
        js_sim[i, j] = 1 - jensenshannon(basic_niche_dist[i], cond_niche_dist[j], base=2)
        # corr_sim[i, j], _ = pearsonr(basic_niche_dist[i], cond_niche_dist[j])

plt.figure(figsize=(10, 5))
sns.heatmap(
    js_sim,
    cmap='cividis',
    xticklabels=cond_niche_labels,
    yticklabels=basic_niche_labels,
    linewidths=0.5,
    linecolor='gray',
)
plt.xlabel("Niche (Condition Group)", fontsize=18)
plt.ylabel("Niches (Control Group)", fontsize=18)
plt.title("Pairwise JS Similarity between Niches", fontsize=18)
plt.xticks(fontsize=16, rotation=0)
plt.yticks(fontsize=16, rotation=0)
plt.grid(False)
plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_42_0.png

Cell type enrichment analysis

[23]:
ct_df = ct_enrichment_test(cond_concat_new.uns['niche_dist'],
                           cond_concat_new.uns['niche_cell_count'],
                           cond_concat_new.uns['idx2ct'],
                           cond_concat_new.uns['niche_label_summary'],
                           method='fisher',
                           alpha=0.05,
                           fdr_method='fdr_by',
                           log2fc_threshold=1,
                           prop_threshold=0.01,
                           verbose=True,
                           )
ct_df.head()
11 niches and 16 cell types in total.

[23]:
niche_idx niche celltype_idx celltype oddsratio p-value q-value log2fc prop enrichment
0 0 0 0 B 0.016995 1.395216e-90 2.715560e-89 -5.800202 0.000967 False
1 0 0 1 CD3 T 0.221063 3.362091e-19 3.402754e-18 -2.151897 0.005076 False
2 0 0 2 CD4 T 0.046033 4.525761e-110 9.957605e-109 -4.336657 0.003626 False
3 0 0 3 CD8 T 0.218917 2.397240e-75 4.043723e-74 -2.083698 0.021755 False
4 0 0 4 DC 0.000000 1.839345e-13 1.632975e-12 -26.131647 0.000000 False
[24]:
niche_labels = cond_concat_new.uns['niche_label_summary'].copy()
ct_labels = sorted(cond_concat_new.obs['all_group_name'].unique())

matrix_df = pd.DataFrame(
    data=cond_concat_new.uns['niche_dist'].toarray(),
    index=niche_labels,
    columns=ct_labels,
)

cn_dist_count = cond_concat_new.uns['niche_dist'].toarray() * cond_concat_new.uns['niche_cell_count'][:, np.newaxis]
cn_dist_norm = cn_dist_count / np.sum(cn_dist_count, axis=0)
matrix_df_norm = pd.DataFrame(
    data=cn_dist_norm,
    index=niche_labels,
    columns=ct_labels,
)

ct_df['stars'] = ct_df['q-value'].apply(p2stars)

stars_df = pd.DataFrame(
    '',
    index=matrix_df.index,
    columns=matrix_df.columns
)

for _, row in ct_df[ct_df['enrichment']].iterrows():
    niche = row['niche']
    ct    = row['celltype']
    if (niche in stars_df.index) and (ct in stars_df.columns):
        stars_df.loc[niche, ct] = row['stars']


fig, axes = plt.subplots(1, 1, figsize=(11, 8))

sns_heatmap_0 = sns.heatmap(
    matrix_df,
    cmap='Blues',
    # cbar_kws={'label': 'Cell type proportion'},
    linewidths=0.5,
    linecolor='gray',
    # square=True,
    ax=axes
)

for i, niche in enumerate(matrix_df.index):
    for j, ct in enumerate(matrix_df.columns):
        star = stars_df.iloc[i, j]
        if star:
            if matrix_df.iloc[i, j] > np.max(matrix_df.values) * 0.7:
                color='white'
            else:
                color='black'
            axes.text(j + 0.5, i + 0.6, star, ha='center', va='center', color=color, fontsize=20, fontweight='bold')

n_rows, n_cols = matrix_df.shape
axes.plot([0, n_cols], [n_rows, n_rows], color='gray', linewidth=0.5, clip_on=False)
axes.plot([n_cols, n_cols], [0, n_rows], color='gray', linewidth=0.5, clip_on=False)

axes.set_xticklabels(axes.get_xticklabels(), rotation=90, ha='center', fontsize=20)
axes.set_yticklabels(axes.get_yticklabels(), rotation=0, ha='right', fontsize=20)
axes.set_ylabel('Niche', fontsize=20)
axes.set_xlabel('Cell Type', fontsize=20)
axes.set_title('Cell Type Proportions', fontsize=20)
axes.collections[0].colorbar.ax.yaxis.label.set_size(20)
axes.collections[0].colorbar.ax.tick_params(labelsize=16)
axes.grid(False)

plt.tight_layout()
plt.show()

fig, axes = plt.subplots(1, 1, figsize=(11, 8))

sns_heatmap_1 = sns.heatmap(
    matrix_df_norm,
    cmap='Blues',
    # cbar_kws={'label': 'Cell type proportion'},
    linewidths=0.5,
    linecolor='gray',
    # square=True,
    ax=axes
)

for i, niche in enumerate(matrix_df.index):
    for j, ct in enumerate(matrix_df.columns):
        star = stars_df.iloc[i, j]
        if star:
            if matrix_df_norm.iloc[i, j] > np.max(matrix_df_norm.values) * 0.7:
                color='white'
            else:
                color='black'
            axes.text(j + 0.5, i + 0.6, star, ha='center', va='center', color=color, fontsize=20, fontweight='bold')

n_rows, n_cols = matrix_df.shape
axes.plot([0, n_cols], [n_rows, n_rows], color='gray', linewidth=0.5, clip_on=False)
axes.plot([n_cols, n_cols], [0, n_rows], color='gray', linewidth=0.5, clip_on=False)

axes.set_xticklabels(axes.get_xticklabels(), rotation=90, ha='center', fontsize=20)
axes.set_yticklabels(axes.get_yticklabels(), rotation=0, ha='right', fontsize=20)
axes.set_ylabel('Niche', fontsize=20)
axes.set_xlabel('Cell Type', fontsize=20)
axes.set_title('Column Normalized Cell Type Proportions', fontsize=20)
axes.collections[0].colorbar.ax.yaxis.label.set_size(20)
axes.collections[0].colorbar.ax.tick_params(labelsize=16)
axes.grid(False)

plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_45_0.png
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_45_1.png

Immunoregulatory protein in different niches

[25]:
immunoregulatory_markers = ['PD1', 'PD-L1', 'Lag3', 'IDO']
csn_label = ['basic', 'R0', 'R1', 'R2', 'R3', 'R4']

exp_dict = {marker: {niche: {'pos': None, 'pos_tumor': None, 'pos_immune': None} for niche in csn_label} for marker in immunoregulatory_markers}

for j, niche in enumerate(csn_label):
    adata_sub = cond_concat_new[cond_concat_new.obs['csn_label'] == niche, :].copy()
    for k, marker in enumerate(immunoregulatory_markers):
        x = adata_sub[:, marker].X
        pos = (np.asarray(x).ravel() > 0.5)
        tumor = (adata_sub.obs['tumorYN'].values == '1')
        immune = ~tumor
        exp_dict[marker][niche]['pos'] = pos.mean()
        exp_dict[marker][niche]['pos_tumor'] = (pos & tumor).mean()
        exp_dict[marker][niche]['pos_immune'] = (pos & immune).mean()

records = []
for marker in immunoregulatory_markers:
    for niche in csn_label:
        records.append({'marker': marker,
                        'niche': niche,
                        'positive_fraction': exp_dict[marker][niche]['pos'],
                        'positive_fraction_tumor': exp_dict[marker][niche]['pos_tumor'],
                        'positive_fraction_immune': exp_dict[marker][niche]['pos_immune'],
                        })
df_plot = pd.DataFrame(records)
df_plot.head()
[25]:
marker niche positive_fraction positive_fraction_tumor positive_fraction_immune
0 PD1 basic 0.016459 0.007176 0.009283
1 PD1 R0 0.070915 0.000270 0.070645
2 PD1 R1 0.120535 0.000770 0.119765
3 PD1 R2 0.076131 0.000404 0.075727
4 PD1 R3 0.068528 0.007941 0.060587
[26]:
fig, axes = plt.subplots(1, 4, figsize=(14, 3), sharey=True)

for i, marker in enumerate(immunoregulatory_markers):
    df_m = df_plot[df_plot['marker'] == marker].copy()
    ax = axes[i]

    x = np.arange(len(df_m['niche']))
    width = 0.7

    ax.bar(x, df_m['positive_fraction_immune'], width, label='Immune', color="#92B4EA")
    ax.bar(x, df_m['positive_fraction_tumor'], width, bottom=df_m['positive_fraction_immune'], label='Tumor', color="#EB9E71" )
    ax.set_xticks(x)
    ax.set_xticklabels(df_m['niche'], fontsize=16)
    ax.set_ylabel(f'{marker}+ cell fraction', fontsize=16)
    ax.set_title(marker, fontsize=16)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.grid(False)

handles, labels = axes[-1].get_legend_handles_labels()
fig.legend(handles, labels, bbox_to_anchor=(1.02, 0.8), ncol=1, frameon=False, fontsize=16)

plt.tight_layout(rect=[0, 0, 0.95, 1])
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_48_0.png
[27]:
fig, axes = plt.subplots(1, 4, figsize=(15.5, 3))

for i, marker in enumerate(immunoregulatory_markers):
    df_m = df_plot[df_plot['marker'] == marker].copy()
    ax = axes[i]

    x = np.arange(len(df_m['niche']))
    width = 0.7

    ax.bar(x, df_m['positive_fraction_immune'], width, label='Immune', color="#92B4EA")
    ax.bar(x, df_m['positive_fraction_tumor'], width, bottom=df_m['positive_fraction_immune'], label='Tumor', color="#EB9E71" )
    ax.set_xticks(x)
    ax.set_xticklabels(df_m['niche'], fontsize=16)
    ax.set_ylabel(f'{marker}+ cell fraction', fontsize=16)
    ax.set_title(marker, fontsize=16)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.grid(False)

handles, labels = axes[-1].get_legend_handles_labels()
fig.legend(handles, labels, bbox_to_anchor=(1.02, 0.8), ncol=1, frameon=False, fontsize=16)

plt.tight_layout(rect=[0, 0, 0.95, 1])
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_49_0.png

Differential expression

In this section, we identify differential expression pattern of the same cell type across different niches.

Tumor cell from different niches

Compare Keratin+ tumor cell in the tumor-immune interaction niche (niche R3) to those in basic cell niches

Compute the mean expression of selected markers for a given cell type within one niche and in other niches across slices, then compare them using Wilcoxon tests and visualizes differences with violin plots and paired sample lines.

[28]:
from scipy.stats import wilcoxon, mannwhitneyu
from statannotations.Annotator import Annotator
np.random.seed(1234)

noi = 'R3'
ctoi = ['Keratin+ tumor']
min_cells = 20
markers = cond_concat_new.var_names.tolist()
markers = ['HLA-DR', 'HLA_Class_1', 'Vimentin', 'SMA']
n_markers = len(markers)

avg_expr_noi = [[] for _ in range(n_markers)]
avg_expr_basic = [[] for _ in range(n_markers)]
new_marker_list = []

for i, slice_name in enumerate(cond_name_list):
    adata = cond_concat_new[cond_concat_new.obs['slice_name'] == slice_name, :].copy()
    adata_noi = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] == noi), :].copy()
    adata_basic = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] == 'basic'), :].copy()
    n_cell_noi = adata_noi.shape[0]
    n_cell_basic = adata_basic.shape[0]
    if min(n_cell_noi, n_cell_basic) < min_cells:
        print(f"Skip {slice_name} due to rare cell count ({n_cell_noi} vs {n_cell_basic}).")
        continue
    for j, marker in enumerate(markers):
        avg_expr_noi[j].append(np.mean(adata_noi[:, marker].X))
        avg_expr_basic[j].append(np.mean(adata_basic[:, marker].X))

res = []
for j, marker in enumerate(markers):
    x1 = np.array(avg_expr_noi[j], dtype=float)
    x2 = np.array(avg_expr_basic[j], dtype=float)
    u, p = wilcoxon(x1, x2, alternative="two-sided")
    res.append({"marker": marker, "pvalue": p, "U": u, "n1": len(x1), "n2": len(x2)})
res_df = pd.DataFrame(res)
alpha = 0.05
sig_df = res_df[res_df["pvalue"] < alpha]#.sort_values("pvalue")
print(f"Significant markers: {sig_df.shape[0]} / {len(markers)}")
sig_markers = sig_df["marker"].tolist()

if len(sig_markers) == 0:
    print("No significant markers under the chosen threshold.")
else:
    n_plot = len(sig_markers)
    if n_plot > 5:
        ncol = 5
    else:
        ncol = n_plot
    nrow = int(np.ceil(n_plot / ncol))

    fig, axes = plt.subplots(nrow, ncol, figsize=(3.2*ncol, 4*nrow))
    axes = np.array(axes).flatten()

    for k, marker in enumerate(sig_markers):
        j = markers.index(marker)
        vals_noi = avg_expr_noi[j]
        vals_basic = avg_expr_basic[j]
        ax = axes[k]

        df_plot = pd.DataFrame({
            "expr": np.concatenate([vals_noi, vals_basic]),
            "group": [f"{noi}"]*len(vals_noi) + ["Basic"]*len(vals_basic),
        })

        colors = [ct_color_dict[ctoi[0]], '#d3d3d3']
        # sns.boxplot(data=df_plot, x="group", y="expr", ax=ax, palette=colors, width=0.5, showfliers=False)
        sns.violinplot(data=df_plot, x="group", y="expr", ax=ax, palette=colors, inner='box', cut=1, width=0.8, alpha=0.8)#, scale='width')
        # sns.stripplot(data=df_plot, x="group", y="expr", ax=ax, jitter=True, size=7, alpha=0.7, color="white", edgecolor="black", linewidth=1)

        for xpos, vals in zip([0, 1],[vals_noi, vals_basic]):
            ax.hlines(np.median(vals), xpos-0.2, xpos+0.2,
                    color='white', lw=4, zorder=3)
        pairs = [(f"{noi}", "Basic")]
        annot = Annotator(ax, pairs, data=df_plot, x='group', y='expr')
        annot.configure(test='Wilcoxon', text_format='full', loc='inside', verbose=0, fontsize=16,)
        annot.apply_and_annotate()

        n = len(vals_noi)
        jitter = 0.1
        for k in range(n):
            x1 = 0 + np.random.uniform(-jitter, jitter)
            x2 = 1 + np.random.uniform(-jitter, jitter)
            y1, y2 = vals_noi[k], vals_basic[k]
            ax.plot([x1, x2], [y1, y2], color='gray', linewidth=1, alpha=0.5, zorder=1)
            ax.scatter([x1], [y1], color='white', s=30, alpha=0.8, edgecolor='black', zorder=2)
            ax.scatter([x2], [y2], color='white', s=30, alpha=0.8, edgecolor='black', zorder=2)

        # pv = sig_df.set_index("marker").loc[marker, "pvalue"]
        ax.set_xticklabels([f"{noi}", "Basic"], fontsize=16)
        ax.tick_params(axis="y", labelsize=16)
        ax.set_title(f"{marker}", fontsize=16)
        ax.set_ylabel("Mean Scaled Expression", fontsize=16)
        ax.set_xlabel("")
        ax.spines["right"].set_visible(False)
        ax.spines["top"].set_visible(False)
        ax.grid(False)

    for ax in axes[n_plot:]:
        ax.axis("off")

    plt.suptitle(f"Keratin+ tumor ({noi} vs Basic)", fontsize=20)
    plt.tight_layout()
    plt.show()
Skip p18 due to rare cell count (10 vs 5011).
Skip p31 due to rare cell count (3 vs 2892).
Significant markers: 4 / 4
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_52_1.png

Compare per-cell (flattened) scaled expression of selected markers for a given cell type in niche noi versus basic cell niches. For each marker it builds a violin plot with median lines, runs a Mann–Whitney test (annotated on the plot), and arranges the marker subplots in a grid titled with the cell type and niche.

[29]:
noi = 'R3'
ctoi = ['Keratin+ tumor']
markers = cond_concat_new.var_names.tolist()
markers = ['HLA-DR', 'HLA_Class_1', 'Vimentin', 'SMA']

n_markers = len(markers)
new_marker_list = []

adata = cond_concat_new.copy()

adata_noi = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] == noi), :].copy()
adata_basic = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] == 'basic'), :].copy()

ncol = 4
nrow = int(np.ceil(n_markers / ncol))
fig, axes = plt.subplots(nrow, ncol, figsize=(3*ncol, 4*nrow))
axes = axes.flatten()

for j, marker in enumerate(markers):
    vals_noi = adata_noi[:, marker].X.flatten()
    vals_basic = adata_basic[:, marker].X.flatten()
    ax = axes[j]

    df_plot = pd.DataFrame({
        'expr': np.concatenate([vals_noi, vals_basic]),
        'group': [noi]*len(vals_noi)
               + ['Basic']*len(vals_basic)
    })

    colors = [ct_color_dict[ctoi[0]], '#d3d3d3']
    sns.violinplot(x='group', y='expr', data=df_plot, ax=ax, palette=colors, cut=1, inner='box', alpha=0.8)#, scale='width')

    for xpos, vals in zip([0,1],[vals_noi, vals_basic]):
        ax.hlines(np.median(vals), xpos-0.2, xpos+0.2,
                  color='white', lw=4, zorder=3)

    pairs = [(noi, 'Basic')]
    annot = Annotator(ax, pairs, data=df_plot, x='group', y='expr')
    annot.configure(test='Mann-Whitney', text_format='star', loc='inside', verbose=0, fontsize=20,)
    annot.apply_and_annotate()

    ax.set_xticklabels([noi, 'Basic'], fontsize=16)
    ax.tick_params(axis='y', labelsize=16)
    ax.set_title(marker, fontsize=16)
    ax.set_ylabel('Mean Scaled Expression', fontsize=16)
    ax.set_xlabel('Niche', fontsize=16)
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.grid(False)

for ax in axes[n_markers:]:
    ax.axis('off')

plt.suptitle(f"{ctoi[0]} (Niche {noi} vs Basic)", fontsize=20)
plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_54_0.png

Macrophages from different niches

Compare macrophages in the two niches where they are enriched (niche R0 and R3) with macrophages in other niches.

Compute the mean expression of selected markers for a given cell type within the two niches of interest and in other niches across slices, then compare them using Wilcoxon tests and visualizes differences with violin plots and paired sample lines.

[30]:
from scipy.stats import wilcoxon

noi_list = ['R0', 'R3']
ctoi = 'Macrophages'
min_cells = 20
markers = cond_concat_new.var_names.tolist()
markers = ['CD68', 'CD45', 'HLA-DR', 'CD11c', 'HLA_Class_1', 'CD45RO']

n_markers = len(markers)

avg_expr_noi1 = [[] for _ in range(n_markers)]
avg_expr_noi2 = [[] for _ in range(n_markers)]
avg_expr_others = [[] for _ in range(n_markers)]
new_marker_list = []

for i, slice_name in enumerate(cond_name_list):

    adata = cond_concat_new[cond_concat_new.obs['slice_name'] == slice_name, :].copy()
    # adata = allconcat.copy()

    adata_noi1 = adata[(adata.obs['all_group_name'] == ctoi) & (adata.obs['csn_label'] == noi_list[0]), :].copy()
    adata_noi2 = adata[(adata.obs['all_group_name'] == ctoi) & (adata.obs['csn_label'] == noi_list[1]), :].copy()
    adata_others = adata[(adata.obs['all_group_name'] == ctoi) & ~(adata.obs['csn_label'].isin(noi_list)), :].copy()

    n_cell_noi1 = adata_noi1.shape[0]
    n_cell_noi2 = adata_noi2.shape[0]
    n_cell_others = adata_others.shape[0]

    if min(n_cell_noi1, n_cell_noi2, n_cell_others) < min_cells:
        print(f"Skip {slice_name} due to rare cell count ({n_cell_noi1} vs {n_cell_noi2} vs {n_cell_others}).")
        continue

    for j, marker in enumerate(markers):
        avg_expr_noi1[j].append(np.mean(adata_noi1[:, marker].X))
        avg_expr_noi2[j].append(np.mean(adata_noi2[:, marker].X))
        avg_expr_others[j].append(np.mean(adata_others[:, marker].X))

alpha = 0.05
pairs = [(noi_list[0], "other"), ("other", noi_list[1]), (noi_list[0], noi_list[1])]

test_res = []
for j, marker in enumerate(markers):
    vals1 = np.array(avg_expr_noi1[j], dtype=float)
    vals2 = np.array(avg_expr_others[j], dtype=float)
    vals3 = np.array(avg_expr_noi2[j], dtype=float)

    p12 = wilcoxon(vals1, vals2, alternative="two-sided").pvalue
    p23 = wilcoxon(vals2, vals3, alternative="two-sided").pvalue
    p13 = wilcoxon(vals1, vals3, alternative="two-sided").pvalue

    pvals = np.array([p12, p23, p13], dtype=float)
    any_sig = np.any(pvals < alpha)
    test_res.append({
        "marker": marker,
        "n": len(vals1),
        "p_R0_other": p12,
        "p_other_R3": p23,
        "p_R0_R3": p13,
        "p_min": float(np.min(pvals)),
        "any_sig": bool(any_sig)
    })

test_df = pd.DataFrame(test_res)
plot_markers = test_df.loc[test_df["any_sig"], "marker"].tolist()

print(f"Markers to plot: {len(plot_markers)} / {len(markers)}")


if len(plot_markers) == 0:
    print("No markers passed the significance filter. Nothing to plot.")
else:
    n_plot = len(plot_markers)
    if n_plot > 4:
        ncol = 4
    else:
        ncol = n_plot
    nrow = int(np.ceil(n_plot / ncol))
    fig, axes = plt.subplots(nrow, ncol, figsize=(4*ncol, 4*nrow))
    axes = np.array(axes).flatten()

    colors = [csn_color_dict[noi_list[0]], "lightgray", csn_color_dict[noi_list[1]]]

    for idx, marker in enumerate(plot_markers):
        j = markers.index(marker)

        vals1 = np.array(avg_expr_noi1[j], dtype=float)
        vals2 = np.array(avg_expr_others[j], dtype=float)
        vals3 = np.array(avg_expr_noi2[j], dtype=float)

        ax = axes[idx]

        df_plot = pd.DataFrame({
            "expr": np.concatenate([vals1, vals2, vals3]),
            "group": [noi_list[0]]*len(vals1) + ["other"]*len(vals2) + [noi_list[1]]*len(vals3),
        })

        sns.violinplot(x="group", y="expr", data=df_plot, ax=ax, palette=colors, cut=1, inner='box', alpha=0.8)#, scale="width")
        # sns.boxplot(data=df_plot, x="group", y="expr", ax=ax, width=0.5, palette=colors, showfliers=False)
        # sns.stripplot(data=df_plot, x="group", y="expr", ax=ax, jitter=True, size=7, alpha=0.7, color="white", edgecolor="black", linewidth=1)

        for xpos, vals in zip([0, 1, 2], [vals1, vals2, vals3]):
            ax.hlines(np.median(vals), xpos-0.2, xpos+0.2,
                      color="white", lw=4, zorder=3)
        pairs = [(noi_list[0], 'other'), ('other', noi_list[1]), (noi_list[0], noi_list[1])]
        annot = Annotator(ax, pairs, data=df_plot, x='group', y='expr')
        annot.configure(test='Wilcoxon', text_format='full', loc='inside', verbose=0, fontsize=16,)
        annot.apply_and_annotate()

        jitter = 0.1
        for k in range(len(vals1)):
            xx1 = 0 + np.random.uniform(-jitter, jitter)
            xx2 = 1 + np.random.uniform(-jitter, jitter)
            xx3 = 2 + np.random.uniform(-jitter, jitter)

            ax.plot([xx1, xx2], [vals1[k], vals2[k]], color="gray", linewidth=0.8, alpha=0.5, zorder=1)
            ax.plot([xx2, xx3], [vals2[k], vals3[k]], color="gray", linewidth=0.8, alpha=0.5, zorder=1)

            ax.scatter(xx1, vals1[k], color='white', s=30, alpha=0.8, edgecolor="black", zorder=2)
            ax.scatter(xx2, vals2[k], color='white', s=30, alpha=0.8, edgecolor="black", zorder=2)
            ax.scatter(xx3, vals3[k], color='white', s=30, alpha=0.8, edgecolor="black", zorder=2)

        row = test_df.set_index("marker").loc[marker]
        ax.set_title(f"{marker}", fontsize=16)

        ax.set_xticklabels([noi_list[0], "Other", noi_list[1]], fontsize=16)
        ax.tick_params(axis="y", labelsize=16)
        ax.set_ylabel("Mean Scaled Expression", fontsize=16)
        ax.set_xlabel("Niche", fontsize=16)
        ax.spines["top"].set_visible(False)
        ax.spines["right"].set_visible(False)
        ax.grid(False)

    for ax in axes[n_plot:]:
        ax.axis("off")

    plt.suptitle(f"{ctoi} (Niche {noi_list[0]} vs Other vs Niche {noi_list[1]})", fontsize=20)
    plt.tight_layout()
    plt.show()
Skip p1 due to rare cell count (0 vs 162 vs 119).
Skip p2 due to rare cell count (0 vs 303 vs 208).
Skip p7 due to rare cell count (0 vs 82 vs 163).
Skip p8 due to rare cell count (0 vs 22 vs 247).
Skip p11 due to rare cell count (1 vs 355 vs 218).
Skip p17 due to rare cell count (10 vs 432 vs 136).
Skip p18 due to rare cell count (0 vs 2 vs 310).
Skip p21 due to rare cell count (0 vs 26 vs 185).
Skip p31 due to rare cell count (0 vs 5 vs 267).
Skip p33 due to rare cell count (0 vs 17 vs 135).
Skip p35 due to rare cell count (18 vs 17 vs 129).
Skip p38 due to rare cell count (0 vs 307 vs 379).
Skip p40 due to rare cell count (0 vs 19 vs 94).
Markers to plot: 6 / 6
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_57_1.png

Compare per-cell (flattened) scaled expression of selected markers for a given cell type in niches in noi_list versus all other niches. For each marker it builds a violin plot with median lines, runs a Mann–Whitney test (annotated on the plot), and arranges the marker subplots in a grid titled with the cell type and niche.

[31]:
noi_list = ['R0', 'R3']
ctoi = 'Macrophages'
markers = cond_concat_new.var_names.tolist()
markers = ['CD68', 'CD45', 'HLA-DR', 'CD11c', 'HLA_Class_1', 'CD45RO']

n_markers = len(markers)
new_marker_list = []

adata = cond_concat_new.copy()

adata_noi1 = adata[(adata.obs['all_group_name'] == ctoi) & (adata.obs['csn_label'] == noi_list[0]), :].copy()
adata_noi2 = adata[(adata.obs['all_group_name'] == ctoi) & (adata.obs['csn_label'] == noi_list[1]), :].copy()
adata_other = adata[(adata.obs['all_group_name'] == ctoi) & ~(adata.obs['csn_label'].isin(noi_list)), :].copy()

ncol = 4
nrow = int(np.ceil(n_markers / ncol))
fig, axes = plt.subplots(nrow, ncol, figsize=(4*ncol, 4*nrow))
axes = axes.flatten()

for j, marker in enumerate(markers):
    vals_noi1 = adata_noi1[:, marker].X.flatten()
    vals_noi2 = adata_noi2[:, marker].X.flatten()
    vals_other = adata_other[:, marker].X.flatten()
    ax = axes[j]

    df_plot = pd.DataFrame({
        'expr': np.concatenate([vals_noi1, vals_other, vals_noi2]),
        'group': [noi_list[0]]*len(vals_noi1)
               + ['other']*len(vals_other)
               + [noi_list[1]]*len(vals_noi2)
    })

    colors = [csn_color_dict[noi_list[0]], 'lightgray', csn_color_dict[noi_list[1]]]
    sns.violinplot(x='group', y='expr', data=df_plot, ax=ax, palette=colors, cut=1, inner='box', alpha=0.8)#, scale='width')

    for xpos, vals in zip([0,1,2],[vals_noi1, vals_other, vals_noi2]):
        ax.hlines(np.median(vals), xpos-0.2, xpos+0.2,
                  color='white', lw=4, zorder=3)

    pairs = [(noi_list[0], 'other'), ('other', noi_list[1]), (noi_list[0], noi_list[1])]
    annot = Annotator(ax, pairs, data=df_plot, x='group', y='expr')
    annot.configure(test='Mann-Whitney', text_format='star', loc='inside', verbose=0, fontsize=20,)
    annot.apply_and_annotate()

    ax.set_xticklabels([noi_list[0], 'Other', noi_list[1]], fontsize=16)
    ax.tick_params(axis='y', labelsize=16)
    ax.set_title(marker, fontsize=16)
    ax.set_ylabel('Mean Scaled Expression', fontsize=16)
    ax.set_xlabel('Niche', fontsize=16)
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)
    ax.grid(False)

for ax in axes[n_markers:]:
    ax.axis('off')

plt.suptitle(f"{ctoi} (Niche {noi_list[0]} vs Other vs Niche {noi_list[1]})", fontsize=20)
plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_59_0.png

Niche composition for different patients (slices)

[32]:
from scipy.cluster.hierarchy import linkage, leaves_list

group_colors = {
    'cold': '#1f77b4',
    'mixed': '#2ca02c',
    'compartmentalized': '#8c564b'
}
slice_color_dict = {}
for i in cold_group:
    slice_color_dict[f'p{i}'] = group_colors['cold']
for i in mixed_group:
    slice_color_dict[f'p{i}'] = group_colors['mixed']
for i in compartmentalized_group:
    slice_color_dict[f'p{i}'] = group_colors['compartmentalized']
[33]:
df_list = []

df_list.append(adata_concat_new.obs[['niche_label', 'slice_name']].copy())
df_list.append(cond_concat_new.obs[['niche_label', 'slice_name']].copy())
df = pd.concat(df_list, axis=0)

comp = pd.crosstab(df['slice_name'], df['niche_label'], normalize='index').sort_index()
cond_niche_labels = cond_concat_new.uns['niche_label_summary'].copy()
comp = comp[cond_niche_labels]

# linkage_result = linkage(comp.values, metric='correlation')
# ordered_indices = leaves_list(linkage_result)
# comp = comp.iloc[ordered_indices]
comp = comp.loc[slice_color_dict.keys()]

plt.figure(figsize=(17, 6))
ax = comp.plot(
    kind='bar',
    stacked=True,
    width=0.8,
    color=niche_colors,
    ax=plt.gca()
)

plt.ylabel('Niche Composition', fontsize=20)
plt.xlabel('Patient (Slice)', fontsize=20)
plt.title('Niche Composition per Patient', fontsize=20)
plt.yticks(fontsize=16)

ax.set_xticks(range(len(comp.index)))
ax.set_xticklabels(comp.index, rotation=90, ha='center', fontsize=16)

for tick_label in ax.get_xticklabels():
    slice_name = tick_label.get_text()
    if slice_name in slice_color_dict:
        tick_label.set_color(slice_color_dict[slice_name])

ax.legend(title='Niche Label', bbox_to_anchor=(1.02, 1), loc='upper left', fontsize=20, title_fontsize=20, frameon=False)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.grid(False)
plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_62_0.png
[34]:
df_list = []

# adata_concat_new.obs['csn_label'] = 'basic'
# df_list.append(adata_concat_new.obs[['csn_label', 'slice_name']].copy())
df_list.append(cond_concat_new.obs[['csn_label', 'slice_name']].copy())
df = pd.concat(df_list, axis=0)

comp = pd.crosstab(df['slice_name'], df['csn_label'], normalize='index').sort_index()
cond_niche_labels =  csns + ['basic']
comp = comp[cond_niche_labels]

# linkage_result = linkage(comp.values, method='ward', metric='euclidean')
# ordered_indices = leaves_list(linkage_result)
# comp = comp.iloc[ordered_indices]
comp = comp.loc[[slice_name for slice_name in slice_color_dict.keys() if slice_name not in ['p15', 'p19', 'p22', 'p24', 'p25', 'p26']]]

plt.figure(figsize=(14, 5))
ax = comp.plot(
    kind='bar',
    stacked=True,
    width=0.8,
    color=csn_color_dict.values(),
    ax=plt.gca()
)

plt.ylabel('Niche Composition', fontsize=20)
plt.xlabel('Patient (Slice)', fontsize=20)
plt.title('Niche Composition per Patient', fontsize=20)
plt.yticks(fontsize=16)

ax.set_xticks(range(len(comp.index)))
ax.set_xticklabels(comp.index, rotation=90, ha='center', fontsize=16)

for tick_label in ax.get_xticklabels():
    slice_name = tick_label.get_text()
    if slice_name in slice_color_dict:
        tick_label.set_color(slice_color_dict[slice_name])

ax.legend(title='Niche Label', bbox_to_anchor=(1.02, 1), loc='upper left', fontsize=20, title_fontsize=20, frameon=False)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.grid(False)
plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_63_0.png

The niche composition comparison between mixed and compartmentalized group

[36]:
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Pie chart data
mixed_slice_name = [f'p{id}' for id in mixed_group]
compart_slice_name = [f'p{id}' for id in compartmentalized_group]

mixed_adata = cond_concat_new[cond_concat_new.obs['slice_name'].isin(mixed_slice_name), :].copy()
compart_adata = cond_concat_new[cond_concat_new.obs['slice_name'].isin(compart_slice_name), :].copy()

mixed_counts = mixed_adata.obs['niche_label'].value_counts(normalize=True)
compart_counts = compart_adata.obs['niche_label'].value_counts(normalize=True)

ordered_labels_mixed = [label for label in niche_color_dict.keys() if label in mixed_counts.index]
mixed_values = [mixed_counts[label] for label in ordered_labels_mixed]
mixed_colors = [niche_color_dict[label] for label in ordered_labels_mixed]
mixed_labels_text = [f"{v*100:.1f}%" for v in mixed_values]

ordered_labels_compart = [label for label in niche_color_dict.keys() if label in compart_counts.index]
compart_values = [compart_counts[label] for label in ordered_labels_compart]
compart_colors = [niche_color_dict[label] for label in ordered_labels_compart]
compart_labels_text = [f"{v*100:.1f}%" for v in compart_values]

# Build subplots
fig = make_subplots(
    rows=1, cols=2, specs=[[{'type': 'domain'}, {'type': 'domain'}]],
    subplot_titles=("Mixed Group", "Compartmentalized Group")
)

fig.add_trace(go.Pie(
    values=mixed_values,
    labels=ordered_labels_mixed,
    name="Mixed",
    marker=dict(colors=mixed_colors,
                # line=dict(color='#FFFFFF', width=2),
                ),
    text=mixed_labels_text,
    textinfo='label+text',  # label + formatted percent
    textposition='outside',
    pull=[0.05]*len(mixed_values),
    hole=0,
    sort=True
), row=1, col=1)

fig.add_trace(go.Pie(
    values=compart_values,
    labels=ordered_labels_compart,
    name="Compartmentalized",
    marker=dict(colors=compart_colors,
                # line=dict(color='#FFFFFF', width=2),
                ),
    text=compart_labels_text,
    textinfo='label+text',
    textposition='outside',
    pull=[0.05]*len(compart_values),
    hole=0,
    sort=True
), row=1, col=2)

# Update layout
fig.update_layout(
    title_text="Niche Composition Comparison",
    title_x=0.5,
    height=700,
    width=1100,
    showlegend=False,
    margin=dict(t=100, b=50),
    font=dict(size=16)
)

fig.show()

Data type cannot be displayed: application/vnd.plotly.v1+json

The niche composition comparison between mixed and compartmentalized group (all basic cell niches labeled as “basic”)

[37]:
# Pie chart data
mixed_counts = mixed_adata.obs['csn_label'].value_counts(normalize=True)
compart_counts = compart_adata.obs['csn_label'].value_counts(normalize=True)

ordered_labels_mixed = [label for label in csn_color_dict.keys() if label in mixed_counts.index]
mixed_values = [mixed_counts[label] for label in ordered_labels_mixed]
mixed_colors = [csn_color_dict[label] for label in ordered_labels_mixed]
mixed_labels_text = [f"{v*100:.1f}%" for v in mixed_values]

ordered_labels_compart = [label for label in csn_color_dict.keys() if label in compart_counts.index]
compart_values = [compart_counts[label] for label in ordered_labels_compart]
compart_colors = [csn_color_dict[label] for label in ordered_labels_compart]
compart_labels_text = [f"{v*100:.1f}%" for v in compart_values]

# Build subplots
fig = make_subplots(
    rows=1, cols=2, specs=[[{'type': 'domain'}, {'type': 'domain'}]],
    subplot_titles=("Mixed Group", "Compartmentalized Group")
)

fig.add_trace(go.Pie(
    values=mixed_values,
    labels=ordered_labels_mixed,
    name="Mixed",
    marker=dict(colors=mixed_colors,
                # line=dict(color='#FFFFFF', width=2),
                ),
    text=mixed_labels_text,
    textinfo='label+text',  # label + formatted percent
    textposition='outside',
    pull=[0.05]*len(mixed_values),
    hole=0,
    sort=True
), row=1, col=1)

fig.add_trace(go.Pie(
    values=compart_values,
    labels=ordered_labels_compart,
    name="Compartmentalized",
    marker=dict(colors=compart_colors,
                # line=dict(color='#FFFFFF', width=2),
                ),
    text=compart_labels_text,
    textinfo='label+text',
    textposition='outside',
    pull=[0.05]*len(compart_values),
    hole=0,
    sort=True
), row=1, col=2)

# Update layout
fig.update_layout(
    title_text="Niche Composition Comparison",
    title_x=0.5,
    height=500,
    width=1100,
    showlegend=False,
    margin=dict(t=100, b=50),
    font=dict(size=16)
)

fig.show()

Data type cannot be displayed: application/vnd.plotly.v1+json

The condition-specific niche composition comparison between mixed and compartmentalized group

[38]:
# Pie chart data
mixed_adata = mixed_adata[mixed_adata.obs['csn_label'] != 'basic', :].copy()
compart_adata = compart_adata[compart_adata.obs['csn_label'] != 'basic', :].copy()

mixed_counts = mixed_adata.obs['csn_label'].value_counts(normalize=True)
compart_counts = compart_adata.obs['csn_label'].value_counts(normalize=True)

ordered_labels_mixed = [label for label in csn_color_dict.keys() if label in mixed_counts.index]
mixed_values = [mixed_counts[label] for label in ordered_labels_mixed]
mixed_colors = [csn_color_dict[label] for label in ordered_labels_mixed]
mixed_labels_text = [f"{v*100:.1f}%" for v in mixed_values]

ordered_labels_compart = [label for label in csn_color_dict.keys() if label in compart_counts.index]
compart_values = [compart_counts[label] for label in ordered_labels_compart]
compart_colors = [csn_color_dict[label] for label in ordered_labels_compart]
compart_labels_text = [f"{v*100:.1f}%" for v in compart_values]

# Build subplots
fig = make_subplots(
    rows=1, cols=2, specs=[[{'type': 'domain'}, {'type': 'domain'}]],
    subplot_titles=("Mixed Group", "Compartmentalized Group")
)

fig.add_trace(go.Pie(
    values=mixed_values,
    labels=ordered_labels_mixed,
    name="Mixed",
    marker=dict(colors=mixed_colors,
                # line=dict(color='#FFFFFF', width=2),
                ),
    text=mixed_labels_text,
    textinfo='label+text',  # label + formatted percent
    textposition='outside',
    pull=[0.05]*len(mixed_values),
    hole=0,
    sort=True
), row=1, col=1)

fig.add_trace(go.Pie(
    values=compart_values,
    labels=ordered_labels_compart,
    name="Compartmentalized",
    marker=dict(colors=compart_colors,
                # line=dict(color='#FFFFFF', width=2),
                ),
    text=compart_labels_text,
    textinfo='label+text',
    textposition='outside',
    pull=[0.05]*len(compart_values),
    hole=0,
    sort=True
), row=1, col=2)

# Update layout
fig.update_layout(
    title_text="Niche Composition Comparison",
    title_x=0.5,
    height=500,
    width=1100,
    showlegend=False,
    margin=dict(t=100, b=50),
    font=dict(size=16)
)

fig.show()

Data type cannot be displayed: application/vnd.plotly.v1+json

Classification of mixed and compartmentalized group using niche proportion

compartmentalized group = 1; mixed group = 0

Proportion of a specific CSN across all CSNs

[39]:
from sklearn.metrics import roc_curve, auc

group_labels = []
for cond_name in cond_name_list:
    adata = cond_concat[cond_concat.obs['slice_name'] == cond_name, :].copy()
    group = int(adata.obs['sample_types'][0] == 'compartmentalized')
    group_labels.append(group)

group_labels = np.array(group_labels)

ncol = 4
nrow = int(np.ceil((len(csns)) / ncol))
fig, axes = plt.subplots(nrow, ncol, figsize=(4*ncol, 4*nrow))
axes = axes.flatten()

for i, roi in enumerate(csns):

    ratios = []
    for cond_name in cond_name_list:
        adata = cond_concat[cond_concat.obs['slice_name'] == cond_name, :].copy()
        roi_ratio = (adata.obs['csn_label'] == roi).mean() / (1 - (adata.obs['csn_label'] == 'basic').mean())
        ratios.append(roi_ratio)

    ratios = np.array(ratios)

    # ROC curve
    fpr, tpr, _ = roc_curve(group_labels, ratios)
    roc_auc = auc(fpr, tpr)

    ax = axes[i]
    ax.plot(fpr, tpr, label=f'AUC = {roc_auc:.3f}')
    ax.plot([0, 1], [0, 1], 'k--', linewidth=0.8)
    ax.set_title(f'Niche {roi}', fontsize=16)
    ax.set_xlabel('FPR', fontsize=16)
    ax.set_ylabel('TPR', fontsize=16)
    ax.grid(False)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.legend(loc='lower right')

for j in range(len(csns), nrow*ncol):
    axes[j].axis('off')

plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_72_0.png

Proportion of a specific CSN across the whole slice

[40]:
ncol = 4
nrow = int(np.ceil((len(csns)+1) / ncol))
fig, axes = plt.subplots(nrow, ncol, figsize=(4*ncol, 4*nrow))
axes = axes.flatten()

for i, roi in enumerate(csns):

    ratios = []
    for cond_name in cond_name_list:
        adata = cond_concat[cond_concat.obs['slice_name'] == cond_name, :].copy()
        roi_ratio = (adata.obs['csn_label'] == roi).mean()
        ratios.append(roi_ratio)

    ratios = np.array(ratios)

    # ROC curve
    fpr, tpr, _ = roc_curve(group_labels, ratios)
    roc_auc = auc(fpr, tpr)

    ax = axes[i]
    ax.plot(fpr, tpr, label=f'AUC = {roc_auc:.3f}')
    ax.plot([0, 1], [0, 1], 'k--', linewidth=0.8)
    ax.set_title(f'Niche {roi}', fontsize=16)
    ax.set_xlabel('FPR', fontsize=16)
    ax.set_ylabel('TPR', fontsize=16)
    ax.grid(False)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.legend(loc='lower right')

    if i == len(csns)-1:
        i += 1
        ratios = []
        for cond_name in cond_name_list:
            adata = cond_concat[cond_concat.obs['slice_name'] == cond_name, :].copy()
            roi_ratio = (adata.obs['csn_label'].isin(csns)).mean()
            ratios.append(roi_ratio)

        ratios = np.array(ratios)

        # ROC curve
        fpr, tpr, _ = roc_curve(group_labels, ratios)
        roc_auc = auc(fpr, tpr)

        ax = axes[i]
        ax.plot(fpr, tpr, label=f'AUC = {roc_auc:.3f}')
        ax.plot([0, 1], [0, 1], 'k--', linewidth=0.8)
        ax.set_title(f'All CSNs', fontsize=16)
        ax.set_xlabel('FPR', fontsize=16)
        ax.set_ylabel('TPR', fontsize=16)
        ax.grid(False)
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        ax.legend(loc='lower right')

for j in range(len(csns)+1, nrow*ncol):
    axes[j].axis('off')

plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_74_0.png

Niche-niche co-localization analysis

Split into two subtypes

[41]:
mixed_slice_name = [f'p{id}' for id in mixed_group]
compart_slice_name = [f'p{id}' for id in compartmentalized_group]
mixed_adata_list = [cond_list[i].copy() for i in range(len(cond_list)) if cond_list[i].obs['slice_name'][0] in mixed_slice_name]
compart_adata_list = [cond_list[i].copy() for i in range(len(cond_list)) if cond_list[i].obs['slice_name'][0] in compart_slice_name]
print('Mixed group adata list length:', len(mixed_adata_list))
print('Compartmentalized group adata list length:', len(compart_adata_list))
Mixed group adata list length: 19
Compartmentalized group adata list length: 15

mixed group

[42]:
csn_labels = ['basic', 'R0', 'R1', 'R2', 'R3', 'R4']
nnc_results_mixed = nnc_enrichment_test(mixed_adata_list,
                                        'csn_label',
                                        niche_summary=csn_labels,
                                        spatial_key='spatial',
                                        cut_percentage=99,
                                        method='fisher',
                                        alpha=0.05,
                                        fdr_method='fdr_by',
                                        log2fc_threshold=1,
                                        prop_threshold=0.01,
                                        verbose=True,
                                        )
nnc_df_mixed, edge_prop_mtx_mixed, n1_count_mixed = nnc_results_mixed

nnc_df_mixed['stars'] = nnc_df_mixed['q-value'].apply(p2stars)

matrix_df = pd.DataFrame(
    data=edge_prop_mtx_mixed,
    index=csn_labels,
    columns=csn_labels,
)

for i in range(matrix_df.shape[0]):
    for j in range(matrix_df.shape[1]):
        if i == j:
            matrix_df.iloc[i, j] = np.nan

stars_df = pd.DataFrame(
    '',
    index=matrix_df.index,
    columns=matrix_df.columns
)

for _, row in nnc_df_mixed[nnc_df_mixed['enrichment']].iterrows():
    n1 = row['niche1']
    n2 = row['niche2']
    if (n1 in stars_df.index) and (n2 in stars_df.columns):
        stars_df.loc[n1, n2] = row['stars']

plt.figure(figsize=(5, 4))

ax  = sns.heatmap(
    matrix_df,
    cmap='Greens',
    # cbar_kws={'label': 'Edge type proportion'},
    linewidths=0.7,
    linecolor='gray',
    # square=True,
)

for i, n1 in enumerate(matrix_df.index):
    for j, n2 in enumerate(matrix_df.columns):
        if i == j:
            ax.plot([i, i+1], [i, i+1], color='gray', linewidth=0.7)
            # ax.plot([i+1, i], [i, i+1], color='gray', linewidth=0.7)
            continue
        star = stars_df.iloc[i, j]
        if star:
            if matrix_df.iloc[i, j] > np.nanmax(matrix_df.values) * 0.7:
                color='white'
            else:
                color='black'
            ax.text(j + 0.5, i + 0.6, star, ha='center', va='center', color=color, fontsize=20, fontweight='bold')

n_rows, n_cols = matrix_df.shape
ax.plot([0, n_cols], [n_rows, n_rows], color='gray', linewidth=0.7, clip_on=False)
ax.plot([n_cols, n_cols], [0, n_rows], color='gray', linewidth=0.7, clip_on=False)

ax.set_xticklabels(ax.get_xticklabels(), rotation=0, ha='center', fontsize=16)
ax.set_yticklabels(ax.get_yticklabels(), rotation=0, ha='right', fontsize=16)
ax.set_ylabel('Niche', fontsize=16)
ax.set_xlabel('Niche', fontsize=16)
ax.set_title('Edge Type Proportions (Mixed)', fontsize=16)
ax.collections[0].colorbar.ax.yaxis.label.set_size(16)
ax.collections[0].colorbar.ax.tick_params(labelsize=16)
ax.grid(False)

plt.tight_layout()
plt.show()
6 niches in total.

../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_79_1.png

Compartmentalized group

[43]:
csn_labels = ['basic', 'R0', 'R1', 'R2', 'R3', 'R4']
nnc_results_compart = nnc_enrichment_test(compart_adata_list,
                                        'csn_label',
                                        niche_summary=csn_labels,
                                        spatial_key='spatial',
                                        cut_percentage=99,
                                        method='fisher',
                                        alpha=0.05,
                                        fdr_method='fdr_by',
                                        log2fc_threshold=1,
                                        prop_threshold=0.01,
                                        verbose=True,
                                        )
nnc_df_compart, edge_prop_mtx_compart, n1_count_compart = nnc_results_compart

nnc_df_compart['stars'] = nnc_df_compart['q-value'].apply(p2stars)
matrix_df = pd.DataFrame(
    data=edge_prop_mtx_compart,
    index=csn_labels,
    columns=csn_labels,
)

for i in range(matrix_df.shape[0]):
    for j in range(matrix_df.shape[1]):
        if i == j:
            matrix_df.iloc[i, j] = np.nan

stars_df = pd.DataFrame(
    '',
    index=matrix_df.index,
    columns=matrix_df.columns
)

for _, row in nnc_df_compart[nnc_df_compart['enrichment']].iterrows():
    n1 = row['niche1']
    n2 = row['niche2']
    if (n1 in stars_df.index) and (n2 in stars_df.columns):
        stars_df.loc[n1, n2] = row['stars']

plt.figure(figsize=(5, 4))

ax  = sns.heatmap(
    matrix_df,
    cmap='Greens',
    # cbar_kws={'label': 'Edge type proportion'},
    linewidths=0.7,
    linecolor='gray',
    # square=True,
)

for i, n1 in enumerate(matrix_df.index):
    for j, n2 in enumerate(matrix_df.columns):
        if i == j:
            ax.plot([i, i+1], [i, i+1], color='gray', linewidth=0.7)
            # ax.plot([i+1, i], [i, i+1], color='gray', linewidth=0.7)
            continue
        star = stars_df.iloc[i, j]
        if star:
            if matrix_df.iloc[i, j] > np.nanmax(matrix_df.values) * 0.7:
                color='white'
            else:
                color='black'
            ax.text(j + 0.5, i + 0.6, star, ha='center', va='center', color=color, fontsize=20, fontweight='bold')

n_rows, n_cols = matrix_df.shape
ax.plot([0, n_cols], [n_rows, n_rows], color='gray', linewidth=0.7, clip_on=False)
ax.plot([n_cols, n_cols], [0, n_rows], color='gray', linewidth=0.7, clip_on=False)

ax.set_xticklabels(ax.get_xticklabels(), rotation=0, ha='center', fontsize=16)
ax.set_yticklabels(ax.get_yticklabels(), rotation=0, ha='right', fontsize=16)
ax.set_ylabel('Niche', fontsize=16)
ax.set_xlabel('Niche', fontsize=16)
ax.set_title('Edge Type Proportions (Compartmentalized)', fontsize=16)
ax.collections[0].colorbar.ax.yaxis.label.set_size(16)
ax.collections[0].colorbar.ax.tick_params(labelsize=16)
ax.grid(False)

plt.tight_layout()
plt.show()
6 niches in total.

../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_81_1.png

Comparing niche-niche co-localization pattern between mixed group and compartmentalized group

[44]:
edge_prop_mtx_compart_list = []
edge_prop_mtx_mixed_list = []

niche_labels = ['basic', 'R0', 'R1', 'R2', 'R3', 'R4']
n_niches = len(niche_labels)

for i in range(len(compart_adata_list)):
    edge_prop_mtx, _ = cal_nnc_mtx(compart_adata_list[i], 'csn_label', niche_summary=niche_labels, adj_mtx_key='delaunay_adj_mtx')
    edge_prop_mtx_compart_list.append(edge_prop_mtx)

for i in range(len(mixed_adata_list)):
    edge_prop_mtx, _ = cal_nnc_mtx(mixed_adata_list[i], 'csn_label', niche_summary=niche_labels, adj_mtx_key='delaunay_adj_mtx')
    edge_prop_mtx_mixed_list.append(edge_prop_mtx)

nnc_btgroup_df = nnc_between_groups_test(edge_prop_mtx_compart_list,
                                         edge_prop_mtx_mixed_list,
                                         niche_labels,
                                         min_valid=3,
                                         alpha=0.05,
                                         alternative="two-sided",
                                         fdr_method="fdr_bh",
                                         )
nnc_btgroup_df.head()
[44]:
niche1 niche2 mean1 mean2 delta_mean n1_valid n2_valid p_value q_value rejected
0 basic R0 0.050620 0.003027 0.047593 15 19 0.006082 0.021570 False
1 basic R1 0.045948 0.002907 0.043041 15 19 0.105328 0.170161 False
2 basic R2 0.019912 0.000100 0.019812 15 19 0.000413 0.004131 False
3 basic R3 0.764465 0.986027 -0.221562 15 19 0.000078 0.001164 False
4 basic R4 0.119055 0.007939 0.111115 15 19 0.002008 0.010039 False
[45]:
delta_mtx = np.full((n_niches, n_niches), np.nan, dtype=float)
q_mtx = np.full((n_niches, n_niches), np.nan, dtype=float)

for _, r in nnc_btgroup_df.iterrows():
    i = niche_labels.index(r["niche1"])
    j = niche_labels.index(r["niche2"])
    delta_mtx[i, j] = r["delta_mean"]
    q_mtx[i, j] = r["q_value"]

plt.figure(figsize=(5, 4))
ax = sns.heatmap(
    delta_mtx,
    xticklabels=niche_labels,
    yticklabels=niche_labels,
    cmap="coolwarm",
    center=0,
    linewidths=0.5,
    linecolor="white",
    # cbar_kws={"label": r"$\Delta$ edge_prop"}
)
ax.set_title(r"$\Delta$ edge_prop (compartmentalized - mixed)", fontsize=16)
ax.set_xticklabels(ax.get_xticklabels(), rotation=0, ha='center', fontsize=16)
ax.set_yticklabels(ax.get_yticklabels(), rotation=0, ha='right', fontsize=16)
ax.set_ylabel('Niche', fontsize=16)
ax.set_xlabel('Niche', fontsize=16)
ax.collections[0].colorbar.ax.yaxis.label.set_size(16)
ax.collections[0].colorbar.ax.tick_params(labelsize=16)
ax.grid(False)
ax.plot([0, n_niches], [n_niches, n_niches], color='white', linewidth=0.5, clip_on=False)
ax.plot([n_niches, n_niches], [0, n_niches], color='white', linewidth=0.5, clip_on=False)

for i in range(n_niches):
    for j in range(n_niches):
        if i == j: continue
        q = q_mtx[i, j]
        if np.isnan(q): continue
        star = p2stars(q)
        if star != "":
            ax.text(j + 0.5, i + 0.6, star, ha="center", va="center", fontsize=20, color="black", fontweight="bold")

plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_84_0.png

Tumor-immune interaction niche (R3)

Cell-cell interaction analysis for R3

[46]:
cci_results = cci_enrichment_test(cond_list,
                                  'csn_label',
                                  'all_group_name',
                                  niche_summary=['R3'],
                                  spatial_key='spatial',
                                  cut_percentage=99,
                                  method='fisher',
                                  alpha=0.05,
                                  fdr_method='fdr_by',
                                  log2fc_threshold=1,
                                  prop_threshold=0.01,
                                  verbose=True,
                                  )
cci_df, test_norm_list, bg_norm_list, test_edge_count_list, bg_edge_count_list = cci_results
cci_df.head()
1 niches and 16 cell types in total.

Testing niche R3...
Finished!
[46]:
niche_idx niche ct1_idx ct1 ct2_idx ct2 test_edge_count bg_edge_count test_edge_prop bg_edge_prop oddsratio p-value q-value log2fc enrichment
0 0 R3 0 B 0 B 345.0 14794.0 0.002975 0.041401 0.069090 0.000000 0.000000 -3.798676 False
1 0 R3 1 CD3 T 0 B 254.0 771.0 0.002190 0.002158 1.015184 0.827408 1.000000 0.021693 False
2 0 R3 1 CD3 T 1 CD3 T 275.0 782.0 0.002371 0.002188 1.083819 0.252054 1.000000 0.115859 False
3 0 R3 2 CD4 T 0 B 436.0 7673.0 0.003760 0.021473 0.171981 0.000000 0.000000 -2.513795 False
4 0 R3 2 CD4 T 1 CD3 T 515.0 1890.0 0.004441 0.005289 0.838931 0.000362 0.003007 -0.252146 False
[47]:
niche_labels = ['R3']
ct_labels = sorted(adata_concat_new.obs['all_group_name'].unique())

cci_df['stars'] = cci_df['q-value'].apply(p2stars)

figrows = 1
figcols = 1

fig, axes = plt.subplots(figrows, figcols, figsize=(9, 8))

for idx in range(figrows * figcols):

    imgrow = idx // figcols
    imgcol = idx % figcols

    if figcols == 1 and figrows == 1:
        ax = axes
    elif figrows == 1:
        ax = axes[imgcol]
    elif figcols == 1:
        ax = axes[imgrow]
    else:
        ax = axes[imgrow, imgcol]

    if idx >= len(niche_labels):
        ax.axis('off')
        continue

    sub_df = cci_df[cci_df['niche_idx'] == idx]

    matrix_df = pd.DataFrame(
        data=test_norm_list[idx],
        index=ct_labels,
        columns=ct_labels,
    )

    for i in range(matrix_df.shape[0]):
        for j in range(matrix_df.shape[1]):
            if i < j:
                matrix_df.iloc[i, j] = np.nan

    stars_df = pd.DataFrame(
        '',
        index=matrix_df.index,
        columns=matrix_df.columns
    )

    for _, row in sub_df[sub_df['enrichment']].iterrows():
        ct1 = row['ct1']
        ct2 = row['ct2']
        if (ct1 in stars_df.index) and (ct2 in stars_df.columns):
            stars_df.loc[ct1, ct2] = row['stars']

    sns_heatmap = sns.heatmap(
        matrix_df,
        cmap='Oranges',
        mask=matrix_df.isna(),
        # cbar_kws={'label': 'Edge type proportion'},
        # linewidths=0.5,
        # linecolor='gray',
        # square=True,
        ax=ax,
    )

    n_rows, n_cols = matrix_df.shape

    for i, ct1 in enumerate(matrix_df.index):
        ax.plot([0, i+1], [i, i], color='gray', linewidth=0.5, clip_on=False)
        ax.plot([i+1, i+1], [i, n_rows], color='gray', linewidth=0.5, clip_on=False)
        for j, ct2 in enumerate(matrix_df.columns):
            star = stars_df.iloc[i, j]
            if star:
                if matrix_df.iloc[i, j] > np.nanmax(matrix_df.values) * 0.7:
                    color='white'
                else:
                    color='black'
                ax.text(j + 0.5, i + 0.6, star, ha='center', va='center', color=color, fontsize=16, fontweight='bold')

    ax.plot([0, 0], [0, n_rows], color='gray', linewidth=0.5, clip_on=False)
    ax.plot([0, n_cols], [n_rows, n_rows], color='gray', linewidth=0.5, clip_on=False)
    # ax.plot([0, n_cols], [n_rows, n_rows], color='gray', linewidth=0.5, clip_on=False)
    # ax.plot([n_cols, n_cols], [0, n_rows], color='gray', linewidth=0.5, clip_on=False)

    ax.set_xticklabels(ax.get_xticklabels(), rotation=90, ha='center', fontsize=16)
    ax.set_yticklabels(ax.get_yticklabels(), rotation=0, ha='right', fontsize=16)
    ax.set_ylabel('Cell Type', fontsize=16)
    ax.set_xlabel('Cell Type', fontsize=16)
    ax.set_title(f'Niche {niche_labels[idx]}', fontsize=16)
    ax.collections[0].colorbar.ax.yaxis.label.set_size(16)
    ax.collections[0].colorbar.ax.tick_params(labelsize=16)
    ax.grid(False)

plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_88_0.png

Niche neighborhood classification

[48]:
mix_cell_count = 100
csn_labels = ['basic', 'R0', 'R1', 'R2', 'R3', 'R4']
niche_neigh_rep_dict = {}
for i in range(len(cond_list)):
    adata = cond_list[i].copy()
    if np.sum(adata.obs['csn_label'] == 'R3') < mix_cell_count:
        print(f'Slice {adata.obs["slice_name"][0]} skipped due to low R3 cell count.')
        continue
    edge_prop_mtx, _ = cal_nnc_mtx(adata, 'csn_label', niche_summary=csn_labels, adj_mtx_key='delaunay_adj_mtx')
    rep = np.array(edge_prop_mtx[csn_labels.index('R3'), :], dtype=float)
    niche_neigh_rep_dict[adata.obs['slice_name'][0]] = rep[~np.isnan(rep)]
Slice p18 skipped due to low R3 cell count.
Slice p21 skipped due to low R3 cell count.
Slice p31 skipped due to low R3 cell count.
Slice p33 skipped due to low R3 cell count.
[53]:
from sklearn.metrics.pairwise import cosine_similarity
from scipy.spatial.distance import jensenshannon
from scipy.spatial.distance import squareform
from scipy.cluster.hierarchy import linkage, leaves_list, fcluster
from matplotlib import gridspec
from scipy.cluster.hierarchy import dendrogram

patients = list(niche_neigh_rep_dict.keys())
n_patient = len(patients)
X = np.vstack([niche_neigh_rep_dict[p] for p in patients])  # n_patient x n_feature
edge_prop_df = pd.DataFrame(X, index=patients, columns=csn_labels)

# S = np.zeros((n_patient, n_patient))
# for i in range(n_patient):
#     for j in range(n_patient):
#         S[i, j] = 1 - jensenshannon(X[i], X[j], base=2)

# S = np.corrcoef(X)  # n_patient x n_patient
S = cosine_similarity(X)  # n_patient x n_patient
S_df = pd.DataFrame(S, index=patients, columns=patients)

D = 1.0 - S_df.values
np.fill_diagonal(D, 0.0)
# D = np.sqrt(D)
condensed_D = squareform(D, checks=False)
Z = linkage(condensed_D, method='average', metric='euclidean')

order = leaves_list(Z)
ordered_patients = [patients[i] for i in order]
S_ord = S_df.loc[ordered_patients, ordered_patients]

fig = plt.figure(figsize=(9, 9))
gs = gridspec.GridSpec(2, 2, height_ratios=[1, 6], width_ratios=[18, 1], hspace=0.02, wspace=0.15)

ax_dendro = fig.add_subplot(gs[0, 0])
ax_hm = fig.add_subplot(gs[1, 0])
cax = fig.add_subplot(gs[1, 1])

dendrogram(Z, ax=ax_dendro, color_threshold=0, no_labels=True, link_color_func=lambda k: "black")
ax_dendro.axis("off")

sns.heatmap(
    S_ord,
    xticklabels=ordered_patients,
    yticklabels=ordered_patients,
    cmap="RdBu_r",
    # linewidths=0.5,
    linecolor="white",
    cbar=True,
    ax=ax_hm,
    cbar_ax=cax,
)
ax_dendro.set_title('R3 niche neighborhood similarity heatmap', fontsize=20)
ax_hm.set_xticklabels(ax_hm.get_xticklabels(), rotation=90, ha='center', fontsize=16)
ax_hm.set_yticklabels(ax_hm.get_yticklabels(), rotation=0, ha='right', fontsize=16)
# ax_hm.set_ylabel('Patient', fontsize=16)
# ax_hm.set_xlabel('Patient', fontsize=16)
ax_hm.collections[0].colorbar.ax.yaxis.label.set_size(16)
ax_hm.collections[0].colorbar.ax.tick_params(labelsize=16)
ax_hm.grid(False)

# for label in ax_hm.get_xticklabels():
#     patient_id = label.get_text()
#     if patient_id in slice_color_dict:
#         label.set_color(slice_color_dict[patient_id])
# for label in ax_hm.get_yticklabels():
#     patient_id = label.get_text()
#     if patient_id in slice_color_dict:
#         label.set_color(slice_color_dict[patient_id])

plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_91_0.png
[50]:
labels = fcluster(Z, t=4, criterion="maxclust")
cluster_df = pd.DataFrame({"patient": patients, "cluster": labels})
cluster_counts = cluster_df["cluster"].value_counts()
top2_clusters = cluster_counts.index[:2].tolist()
cluster_top2_df = cluster_df[cluster_df["cluster"].isin(top2_clusters)]

subtype_1_name = cluster_top2_df[cluster_top2_df["cluster"] == top2_clusters[0]]["patient"].tolist()
subtype_2_name = cluster_top2_df[cluster_top2_df["cluster"] == top2_clusters[1]]["patient"].tolist()
print("Subtype 1 patients (n={}):".format(len(subtype_1_name)), subtype_1_name)
print("Subtype 2 patients (n={}):".format(len(subtype_2_name)), subtype_2_name)

df1 = edge_prop_df.loc[subtype_1_name].copy()
df2 = edge_prop_df.loc[subtype_2_name].copy()
df1["Subtype"] = "Subtype 1"
df2["Subtype"] = "Subtype 2"
df_long = pd.concat([df1, df2], axis=0)
df_long = df_long.reset_index().rename(columns={"index": "patient"})
df_long = df_long.melt(id_vars=["patient", "Subtype"], value_vars=csn_labels, var_name="Neighbor_niche", value_name="Edge_prop")
Subtype 1 patients (n=20): ['p1', 'p2', 'p3', 'p4', 'p6', 'p7', 'p8', 'p11', 'p12', 'p14', 'p17', 'p20', 'p27', 'p29', 'p34', 'p36', 'p38', 'p39', 'p40', 'p41']
Subtype 2 patients (n=6): ['p5', 'p9', 'p10', 'p13', 'p23', 'p32']
[51]:
from scipy.stats import mannwhitneyu
from matplotlib.patches import Patch

subtype_color_dict = {"Subtype 1": "#D8B365", "Subtype 2": "#A6CEE3",  }

plt.figure(figsize=(1.7 * len(csn_labels), 4))
ax = plt.gca()

x_pos = np.arange(len(csn_labels))
offset = 0.22

# boxplot: Subtype 1
sns.boxplot(data=df_long[df_long["Subtype"] == "Subtype 1"], x="Neighbor_niche", y="Edge_prop",
            order=csn_labels, color=subtype_color_dict["Subtype 1"], width=0.35, ax=ax, positions=x_pos - offset, showfliers=False)

# boxplot: Subtype 2
sns.boxplot(data=df_long[df_long["Subtype"] == "Subtype 2"], x="Neighbor_niche", y="Edge_prop",
            order=csn_labels, color=subtype_color_dict["Subtype 2"], width=0.35, ax=ax, positions=x_pos + offset, showfliers=False)
# scatter
sns.stripplot(data=df_long, x="Neighbor_niche", y="Edge_prop", hue="Subtype", dodge=True, jitter=True, size=7, alpha=0.7,
              edgecolor="black", linewidth=1, palette={"Subtype 1": "white", "Subtype 2": "white"}, ax=ax)

stats_res = []
for i, niche in enumerate(csn_labels):
    x1 = df_long[(df_long["Neighbor_niche"] == niche) & (df_long["Subtype"] == "Subtype 1")]["Edge_prop"]
    x2 = df_long[(df_long["Neighbor_niche"] == niche) & (df_long["Subtype"] == "Subtype 2")]["Edge_prop"]

    y_max = max(x1.max(), x2.max())

    _, p = mannwhitneyu(x1, x2, alternative="two-sided")
    stats_res.append({"Neighbor_niche": niche, "pvalue": p})
    ax.text(i, y_max + 0.05, f"p = {p:.2e}" if p == p else "n.a.", ha="center", va="bottom", fontsize=16)

legend_elements = [
    Patch(facecolor=subtype_color_dict["Subtype 1"], edgecolor="black", label="Subtype 1"),
    Patch(facecolor=subtype_color_dict["Subtype 2"], edgecolor="black", label="Subtype 2")
]
ax.legend(handles=legend_elements, frameon=False, fontsize=16, loc="upper right")

ax.set_ylabel("Edge proportion", fontsize=16)
ax.set_xlabel("")
ax.set_xticklabels(csn_labels, rotation=0, ha="center", fontsize=16)
ax.tick_params(axis="y", labelsize=16)
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.grid(False)

plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_93_0.png

Differential expression analysis

Keratin+ tumor

[56]:
from scipy.stats import wilcoxon, mannwhitneyu
from statannotations.Annotator import Annotator
np.random.seed(1234)

noi = 'R3'
ctoi = ['Keratin+ tumor']
markers = cond_concat_new.var_names.tolist()
markers = ['Ki67', 'phospho-S6']
n_markers = len(markers)

avg_expr_noi_subtype_1 = [[] for _ in range(n_markers)]
avg_expr_noi_subtype_2 = [[] for _ in range(n_markers)]
# avg_expr_other_subtype_1 = [[] for _ in range(n_markers)]
# avg_expr_other_subtype_2 = [[] for _ in range(n_markers)]
# avg_expr_whole_subtype_1 = [[] for _ in range(n_markers)]
# avg_expr_whole_subtype_2 = [[] for _ in range(n_markers)]
# new_marker_list = []

for i, slice_name in enumerate(subtype_1_name):
    adata = cond_concat_new[cond_concat_new.obs['slice_name'] == slice_name, :].copy()
    adata_noi = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] == noi), :].copy()
    # adata_other = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] != noi), :].copy()
    # adata_whole = adata[adata.obs['all_group_name'].isin(ctoi), :].copy()
    for j, marker in enumerate(markers):
        avg_expr_noi_subtype_1[j].append(np.mean(adata_noi[:, marker].X > 0.5))
        # avg_expr_other_subtype_1[j].append(np.mean(adata_other[:, marker].X > 0.5))
        # avg_expr_whole_subtype_1[j].append(np.mean(adata_whole[:, marker].X > 0.5))

for i, slice_name in enumerate(subtype_2_name):
    adata = cond_concat_new[cond_concat_new.obs['slice_name'] == slice_name, :].copy()
    adata_noi = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] == noi), :].copy()
    # adata_other = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] != noi), :].copy()
    # adata_whole = adata[adata.obs['all_group_name'].isin(ctoi), :].copy()
    for j, marker in enumerate(markers):
        avg_expr_noi_subtype_2[j].append(np.mean(adata_noi[:, marker].X > 0.5))
        # avg_expr_other_subtype_2[j].append(np.mean(adata_other[:, marker].X > 0.5))
        # avg_expr_whole_subtype_2[j].append(np.mean(adata_whole[:, marker].X > 0.5))

### noi
res = []
for j, marker in enumerate(markers):
    x1 = np.array(avg_expr_noi_subtype_1[j], dtype=float)
    x2 = np.array(avg_expr_noi_subtype_2[j], dtype=float)
    u, p = mannwhitneyu(x1, x2, alternative="two-sided")
    res.append({"marker": marker, "pvalue": p, "U": u, "n1": len(x1), "n2": len(x2)})
res_df_noi = pd.DataFrame(res)
alpha = 0.05
sig_df_noi = res_df_noi[res_df_noi["pvalue"] < alpha]#.sort_values("pvalue")
print(f"Significant markers ({noi}): {sig_df_noi.shape[0]} / {len(markers)}")
sig_markers = sig_df_noi["marker"].tolist()

# ### other
# res = []
# for j, marker in enumerate(markers):
#     x1 = np.array(avg_expr_other_subtype_1[j], dtype=float)
#     x2 = np.array(avg_expr_other_subtype_2[j], dtype=float)
#     u, p = mannwhitneyu(x1, x2, alternative="two-sided")
#     res.append({"marker": marker, "pvalue": p, "U": u, "n1": len(x1), "n2": len(x2)})
# res_df_other = pd.DataFrame(res)
# alpha = 0.05
# print(f"Significant markers (Other): {res_df_other[res_df_other['pvalue'] < alpha].shape[0]} / {len(markers)}")

# ### whole
# res = []
# for j, marker in enumerate(markers):
#     x1 = np.array(avg_expr_whole_subtype_1[j], dtype=float)
#     x2 = np.array(avg_expr_whole_subtype_2[j], dtype=float)
#     u, p = mannwhitneyu(x1, x2, alternative="two-sided")
#     res.append({"marker": marker, "pvalue": p, "U": u, "n1": len(x1), "n2": len(x2)})
# res_df_whole = pd.DataFrame(res)
# alpha = 0.05
# print(f"Significant markers (Whole): {res_df_whole[res_df_whole['pvalue'] < alpha].shape[0]} / {len(markers)}")

if len(sig_markers) == 0:
    print("No significant markers under the chosen threshold.")
else:
    n_plot = len(sig_markers)
    if n_plot > 5:
        ncol = 5
    else:
        ncol = n_plot
    nrow = int(np.ceil(n_plot / ncol))

    ### noi
    fig, axes = plt.subplots(nrow, ncol, figsize=(3*ncol, 3.5*nrow))
    axes = np.array(axes).flatten()

    for k, marker in enumerate(sig_markers):
        j = markers.index(marker)
        vals_subtype_1 = avg_expr_noi_subtype_1[j]
        vals_subtype_2 = avg_expr_noi_subtype_2[j]
        ax = axes[k]

        df_plot = pd.DataFrame({
            "expr": np.concatenate([vals_subtype_1, vals_subtype_2]),
            "group": ["Subtype 1"]*len(vals_subtype_1) + ["Subtype 2"]*len(vals_subtype_2),
            # "patient": subtype_1_name + subtype_2_name
        })
        # print(df_plot.sort_values("expr", ascending=False).head(5))

        colors = [subtype_color_dict["Subtype 1"], subtype_color_dict["Subtype 2"]]
        sns.boxplot(data=df_plot, x="group", y="expr", ax=ax, palette=subtype_color_dict, width=0.5, showfliers=False)
        # sns.violinplot(data=df_plot, x="group", y="expr", ax=ax, palette=subtype_color_dict, inner='box', cut=1)
        sns.stripplot(data=df_plot, x="group", y="expr", ax=ax, jitter=True, size=7, alpha=0.7, color="white", edgecolor="black", linewidth=1)

        pv = sig_df_noi.set_index("marker").loc[marker, "pvalue"]

        ax.set_xticklabels(["Subtype 1", "Subtype 2"], fontsize=14)
        ax.tick_params(axis="y", labelsize=14)
        ax.set_title(f"{marker}\np = {pv:.3e}", fontsize=14)
        ax.set_ylabel("Mean Scaled Expression", fontsize=14)
        ax.set_xlabel("")
        ax.spines["right"].set_visible(False)
        ax.spines["top"].set_visible(False)
        ax.grid(False)

    for ax in axes[n_plot:]:
        ax.axis("off")

    plt.suptitle(f"{ctoi[0]} (R3)", fontsize=16)
    plt.tight_layout()
    plt.show()
Significant markers (R3): 2 / 2
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_96_1.png

CD8 T cell

[57]:
from scipy.stats import wilcoxon, mannwhitneyu
from statannotations.Annotator import Annotator
np.random.seed(1234)

noi = 'R3'
ctoi = ['CD8 T']
markers = cond_concat_new.var_names.tolist()
markers = ['HLA-DR']
n_markers = len(markers)

avg_expr_noi_subtype_1 = [[] for _ in range(n_markers)]
avg_expr_noi_subtype_2 = [[] for _ in range(n_markers)]
# avg_expr_other_subtype_1 = [[] for _ in range(n_markers)]
# avg_expr_other_subtype_2 = [[] for _ in range(n_markers)]
# avg_expr_whole_subtype_1 = [[] for _ in range(n_markers)]
# avg_expr_whole_subtype_2 = [[] for _ in range(n_markers)]
# new_marker_list = []

for i, slice_name in enumerate(subtype_1_name):
    adata = cond_concat_new[cond_concat_new.obs['slice_name'] == slice_name, :].copy()
    adata_noi = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] == noi), :].copy()
    # adata_other = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] != noi), :].copy()
    # adata_whole = adata[adata.obs['all_group_name'].isin(ctoi), :].copy()
    for j, marker in enumerate(markers):
        avg_expr_noi_subtype_1[j].append(np.mean(adata_noi[:, marker].X))
        # avg_expr_other_subtype_1[j].append(np.mean(adata_other[:, marker].X))
        # avg_expr_whole_subtype_1[j].append(np.mean(adata_whole[:, marker].X))

for i, slice_name in enumerate(subtype_2_name):
    adata = cond_concat_new[cond_concat_new.obs['slice_name'] == slice_name, :].copy()
    adata_noi = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] == noi), :].copy()
    # adata_other = adata[(adata.obs['all_group_name'].isin(ctoi)) & (adata.obs['csn_label'] != noi), :].copy()
    # adata_whole = adata[adata.obs['all_group_name'].isin(ctoi), :].copy()
    for j, marker in enumerate(markers):
        avg_expr_noi_subtype_2[j].append(np.mean(adata_noi[:, marker].X))
        # avg_expr_other_subtype_2[j].append(np.mean(adata_other[:, marker].X))
        # avg_expr_whole_subtype_2[j].append(np.mean(adata_whole[:, marker].X))

### noi
res = []
for j, marker in enumerate(markers):
    x1 = np.array(avg_expr_noi_subtype_1[j], dtype=float)
    x2 = np.array(avg_expr_noi_subtype_2[j], dtype=float)
    u, p = mannwhitneyu(x1, x2, alternative="two-sided")
    res.append({"marker": marker, "pvalue": p, "U": u, "n1": len(x1), "n2": len(x2)})
res_df_noi = pd.DataFrame(res)
alpha = 0.05
sig_df_noi = res_df_noi[res_df_noi["pvalue"] < alpha]#.sort_values("pvalue")
print(f"Significant markers ({noi}): {sig_df_noi.shape[0]} / {len(markers)}")
sig_markers = sig_df_noi["marker"].tolist()

if len(sig_markers) == 0:
    print("No significant markers under the chosen threshold.")
else:
    n_plot = len(sig_markers)
    if n_plot > 5:
        ncol = 5
    else:
        ncol = n_plot
    nrow = int(np.ceil(n_plot / ncol))

    ### noi
    fig, axes = plt.subplots(nrow, ncol, figsize=(3*ncol, 3.5*nrow))
    axes = np.array(axes).flatten()

    for k, marker in enumerate(sig_markers):
        j = markers.index(marker)
        vals_subtype_1 = avg_expr_noi_subtype_1[j]
        vals_subtype_2 = avg_expr_noi_subtype_2[j]
        ax = axes[k]

        df_plot = pd.DataFrame({
            "expr": np.concatenate([vals_subtype_1, vals_subtype_2]),
            "group": ["Subtype 1"]*len(vals_subtype_1) + ["Subtype 2"]*len(vals_subtype_2),
            # "patient": subtype_1_name + subtype_2_name
        })
        # print(df_plot.sort_values("expr", ascending=False).head(5))

        colors = [subtype_color_dict["Subtype 1"], subtype_color_dict["Subtype 2"]]
        sns.boxplot(data=df_plot, x="group", y="expr", ax=ax, palette=subtype_color_dict, width=0.5, showfliers=False)
        # sns.violinplot(data=df_plot, x="group", y="expr", ax=ax, palette=subtype_color_dict, inner='box', cut=1)
        sns.stripplot(data=df_plot, x="group", y="expr", ax=ax, jitter=True, size=7, alpha=0.7, color="white", edgecolor="black", linewidth=1)

        pv = sig_df_noi.set_index("marker").loc[marker, "pvalue"]

        ax.set_xticklabels(["Subtype 1", "Subtype 2"], fontsize=14)
        ax.tick_params(axis="y", labelsize=14)
        ax.set_title(f"{marker}\np = {pv:.3e}", fontsize=14)
        ax.set_ylabel("Mean Scaled Expression", fontsize=14)
        ax.set_xlabel("")
        ax.spines["right"].set_visible(False)
        ax.spines["top"].set_visible(False)
        ax.grid(False)

    for ax in axes[n_plot:]:
        ax.axis("off")

    plt.suptitle(f"{ctoi[0]} (R3)", fontsize=16)
    plt.tight_layout()
    plt.show()
Significant markers (R3): 1 / 1
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_98_1.png

Survival analysis

[58]:
meta = pd.read_csv(data_dir + 'survival_data.csv', header=0, index_col=None)
meta.head()
[58]:
InternalId Survival_days Censored Event DONOR_NO
0 1 2612 0 1 30824
1 2 745 0 1 30805
2 3 3130 1 0 30812
3 4 2523 1 0 30838
4 5 1683 1 0 30865

Using the proportion of niche R0 as an indicator to classify slices into two subgroups

[60]:
from lifelines import CoxPHFitter
from lifelines import KaplanMeierFitter
from lifelines.statistics import logrank_test

cold_group = [15, 19, 22, 24, 25, 26]  # 6
mixed_group = [1, 2, 7, 8, 11, 12, 13, 14, 17, 18, 20, 21, 23, 27, 29, 31, 33, 38, 39]  # 19
compartmentalized_group = [3, 4, 5, 6, 9, 10, 16, 28, 32, 34, 35, 36, 37, 40, 41]  #15

roi = 'R0'
thres_list = [0, 0.001, 0.003, 0.005, 0.01, 0.03,]
# thres = 0.03

fig, axes = plt.subplots(2, 3, figsize=(20, 10))
axes = axes.flatten()

for idx, thres in enumerate(thres_list):
    rows = []
    for i in range(1, 42):
        if i == 30: continue
        row = meta[meta['InternalId'] == i]
        if row.empty: continue
        survival = row['Survival_days'].iloc[0]
        event = row['Event'].iloc[0]

        if i in cold_group:
            continue
            # adata = adata_concat_new[adata_concat_new.obs['slice_name'] == f'p{i}', :].copy()
        else:
            adata = cond_concat_new[cond_concat_new.obs['slice_name'] == f'p{i}', :].copy()
        roi_ratio = (adata.obs['csn_label'] == roi).mean()

        rows.append({
            'id': i,
            'duration': survival/365,
            'event': event,
            'prop': roi_ratio
        })
        # print(survival, roi_ratio)

    df = pd.DataFrame(rows)
    df['group'] = (df['prop'] > thres)

    cph = CoxPHFitter()
    cph.fit(df, duration_col='duration', event_col='event', formula='group')
    hr = cph.hazard_ratios_['group']
    cox_pval = cph.summary.loc['group', 'p']

    group_high = df[df['group'] == 1]
    group_low = df[df['group'] == 0]

    # print(f"Threshold: {thres}")
    # print("High group IDs:", df[df['group'] == 1]['id'].tolist())
    # print("Low group IDs:", df[df['group'] == 0]['id'].tolist())

    kmf = KaplanMeierFitter()
    results = logrank_test(group_high['duration'], group_low['duration'],
                        event_observed_A=group_high['event'],
                        event_observed_B=group_low['event'])
    logrank_pval = results.p_value

    ax = axes[idx]
    kmf.fit(group_high['duration'], group_high['event'], label=f"high proportion n={df['group'].sum()}")
    kmf.plot_survival_function(ax=ax, linewidth=2.5, ci_show=False, show_censors=True,
                               censor_styles={'marker': '+', 'ms': 12, 'mew': 2,})

    kmf.fit(group_low['duration'], group_low['event'], label=f"low proportion n={df.shape[0] - df['group'].sum()}")
    kmf.plot_survival_function(ax=ax, linewidth=2.5, ci_show=False, show_censors=True,
                               censor_styles={'marker': '+', 'ms': 12, 'mew': 2,})

    ax.text(x=0.7, y=0.30, s=f'HR={hr:.2f}', fontsize=14, transform=ax.transAxes)
    ax.text(x=0.7, y=0.25, s=f'Cox p={cox_pval:.3f}', fontsize=14, transform=ax.transAxes)
    ax.text(x=0.7, y=0.20, s=f'Log-rank p={logrank_pval:.3f}', fontsize=14, transform=ax.transAxes)
    ax.set_title(f'Kaplan-Meier Curve (threshold={thres})', fontsize=18)
    ax.set_xlabel('Years', fontsize=18)
    ax.set_ylabel('Survival', fontsize=18)
    ax.legend(loc='lower left', fontsize=16)

    ax.grid(False)
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)

plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_102_0.png

Using the proportion of macrophage as an indicator to classify slices into two subgroups

[61]:
ctoi = 'Macrophages'
thres_list = [0.06, 0.07, 0.075, 0.078, 0.08, 0.117]

fig, axes = plt.subplots(2, 3, figsize=(20, 10))
axes = axes.flatten()

for idx, thres in enumerate(thres_list):
    rows = []
    for i in range(1, 42):
        if i == 30: continue
        row = meta[meta['InternalId'] == i]
        if row.empty: continue
        survival = row['Survival_days'].iloc[0]
        event = row['Event'].iloc[0]

        if i in cold_group:
            continue
            # adata = adata_concat_new[adata_concat_new.obs['slice_name'] == f'p{i}', :].copy()
        else:
            adata = cond_concat_new[cond_concat_new.obs['slice_name'] == f'p{i}', :].copy()
        ctoi_ratio = (adata.obs['all_group_name'] == ctoi).mean()

        rows.append({
            'id': i,
            'duration': survival/365,
            'event': event,
            'prop': ctoi_ratio
        })
        # print(survival, ctoi_ratio)

    df = pd.DataFrame(rows)
    df['group'] = (df['prop'] > thres)

    cph = CoxPHFitter()
    cph.fit(df, duration_col='duration', event_col='event', formula='group')
    hr = cph.hazard_ratios_['group']
    cox_pval = cph.summary.loc['group', 'p']

    group_high = df[df['group'] == 1]
    group_low = df[df['group'] == 0]

    kmf = KaplanMeierFitter()
    results = logrank_test(group_high['duration'], group_low['duration'],
                        event_observed_A=group_high['event'],
                        event_observed_B=group_low['event'])
    logrank_pval = results.p_value

    ax = axes[idx]
    kmf.fit(group_high['duration'], group_high['event'], label=f"high proportion n={df['group'].sum()}")
    kmf.plot_survival_function(ax=ax, linewidth=2.5, ci_show=False, show_censors=True,
                               censor_styles={'marker': '+', 'ms': 12, 'mew': 2,})

    kmf.fit(group_low['duration'], group_low['event'], label=f"low proportion n={df.shape[0] - df['group'].sum()}")
    kmf.plot_survival_function(ax=ax, linewidth=2.5, ci_show=False, show_censors=True,
                               censor_styles={'marker': '+', 'ms': 12, 'mew': 2,})

    ax.text(x=0.70, y=0.40, s=f'HR={hr:.2f}', fontsize=14, transform=ax.transAxes)
    ax.text(x=0.70, y=0.35, s=f'Cox p={cox_pval:.3f}', fontsize=14, transform=ax.transAxes)
    ax.text(x=0.70, y=0.30, s=f'Log-rank p={logrank_pval:.3f}', fontsize=14, transform=ax.transAxes)
    ax.set_title(f'Kaplan-Meier Curve', fontsize=18)
    ax.set_xlabel('Years', fontsize=18)
    ax.set_ylabel('Survival', fontsize=18)
    ax.legend(loc='lower left', fontsize=16)

    ax.grid(False)
    ax.spines['right'].set_visible(False)
    ax.spines['top'].set_visible(False)

plt.tight_layout()
plt.show()
../../_images/Case-control_studies_2.2_MIBI-TOF_TNBC_Keren2018_runHarmonics_104_0.png