Run Harmonics on MERFISH FCandS dataset
Need additional packages: scanpy seaborn networkx
Load the packages
[ ]:
%reload_ext autoreload
%autoreload 2
import os
import time
import scanpy as sc
import pandas as pd
import numpy as np
import anndata as ad
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from Harmonics import *
import warnings
warnings.filterwarnings("ignore")
sc.settings.verbosity = 0
sc.settings.set_figure_params(dpi=50, dpi_save=500)
from matplotlib import rcParams
rcParams["figure.dpi"] = 50
rcParams["savefig.dpi"] = 500
rcParams['pdf.fonttype'] = 42
rcParams['svg.fonttype'] = 'none'
rcParams['ps.fonttype'] = 42
# rcParams['font.family'] = 'Arial'
rcParams['savefig.transparent'] = True
[2]:
data_dir = '../../../Data/Spatial/Transcriptomics/MERFISH_FCandS_Allen2023/processed/'
save_dir = f'../../results/MERFISH_FCandS_Allen2023/Harmonics/'
if not os.path.exists(save_dir):
os.makedirs(save_dir)
Define a function to match the niche label to ground truth annotation and a function to change p values to corresponding star representation, used to show the results of additional tests implemented in Harmonics.
[3]:
import numpy as np
import pandas as pd
import networkx as nx
def match_cluster_labels(true_labels, est_labels):
true_labels_arr = np.array(list(true_labels))
est_labels_arr = np.array(list(est_labels))
org_cat = list(np.sort(list(pd.unique(true_labels))))
est_cat = list(np.sort(list(pd.unique(est_labels))))
B = nx.Graph()
B.add_nodes_from([i + 1 for i in range(len(org_cat))], bipartite=0)
B.add_nodes_from([-j - 1 for j in range(len(est_cat))], bipartite=1)
for i in range(len(org_cat)):
for j in range(len(est_cat)):
weight = np.sum((true_labels_arr == org_cat[i]) * (est_labels_arr == est_cat[j]))
B.add_edge(i + 1, -j - 1, weight=-weight)
match = nx.algorithms.bipartite.matching.minimum_weight_full_matching(B)
if len(org_cat) >= len(est_cat):
return np.array([match[-est_cat.index(c) - 1] - 1 for c in est_labels_arr])
else:
unmatched = [c for c in est_cat if not (-est_cat.index(c) - 1) in match.keys()]
l = []
for c in est_labels_arr:
if (-est_cat.index(c) - 1) in match:
l.append(match[-est_cat.index(c) - 1] - 1)
else:
l.append(len(org_cat) + unmatched.index(c))
return np.array(l)
def p2stars(p):
if p < 0.001:
return '***'
elif p < 0.01:
return '**'
elif p < 0.05:
return '*'
else:
return ''
Load dataset
Load samples from all developmental stages
[4]:
age = 'all'
slice_name_list = np.loadtxt(data_dir + f"slice_name_list_{age}.txt", dtype=str, delimiter=" ").tolist()
adata_list = []
for slice_name in slice_name_list:
adata = ad.read_h5ad(data_dir + slice_name + '.h5ad')
adata_list.append(adata)
Run model
Instantiate Harmonics
[5]:
iter = 0
model = Harmonics_Model(adata_list,
slice_name_list,
cond_list=None, # default
cond_name_list=None, # default
concat_label='slice_name', # default
proportion_label=None, # default
seed=1234+iter, # default
parallel=True, # default
verbose=True, # default
)
Dataset comprises 31 slices, 378918 cells/spots in total.
Preprocess the data (Generating the connection graph and calculating neighborhood cell type destribution for cells)
[6]:
model.preprocess(ct_key='celltype_43',
spatial_key='spatial', # default
method='joint', # default
n_step=3, # default
n_neighbors=20, # default
cut_percentage=99, # default
)
Generating Delaunay neighbor graph...
100%|██████████| 31/31 [00:03<00:00, 9.38it/s]
All done!
Performing graph completion...
100%|██████████| 31/31 [00:31<00:00, 1.02s/it]
All done!
The cell types of interest are:
Astro-1
Astro-2
Endo-1
Endo-2
Endo-3
Epen
ExN-L2/3-1
ExN-L2/3-2
ExN-L5-1
ExN-L5-2
ExN-L5-3
ExN-L6-1
ExN-L6-2
ExN-L6-3
ExN-Olf
InN-Calb2-1
InN-Calb2-2
InN-Chat
InN-Lamp5
InN-Lhx6
InN-Olf-1
InN-Olf-2
InN-Pvalb-1
InN-Pvalb-2
InN-Pvalb-3
InN-Sst-1
InN-Sst-2
InN-Vip
MSN-D1-1
MSN-D1-2
MSN-D2
Macro
Micro-1
Micro-2
Micro-3
OPC
Olig-1
Olig-2
Olig-3
Peri-1
Peri-2
T cell
Vlmc
Generating one-hot matrix...
100%|██████████| 31/31 [00:00<00:00, 63.59it/s]
All done!
Dataset comprises 43 cell types.
Calculating cell type distribution for microenvironments...
Microenvironments comprise 41.35 cells/spots on average.
Minimum: 20, Maximum: 78
Perform overclustered initialization on the cell type distributions of cell neighborhoods. Resulting in Qmax niches. The distributions of niches are also computed.
[7]:
model.initialize_clusters(dim_reduction=True, # default
explained_var=None, # default
n_components=None, # default
n_components_max=100, # default
standardize=True, # default
method='kmeans', # default
Qmax=20, # default
)
Performing dimension reduction...
Returning 43 principal components.
Initializing niches...
20 initial niches defined.
Perform hierarchical distribution matching to reduce the niche number to no less than Qmin. This step results in niche assignment under a sequence of different niche numbers (usually from Qmax to Qmin).
[ ]:
model.hier_dist_match(assign_metric='jsd', # default
weighted_merge=True, # default
max_iters=100, # default
tol=1e-4, # default
test_kmeans=False, # default
Qmin=2, # default
)
Starting from 20 cell niches...
Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
100%|██████████| 100/100 [01:03<00:00, 1.56it/s]
Unconverged at iteration 100!
20 cell niches left.
Merging cell niche 11 and cell niche 18...
Done!
Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19]
15%|█▌ | 15/100 [00:09<00:52, 1.61it/s]
Distribution of cell niches (centers) converge at iteration 16.
19 cell niches left.
Merging cell niche 8 and cell niche 6...
Done!
Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 19]
40%|████ | 40/100 [00:23<00:35, 1.68it/s]
Distribution of cell niches (centers) converge at iteration 41.
18 cell niches left.
Merging cell niche 13 and cell niche 17...
Done!
Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 19]
14%|█▍ | 14/100 [00:08<00:50, 1.70it/s]
Distribution of cell niches (centers) converge at iteration 15.
17 cell niches left.
Merging cell niche 4 and cell niche 8...
Done!
Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 4, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 19]
28%|██▊ | 28/100 [00:15<00:40, 1.77it/s]
Distribution of cell niches (centers) converge at iteration 29.
16 cell niches left.
Merging cell niche 15 and cell niche 4...
Done!
Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 5, 7, 9, 10, 11, 12, 13, 14, 15, 16, 19]
10%|█ | 10/100 [00:05<00:50, 1.78it/s]
Distribution of cell niches (centers) converge at iteration 11.
15 cell niches left.
Merging cell niche 0 and cell niche 15...
Done!
Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 5, 7, 9, 10, 11, 12, 13, 14, 16, 19]
7%|▋ | 7/100 [00:03<00:52, 1.76it/s]
Distribution of cell niches (centers) converge at iteration 8.
14 cell niches left.
Merging cell niche 5 and cell niche 14...
Done!
Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 5, 7, 9, 10, 11, 12, 13, 16, 19]
16%|█▌ | 16/100 [00:08<00:42, 1.96it/s]
Distribution of cell niches (centers) converge at iteration 17.
13 cell niches left.
Merging cell niche 13 and cell niche 9...
Done!
Assigning cells to cell niche...
Current state: [0, 1, 2, 3, 5, 7, 10, 11, 12, 13, 16, 19]
41%|████ | 41/100 [00:19<00:27, 2.13it/s]
Distribution of cell niches (centers) converge at iteration 42.
12 cell niches left.
Merging cell niche 16 and cell niche 0...
Done!
Assigning cells to cell niche...
Current state: [1, 2, 3, 5, 7, 10, 11, 12, 13, 16, 19]
19%|█▉ | 19/100 [00:08<00:38, 2.13it/s]
Distribution of cell niches (centers) converge at iteration 20.
11 cell niches left.
Merging cell niche 11 and cell niche 5...
Done!
Assigning cells to cell niche...
Current state: [1, 2, 3, 7, 10, 11, 12, 13, 16, 19]
4%|▍ | 4/100 [00:02<00:50, 1.89it/s]
Distribution of cell niches (centers) converge at iteration 5.
10 cell niches left.
Merging cell niche 16 and cell niche 13...
Done!
Assigning cells to cell niche...
Current state: [1, 2, 3, 7, 10, 11, 12, 16, 19]
11%|█ | 11/100 [00:04<00:39, 2.25it/s]
Distribution of cell niches (centers) converge at iteration 12.
9 cell niches left.
Merging cell niche 16 and cell niche 1...
Done!
Assigning cells to cell niche...
Current state: [2, 3, 7, 10, 11, 12, 16, 19]
30%|███ | 30/100 [00:16<00:38, 1.81it/s]
Distribution of cell niches (centers) converge at iteration 31.
8 cell niches left.
Merging cell niche 16 and cell niche 2...
Done!
Assigning cells to cell niche...
Current state: [3, 7, 10, 11, 12, 16, 19]
24%|██▍ | 24/100 [00:12<00:41, 1.85it/s]
Distribution of cell niches (centers) converge at iteration 25.
7 cell niches left.
Merging cell niche 16 and cell niche 12...
Done!
Assigning cells to cell niche...
Current state: [3, 7, 10, 11, 16, 19]
8%|▊ | 8/100 [00:04<00:53, 1.73it/s]
Distribution of cell niches (centers) converge at iteration 9.
6 cell niches left.
Merging cell niche 16 and cell niche 10...
Done!
Assigning cells to cell niche...
Current state: [3, 7, 11, 16, 19]
7%|▋ | 7/100 [00:03<00:50, 1.83it/s]
Distribution of cell niches (centers) converge at iteration 8.
5 cell niches left.
Merging cell niche 19 and cell niche 16...
Done!
Assigning cells to cell niche...
Current state: [3, 7, 11, 19]
8%|▊ | 8/100 [00:03<00:37, 2.43it/s]
Distribution of cell niches (centers) converge at iteration 9.
4 cell niches left.
Merging cell niche 19 and cell niche 3...
Done!
Assigning cells to cell niche...
Current state: [7, 11, 19]
21%|██ | 21/100 [00:08<00:33, 2.35it/s]
Distribution of cell niches (centers) converge at iteration 22.
3 cell niches left.
Merging cell niche 7 and cell niche 19...
Done!
Assigning cells to cell niche...
Current state: [7, 11]
3%|▎ | 3/100 [00:01<00:47, 2.03it/s]
Distribution of cell niches (centers) converge at iteration 4.
2 cell niches left.
Niche count no more than 2.
Finished!
Metric-guided solution selection
[9]:
adata_list, adata_concat = model.select_solution(n_niche=None,
niche_key='niche_label', # default
auto=True,
metric='jsd_v2', # default
threshold=0.1, # default
return_adata=True, # default
plot=True, # default
save=False, # default
fig_size=(9, 5), # default
save_dir=save_dir,
file_name=f'score_vs_nichecount_basic_{age}_{iter}.pdf',
)
Automatically selecting best solution...
100%|██████████| 100/100 [00:01<00:00, 59.89it/s]
100%|██████████| 100/100 [00:01<00:00, 64.17it/s]
100%|██████████| 100/100 [00:01<00:00, 65.23it/s]
100%|██████████| 100/100 [00:01<00:00, 67.11it/s]
100%|██████████| 100/100 [00:01<00:00, 66.85it/s]
100%|██████████| 100/100 [00:01<00:00, 67.43it/s]
100%|██████████| 100/100 [00:01<00:00, 67.03it/s]
100%|██████████| 100/100 [00:01<00:00, 69.23it/s]
100%|██████████| 100/100 [00:01<00:00, 70.10it/s]
100%|██████████| 100/100 [00:01<00:00, 63.03it/s]
100%|██████████| 100/100 [00:01<00:00, 67.16it/s]
100%|██████████| 100/100 [00:01<00:00, 72.92it/s]
100%|██████████| 100/100 [00:01<00:00, 73.29it/s]
100%|██████████| 100/100 [00:01<00:00, 66.40it/s]
100%|██████████| 100/100 [00:01<00:00, 66.40it/s]
100%|██████████| 100/100 [00:01<00:00, 71.35it/s]
100%|██████████| 100/100 [00:01<00:00, 62.07it/s]
100%|██████████| 100/100 [00:01<00:00, 57.72it/s]
100%|██████████| 100/100 [00:02<00:00, 45.81it/s]
Suggested range of niche count is [11, 11].
Recommended number of niches are [11]
Selecting 11 niches as the best solution.
Done!
Save and reload the results
[ ]:
adata_concat.write_h5ad(save_dir + f'Harmonics_result_{age}_{iter}.h5ad')
[ ]:
adata_concat = ad.read_h5ad(save_dir + f'Harmonics_result_{age}_{iter}.h5ad')
adata_concat_new = adata_concat.copy()
for i, slice_name in enumerate(slice_name_list):
adata = adata_concat[adata_concat.obs['slice_name'] == slice_name, :].copy()
adata_list[i] = adata
# adata_concat_new
Plot the results
[8]:
domains = sorted(set(adata_concat_new.obs[f'domain_annotation']))
domain_color_dict = {f'{domains[i]}': sns.color_palette()[i] for i in range(len(domains))}
niches = sorted(set(adata_concat_new.obs['niche_label']))
niche_color_dict = {str(k): sns.color_palette()[k] for k in range(len(niches))}
niche_color_dict['1'], niche_color_dict['8'] = niche_color_dict['8'], niche_color_dict['1']
celltypes = ['Astro-1', 'Astro-2', 'Endo-1', 'Endo-2', 'Endo-3', 'Epen', 'ExN-L2/3-1', 'ExN-L2/3-2', 'ExN-L5-1', 'ExN-L5-2',
'ExN-L5-3', 'ExN-L6-1', 'ExN-L6-2', 'ExN-L6-3', 'ExN-Olf', 'InN-Calb2-1', 'InN-Calb2-2', 'InN-Chat', 'InN-Lamp5',
'InN-Lhx6', 'InN-Olf-1', 'InN-Olf-2', 'InN-Pvalb-1', 'InN-Pvalb-2', 'InN-Pvalb-3', 'InN-Sst-1', 'InN-Sst-2', 'InN-Vip',
'MSN-D1-1', 'MSN-D1-2', 'MSN-D2', 'Macro', 'Micro-1', 'Micro-2', 'Micro-3', 'OPC', 'Olig-1', 'Olig-2', 'Olig-3', 'Peri-1',
'Peri-2', 'T cell', 'Vlmc']
ct_colors = ['#ffff00', '#1ce6ff', '#ff34ff', '#ff4a46', '#008941', '#006fa6', '#a30059', '#ffdbe5', '#7a4900', '#0000a6', '#63ffac',
'#b79762', '#004d43', '#8fb0ff', '#997d87', '#5a0007', '#809693', '#6a3a4c', '#1b4400', '#4a3b53', '#3b5dff', '#4fc601',
'#ff2f80', '#61615a', '#ba0900', '#6b7900', '#00c2a0', '#ffaa92', '#ff90c9', '#b903aa', '#d16100', '#ddefff', '#000035',
'#7b4f4b', '#a1c299', '#ffb500', '#0aa6d8', '#c2ffed', '#00846f', "#E6C697", "#D00C6E", "#699191", '#a079bf']
ct_color_dict = {celltypes[k]: ct_colors[k] for k in range(len(celltypes))}
[9]:
matched_clusters = match_cluster_labels(adata_concat_new.obs['domain_annotation'], adata_concat_new.obs[f'niche_label'])
matched_labels = [domains[idx] if idx < len(domains) else 'Unmatched' for idx in matched_clusters]
adata_concat_new.obs[f'matched_cluster'] = [str(label) for label in matched_clusters]
adata_concat_new.obs[f'matched_label'] = matched_labels
start = 0
for i in range(len(adata_list)):
end = start + adata_list[i].shape[0]
adata = adata_concat_new[start:end].copy()
adata_list[i].obs['matched_cluster'] = adata_concat_new[start:end].obs['matched_cluster'].copy()
adata_list[i].obs['matched_label'] = adata_concat_new[start:end].obs['matched_label'].copy()
adata_list[i].obs['niche_label'] = adata_concat_new[start:end].obs['niche_label'].copy()
start = end
[10]:
for i in range(len(adata_list)):
print(slice_name_list[i])
adata = adata_list[i].copy()
fig, axes = plt.subplots(1, 4, figsize=(25, 5))
sc.pl.embedding(adata, basis='spatial', palette=domain_color_dict, color='domain_annotation',
ax=axes[0], s=25, show=False, frameon=False, title='Domain Annotation', legend_fontsize=16)
sc.pl.embedding(adata, basis='spatial', palette=niche_color_dict, color='matched_cluster',
ax=axes[1], s=25, show=False, frameon=False, title='Cell Niche (matched)', legend_fontsize=16)
sc.pl.embedding(adata, basis='spatial', palette=niche_color_dict, color='niche_label',
ax=axes[2], s=25, show=False, frameon=False, title='Cell Niche', legend_fontsize=16)
sc.pl.embedding(adata, basis='spatial', palette=ct_color_dict, color='celltype_43',
ax=axes[3], s=25, show=False, frameon=False, title='Cell Type', legend_loc=None)
plt.tight_layout()
plt.show()
Donor1_Slice0_4wk
Donor4_Slice0_4wk
Donor4_Slice1_4wk
Donor4_Slice2_4wk
Donor7_Slice0_4wk
Donor7_Slice1_4wk
Donor7_Slice2_4wk
Donor8_Slice0_4wk
Donor8_Slice1_4wk
Donor8_Slice2_4wk
Donor10_Slice0_24wk
Donor10_Slice1_24wk
Donor10_Slice2_24wk
Donor11_Slice0_24wk
Donor11_Slice1_24wk
Donor11_Slice2_24wk
Donor12_Slice0_24wk
Donor12_Slice1_24wk
Donor2_Slice0_90wk
Donor2_Slice1_90wk
Donor3_Slice0_90wk
Donor3_Slice1_90wk
Donor5_Slice0_90wk
Donor5_Slice1_90wk
Donor5_Slice2_90wk
Donor6_Slice0_90wk
Donor6_Slice1_90wk
Donor6_Slice2_90wk
Donor9_Slice0_90wk
Donor9_Slice1_90wk
Donor9_Slice2_90wk
[11]:
mapping_df = pd.DataFrame({"matched_cluster": adata_concat_new.obs["matched_cluster"].values, "niche_label": adata_concat_new.obs["niche_label"].values})
mapping = mapping_df.groupby("matched_cluster")["niche_label"].agg(lambda x: x.mode().iloc[0]).to_dict()
mapping
[11]:
{'0': '6',
'1': '4',
'10': '5',
'2': '2',
'3': '8',
'4': '9',
'5': '7',
'6': '10',
'7': '3',
'8': '0',
'9': '1'}
[12]:
perm = np.asarray([mapping[i] for i in adata_concat_new.uns['niche_label_summary']], dtype=int)
niche_labels = adata_concat_new.uns['niche_label_summary'].copy()
ct_labels = sorted(set(adata_concat_new.obs['celltype_43']))
niche_dist = adata_concat_new.uns['niche_dist'].toarray()[perm].copy()
cell_count_niche = adata_concat_new.uns['niche_cell_count'][perm].copy()
Cell type composition
[13]:
# niche_labels = adata_concat_new.uns['niche_label_summary'].copy()
# ct_labels = sorted(set(adata_concat_new.obs['celltype_43']))
# niche_dist = adata_concat_new.uns['niche_dist'].toarray()[perm].copy()
# cell_count_niche = adata_concat_new.uns['niche_cell_count'][perm].copy()
fig, ax = plt.subplots(figsize=(14, 8))
bar_width = 0.7
n_niches, n_cell_types = niche_dist.shape
x = np.arange(n_niches)
for j in range(n_cell_types):
bottom = np.sum(niche_dist[:, :j], axis=1)
ax.bar(x,
niche_dist[:, j],
bottom=bottom,
width=bar_width,
color=ct_color_dict[ct_labels[j]],
label=ct_labels[j])
ax.set_ylabel('Proportion', fontsize=20)
ax.set_xlabel('Niche', fontsize=20)
ax.set_xticks(x)
ax.set_xticklabels(niche_labels, rotation=0, ha='center')
ax.tick_params(axis='x', labelsize=20)
ax.tick_params(axis='y', labelsize=20)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.grid(False)
handles = [
mpatches.Patch(color=color, label=ct)
for ct, color in zip(celltypes, ct_colors)
]
ax.legend(handles=handles, title='Cell Types', loc=(1.05, 0.0), frameon=False, handleheight=0.8,
handlelength=0.7, ncol=3, fontsize=14, title_fontsize=14)
plt.title('Cell Type Proportions in Different Cell Niches', fontsize=20)
plt.tight_layout()
plt.show()