From sciagent-skills
Queries ARCHS4 REST API for RNA-seq expression z-scores, tissue patterns, co-expressed genes, sample metadata, and HDF5 matrices across 1M+ human/mouse samples.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
ARCHS4 (All RNA-seq and ChIP-seq Sample and Signature Search) is a resource of uniformly aligned and processed human and mouse RNA-seq data from NCBI GEO and SRA, covering 1 million+ samples. The REST API at `https://maayanlab.cloud/archs4/api/` provides gene-level expression profiles, z-score normalized tissue expression, co-expression networks, and sample metadata search — all without authent...
Accesses NCBI GEO to search/download gene expression datasets (microarray/RNA-seq GSE/GSM/GPL), retrieve SOFT/Matrix files for transcriptomics and genomics analysis workflows.
Accesses NCBI GEO via GEOparse and E-utilities. Searches datasets by keyword/organism/platform, downloads GSE matrices, parses GPL annotations/GSM metadata, loads expression data into pandas.
Accesses NCBI GEO to search and download microarray/RNA-seq gene expression datasets (GSE, GSM, GPL). Retrieves SOFT/Matrix files for transcriptomics analysis.
Share bugs, ideas, or general feedback.
ARCHS4 (All RNA-seq and ChIP-seq Sample and Signature Search) is a resource of uniformly aligned and processed human and mouse RNA-seq data from NCBI GEO and SRA, covering 1 million+ samples. The REST API at https://maayanlab.cloud/archs4/api/ provides gene-level expression profiles, z-score normalized tissue expression, co-expression networks, and sample metadata search — all without authentication. Large-scale bulk queries can also use the downloadable HDF5 expression matrices.
gnomad-database; ARCHS4 provides expression evidence onlygget-genomic-databases (gget enrichr); ARCHS4 is for expression lookupsrequests, pandas, matplotlib, seabornTP53, BRCA1); sample GEO/SRA IDs for direct sample queriestime.sleep(0.1) between sequential gene queries to avoid throttlingpip install requests pandas matplotlib seaborn
import requests
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def archs4_get(endpoint: str, params: dict = None) -> dict:
"""Send a GET request to the ARCHS4 API and return parsed JSON."""
r = requests.get(f"{ARCHS4_BASE}/{endpoint}", params=params, timeout=30)
r.raise_for_status()
return r.json()
# Quick check: top tissues expressing TP53
data = archs4_get("meta/genes/TP53/zscore")
tissues = data.get("values", [])
print(f"TP53 tissue expression entries: {len(tissues)}")
top5 = sorted(tissues, key=lambda x: x.get("zscore", 0), reverse=True)[:5]
for t in top5:
print(f" {t['tissue']:<40} z={t['zscore']:.2f}")
# TP53 tissue expression entries: 200
# thymus z=2.81
# testis z=2.44
Retrieve z-score normalized expression for a gene across all available tissue types. Z-scores are computed per-sample relative to the population distribution; positive values indicate above-average expression.
import requests
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def get_gene_tissue_zscore(gene_symbol: str, species: str = "human") -> pd.DataFrame:
"""Return tissue z-score expression profile for a gene.
Parameters
----------
gene_symbol : str
HGNC gene symbol (e.g., 'TP53').
species : str
'human' or 'mouse' (default: 'human').
"""
endpoint = f"meta/genes/{gene_symbol}/zscore"
r = requests.get(
f"{ARCHS4_BASE}/{endpoint}",
params={"species": species},
timeout=30
)
r.raise_for_status()
data = r.json()
records = data.get("values", [])
df = pd.DataFrame(records)
return df.sort_values("zscore", ascending=False).reset_index(drop=True)
df = get_gene_tissue_zscore("MYC")
print(f"MYC tissue z-scores: {len(df)} tissue types")
print(df[["tissue", "zscore"]].head(10).to_string(index=False))
# MYC tissue z-scores: 200
# tissue zscore
# colon 3.12
# small intestine 2.98
# placenta 2.74
# Query mouse tissues for a gene
df_mouse = get_gene_tissue_zscore("Myc", species="mouse")
print(f"Mouse Myc: top 5 tissues")
print(df_mouse[["tissue", "zscore"]].head(5).to_string(index=False))
Find genes whose expression is most correlated with a query gene across all ARCHS4 samples. Useful for identifying pathway partners, regulators, or candidate targets.
import requests
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def get_coexpressed_genes(gene_symbol: str, top_n: int = 50,
species: str = "human") -> pd.DataFrame:
"""Return genes co-expressed with the query gene.
Parameters
----------
gene_symbol : str
HGNC gene symbol.
top_n : int
Number of correlated genes to return (default: 50).
species : str
'human' or 'mouse' (default: 'human').
"""
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene_symbol}/correlations",
params={"species": species, "limit": top_n},
timeout=30
)
r.raise_for_status()
data = r.json()
records = data.get("values", [])
df = pd.DataFrame(records)
return df.sort_values("correlation", ascending=False).reset_index(drop=True)
coexp = get_coexpressed_genes("PCNA", top_n=20)
print(f"Top co-expressed genes with PCNA (n={len(coexp)}):")
print(coexp[["gene", "correlation"]].head(10).to_string(index=False))
# Top co-expressed genes with PCNA (n=20):
# gene correlation
# RFC4 0.91
# RFC2 0.89
# MCM6 0.87
# Extract gene list for downstream enrichment
gene_list = coexp["gene"].tolist()
print(f"Co-expression gene list: {gene_list[:10]}")
# Pass gene_list to Enrichr or pathway analysis tools
Search for RNA-seq samples by metadata keyword (tissue, disease condition, cell type, treatment). Returns GEO/SRA sample identifiers with metadata fields.
import requests
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def search_samples(keyword: str, species: str = "human",
limit: int = 100) -> pd.DataFrame:
"""Search ARCHS4 samples by metadata keyword.
Parameters
----------
keyword : str
Search term (e.g., 'breast cancer', 'liver', 'HeLa').
species : str
'human' or 'mouse'.
limit : int
Maximum number of samples to return.
"""
r = requests.get(
f"{ARCHS4_BASE}/samples/search",
params={"query": keyword, "species": species, "limit": limit},
timeout=30
)
r.raise_for_status()
data = r.json()
records = data.get("samples", [])
return pd.DataFrame(records)
samples = search_samples("pancreatic cancer", limit=50)
print(f"Samples matching 'pancreatic cancer': {len(samples)}")
if len(samples) > 0:
print(samples[["sample_id", "series_id", "title"]].head(5).to_string(index=False))
# Samples matching 'pancreatic cancer': 50
# sample_id series_id title
# GSM2345678 GSE123456 Pancreatic ductal adenocarcinoma - sample 1
Retrieve summary statistics and metadata for a gene including the number of samples expressing it, expression percentile, and available annotation.
import requests
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def get_gene_metadata(gene_symbol: str, species: str = "human") -> dict:
"""Return metadata and expression summary for a gene."""
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene_symbol}",
params={"species": species},
timeout=30
)
r.raise_for_status()
return r.json()
meta = get_gene_metadata("GAPDH")
print(f"Gene: {meta.get('gene_symbol', 'N/A')}")
print(f"Species: {meta.get('species', 'N/A')}")
print(f"Ensembl ID: {meta.get('ensembl_gene_id', 'N/A')}")
print(f"Description: {meta.get('description', 'N/A')[:80]}")
# Compare metadata for a panel of housekeeping genes
import time
housekeeping = ["GAPDH", "ACTB", "B2M", "HPRT1", "RPLP0"]
for gene in housekeeping:
meta = get_gene_metadata(gene)
print(f" {gene:<8} {meta.get('ensembl_gene_id', 'N/A')}")
time.sleep(0.1)
Generate a publication-ready barplot of z-score expression across the top tissues for a gene.
import requests
import pandas as pd
import matplotlib.pyplot as plt
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def plot_tissue_expression(gene_symbol: str, top_n: int = 20,
species: str = "human",
output_file: str = None) -> None:
"""Plot top tissue z-score expression for a gene.
Parameters
----------
gene_symbol : str
HGNC gene symbol.
top_n : int
Number of top tissues to display.
species : str
'human' or 'mouse'.
output_file : str
If provided, save figure to this path.
"""
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene_symbol}/zscore",
params={"species": species},
timeout=30
)
r.raise_for_status()
records = r.json().get("values", [])
df = pd.DataFrame(records).sort_values("zscore", ascending=False).head(top_n)
fig, ax = plt.subplots(figsize=(10, 6))
colors = ["#D73027" if z > 0 else "#4575B4" for z in df["zscore"]]
bars = ax.barh(df["tissue"][::-1], df["zscore"][::-1], color=colors[::-1])
ax.axvline(0, color="black", linewidth=0.8, linestyle="--")
ax.set_xlabel("Expression Z-Score")
ax.set_title(f"ARCHS4 Tissue Expression: {gene_symbol} ({species})\nTop {top_n} tissues")
ax.bar_label(bars, fmt="%.2f", padding=3, fontsize=8)
plt.tight_layout()
fname = output_file or f"{gene_symbol}_tissue_expression.png"
plt.savefig(fname, dpi=150, bbox_inches="tight")
print(f"Saved {fname} ({len(df)} tissues plotted)")
plot_tissue_expression("BRCA1", top_n=15, output_file="BRCA1_tissue_expression.png")
Download or stream from ARCHS4's precomputed HDF5 expression matrices for large-scale cross-sample analysis. The HDF5 files contain gene × sample count matrices for human and mouse.
import requests
# HDF5 files are available for bulk download from the ARCHS4 data portal
# URL pattern: https://maayanlab.cloud/archs4/download#expression
# Human gene-level: human_gene_v2.6.h5
# Mouse gene-level: mouse_gene_v2.6.h5
def get_h5_download_urls() -> dict:
"""Return download URLs for ARCHS4 HDF5 expression matrices."""
base = "https://maayanlab.cloud/archs4"
return {
"human_gene": f"{base}/files/human_gene_v2.6.h5",
"mouse_gene": f"{base}/files/mouse_gene_v2.6.h5",
"human_transcript": f"{base}/files/human_transcript_v2.6.h5",
"mouse_transcript": f"{base}/files/mouse_transcript_v2.6.h5",
}
urls = get_h5_download_urls()
for key, url in urls.items():
print(f" {key:<22} {url}")
# To work with a downloaded HDF5 file:
try:
import h5py
import numpy as np
h5_path = "human_gene_v2.6.h5" # after download
def extract_gene_from_h5(h5_path: str, gene_symbol: str,
n_samples: int = 1000) -> dict:
"""Extract expression values for a gene from the HDF5 matrix."""
with h5py.File(h5_path, "r") as f:
genes = [g.decode() for g in f["meta"]["genes"]["gene_symbol"][:]]
if gene_symbol not in genes:
raise ValueError(f"{gene_symbol} not found in HDF5")
idx = genes.index(gene_symbol)
expr = f["data"]["expression"][idx, :n_samples]
sample_ids = [s.decode() for s in f["meta"]["samples"]["geo_accession"][:n_samples]]
return {"gene": gene_symbol, "expression": expr, "sample_ids": sample_ids}
result = extract_gene_from_h5(h5_path, "TP53", n_samples=500)
print(f"TP53 expression: mean={result['expression'].mean():.2f},"
f" max={result['expression'].max():.2f} (n={len(result['expression'])} samples)")
except ImportError:
print("h5py not installed. Install with: pip install h5py")
except FileNotFoundError:
print("HDF5 file not downloaded yet. Use the URLs above to download first.")
ARCHS4 reports gene expression as z-scores computed relative to all samples for that gene. A z-score of 0 means expression at the population mean; a z-score of 2.0 means expression 2 standard deviations above the mean. Z-scores are more interpretable across datasets than raw counts because they account for library size differences and batch effects introduced by uniform alignment across studies.
# Example: Positive z-score = above-average expression for that gene
# z > 2.0 → top ~2.5% of samples for that gene
# z < -2.0 → bottom ~2.5% of samples for that gene
# Use absolute z-score thresholds consistently when comparing across genes
| Access method | Best for | Limitations |
|---|---|---|
REST API (/zscore, /correlations) | Quick single-gene queries, exploration | Aggregated profiles only, no per-sample access |
REST API (/samples/search) | Discovering relevant datasets | Returns metadata, not expression values |
| HDF5 download | Bulk analysis, custom co-expression, ML | Requires 30–60 GB disk; download once |
ARCHS4 indexes human samples using HGNC gene symbols (uppercase, e.g., TP53) and mouse samples using MGI symbols (first letter uppercase, e.g., Trp53). The species parameter accepts "human" or "mouse". Mixed-case or ensemble IDs will return empty results.
Goal: Compare tissue expression profiles of a gene panel and visualize as a heatmap to identify tissue-specific vs ubiquitous expression patterns.
import requests, time
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
gene_panel = ["MYC", "TP53", "BRCA1", "EGFR", "KRAS", "CDK4"]
top_n_tissues = 25
def get_tissue_zscores(gene: str) -> pd.Series:
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene}/zscore",
params={"species": "human"},
timeout=30
)
r.raise_for_status()
records = r.json().get("values", [])
df = pd.DataFrame(records).set_index("tissue")["zscore"]
return df
# Build expression matrix (genes × tissues)
all_data = {}
for gene in gene_panel:
try:
all_data[gene] = get_tissue_zscores(gene)
print(f" Fetched {gene}")
except Exception as e:
print(f" Warning: {gene} failed — {e}")
time.sleep(0.1)
matrix = pd.DataFrame(all_data).T # genes × tissues
# Select top tissues by max absolute z-score
tissue_importance = matrix.abs().max(axis=0).sort_values(ascending=False)
top_tissues = tissue_importance.head(top_n_tissues).index
matrix_subset = matrix[top_tissues]
# Plot heatmap
fig, ax = plt.subplots(figsize=(14, 5))
sns.heatmap(
matrix_subset,
cmap="RdBu_r",
center=0,
vmin=-3,
vmax=3,
ax=ax,
cbar_kws={"label": "Z-Score"},
linewidths=0.5
)
ax.set_title("ARCHS4 Tissue Expression Profiles — Gene Panel")
ax.set_xlabel("Tissue")
ax.set_ylabel("Gene")
plt.xticks(rotation=45, ha="right", fontsize=8)
plt.tight_layout()
plt.savefig("archs4_panel_heatmap.png", dpi=150, bbox_inches="tight")
print(f"Saved archs4_panel_heatmap.png ({matrix_subset.shape})")
Goal: Start from a seed gene, retrieve co-expressed partners, then query their co-expressed genes in turn to build a two-hop co-expression neighborhood.
import requests, time
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def get_coexp(gene: str, top_n: int = 20, species: str = "human") -> list:
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene}/correlations",
params={"species": species, "limit": top_n},
timeout=30
)
r.raise_for_status()
return [rec["gene"] for rec in r.json().get("values", [])]
seed_gene = "PCNA"
min_correlation = 0.80
# Hop 1: direct co-expressed partners
hop1_genes = get_coexp(seed_gene, top_n=30)
print(f"Hop 1 partners of {seed_gene}: {len(hop1_genes)}")
time.sleep(0.1)
# Hop 2: co-expressed genes of each partner
edges = set()
for gene in hop1_genes[:10]: # limit for demonstration
partners = get_coexp(gene, top_n=20)
for partner in partners:
if partner != seed_gene:
edges.add((gene, partner))
time.sleep(0.1)
# Summarize the network
network_df = pd.DataFrame(list(edges), columns=["source", "target"])
hub_counts = network_df["source"].value_counts()
print(f"\nTwo-hop network: {len(edges)} edges")
print(f"Top hub genes:")
print(hub_counts.head(5))
network_df.to_csv(f"{seed_gene}_coexp_network.csv", index=False)
print(f"\nSaved {seed_gene}_coexp_network.csv")
Goal: Search for samples by disease keyword, summarize how many GEO series are available, and export sample metadata for downstream reanalysis selection.
import requests, time
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def search_and_summarize(keyword: str, species: str = "human",
limit: int = 200) -> pd.DataFrame:
"""Search samples and return a tidy metadata DataFrame."""
r = requests.get(
f"{ARCHS4_BASE}/samples/search",
params={"query": keyword, "species": species, "limit": limit},
timeout=30
)
r.raise_for_status()
records = r.json().get("samples", [])
return pd.DataFrame(records)
keyword = "colorectal cancer"
df = search_and_summarize(keyword, limit=150)
print(f"Samples matching '{keyword}': {len(df)}")
if len(df) > 0:
# Summarize by GEO series
series_counts = df["series_id"].value_counts()
print(f"\nTop GEO series (by sample count):")
print(series_counts.head(8).to_string())
# Export sample list
df.to_csv(f"{keyword.replace(' ', '_')}_samples.csv", index=False)
print(f"\nSaved {keyword.replace(' ', '_')}_samples.csv ({len(df)} samples)")
print(f"Unique GEO series: {df['series_id'].nunique()}")
| Parameter | Endpoint | Default | Range / Options | Effect |
|---|---|---|---|---|
species | All gene endpoints | "human" | "human", "mouse" | Selects the species-specific sample index |
limit | /correlations, /samples/search | 100 | 1–500 | Number of results returned |
gene_symbol (path) | /meta/genes/{gene}/zscore, /correlations | — | HGNC symbol (human) or MGI symbol (mouse) | Query gene; case-sensitive |
query | /samples/search | — | free-text string | Metadata keyword search across title, tissue, source fields |
offset | /samples/search | 0 | integer | Pagination offset for large result sets |
correlation (response field) | /correlations | — | -1.0–1.0 | Pearson correlation coefficient; filter > 0.7 for high co-expression |
zscore (response field) | /zscore | — | continuous float | Expression z-score; > 2.0 = high expression |
page_size (HDF5) | HDF5 slice | all | any integer | Number of samples to extract per read from HDF5 |
Use z-score thresholds consistently: Because z-scores are gene-specific, a z-score of 2.0 for a ubiquitous gene (GAPDH) and a tissue-restricted gene (TTR, liver) have different interpretive meaning. Always annotate which gene you are comparing and the tissue background.
Sleep between batch queries: ARCHS4 enforces a soft rate limit of ~10 requests/second. Add time.sleep(0.1) between sequential gene queries to avoid 429 Too Many Requests errors.
Download HDF5 for large-scale analyses: For queries covering 50+ genes or requiring per-sample expression values, the REST API is impractical. Download the HDF5 file once and use h5py slicing for fast matrix access; this avoids hitting rate limits and is 100× faster for bulk extraction.
Match gene symbol conventions by species: Human queries require HGNC uppercase symbols (e.g., TP53); mouse queries require MGI-style symbols (e.g., Trp53). Using the wrong case returns empty results without an error.
Validate co-expression findings across datasets: ARCHS4 co-expression aggregates across all tissue types. A high correlation may be driven by a single tissue or study. Cross-check with tissue-specific queries or manually inspect the top contributing GEO series.
When to use: Rapidly determine whether a gene is broadly expressed (housekeeping) or tissue-restricted before designing experiments.
import requests
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def tissue_specificity_summary(gene_symbol: str) -> None:
"""Print a summary of high and low expression tissues for a gene."""
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene_symbol}/zscore",
params={"species": "human"},
timeout=30
)
r.raise_for_status()
records = r.json().get("values", [])
zscores = [rec["zscore"] for rec in records if rec.get("zscore") is not None]
top_high = sorted(records, key=lambda x: x.get("zscore", 0), reverse=True)[:5]
top_low = sorted(records, key=lambda x: x.get("zscore", float("inf")))[:3]
print(f"\n{gene_symbol} — {len(zscores)} tissues")
print(f" Range: [{min(zscores):.2f}, {max(zscores):.2f}] "
f"Mean: {sum(zscores)/len(zscores):.2f}")
print(" High expression:")
for t in top_high:
print(f" {t['tissue']:<35} z={t['zscore']:.2f}")
print(" Low expression:")
for t in top_low:
print(f" {t['tissue']:<35} z={t['zscore']:.2f}")
tissue_specificity_summary("TTR") # Transthyretin — liver-specific
When to use: Generate a pairwise correlation table for a gene panel from a list of differentially expressed genes.
import requests, time
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
def batch_coexpr_table(gene_list: list, top_n: int = 10) -> pd.DataFrame:
"""For each gene in gene_list, return its top co-expressed genes."""
rows = []
for gene in gene_list:
try:
r = requests.get(
f"{ARCHS4_BASE}/meta/genes/{gene}/correlations",
params={"species": "human", "limit": top_n},
timeout=30
)
r.raise_for_status()
for rec in r.json().get("values", []):
rows.append({
"query_gene": gene,
"coexp_gene": rec.get("gene"),
"correlation": rec.get("correlation"),
})
time.sleep(0.1)
except Exception as e:
print(f"Warning: {gene} skipped — {e}")
return pd.DataFrame(rows)
deg_list = ["MYC", "CCND1", "CDK4", "RB1", "E2F1"]
coexp_table = batch_coexpr_table(deg_list, top_n=10)
print(f"Co-expression entries: {len(coexp_table)}")
print(coexp_table.groupby("query_gene")["coexp_gene"].count())
coexp_table.to_csv("deg_coexpression_table.csv", index=False)
print("Saved deg_coexpression_table.csv")
When to use: Identify relevant GEO accessions to download raw count matrices for a meta-analysis.
import requests
import pandas as pd
ARCHS4_BASE = "https://maayanlab.cloud/archs4/api/v1"
keyword = "glioblastoma"
r = requests.get(
f"{ARCHS4_BASE}/samples/search",
params={"query": keyword, "species": "human", "limit": 200},
timeout=30
)
r.raise_for_status()
samples = pd.DataFrame(r.json().get("samples", []))
if len(samples) > 0:
# Get unique GEO series accessions
series = samples["series_id"].dropna().unique()
print(f"Unique GEO series for '{keyword}': {len(series)}")
for s in series[:10]:
n = (samples["series_id"] == s).sum()
print(f" {s} ({n} samples)")
# Export series list for GEO download script
pd.Series(series, name="geo_series").to_csv(
f"{keyword}_geo_series.txt", index=False
)
print(f"\nSaved {keyword}_geo_series.txt")
| Problem | Cause | Solution |
|---|---|---|
HTTP 404 for gene query | Gene symbol not found in ARCHS4 index | Verify HGNC symbol spelling; check species parameter matches gene convention (human: uppercase, mouse: first-letter-upper) |
HTTP 429 Too Many Requests | Exceeded ~10 req/s rate limit | Add time.sleep(0.1) between requests; for batch queries use a 0.5 s delay |
Empty values list in z-score response | Gene is not expressed in any indexed tissue, or wrong species | Switch species; verify gene is protein-coding and has GEO coverage |
Empty samples list from search | Keyword not matched in metadata fields | Try broader or alternative keywords (e.g., "liver" instead of "hepatic") |
| HDF5 gene not found | Symbol mismatch between HDF5 version and query | Check available genes in f["meta"]["genes"]["gene_symbol"][:]; try Ensembl ID or alias |
requests.exceptions.Timeout | Slow API response under load | Increase timeout=60; retry with exponential backoff |
| Z-scores all near zero | Gene has very low or absent expression across tissues | Check the gene's expression in raw counts; the gene may be non-coding or very lowly expressed |
gnomad-database — Population variant frequencies; use after ARCHS4 to identify variants in highly expressed genesgget-genomic-databases — Enrichr pathway enrichment for ARCHS4 co-expression gene lists (gget enrichr)pydeseq2-differential-expression — Differential expression analysis on bulk RNA-seq; ARCHS4 HDF5 matrices can serve as reference cohorts