From sciagent-skills
Analyzes graphs and networks using NetworkX: creates directed/undirected/multi-edge graphs, computes centrality, shortest paths, communities; I/O (GraphML/GML), matplotlib viz. For social/biological/transport data.
npx claudepluginhub jaechang-hits/sciagent-skills --plugin sciagent-skillsThis skill uses the workspace's default tool permissions.
NetworkX is a Python library for creating, manipulating, and analyzing complex networks and graphs. It provides data structures for undirected, directed, and multi-edge graphs along with a comprehensive collection of graph algorithms, generators, and I/O utilities. Use NetworkX when working with relationship data in social networks, biological interaction networks, transportation systems, citat...
Creates, manipulates, and analyzes complex networks and graphs in Python with NetworkX. Supports graph algorithms like centrality, shortest paths, community detection, and visualization.
Creates, analyzes, and visualizes graphs and networks in Python using NetworkX. Runs algorithms for shortest paths, centrality, clustering, community detection, and generates synthetic networks.
Creates, analyzes, and visualizes graphs and networks in Python with NetworkX. Use for graph algorithms like shortest paths, centrality, clustering, community detection, and synthetic network generation.
Share bugs, ideas, or general feedback.
NetworkX is a Python library for creating, manipulating, and analyzing complex networks and graphs. It provides data structures for undirected, directed, and multi-edge graphs along with a comprehensive collection of graph algorithms, generators, and I/O utilities. Use NetworkX when working with relationship data in social networks, biological interaction networks, transportation systems, citation graphs, or any domain involving pairwise entity relationships.
igraph or graph-tool insteadgraph-tool with OpenMP or cuGraphtorch-geometric-graph-neural-networksnetworkx, matplotlib, scipy, pandas, numpypydot or pygraphviz (Graphviz layouts)pip install networkx matplotlib scipy pandas numpy
import networkx as nx
# Create a graph and add edges with weights
G = nx.karate_club_graph()
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Nodes: 34, Edges: 78
# Compute centrality and find most central node
bc = nx.betweenness_centrality(G)
top_node = max(bc, key=bc.get)
print(f"Most central node: {top_node}, betweenness: {bc[top_node]:.3f}")
# Detect communities
from networkx.algorithms import community
comms = community.greedy_modularity_communities(G)
print(f"Communities found: {len(comms)}")
import networkx as nx
# Undirected graph (most common)
G = nx.Graph()
G.add_node("protein_A", type="kinase", weight=1.5)
G.add_nodes_from(["protein_B", "protein_C"])
G.add_edge("protein_A", "protein_B", weight=0.9, interaction="phosphorylation")
G.add_edges_from([("protein_B", "protein_C"), ("protein_A", "protein_C")])
print(f"Nodes: {G.number_of_nodes()}, Edges: {G.number_of_edges()}")
# Nodes: 3, Edges: 3
# Directed graph (gene regulation, citations)
D = nx.DiGraph()
D.add_edges_from([("TF1", "geneA"), ("TF1", "geneB"), ("TF2", "geneA")])
print(f"TF1 out-degree: {D.out_degree('TF1')}") # 2
# MultiGraph (multiple relationship types between same nodes)
M = nx.MultiGraph()
M.add_edge("A", "B", key="binding", affinity=0.8)
M.add_edge("A", "B", key="regulation", effect="inhibition")
print(f"Edges between A-B: {M.number_of_edges('A', 'B')}") # 2
import networkx as nx
G = nx.karate_club_graph()
# Query structure
print(f"Degree of node 0: {G.degree(0)}")
print(f"Neighbors of node 0: {list(G.neighbors(0))[:5]}")
print(f"Has edge 0-1: {G.has_edge(0, 1)}")
# Set and get attributes
G.nodes[0]["role"] = "instructor"
nx.set_node_attributes(G, {0: "high", 33: "high"}, "importance")
G[0][1]["weight"] = 0.95
# Iterate with data
for u, v, data in G.edges(data=True):
if "weight" in data:
print(f" Edge {u}-{v}: weight={data['weight']}")
break
# Subgraphs (returns read-only view; use .copy() for mutable)
H = G.subgraph([0, 1, 2, 3, 4, 5]).copy()
print(f"Subgraph: {H.number_of_nodes()} nodes, {H.number_of_edges()} edges")
import networkx as nx
G = nx.karate_club_graph()
degree_c = nx.degree_centrality(G)
between_c = nx.betweenness_centrality(G, weight="weight")
# For large graphs, approximate: nx.betweenness_centrality(G, k=100)
close_c = nx.closeness_centrality(G)
eigen_c = nx.eigenvector_centrality(G, max_iter=1000)
pr = nx.pagerank(G, alpha=0.85)
# Compare top nodes across measures
for name, metric in [("Degree", degree_c), ("Betweenness", between_c),
("Closeness", close_c), ("PageRank", pr)]:
top = max(metric, key=metric.get)
print(f"{name:12s}: top node={top}, score={metric[top]:.4f}")
import networkx as nx
G = nx.karate_club_graph()
# Shortest path
path = nx.shortest_path(G, source=0, target=33)
length = nx.shortest_path_length(G, source=0, target=33)
print(f"Shortest path 0->33: {path} (length {length})")
print(f"Average shortest path length: {nx.average_shortest_path_length(G):.3f}")
# Connected components
print(f"Connected: {nx.is_connected(G)}")
components = list(nx.connected_components(G))
print(f"Components: {len(components)}, largest: {len(max(components, key=len))}")
# For directed graphs: strong/weak connectivity
D = nx.DiGraph([(0,1),(1,2),(2,0),(3,4)])
print(f"Strongly connected: {list(nx.strongly_connected_components(D))}")
# Connectivity measures
print(f"Node connectivity: {nx.node_connectivity(G)}")
print(f"Edge connectivity: {nx.edge_connectivity(G)}")
Partition networks into densely connected groups.
import networkx as nx
from networkx.algorithms import community
import itertools
G = nx.karate_club_graph()
# Greedy modularity maximization
comms_greedy = community.greedy_modularity_communities(G)
mod_score = community.modularity(G, comms_greedy)
print(f"Greedy: {len(comms_greedy)} communities, modularity={mod_score:.4f}")
# Label propagation (fast, non-deterministic)
comms_lpa = community.label_propagation_communities(G)
print(f"Label propagation: {len(list(comms_lpa))} communities")
# Girvan-Newman (hierarchical, edge betweenness removal)
gn = community.girvan_newman(G)
# Get first level of partition
first_level = next(gn)
print(f"Girvan-Newman first split: {len(first_level)} groups")
print(f" Sizes: {[len(c) for c in first_level]}")
import networkx as nx
import pandas as pd
import json
G = nx.karate_club_graph()
# Edge list (simple text format)
nx.write_edgelist(G, "karate.edgelist")
G_loaded = nx.read_edgelist("karate.edgelist", nodetype=int)
# GraphML (preserves all attributes, XML-based)
nx.write_graphml(G, "karate.graphml")
G_xml = nx.read_graphml("karate.graphml")
# JSON (node-link format, web-friendly for d3.js)
data = nx.node_link_data(G)
with open("karate.json", "w") as f:
json.dump(data, f)
# Pandas integration
df = pd.DataFrame({"source": [1,2,3], "target": [2,3,4], "weight": [0.5,1.0,0.75]})
G_pd = nx.from_pandas_edgelist(df, "source", "target", edge_attr="weight")
df_out = nx.to_pandas_edgelist(G_pd)
print(f"Pandas round-trip: {len(df_out)} edges")
# NumPy/SciPy matrices
A = nx.to_numpy_array(G)
print(f"Adjacency matrix shape: {A.shape}")
A_sparse = nx.to_scipy_sparse_array(G, format="csr") # Memory-efficient
import networkx as nx
import matplotlib.pyplot as plt
G = nx.karate_club_graph()
pos = nx.spring_layout(G, seed=42)
# Color by degree, size by betweenness centrality
bc = nx.betweenness_centrality(G)
fig, ax = plt.subplots(figsize=(10, 8))
nx.draw(G, pos=pos, ax=ax,
node_color=[G.degree(n) for n in G.nodes()], cmap=plt.cm.viridis,
node_size=[3000 * bc[n] + 100 for n in G.nodes()],
edge_color="gray", alpha=0.8, with_labels=True, font_size=8)
plt.tight_layout()
plt.savefig("network.png", dpi=300, bbox_inches="tight")
plt.savefig("network.pdf", bbox_inches="tight") # Vector format
print("Saved network.png and network.pdf")
import networkx as nx
# Erdos-Renyi random graph: n nodes, edge probability p
G_er = nx.erdos_renyi_graph(n=200, p=0.05, seed=42)
print(f"ER: {G_er.number_of_nodes()} nodes, {G_er.number_of_edges()} edges")
# Barabasi-Albert scale-free (power-law degree distribution)
G_ba = nx.barabasi_albert_graph(n=200, m=3, seed=42)
# Watts-Strogatz small-world
G_ws = nx.watts_strogatz_graph(n=200, k=6, p=0.1, seed=42)
print(f"WS clustering: {nx.average_clustering(G_ws):.3f}")
# Stochastic block model (community structure)
sizes, probs = [50, 50, 50], [[0.25,0.05,0.02],[0.05,0.35,0.07],[0.02,0.07,0.40]]
G_sbm = nx.stochastic_block_model(sizes, probs, seed=42)
# Built-in datasets and classic graphs
G_karate = nx.karate_club_graph() # Zachary's karate club
G_grid = nx.grid_2d_graph(5, 7) # 2D lattice
G_tree = nx.random_tree(n=50, seed=42) # Random tree
G_geo = nx.random_geometric_graph(n=100, radius=0.2, seed=42)
# See references/algorithms_generators.md for full generator catalog
| Class | Directed | Multi-edge | Self-loops | Use Case |
|---|---|---|---|---|
Graph | No | No | Yes | Undirected networks: social, PPI |
DiGraph | Yes | No | Yes | Gene regulation, citations, web |
MultiGraph | No | Yes | Yes | Multiple relationship types |
MultiDiGraph | Yes | Yes | Yes | Transportation with routes |
Attributes are stored as dictionaries at graph, node, and edge levels:
import networkx as nx
G = nx.Graph(name="example") # Graph-level attribute
G.add_node(1, label="hub", weight=1.5) # Node attributes
G.add_edge(1, 2, weight=0.8, type="ppi") # Edge attributes
# Bulk set/get
nx.set_node_attributes(G, {1: "red", 2: "blue"}, "color")
colors = nx.get_node_attributes(G, "color") # {1: 'red', 2: 'blue'}
| Layout | Function | Best For |
|---|---|---|
| Spring (force-directed) | spring_layout(G, seed=42) | General networks |
| Circular | circular_layout(G) | Regular graphs, cycles |
| Kamada-Kawai | kamada_kawai_layout(G) | Small-medium networks |
| Spectral | spectral_layout(G) | Highlighting clusters |
| Shell (concentric) | shell_layout(G, nlist=[[...],[...]]) | Layered/hierarchical |
| Planar | planar_layout(G) | Planar graphs only |
Goal: Identify influential actors, detect communities, and visualize.
import networkx as nx
import matplotlib.pyplot as plt
from networkx.algorithms import community
# Step 1: Load network and basic stats
G = nx.karate_club_graph()
print(f"Network: {G.number_of_nodes()} actors, {G.number_of_edges()} ties")
print(f"Density: {nx.density(G):.4f}, Clustering: {nx.average_clustering(G):.4f}")
# Step 2: Identify influential nodes
bc = nx.betweenness_centrality(G)
top_bc = sorted(bc.items(), key=lambda x: x[1], reverse=True)[:5]
print("Top 5 by betweenness:", [(n, f"{s:.3f}") for n, s in top_bc])
# Step 3: Detect communities
comms = community.greedy_modularity_communities(G)
print(f"Communities: {len(comms)}, modularity: {community.modularity(G, comms):.4f}")
# Step 4: Visualize with community coloring
pos = nx.spring_layout(G, seed=42)
fig, ax = plt.subplots(figsize=(10, 8))
for i, comm in enumerate(comms):
nx.draw_networkx_nodes(G, pos, nodelist=list(comm), ax=ax,
node_color=[plt.cm.Set2(i)]*len(comm), node_size=400)
nx.draw_networkx_edges(G, pos, ax=ax, alpha=0.3)
nx.draw_networkx_labels(G, pos, ax=ax, font_size=8)
plt.axis("off")
plt.tight_layout()
plt.savefig("social_network_analysis.png", dpi=300, bbox_inches="tight")
print("Saved social_network_analysis.png")
Goal: Build a PPI network from tabular data, analyze topology, and identify hub proteins.
import networkx as nx
import pandas as pd
# Step 1: Load interaction data from DataFrame
interactions = pd.DataFrame({
"protein_a": ["TP53","TP53","BRCA1","BRCA1","MDM2","ATM","ATM","CHEK2","RB1","CDK2"],
"protein_b": ["MDM2","BRCA1","ATM","CHEK2","RB1","CHEK2","BRCA2","CDC25A","CDK2","CCNA2"],
"score": [0.99, 0.95, 0.92, 0.88, 0.91, 0.97, 0.85, 0.90, 0.87, 0.93]
})
G = nx.from_pandas_edgelist(interactions, "protein_a", "protein_b",
edge_attr="score")
print(f"PPI network: {G.number_of_nodes()} proteins, {G.number_of_edges()} interactions")
# Step 2: Network statistics
print(f"Connected: {nx.is_connected(G)}")
print(f"Diameter: {nx.diameter(G)}")
print(f"Avg path length: {nx.average_shortest_path_length(G):.2f}")
print(f"Transitivity: {nx.transitivity(G):.4f}")
# Step 3: Hub identification (multiple centrality measures)
degree_c = nx.degree_centrality(G)
between_c = nx.betweenness_centrality(G)
close_c = nx.closeness_centrality(G)
results = pd.DataFrame({
"protein": list(G.nodes()),
"degree_centrality": [degree_c[n] for n in G.nodes()],
"betweenness": [between_c[n] for n in G.nodes()],
"closeness": [close_c[n] for n in G.nodes()],
}).sort_values("betweenness", ascending=False)
print("\nHub proteins:")
print(results.head(5).to_string(index=False))
# Step 4: Export for downstream analysis
nx.write_graphml(G, "ppi_network.graphml")
results.to_csv("protein_centrality.csv", index=False)
print("Exported ppi_network.graphml and protein_centrality.csv")
| Parameter | Module | Default | Range / Options | Effect |
|---|---|---|---|---|
weight | Paths/Centrality | None | Edge attribute name | Use weighted edges for path/centrality calculations |
alpha | pagerank | 0.85 | 0.0-1.0 | Damping factor; lower = more uniform distribution |
k | betweenness_centrality | None | int | Sample k nodes for approximation on large graphs |
max_iter | eigenvector_centrality | 100 | int | Max iterations for convergence |
seed | Generators/Layouts | None | int | Random seed for reproducibility |
n / p / m | ER/BA generators | varies | int/float | Node count, edge probability, edges per new node |
k / p | Watts-Strogatz | varies | int/float | Nearest neighbors, rewiring probability |
nodetype | read_edgelist | str | int, float, str | Type conversion for node identifiers |
edge_attr | from_pandas_edgelist | None | Column name(s) | Edge attribute columns to include from DataFrame |
format | to_scipy_sparse_array | "csc" | "csr", "csc", "coo" | Sparse matrix format |
Always set random seeds for reproducible generators and layouts: seed=42 in both erdos_renyi_graph() and spring_layout().
Use approximate algorithms for large graphs: nx.betweenness_centrality(G, k=500) samples k nodes instead of all pairs.
Prefer from_pandas_edgelist over manual add_edge loops for bulk data loading -- handles attributes cleanly and is faster.
Copy subgraphs before modification: G.subgraph(nodes) returns a read-only view; call .copy() for a mutable independent graph.
Use GraphML or GML for persistent storage to preserve all node/edge attributes. Edge lists lose metadata unless explicitly handled.
Convert graph types explicitly: D.to_undirected() (DiGraph -> Graph), nx.Graph(M) (MultiGraph -> Graph, collapses multi-edges).
Use sparse matrices for large adjacency exports: to_scipy_sparse_array() is far more memory-efficient than to_numpy_array().
Anti-pattern -- Don't use nx.info(): Deprecated; use G.number_of_nodes(), G.number_of_edges(), nx.density(G) directly.
Anti-pattern -- Don't assume node ordering: Algorithms may return results in different orders. Always index by node key, not position.
Extract the minimum spanning tree and compare to the original graph.
import networkx as nx
# Create weighted graph
G = nx.erdos_renyi_graph(50, 0.15, seed=42)
for u, v in G.edges():
G[u][v]["weight"] = round(nx.utils.py_random_state(42).random(), 2)
mst = nx.minimum_spanning_tree(G, weight="weight")
print(f"Original: {G.number_of_edges()} edges")
print(f"MST: {mst.number_of_edges()} edges")
total_weight = sum(d["weight"] for _, _, d in mst.edges(data=True))
print(f"MST total weight: {total_weight:.2f}")
Find cliques and compute graph coloring.
import networkx as nx
G = nx.karate_club_graph()
# Find all maximal cliques
cliques = list(nx.find_cliques(G))
print(f"Maximal cliques: {len(cliques)}")
largest_clique = max(cliques, key=len)
print(f"Largest clique size: {len(largest_clique)}, nodes: {largest_clique}")
# Greedy graph coloring
coloring = nx.greedy_color(G, strategy="largest_first")
n_colors = max(coloring.values()) + 1
print(f"Chromatic number (greedy upper bound): {n_colors}")
Build a directed acyclic graph and find execution order.
import networkx as nx
# Task dependency DAG
D = nx.DiGraph()
D.add_edges_from([
("download_data", "preprocess"),
("download_data", "validate"),
("preprocess", "analyze"),
("validate", "analyze"),
("analyze", "visualize"),
("analyze", "report"),
("visualize", "report"),
])
print(f"Is DAG: {nx.is_directed_acyclic_graph(D)}")
order = list(nx.topological_sort(D))
print(f"Execution order: {order}")
# Find all paths from start to end
paths = list(nx.all_simple_paths(D, "download_data", "report"))
print(f"Paths to report: {len(paths)}")
for p in paths:
print(f" {' -> '.join(p)}")
| Problem | Cause | Solution |
|---|---|---|
NetworkXError: Graph is not connected | Algorithm requires connected graph | Extract largest component: G.subgraph(max(nx.connected_components(G), key=len)).copy() |
PowerIterationFailedConvergence | Eigenvector/PageRank did not converge | Increase max_iter (e.g., 1000) or check for disconnected components |
| Very slow centrality computation | O(n*m) complexity on large graphs | Use k parameter for sampling: betweenness_centrality(G, k=500) |
nx.NetworkXNotImplemented | Algorithm not available for graph type | Convert graph type: G.to_undirected() or G.to_directed() |
| Memory error on large graphs | Dense adjacency matrix | Use to_scipy_sparse_array() instead of to_numpy_array() |
| Node IDs read as strings from file | read_edgelist defaults to str | Pass nodetype=int: nx.read_edgelist(f, nodetype=int) |
| Community detection returns frozen sets | Normal return type for communities | Convert: [list(c) for c in communities] |
| Self-loops in generated graphs | Configuration model allows self-loops | Remove: G.remove_edges_from(nx.selfloop_edges(G)) |
| Visualization too cluttered | Too many nodes/edges | Filter to subgraph, adjust alpha, increase figure size, or use interactive tools (Plotly, PyVis) |
Migrated from original entry (STUB: 436-line main file + 2,014 lines across 5 reference files, main/total = 17.8%).
Covers: Detailed algorithm parameters for traversal (DFS/BFS), cycles, cliques, graph coloring, isomorphism, matching/covering, tree algorithms (MST variants). Full generator catalog: classic graphs, lattice/grid, tree, bipartite, degree sequence, graph operations (union, compose, complement, products). Relocated inline: Core algorithms (centrality, paths, connectivity, community, flow) -> Core API Modules 3-5. Core generators (ER, BA, WS, SBM) -> Module 8. Omitted: A* heuristic customization, Bellman-Ford negative weights -- consult official docs.
Original file disposition:
algorithms.md (383 lines): Top algorithms relocated to Core API Modules 3-5 + Recipes. Remaining (traversal, cliques, coloring, isomorphism, matching, cycles, trees) -> this reference.generators.md (378 lines): Core generators relocated to Module 8. Full catalog (classic, lattice, tree, bipartite, degree sequence, operators) -> this reference.Covers: All I/O formats (adjacency list, GEXF, Pajek, LEDA, Cytoscape JSON, DOT/Graphviz, Matrix Market, CSV, database/SQL, compressed gzip). Format selection guide. Advanced visualization: Plotly interactive, PyVis HTML, Graphviz layouts, 3D networks, bipartite layout, community coloring, subgraph highlighting, multi-panel figures, edge labels, directed arrows.
Relocated inline: Core I/O (edge list, GraphML, JSON, pandas, NumPy/SciPy) -> Module 6. Basic matplotlib -> Module 7.
Omitted: write_gpickle/read_gpickle (deprecated), read_shp/write_shp (removed in NetworkX 3.0; use geopandas).
Original file disposition:
io.md (441 lines): Core formats relocated to Module 6. Remaining formats + format selection guide -> this reference.visualization.md (529 lines): Basic matplotlib relocated to Module 7. Advanced techniques (Plotly, PyVis, 3D, bipartite, community coloring) -> this reference.graph-basics.md (283 lines): Fully consolidated into main SKILL.md. Graph types -> Key Concepts. Node/edge operations, attributes, subgraphs -> Core API Modules 1-2. Diagnostics -> Common Workflows. Memory/float-point considerations -> Best Practices + Troubleshooting. Omitted: nx.info() (deprecated).nx.draw