Skill

databases

PostgreSQL databases on Giratina - PubMed, Legal, StackOverflow, ArXiv. Use when querying research data, legal documents, programming solutions, or scientific papers.

npx claudepluginhub built-simple/claude-brain-dump-repo --plugin neely-brain-dump

Tool Access

This skill is limited to using the following tools:

Bashmcp__postgres__query

Preview

**Server:** 192.168.1.100

SKILL.md

Similar Skills

query-expert

Generates optimized SQL/NoSQL queries for PostgreSQL, MySQL, MongoDB, Redis; analyzes EXPLAIN plans, designs indexes, troubleshoots slow queries and bottlenecks.

7 files

skills

postgresql

328

Guides PostgreSQL development: table design, indexing, constraints, PL/pgSQL, JSONB, full-text search, window functions, CTEs, EXPLAIN ANALYZE tuning, backups, replication, pgvector. For writing/optimizing queries, schemas, DB management.

1 file

partme-ai-full-stack-skills

query-builder

Generates optimized SQL queries for PostgreSQL, MySQL, SQLite and NoSQL for MongoDB, DynamoDB, Redis; supports ORMs like Prisma. Explains plans, indexes, and performance optimizations.

devkit

Stats

Parent Repo Stars0

Parent Repo Forks0

Last CommitDec 5, 2025

Actions

View Source View Plugin View on GitHub View README

Help us improve

Share bugs, ideas, or general feedback.

PostgreSQL Databases (Giratina)

Server: 192.168.1.100 PostgreSQL Version: 17.6 Location: /mnt/raid6/databases/postgresql/17/main Storage: 13TB RAID6 array (4.9TB available)

Database Summary

Database	Size	Records	Primary Use
pmc_fulltext	227 GB	4.5M articles, 255M citations	Medical/scientific research
legal_db	81 GB	9.2M documents	Legal document search
stackoverflow_complete	53 GB	18.5M solutions	Programming Q&A
arxiv_papers	16 GB	2.8M papers	Scientific preprints
TOTAL	377 GB	~340M records

pmc_fulltext (227 GB) - PubMed Central Research

Medical and scientific research articles with full citation network.

Stats

4,483,066 articles with full-text (68 GB raw text)
255,138,874 citations - massive citation network
88,560,660 author records
35 tables including network analysis, co-citation graphs
Query performance: 23ms for indexed queries

Key Tables

-- Articles with full text
SELECT id, title, abstract, full_text FROM articles LIMIT 10;

-- Citation network
SELECT * FROM citations WHERE source_pmid = '12345678';

-- Author search
SELECT * FROM authors WHERE name ILIKE '%Smith%';

Example Queries

-- Find articles by keyword in title
SELECT id, title, pub_date FROM articles
WHERE title ILIKE '%cancer%' AND pub_date > '2020-01-01'
LIMIT 100;

-- Get citation count for an article
SELECT COUNT(*) FROM citations WHERE target_pmid = '12345678';

-- Find co-authors
SELECT DISTINCT a2.name FROM authors a1
JOIN article_authors aa1 ON a1.id = aa1.author_id
JOIN article_authors aa2 ON aa1.article_id = aa2.article_id
JOIN authors a2 ON aa2.author_id = a2.id
WHERE a1.name = 'John Smith' AND a2.name != 'John Smith';

legal_db (81 GB) - Legal Documents

Legal documents from the "Pile of Law" dataset with excellent full-text search.

Stats

9,206,663 documents (124 GB raw → 11 GB stored, 91% compression)
70 GB of indexes (full-text search, trigrams, hash indexes)
Query performance: 1ms for simple lookups

Document Categories

Category	Count
Caselaw	5.9M
Miscellaneous	2.2M
Patents	543K
Contracts	353K

Key Tables

-- Main documents table
SELECT id, title, category, text FROM documents LIMIT 10;

-- Full-text search
SELECT id, title FROM documents
WHERE to_tsvector('english', text) @@ to_tsquery('contract & breach');

Example Queries

-- Search by category
SELECT id, title, LEFT(text, 500) FROM documents
WHERE category = 'caselaw'
LIMIT 100;

-- Full-text search with ranking
SELECT id, title, ts_rank(to_tsvector('english', text), query) AS rank
FROM documents, to_tsquery('intellectual & property') query
WHERE to_tsvector('english', text) @@ query
ORDER BY rank DESC
LIMIT 20;

-- Find patents
SELECT id, title FROM documents
WHERE category = 'patents' AND title ILIKE '%machine learning%';

stackoverflow_complete (53 GB) - Programming Solutions

Programming Q&A from Stack Overflow.

Stats

18,563,455 solutions
12,718,160 unique questions
12 programming languages tracked
Full-text search enabled on answers
Query performance: ~26 seconds for complex queries (needs optimization)

Top Languages

Language	Solutions
JavaScript	3.4M
Java	2.6M
Python	1.9M
C#	1.5M
PHP	1.2M

Key Tables

-- Questions
SELECT id, title, tags, score FROM questions LIMIT 10;

-- Answers/Solutions
SELECT id, question_id, body, score FROM answers LIMIT 10;

Example Queries

-- Find Python questions by keyword
SELECT id, title, score FROM questions
WHERE tags LIKE '%python%' AND title ILIKE '%pandas%'
ORDER BY score DESC
LIMIT 20;

-- Get top answers for a question
SELECT body, score FROM answers
WHERE question_id = 12345
ORDER BY score DESC;

-- Find high-scoring solutions
SELECT q.title, a.body, a.score
FROM answers a
JOIN questions q ON a.question_id = q.id
WHERE q.tags LIKE '%python%' AND a.score > 100
ORDER BY a.score DESC
LIMIT 10;

arxiv_papers (16 GB) - Research Papers

Scientific preprints from ArXiv.

Stats

2,770,235 papers
2.2 GB of abstracts with full-text search
Multiple full-text search indexes
Query performance: 23ms for category searches

Key Tables

-- Papers
SELECT id, title, abstract, categories, authors FROM papers LIMIT 10;

Example Queries

-- Search by category (cs.AI, physics, math, etc.)
SELECT id, title, authors FROM papers
WHERE categories LIKE '%cs.AI%'
ORDER BY id DESC
LIMIT 100;

-- Full-text search in abstracts
SELECT id, title FROM papers
WHERE to_tsvector('english', abstract) @@ to_tsquery('transformer & attention')
LIMIT 50;

-- Find papers by author
SELECT id, title, categories FROM papers
WHERE authors ILIKE '%Hinton%';

-- Recent ML papers
SELECT id, title, abstract FROM papers
WHERE categories LIKE '%cs.LG%'
ORDER BY id DESC
LIMIT 20;

Connection Details

User Credentials

Read-Only User (Recommended for Scripts)

Username: readonly
Password: ReadOnly2025
Permissions: SELECT only (cannot modify data)

Admin User (Local Only)

Username: postgres
Network access: Not configured (use locally on Giratina)

From Giratina (localhost)

psql -U postgres -d pmc_fulltext
psql -U postgres -d legal_db
psql -U postgres -d stackoverflow_complete
psql -U postgres -d arxiv_papers

From Other Nodes (192.168.1.x)

# Read-only access (recommended for all network connections)
PGPASSWORD='ReadOnly2025' psql -h 192.168.1.100 -U readonly -d pmc_fulltext

Connection String

postgresql://readonly:ReadOnly2025@192.168.1.100:5432/DATABASE_NAME

Note: No special characters in password - exclamation marks cause md5 auth failures with psycopg2.

Code Examples

Python (psycopg2)

import psycopg2

conn = psycopg2.connect(
    host="192.168.1.100",
    port=5432,
    database="pmc_fulltext",
    user="readonly",
    password="ReadOnly2025"
)
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM articles")
print(cursor.fetchone())

Python (SQLAlchemy + Pandas)

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine(
    "postgresql://readonly:ReadOnly2025@192.168.1.100:5432/legal_db"
)
df = pd.read_sql("SELECT * FROM documents LIMIT 1000", engine)

Node.js (pg)

const { Client } = require('pg');

const client = new Client({
  host: '192.168.1.100',
  port: 5432,
  database: 'stackoverflow_complete',
  user: 'readonly',
  password: 'ReadOnly2025'
});

await client.connect();
const res = await client.query('SELECT COUNT(*) FROM solutions');
console.log(res.rows);

Server-Side Cursors (Large Datasets)

import psycopg2

conn = psycopg2.connect(
    host="192.168.1.100",
    database="pmc_fulltext",
    user="readonly",
    password="ReadOnly2025"
)

# Named cursor = server-side cursor (streams results)
cursor = conn.cursor(name='fetch_large_data')
cursor.execute("SELECT * FROM citations WHERE cited_year > 2020")

# Fetch in batches to avoid memory issues
while True:
    rows = cursor.fetchmany(10000)  # 10k rows at a time
    if not rows:
        break
    for row in rows:
        process(row)

Network Access

Local Network (192.168.1.x) ✅

Direct connection works from any cluster node:

Silvally (192.168.1.52)
Hoopa (192.168.1.79)
Victini (192.168.1.115)
Talon (192.168.1.7)
Any laptop/desktop on the network

External Access

Use Cloudflare Tunnel:

cloudflared access tcp --hostname giratina-db.built-simple.ai --url localhost:5432 &
psql -h localhost -p 5432 -U readonly -d pmc_fulltext

Performance Tips

Operation	Speed	Notes
Simple SELECT by ID	~1ms	Using indexes
Category filter	~23ms	Indexed columns
Full-text search	Fast	GIN indexes enabled
Complex joins	Varies	Use EXPLAIN ANALYZE

Best Practices:

Use readonly user for analysis scripts
Fetch only needed columns (avoid SELECT *)
Filter on server side, not in Python
Use server-side cursors for large results
Use parallel processing for big jobs

Backup Files (on RAID6)

File	Size
legal_db.sql.gz	40 GB
legal_documents_20250828_095834.parquet.7z	78 GB
legal_master_duckdb.db.7z	39 GB