Skill

databases

Install
1
Install the plugin
$
npx claudepluginhub built-simple/claude-brain-dump-repo --plugin neely-brain-dump

Want just this skill?

Add to a custom plugin, then install with one command.

Description

PostgreSQL databases on Giratina - PubMed, Legal, StackOverflow, ArXiv. Use when querying research data, legal documents, programming solutions, or scientific papers.

Tool Access

This skill is limited to using the following tools:

Bashmcp__postgres__query
Skill Content

PostgreSQL Databases (Giratina)

Server: 192.168.1.100 PostgreSQL Version: 17.6 Location: /mnt/raid6/databases/postgresql/17/main Storage: 13TB RAID6 array (4.9TB available)

Database Summary

DatabaseSizeRecordsPrimary Use
pmc_fulltext227 GB4.5M articles, 255M citationsMedical/scientific research
legal_db81 GB9.2M documentsLegal document search
stackoverflow_complete53 GB18.5M solutionsProgramming Q&A
arxiv_papers16 GB2.8M papersScientific preprints
TOTAL377 GB~340M records

pmc_fulltext (227 GB) - PubMed Central Research

Medical and scientific research articles with full citation network.

Stats

  • 4,483,066 articles with full-text (68 GB raw text)
  • 255,138,874 citations - massive citation network
  • 88,560,660 author records
  • 35 tables including network analysis, co-citation graphs
  • Query performance: 23ms for indexed queries

Key Tables

-- Articles with full text
SELECT id, title, abstract, full_text FROM articles LIMIT 10;

-- Citation network
SELECT * FROM citations WHERE source_pmid = '12345678';

-- Author search
SELECT * FROM authors WHERE name ILIKE '%Smith%';

Example Queries

-- Find articles by keyword in title
SELECT id, title, pub_date FROM articles
WHERE title ILIKE '%cancer%' AND pub_date > '2020-01-01'
LIMIT 100;

-- Get citation count for an article
SELECT COUNT(*) FROM citations WHERE target_pmid = '12345678';

-- Find co-authors
SELECT DISTINCT a2.name FROM authors a1
JOIN article_authors aa1 ON a1.id = aa1.author_id
JOIN article_authors aa2 ON aa1.article_id = aa2.article_id
JOIN authors a2 ON aa2.author_id = a2.id
WHERE a1.name = 'John Smith' AND a2.name != 'John Smith';

legal_db (81 GB) - Legal Documents

Legal documents from the "Pile of Law" dataset with excellent full-text search.

Stats

  • 9,206,663 documents (124 GB raw → 11 GB stored, 91% compression)
  • 70 GB of indexes (full-text search, trigrams, hash indexes)
  • Query performance: 1ms for simple lookups

Document Categories

CategoryCount
Caselaw5.9M
Miscellaneous2.2M
Patents543K
Contracts353K

Key Tables

-- Main documents table
SELECT id, title, category, text FROM documents LIMIT 10;

-- Full-text search
SELECT id, title FROM documents
WHERE to_tsvector('english', text) @@ to_tsquery('contract & breach');

Example Queries

-- Search by category
SELECT id, title, LEFT(text, 500) FROM documents
WHERE category = 'caselaw'
LIMIT 100;

-- Full-text search with ranking
SELECT id, title, ts_rank(to_tsvector('english', text), query) AS rank
FROM documents, to_tsquery('intellectual & property') query
WHERE to_tsvector('english', text) @@ query
ORDER BY rank DESC
LIMIT 20;

-- Find patents
SELECT id, title FROM documents
WHERE category = 'patents' AND title ILIKE '%machine learning%';

stackoverflow_complete (53 GB) - Programming Solutions

Programming Q&A from Stack Overflow.

Stats

  • 18,563,455 solutions
  • 12,718,160 unique questions
  • 12 programming languages tracked
  • Full-text search enabled on answers
  • Query performance: ~26 seconds for complex queries (needs optimization)

Top Languages

LanguageSolutions
JavaScript3.4M
Java2.6M
Python1.9M
C#1.5M
PHP1.2M

Key Tables

-- Questions
SELECT id, title, tags, score FROM questions LIMIT 10;

-- Answers/Solutions
SELECT id, question_id, body, score FROM answers LIMIT 10;

Example Queries

-- Find Python questions by keyword
SELECT id, title, score FROM questions
WHERE tags LIKE '%python%' AND title ILIKE '%pandas%'
ORDER BY score DESC
LIMIT 20;

-- Get top answers for a question
SELECT body, score FROM answers
WHERE question_id = 12345
ORDER BY score DESC;

-- Find high-scoring solutions
SELECT q.title, a.body, a.score
FROM answers a
JOIN questions q ON a.question_id = q.id
WHERE q.tags LIKE '%python%' AND a.score > 100
ORDER BY a.score DESC
LIMIT 10;

arxiv_papers (16 GB) - Research Papers

Scientific preprints from ArXiv.

Stats

  • 2,770,235 papers
  • 2.2 GB of abstracts with full-text search
  • Multiple full-text search indexes
  • Query performance: 23ms for category searches

Key Tables

-- Papers
SELECT id, title, abstract, categories, authors FROM papers LIMIT 10;

Example Queries

-- Search by category (cs.AI, physics, math, etc.)
SELECT id, title, authors FROM papers
WHERE categories LIKE '%cs.AI%'
ORDER BY id DESC
LIMIT 100;

-- Full-text search in abstracts
SELECT id, title FROM papers
WHERE to_tsvector('english', abstract) @@ to_tsquery('transformer & attention')
LIMIT 50;

-- Find papers by author
SELECT id, title, categories FROM papers
WHERE authors ILIKE '%Hinton%';

-- Recent ML papers
SELECT id, title, abstract FROM papers
WHERE categories LIKE '%cs.LG%'
ORDER BY id DESC
LIMIT 20;

Connection Details

User Credentials

Read-Only User (Recommended for Scripts)

  • Username: readonly
  • Password: ReadOnly2025
  • Permissions: SELECT only (cannot modify data)

Admin User (Local Only)

  • Username: postgres
  • Network access: Not configured (use locally on Giratina)

From Giratina (localhost)

psql -U postgres -d pmc_fulltext
psql -U postgres -d legal_db
psql -U postgres -d stackoverflow_complete
psql -U postgres -d arxiv_papers

From Other Nodes (192.168.1.x)

# Read-only access (recommended for all network connections)
PGPASSWORD='ReadOnly2025' psql -h 192.168.1.100 -U readonly -d pmc_fulltext

Connection String

postgresql://readonly:ReadOnly2025@192.168.1.100:5432/DATABASE_NAME

Note: No special characters in password - exclamation marks cause md5 auth failures with psycopg2.


Code Examples

Python (psycopg2)

import psycopg2

conn = psycopg2.connect(
    host="192.168.1.100",
    port=5432,
    database="pmc_fulltext",
    user="readonly",
    password="ReadOnly2025"
)
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM articles")
print(cursor.fetchone())

Python (SQLAlchemy + Pandas)

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine(
    "postgresql://readonly:ReadOnly2025@192.168.1.100:5432/legal_db"
)
df = pd.read_sql("SELECT * FROM documents LIMIT 1000", engine)

Node.js (pg)

const { Client } = require('pg');

const client = new Client({
  host: '192.168.1.100',
  port: 5432,
  database: 'stackoverflow_complete',
  user: 'readonly',
  password: 'ReadOnly2025'
});

await client.connect();
const res = await client.query('SELECT COUNT(*) FROM solutions');
console.log(res.rows);

Server-Side Cursors (Large Datasets)

import psycopg2

conn = psycopg2.connect(
    host="192.168.1.100",
    database="pmc_fulltext",
    user="readonly",
    password="ReadOnly2025"
)

# Named cursor = server-side cursor (streams results)
cursor = conn.cursor(name='fetch_large_data')
cursor.execute("SELECT * FROM citations WHERE cited_year > 2020")

# Fetch in batches to avoid memory issues
while True:
    rows = cursor.fetchmany(10000)  # 10k rows at a time
    if not rows:
        break
    for row in rows:
        process(row)

Network Access

Local Network (192.168.1.x) ✅

Direct connection works from any cluster node:

  • Silvally (192.168.1.52)
  • Hoopa (192.168.1.79)
  • Victini (192.168.1.115)
  • Talon (192.168.1.7)
  • Any laptop/desktop on the network

External Access

Use Cloudflare Tunnel:

cloudflared access tcp --hostname giratina-db.built-simple.ai --url localhost:5432 &
psql -h localhost -p 5432 -U readonly -d pmc_fulltext

Performance Tips

OperationSpeedNotes
Simple SELECT by ID~1msUsing indexes
Category filter~23msIndexed columns
Full-text searchFastGIN indexes enabled
Complex joinsVariesUse EXPLAIN ANALYZE

Best Practices:

  • Use readonly user for analysis scripts
  • Fetch only needed columns (avoid SELECT *)
  • Filter on server side, not in Python
  • Use server-side cursors for large results
  • Use parallel processing for big jobs

Backup Files (on RAID6)

FileSize
legal_db.sql.gz40 GB
legal_documents_20250828_095834.parquet.7z78 GB
legal_master_duckdb.db.7z39 GB
Stats
Stars0
Forks0
Last CommitDec 5, 2025
Actions

Similar Skills