Help us improve
Share bugs, ideas, or general feedback.
Guides researchers through open science practices: preregistration, FAIR data, repository choice, open access, licensing, and reproducible workflows. Use when writing data management plans or sharing research outputs.
npx claudepluginhub alterlab-ieu/alterlab-academic-skills --plugin alterlab-visualizationHow this skill is triggered — by the user, by Claude, or both
Slash command
/alterlab-writing-tools:alterlab-open-scienceThis skill is limited to the following tools:
The summary Claude sees in its skill listing — used to decide when to auto-load this skill
Open science represents a fundamental shift in how research is conducted, shared, and evaluated. Rather than treating the scientific process as a series of private activities culminating in a polished publication, open science makes the entire research lifecycle transparent -- from the initial hypothesis through data collection, analysis, and dissemination. This transparency serves multiple pur...
Guides researchers through open science practices: preregistration, FAIR data, open access publishing, reproducible analysis, and funder mandate compliance.
Guides preparation of Nature-ready Data Availability statements, repository plans, dataset citations, and FAIR metadata. Supports Chinese-speaking authors and general academic-writing data needs.
Routes ambiguous or multi-step research requests to the correct skill in the medsci-skills bundle. Use when the user has a research goal but doesn't name a specific skill, or when the task spans multiple skills.
Share bugs, ideas, or general feedback.
Open science represents a fundamental shift in how research is conducted, shared, and evaluated. Rather than treating the scientific process as a series of private activities culminating in a polished publication, open science makes the entire research lifecycle transparent -- from the initial hypothesis through data collection, analysis, and dissemination. This transparency serves multiple purposes: it increases trust in research findings, accelerates scientific progress by enabling reuse and replication, reduces waste by making negative results visible, and democratizes access to knowledge.
The open science movement encompasses a wide range of practices: preregistration of study designs and analyses, registered reports that receive peer review before data collection, open sharing of data and materials under FAIR principles (Findable, Accessible, Interoperable, Reusable), open access publishing through various routes (Green, Gold, Diamond), reproducible computational workflows using containers and notebooks, open peer review, and the use of persistent repositories for long-term data preservation.
This skill provides practical guidance for implementing each of these practices. It is not an advocacy document -- it acknowledges the real tensions between openness and privacy, the costs of open access publishing, and the career incentives that sometimes conflict with open practices. The goal is to equip researchers with the knowledge to make informed decisions about which open science practices to adopt, when, and how.
Use this skill when you need to:
Preregistration involves publicly recording your research plan -- hypotheses, methods, sample size, and analysis strategy -- before collecting or analyzing data. It distinguishes confirmatory analyses (hypothesis-testing) from exploratory analyses (hypothesis-generating), reducing the risk of p-hacking, HARKing (Hypothesizing After Results are Known), and other questionable research practices.
Key preregistration platforms:
| Platform | Best For | Features |
|---|---|---|
| OSF Registries | All disciplines | Multiple templates, embargo options, DOI, integrates with OSF projects |
| AsPredicted | Quick preregistration | 8-question template, simple, generates PDF |
| ClinicalTrials.gov | Clinical trials | Legally required for most interventional trials in the US |
| PROSPERO | Systematic reviews | Specific to health-related systematic reviews |
| EGAP | Political science / governance | Designed for experimental governance research |
What to include in a preregistration:
Example: Preregistration excerpt
Hypothesis 1: Participants in the spaced practice condition will score
higher on the delayed retention test (administered 7 days after training)
than participants in the massed practice condition.
Analysis: Independent samples t-test comparing mean retention scores
between conditions. Alpha = .05, two-tailed. If the Levene test indicates
unequal variances (p < .05), the Welch t-test will be used. Effect size
will be reported as Cohen d with 95% CI.
Power analysis: A priori power analysis using G*Power (Faul et al., 2007)
indicates that n = 64 per group (N = 128 total) is required to detect
a medium effect (d = 0.50) with 80% power at alpha = .05.
Preregistration does NOT mean:
Registered reports take preregistration further by embedding it in the peer review process. The study is reviewed in two stages:
Stage 1: Before data collection
Stage 2: After data collection
Benefits of registered reports:
Journals offering registered reports: Over 300 journals across disciplines now accept registered reports. The Center for Open Science maintains the complete list at cos.io/rr.
Example: Stage 1 submission structure
1. Introduction
1.1 Background and motivation
1.2 Existing evidence and gaps
1.3 Theoretical framework
1.4 Specific hypotheses (numbered, directional)
2. Methods
2.1 Design overview
2.2 Participants (target N, power justification, recruitment)
2.3 Materials and procedures
2.4 Measures (with psychometric evidence)
2.5 Analysis plan
2.5.1 Confirmatory analyses (mapped to hypotheses)
2.5.2 Robustness checks
2.5.3 Exploratory analyses (planned but not confirmatory)
2.6 Data exclusion criteria
2.7 Timeline
3. Pilot data (if available)
The FAIR principles provide a framework for making data maximally useful for both humans and machines:
Findable:
Accessible:
Interoperable:
Reusable:
Practical data sharing checklist:
Before sharing:
[ ] Remove or anonymize personally identifiable information (PII)
[ ] Check IRB/ethics approval covers data sharing
[ ] Verify no proprietary restrictions from data providers
[ ] Clean variable names and remove internal codes
[ ] Create a comprehensive codebook/data dictionary
Preparing the deposit:
[ ] Choose appropriate file formats (CSV over XLSX, open formats preferred)
[ ] Write a README describing the dataset structure
[ ] Create a data dictionary with all variable definitions
[ ] Include analysis scripts that reproduce published results
[ ] Add a LICENSE file (CC-BY 4.0 recommended for data)
[ ] Include the study preregistration link if applicable
Depositing:
[ ] Upload to a persistent repository (Zenodo, Dryad, Figshare, or domain-specific)
[ ] Obtain a DOI
[ ] Set an embargo period if needed (e.g., until publication)
[ ] Link the dataset DOI to the paper DOI
[ ] Add the data availability statement to the manuscript
Most major funders now require a data management plan (DMP) as part of grant applications. A DMP describes how data will be collected, organized, stored, shared, and preserved.
NSF DMP requirements (2 pages):
NIH Data Management and Sharing Plan (post-2023):
Example: DMP excerpt
Data Types: This project will generate three primary data types:
(1) survey responses from approximately 500 participants (Qualtrics,
exported as CSV), (2) semi-structured interview transcripts from 30
participants (audio recordings transcribed to text), and (3) behavioral
log data from the learning platform (JSON format, approximately 2GB).
Sharing Plan: De-identified survey and behavioral data will be deposited
in the ICPSR data repository within 12 months of the project end date
and assigned a DOI. Interview transcripts will be shared in redacted
form, with participant consent for sharing obtained during recruitment.
Audio recordings will not be shared due to re-identification risk.
Standards: Survey data will follow DDI (Data Documentation Initiative)
metadata standards. Variable names will use the codebook published with
our validated instrument (Martinez et al., 2024). All dates will use
ISO 8601 format.
DMP tools:
Open access (OA) removes paywalls so that anyone can read research without a subscription. There are several routes to OA:
Gold OA: Published in a fully open access journal. The author (or their funder/institution) pays an Article Processing Charge (APC). Examples: PLOS ONE, eLife, BMJ Open.
Green OA: The author deposits a version of the paper (preprint or accepted manuscript) in a repository. The journal may impose an embargo period (typically 6-12 months). No APC required. Repositories include institutional repositories, PubMed Central, arXiv, and SSRN.
Diamond OA (Platinum OA): The journal is open access with no APC -- costs are covered by institutions, scholarly societies, or grants. Examples include many humanities journals, the Journal of Machine Learning Research, and some society journals.
Hybrid OA: The journal is subscription-based but offers an OA option for individual articles (for an APC). Controversial because institutions pay twice (subscription + APC). Some funders (e.g., cOAlition S / Plan S) no longer fund hybrid OA.
Bronze OA: Free to read on the publisher website but without an open license. The publisher can remove access at any time. Not true OA.
APC cost ranges (2025-2026):
| Publisher Tier | Typical APC |
|---|---|
| Mega journals (PLOS ONE) | $1,500-$2,000 |
| Mid-tier specialty journals | $2,000-$4,000 |
| High-impact journals (Nature, Science OA options) | $5,000-$11,000 |
| Diamond OA journals | $0 |
Rights retention strategy: Many funders (including cOAlition S members) now support a Rights Retention Strategy where authors retain a CC-BY license on the Author Accepted Manuscript, regardless of publisher policy. This enables Green OA deposit immediately upon acceptance.
Preprint servers by discipline:
| Server | Disciplines |
|---|---|
| arXiv | Physics, mathematics, computer science, quantitative biology |
| bioRxiv | Biology |
| medRxiv | Health sciences (not peer-reviewed clinical findings) |
| SSRN | Social sciences, economics, law |
| PsyArXiv | Psychology |
| SocArXiv | Sociology, political science |
| EdArXiv | Education |
| EarthArXiv | Earth sciences |
| ChemRxiv | Chemistry |
| OSF Preprints | All disciplines |
Creative Commons (CC) licenses provide standardized terms for sharing research outputs. Understanding them is essential for open science.
License options (most to least permissive):
| License | Allows | Requires | Restrictions |
|---|---|---|---|
| CC0 (Public Domain) | Anything | Nothing | None |
| CC-BY | Anything | Attribution | None |
| CC-BY-SA | Anything | Attribution, share-alike | Derivatives must use same license |
| CC-BY-NC | Non-commercial use | Attribution | No commercial use |
| CC-BY-NC-SA | Non-commercial use | Attribution, share-alike | No commercial use, same license |
| CC-BY-ND | Sharing only | Attribution | No derivatives |
| CC-BY-NC-ND | Non-commercial sharing | Attribution | No commercial use, no derivatives |
Recommendations:
Reproducibility means that another researcher can take your data and code and obtain the same results. Computational reproducibility is the minimum standard; replicability (obtaining similar results with new data) is the aspirational goal.
Levels of reproducibility:
Docker for reproducible research:
# Dockerfile for a reproducible R analysis
FROM rocker/tidyverse:4.3.0
# Install additional R packages
RUN install2.r --error lme4 brms papaja here
# Copy analysis files
COPY . /home/rstudio/project
WORKDIR /home/rstudio/project
# Run the analysis
CMD ["Rscript", "analysis/main.R"]
Binder for interactive reproducibility:
Binder (mybinder.org) takes a GitHub repository with an environment.yml (Python) or install.R (R) file and creates a live, interactive Jupyter or RStudio environment that anyone can use without installing anything.
Example: environment.yml for Binder
name: my-research-env
channels:
- conda-forge
- defaults
dependencies:
- python=3.11
- numpy=1.26
- pandas=2.1
- scipy=1.11
- matplotlib=3.8
- seaborn=0.13
- statsmodels=0.14
- jupyter=1.0
- pip:
- pingouin==0.5.3
Code Ocean: A commercial platform that provides guaranteed computational reproducibility with a published DOI for each "compute capsule." Used by journals including Nature for results verification.
Best practices for reproducible code:
Jupyter notebooks, R Markdown, and Quarto documents combine code, text, and results in a single document. They are powerful tools for reproducible research when used well.
Jupyter notebooks (.ipynb):
R Markdown (.Rmd) / Quarto (.qmd):
Best practices for notebooks:
Open peer review encompasses several practices that increase transparency in the review process:
Models of open peer review:
Journals practicing open peer review:
| Journal/Platform | Model |
|---|---|
| eLife | Published reviews with author responses |
| BMJ | Open identities, open reports |
| F1000Research | Post-publication open review |
| PLOS ONE (optional) | Authors can opt for open reports |
| PeerJ | Authors can publish review history |
| Frontiers | Open identities, structured reports |
Choosing the right repository depends on your discipline, data type, and funder requirements.
General-purpose repositories:
| Repository | Max File Size | License | DOI | Preservation |
|---|---|---|---|---|
| Zenodo | 50 GB per dataset | Flexible | Yes | CERN long-term |
| Dryad | No hard limit | CC0 required | Yes | Curated, long-term |
| Figshare | 5 GB free, 20 GB institutional | Flexible | Yes | Long-term |
| OSF | 5 GB per file, 50 GB per project | Flexible | Yes | Long-term |
| Harvard Dataverse | 2.5 GB per file | Flexible | Yes | Long-term |
Domain-specific repositories (selected):
| Repository | Domain | Notes |
|---|---|---|
| GenBank / SRA | Genomics | Required for sequence data |
| PDB | Protein structures | Required for structural biology |
| ICPSR | Social science | Curated, access-controlled options |
| PANGAEA | Earth sciences | Georeferenced data |
| Qualitative Data Repository | Qualitative research | Specialized for interview/ethnographic data |
| UK Data Archive | Social science (UK) | Long-term preservation |
| Archaeology Data Service | Archaeology | UK-based, international scope |
Replication studies attempt to reproduce the findings of a previous study. They are essential for scientific self-correction but historically undervalued.
Types of replication:
Planning a replication study:
Key considerations:
Research software is increasingly recognized as a first-class scholarly output. Making it open source enhances reproducibility and enables community contributions.
Best practices for research software:
Example: CITATION.cff
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Martinez"
given-names: "Rosa"
orcid: "https://orcid.org/0000-0002-1234-5678"
title: "PyRetention: A Python Package for Learning Retention Analysis"
version: 2.1.0
doi: 10.5281/zenodo.1234567
date-released: 2025-09-15
url: "https://github.com/martinez-lab/pyretention"
license: MIT