Science Spec-Kit
Most earth scientists aren’t trained as software developers, but we write a lot of code. AI coding assistants are making this easier, but also risker. They’re fast! They’re fun! But they can make mistakes and wander astray if you set them loose without structured guidance.
“Vibe coding” (prompting and hoping) is fine for side projects, but scientific analysis needs to be correct, documented, and reproducible. That’s where this toolkit comes in.
What is this?
Section titled “What is this?”Science Spec-Kit gives you a structured way to build analysis code, whether you’re writing it yourself, working with an AI assistant, or both. Before you write any code, you write down:
- What question you’re trying to answer
- What data you’re using
- What outputs you expect
- How you’ll know the results are correct
Then you build the code step by step, with checkpoints along the way.
Everything is written in plain English. Your collaborators and reviewers can understand your analysis plan without reading Python. Non-programmers can review your approach, catch logical errors, and understand exactly what the code is supposed to do.
Why bother?
Section titled “Why bother?”Mistakes happen in science. There’s no way around that. But the goal is to:
- Catch mistakes earlier by thinking through the approach before coding
- Make mistakes easier to find by logging every decision and change
- Make reviews more thorough because reviewers can understand intentions, not just code
This isn’t only for AI-assisted coding. It’s useful for anyone who wants to organize their thoughts before starting. But if you are using an AI assistant and want quality results, you need structure to keep things on track.
Core Principles
Section titled “Core Principles”These principles guide how the toolkit is designed and how you should think about your analysis:
- Reproducibility: Analysis runs from raw data to outputs without manual intervention. Someone else (or future you) should be able to run your code and get the same results.
- Data Integrity: Raw data is immutable—you never modify the original files. Transformations produce new files, leaving the originals untouched.
- Provenance: Every output traces back to code, data, and parameter choices. You can always answer “where did this number come from?”
Next steps
Section titled “Next steps”Ready to try it? Head to the Quickstart to set up your first project.