geoarches documentation
If you're a user, start with the Getting Started guide, then explore the User Guide for more detailed instructions and tips.
If you're interested in contributing to the project, check out the Contributing for developer setup and guidelines.
What is geoarches?
geoarches is a research-friendly machine learning library for training, running, and evaluating models on geospatial data, mainly weather and climate data.
Built on PyTorch, Pytorch Lightning, and Hydra, geoarches offers a clean, modular structure for developing and scaling ML pipelines. Once installed, you can use its modules inside your own project, or use the main training and evaluating workflows.
geoarches powers ArchesWeather and ArchesWeatherGen models.
See ArchesWeather section for more details.
Overview
geoarches is meant to jumpstart your ML pipeline with building blocks for data handling, model training, and evaluation. This is an ongoing effort to share engineering tools and research knowledge across projects.
Data
download/: Parallelized dataset download scripts with support for chunking to speed up read access.dataloaders/: PyTorch datasets for loading and preprocessing NetCDF files into ML-ready tensors.
Model training
backbones/: Network architectures that plug into Lightning modules.lightning_modules/: Training and inference wrappers that are agnostic to the backbone but specific to the ML task — handle losses, optimizers, and metrics.
Evaluation
metrics/: Tested suite of efficient, memory-friendly metrics.evaluation/: End-to-end scripts to benchmark model predictions and generate plots.
Pipeline
main_hydra.py: Entry point for training or inference using Hydra configurations.docs/archesweather/: Quickstart code for training and inference.