Skip to content
Logo

geoarches documentation

If you're a user, start with the Getting Started guide, then explore the User Guide for more detailed instructions and tips.
If you're interested in contributing to the project, check out the Contributing for developer setup and guidelines.

What is geoarches?

geoarches is a research-friendly machine learning library for training, running, and evaluating models on geospatial data, mainly weather and climate data.

Built on PyTorch, Pytorch Lightning, and Hydra, geoarches offers a clean, modular structure for developing and scaling ML pipelines. Once installed, you can use its modules inside your own project, or use the main training and evaluating workflows.

geoarches powers ArchesWeather and ArchesWeatherGen models.

See ArchesWeather section for more details.

Overview

geoarches is meant to jumpstart your ML pipeline with building blocks for data handling, model training, and evaluation. This is an ongoing effort to share engineering tools and research knowledge across projects.

Data

  • download/: Parallelized dataset download scripts with support for chunking to speed up read access.
  • dataloaders/: PyTorch datasets for loading and preprocessing NetCDF files into ML-ready tensors.

Model training

  • backbones/: Network architectures that plug into Lightning modules.
  • lightning_modules/: Training and inference wrappers that are agnostic to the backbone but specific to the ML task — handle losses, optimizers, and metrics.

Evaluation

  • metrics/: Tested suite of efficient, memory-friendly metrics.
  • evaluation/: End-to-end scripts to benchmark model predictions and generate plots.

Pipeline

  • main_hydra.py: Entry point for training or inference using Hydra configurations.
  • docs/archesweather/: Quickstart code for training and inference.

Next steps