Building an Open Source Ecosystem in a Golden Age of Solar Physics

DOEPy Python Exchange / 27 March 2024

Will Barnes

American University / NASA Goddard Space Flight Center

What is solar physics?

What does a solar physicist do?

  • Ask a “physics” question
  • Identify an event of interest
  • Combine data to answer a question
    • Search for data
    • Download data
    • Ingest the data
    • Transform the data
    • Analyze the data
  • Publish a paper
  • Profit

Challenges

  • Data are spread across many different providers
  • Data are often large
  • Data are often very heterogeneous
  • Data and metadata must be held together
  • Complex coordinate systems

Solution

A Brief History of Software in Solar Physics

  • Historically, nearly all software in IDL
  • SolarSoftware (SSW)–solar physicist’s toolbox
  • Pros:
    • Scripting language–low barrier to entry
    • Freely available
    • Centrally distributed
    • Everything you need in one place
  • Cons:
    • Proprietary language, license fees
    • Development is not coordinated
    • No tests or documentation
    • No clear path to contribute

sunpy: Solar Data Analysis in Python

  • Began in March of 2011 at NASA GSFC
  • Frustration with licensing fees, fragility of SSW
  • Early attempts using GDL\(\to\)Python
  • v0.1 released 9/2011
  • v1.0 in 6/2019-paper currently has >200 citations
  • Some early skepticism due to heritage of IDL/SSW
  • Now the default choice, especially for ECRs
  • Open-source and openly developed–by the community, for the community
  • Built on SciPy ecosystem, especially astropy

sunpy: Solar Data Analysis in Python

The SunPy Project

Describe a software ecosystem (an informal survey)

  • “A set of software that can be installed in the same environment”
  • “A pipedream that will never occur.”
  • “Imagine a set of canals, linked together with locks.”
  • “A set of software that has enough in common to make it easy to share data between all the functionality…sharing a common programing language is neither required or sufficient”
  • ChatGPT
    • “…a complex network of interacting software projects, developers, and organizations that build, maintain, and use a common platform or set of technologies.”
    • ELI5: “…a playground, where different computer programs, the people who make them, and the people who use them, all play together nicely and help each other out.”

Building a Software Ecosystem

  • My proposed criteria:
    • open–transparent path for adding to the ecosystem
    • integrated–share common data structures and abstractions
    • nonduplicative–minimize overlap in functionality
    • reliable–regularly test package health and ecosystem integration
  • Challenges:
    • coordinating development between groups with different interests
    • curating a set of interoperable software packages
    • maintaining health of ecosystem

The SunPy Ecosystem

Functionality

Integration

Documentation

Testing

Duplication

Community

Development Status

The SunPy Ecosystem

Why is this so important now?

…Data are Used Together

…Data are Larger

…Data are More Complex

Conclusion

Summary

  • Solar data are large, complex, and heterogenous, but hold great value in combination
  • sunpy is a community-developed Python package for solar data analysis
  • The SunPy Project maintains an ecosystem of tools for working with solar data
  • Maintaining this ecosystem is increasingly important in this “Golden Age”

Questions for Panelists

  • How have you sustained momentum within your open-source community?
  • Have you converted users to long-term contributors? How?
  • What challenges have you faced in building a scientific software ecosystem?