Our Sustainability Journey: CLOCKSS and Carbon Footprint Tracking

In July 2023, CLOCKSS began a 15-month journey to better understand the environmental impact of our long-term digital preservation service. While our mission is to safeguard scholarly content for future generations, we recognize that sustainability must also guide how we operate. We're now sharing a high-level analysis of our carbon footprint, and some important lessons we’ve learned.

We started by forming a project team and partnering with experts from DIMPACT, who specialize in helping digital organizations assess and reduce their climate impact. Our first step was a two-hour workshop to map how content flows through the CLOCKSS system: how it’s ingested, preserved, and accessed. This workshop focused on digging deep into how CLOCKSS really works. How is content ingested, stored, and served? How do administrators and users interact with preserved content before and after it enters the archive? From this, we created a rough map of our content flow: into, around, and out of the CLOCKSS archive. The workshop also helped us visualize our infrastructure and create a working model of the archive.

Picture1

A second workshop refined this model, identifying the actual machines involved in service delivery, where each is located, and what each one does and how often. This was a bit of a revelation as no one had a complete understanding of all the distributed kit we use to deliver the archive. There were unexpected benefits of this cross-organizational alignment – for example it helped us update our corporate asset register and strengthened shared understanding across the various research organizations that host CLOCKSS machines.

Next, we agreed on a set of data points to collect. Given CLOCKSS is hosted at 12 global sites, we focused on gathering data at four locations: University of Edinburgh, University of Alberta, Stanford University, and Indiana University.

Each site offered unique insights. For example:

  • Stanford, located in a region with a strong renewable energy mix, had good data availability and is well along its clean energy transition.
  • Indiana, located in a region heavily reliant on fossil fuels and where CLOCKSS machines are deeply integrated into a high-performance computing environment, proved more complex, but the process sparked important conversations about sustainability.

We plugged the data into a modelling spreadsheet and estimate that CLOCKSS generates about 9 tonnes of carbon per month from its archiving service and another 1 tonne per year from travel. Integrity checking, a key process ensuring content authenticity, is our most carbon-intensive operation. Notably, this estimate covers only the use phase of our hardware. We haven't yet quantified the embodied emissions, the environmental cost of producing our servers which will add to our overall footprint. We have got insight into the environmental cost of disposal, with host institutions having an array of thoughtful policies.

Picture2

We presented our findings at iPRES 2024, the leading international digital preservation conference, where peer feedback will help us generalize the CLOCKSS model for broader application across the preservation community. This work is now underway under the auspices of the Digital Preservation Coalition’s new Carbon Footprint Taskforce.

This project wasn’t easy. It required sustained collaboration, deep inquiry, a lot of tenacious data gathering, and real organizational effort. But it was worth it. We now have clearer insights that will inform future decisions such as how many content copies we store, where we locate nodes, and other ways we can help to decrease our impact on the climate and environment.

As stewards of the scholarly record, we must preserve not just knowledge, but also the world that future scholars will inhabit.

Scroll to Top