Understanding CLOCKSS: A Commitment to Preserving Content Safely and Securely

Since the recent cyber-attack on the British Library, many publishers have reached out with questions about the security of their content. While we cannot disclose every detail of our security protocols, we want to assure you that we are dedicated to the long-term preservation and security of all content entrusted to us. Here is an overview of how CLOCKSS works to protect data and digital assets.

The CLOCKSS Mission: Long-Term Preservation and Access

At the heart of our work is a clear and unwavering mission: to ensure the long-term preservation and access of scholarly content. CLOCKSS is not simply another backup service; it’s a curated, authoritative copy of the original content. Our goal is to make sure that this content remains unchanged and protected—forever.

We understand the immense responsibility involved in preserving this material, and that’s why security is embedded in everything we do. Our use of the award-winning LOCKSS open-source preservation software, developed at Stanford University, is central to ensuring both the integrity and safety of this content.

The Role of LOCKSS Software in Our Security Framework

LOCKSS isn’t just about storing files; it’s about ensuring that content remains intact and protected against threats. The software is designed to actively manage content with bit-level integrity checking. This means that every piece of content entrusted to CLOCKSS is continuously monitored to ensure that nothing is tampered with or altered.

The LOCKSS system operates as a self-healing secure storage network, meaning that if any content is compromised—whether through corruption or unauthorized access—the system automatically detects and corrects the issue. This robust protection keeps content safe from exfiltration or subversion, ensuring it remains as it should be, for the long term.

If you'd like to learn more about how our preservation system works, you can read more in detail here.

Multi-Layered Security: Comprehensive and Adaptable

When it comes to securing the CLOCKSS archive, we take a layered approach. Security doesn’t just come from technology—it’s also embedded in governance, policy, and social practices. Some layers are well-documented in computer science literature , while others remain private for security reasons. Here are some key layers that protect CLOCKSS content:

1. Governance and Policy

CLOCKSS operates under the guidance of libraries and publishers worldwide, all of whom agree on policies that govern our practices. Our board includes representatives from organizations with in-house security expertise, and our team includes experts for whom security is always a top priority.

2. Secure Storage

All content in the CLOCKSS archive is stored in an actively managed, secure, distributed network of storage nodes. Each node is protected by a unique combination of security measures, offering a multi-faceted approach to securing the data.

3. Distributed Preservation

One of the core strengths of CLOCKSS is the distribution of content across 12 secure storage sites around the globe, hosted by universities and research institutes. This geographic diversity enhances security by reducing the risk of a single point of failure. These sites are in constant communication, so if any content becomes corrupted, it is quickly identified and repaired by other nodes in the network.

4. Diversity in Security Practices

Each of the 12 preservation sites is managed and protected in unique ways, adding an extra layer of defense. This diversity strengthens the security of the entire system, ensuring that no single vulnerability can compromise the entire archive.

5. Access Controls and Technical Security Measures

CLOCKSS operates as a dark archive, which means that content is protected and managed for long-term preservation and access, but it is not openly accessible to the public. This dark archive model significantly reduces human-related risks to security, such as accidental breaches or unauthorized access.

Security measures within the CLOCKSS archive include the use of SSL certificates, firewalls, network intrusion detection systems, physical security measures, and remote wiping capabilities. Additionally, we use encryption and strong authentication protocols to ensure only authorized users can administer the preservation of the content.

6. Regular Audits and Testing

We cannot afford to become complacent, which is why we conduct regular external audits and security tests to assess our defenses. For example, we conducted a series of penetration tests while temporarily removing certain security layers. Even without these layers, CLOCKSS withstood external attacks, proving the resilience of our systems.

7. Pre-Trigger Scans for Malware and Viruses

As a dark archive, CLOCKSS stores and manages files in a state that prevents them from being accessed or executed in a typical browsing environment. This means that if any content contains malware or viruses, these remain locked in a “fossilized” state where they cannot activate. Prior to triggering content for release, we perform special checks to ensure the content is free from such threats.

8. Accreditation and Recognition

CLOCKSS is accredited under the Center for Research Libraries’ TRAC Audit scheme, receiving the highest score ever awarded to a trusted repository. This accreditation considers everything from our technology and technical infrastructure to our security arrangements, validating the strength of our preservation and protection processes.

9. Adapting to Changing Threats

At CLOCKSS, we understand that security is not static—it must evolve in response to emerging threats. The team at Stanford University, led by Thib Guicherd-Callin, actively monitors vulnerabilities and takes immediate action to mitigate risks. For example, in December 2021, a major vulnerability called "PwnKit" affected all versions of Linux, including those used by CLOCKSS systems. We were able to patch all affected machines on the same day the vulnerability was announced, ensuring no exploits were able to spread.

Conclusion

CLOCKSS is more than just a repository — it's a secure, self-healing archival vault designed to preserve digital content for generations to come. Our multi-layered security practices, combined with cutting-edge technology and vigilant monitoring, ensure that the content we manage remains protected and secure.

As threats continue to evolve, CLOCKSS remains committed to adapting and strengthening our defenses.

References

Reich, V. A. (2002). Diffused Knowledge Immortalizes Itself. The LOCKSS Program. High Energy Physics Libraries Webzine, 7/2003.