Developing a Library Digital Preservation Strategy

Digital scholarship is at the heart of today’s research and learning. Without a clear plan, valuable knowledge can slip away through format changes, platform shifts, technology failures, or simple aging. A library digital preservation strategy sets out what you’ll keep, how you’ll keep it safe, and who is responsible - so the scholarly record you steward stays accessible, usable, and trustworthy for your researchers and community, now and into the future.

Getting started
A preservation strategy is a simple plan that says what you’ll keep, how you’ll keep it safe, and who does what - so people can still read and reuse your content years from now. This page walks you through the essentials in plain language and shows where CLOCKSS fits in.

Why have a strategy?
Before choosing tools or vendors, be clear on why you’re preserving. Purpose keeps the work focused and makes it easy to explain to colleagues funders, and other stakeholders.

Keep access going: If a platform change goes wrong or a publisher closes, your readers aren’t stranded.
Show you’re trustworthy: Authors and readers can see a clear plan, and evidence that you safeguard their interests.
Work smarter: Agreed rules stop one-off decisions and endless changes.
Reduce risk: You address obsolescence, vendor lock-in, and rights issues up front.

Build your strategy in 10 simple steps
Use these steps as your starter template. Each step has a brief explanation and a small set of actions - no jargon required. (This is where the NASIG model digital preservation policy fits - see below.)

1) Set your goals (why preserve?)
Goals are your north star - start with 2 or 3 short statements everyone can remember.

“Keep long-term access to published articles, books and related files (figures, data, code).”

“Ensure continuity of access despite shifts in national policies and priorities.”

2) Decide the scope (what’s in / what’s out)
Scope stops scope creep. Say clearly what you’ll preserve now, and what you won’t -plus why.

Include: Publications (PDF, HTML, XML, EPUB), digitised files,, datasets & code, images/audio/video, and key metadata (DOIs, ORCIDs, RORs).
Exclude (and say why): Preliminary analyses or “throwaway” datasets that were later replaced by higher-quality versions., routine communications such as email exchanges or meeting notes about project logistics., or anything you don’t have rights to keep yet.

3) Make a quick inventory (what you have, where it lives)
A lightweight list gives you visibility without creating bureaucracy.

List main content groups, where files are stored, who owns them, and file types.
Mark priority content (high value or at risk), e.g., disseminated by small publishers or unique datasets.

4) Assign roles (who does what)
Named people = real accountability.

For selection, rights checks, transfer, ingest, integrity checks, monitoring, reporting, and incident response, name:
- A lead person and a back-up (so you aren’t dependent on one person).

5) Set short, simple policies (the rules)
One page per topic is enough to start; you can expand later.

Selection & appraisal: how you choose and review priorities.
File formats: keep originals; create preservation copies like PDF/A where useful.
Metadata: DOI, creators + ORCID, licence, relationships.
Integrity (checksums): create on arrival; re-check on a schedule.
Access & rights: what you can share and when (e.g., during a platform outage).
Security & recovery: who can access files; how quickly you can restore.

6) Map your workflow (how the work flows)
A simple “happy path” prevents confusion and delays.

Identify & authorise content preservation
Transfer files (API/SFTP/export) with a file list
Validate (virus scan; basic checks; verify the checksum)
Ingest into your repository/preservation system
Replicate to another, independent trusted archive
Monitor jobs and scheduled integrity checks
Report monthly (what came in, what passed, any issues)

7) Choose storage & partners (avoid single points of failure)
Resilience comes from independence and diversity - more than one copy, place, and technology.

Keep important content in at least three different preservation services with different funding, governance, and technologies.
Pair your internal systems with an independent, community-governed dark archive for disaster-level risks and platform changes. (This is where CLOCKSS fits - see below.)

8) Plan time and money (sustainability)
Preservation is a practice, not a project. Budget for it.

Make a simple 3-year view of people, time, storage, memberships, and audits.
Set up a small steering group (library /IT/research office) that meets quarterly.

9) Set a few measures (prove it works)
A handful of metrics keeps you honest and shows progress.

Time from publication to “preserved copy.”
% of files with a recent checksum.
% of titles protected by an independent archive.

10) Improve steadily (don’t set-and-forget)
Small, regular improvements beat big, rare overhauls.

Review priorities each quarter.
Run one restore drill per year for each big content type (text, AV, data).
Fix gaps and update policies as you learn.

Common pitfalls (and easy fixes)
Most problems are predictable. Here’s how to dodge them without fuss.

“Backups = preservation.” Backups restore systems; preservation restores content even if the original system/vendor is gone. Use both.
“Cloud services = preservation.” Cloud storage is like renting space in someone else’s filing cabinet. You can put your files there, share them, and get them back easily but if you stop paying, the company changes, or a file gets deleted, it’s gone.
Enabling access to the preserved master copy. The preserved version is the master copy we keep safe for the long term. The access version is the user-friendly copy we share for viewing or download. They come from the same source, but serve different purposes: one protects, the other provides access.
Rights gaps. Make sure contracts explicitly allow/require preservation and open access if the source disappears.
Relying on one vendor or location. Always have an independent preservation copy (see CLOCKSS).

How to get started (a small, clear plan)
Have a look at the NASIG Model Digital Preservation Policy: https://nasig.org/NASIG-model-digital-preservation-policy It’s a friendly friend!

Start small, learn fast, and scale. Here’s a 12-week outline you can actually follow.

Weeks 1–2 - Foundations

Choose a pilot (e.g., one journal list or one digitised collection).
Write 1 page of Goals & Scope and 1 page of Roles & Workflow.
Make a simple inventory (what, where, who, formats).

Weeks 3–6 - Make it real

Publish short policies: Formats, Metadata, Integrity, Access & Rights.
Automate file transfer + checksum validation for the pilot.
Start a monthly preservation report.

Weeks 6–12 - Build resilience

Add a second, independent preservation copy — this is where CLOCKSS comes in.
Run a restore test and document the steps and timing.
Expand the scope based on what you learned.

Where CLOCKSS fits (and why include it)
Your strategy needs an independent safety net. CLOCKSS is purpose-built to be that safety net for scholarship.

Independent assurance: CLOCKSS is a not-for-profit, community-governed dark archive that protects the scholarly record beyond any single business, country, or platform.
Global, resilient network: Content is preserved across multiple locations with regular integrity checks.
Trigger access when needed: If the original source goes away, CLOCKSS provides open access so your readers aren’t left without content.
Easy to plug in: We work with your publishers and platforms to set up feeds, verify ingest, and give you the evidence your stakeholders expect.

Next step: Building or refreshing your preservation strategy?
Bring CLOCKSS in as your independent layer.
Let’s map your content and set up a right-sized onboarding.

Tiny glossary (for beginners)
A few quick definitions to de-jargon the process.

Checksum (fixity): A digital fingerprint; re-checking it tells you a file hasn’t changed or corrupted.
Dark archive: A preserved copy that stays closed unless specific conditions happen (e.g., the source disappears).
PDF/A: A long-term, preservation-friendly version of PDF.
DOI / ORCID: Persistent IDs for publications and researchers.