Thursday, July 21, 2011

Umbrella 2011: Preserving information on the web for the future

Umbrella 2011
Preserving information on the web for the future
My notes on a session by the National Archives on how they archive government information.
Web Archiving and Government Sites
Web archiving is the automated collection of portions of web for preservation and research and is mainly carried out by cultural organisations. The NA archives government websites, the likes of the British Library archives a much wider range of material.
There has been massive growth in the use of the internet by government but content e.g. electronic only publications, and websites can disappear very rapidly with the risk it is lost from the public record. Websites are worth saving for historical reasons, otherwise there are gaps in recorded knowledge.
The UK Government Web Archive
The UK Government Web Archive on the NA website is free and consists of A / Z lists of archived websites and their content, there is also available accessing information by theme.
The NA seeks to capture all UK central government departments, central government agencies, public enquiries, Royal Commissions, some English regional material.
Web Continuity Issues
It also seeks to address web continuity issues. These consist of broken links, websites that have been removed or had key content moved. Research was done into the number of URLs in Hansard that were broken over a defined period of time and 60% were broken. Important to resolve such things. A redirection solution uses the UK Government Archive to make sure information remains accessible – those setting this up can choose whether want to link direct to new location or through a bridging page to make it clear where the information has been moved and why.
Volume and Frequency
The comprehensive archiving of government websites is part of the transformational government initiative. There are around 1500 UK central government websites in all and sites are archived at least once a year, additional crawls can be done if there is likelihood of major changes imminent. So before the 2010 General Election there was a project to capture all government websites before and after the election due to the concern of the loss of information. Social media represents new problems and the NA has started experimenting catching social media as well but with limited success so far.
Statistics
The NA collects statistics on usage of the UK Government Web Archive, the peak was after the last General Election with 140 million hits in one month. A lot of these come through the automated redirection points, others through web browser searches. There are about 3 million unique visits a month.
Problems and next steps
The Archive is huge already with lots of duplication. To know what website to go to for any subject you first already need to know a lot of information – who was responsible for the information at what point in time and what were they named. So looking at creating a semantic knowledge base to help locate material easier to link subjects across databases. Considering archiving various departmental intranets across government but raises lots of sensitivity issues re levels of content. Now archiving datasets as well to support transparency at data.gov.uk.
Additional Access Point
All sites are catalogued through the main National Archives catalogue.
Discussion
Social media problems - encountered lots of problems to do with technical issues of how things like Twitter view and copyright issues. Devolved Nations approach to archiving government information – the National Libraries have programmes for archiving content relating to own jurisdictions, National Library of Scotland, National Library of Wales. I didn't quite catch who does it for Northern Ireland. The UK Parliament is also building its own digital repository.
Thoughts
I really enjoyed this session. The work of the NA is something I have been known to be incredibly thankful for in the day job when trying to find information created by  government bodies that lived short lives (English regional planning agencies spring to mind!!) before being unceremoniously culled. The Archive means that a snapshot of the relevant website will still exist and it gives me some form of pointer information, even the bridging redirection pages contain a lot of useful information on where responsibilities went to and when bodies ceased to exist. So while I’ve been making practical use of the information for a good while it was really nice to just get the rationale and background to everything to give me some idea of just how much is archived how frequently by who re previous web incarnations of government information and the simplest access points.

1 comment:

  1. Have you been looking for a way to archive and organize all the data that you have that is Google related? Great! Let us show you how the features of our archiving solutions can help you make your organization more effective as well as orderly.
    email archiving

    ReplyDelete