Historically and even today there is little in the way of digital recording media which will stand the test of time for archiving.
The term archiving has many variant meanings to different sectors involved with data storage:
For the IT industry / profession the following is, or may be the general understanding of archiving:
Archiving is the intelligent back-up of selected objects (datasets, items etc.) that no longer need to be accessed on a regular basis within the computing system. These files are removed from the on-line disk storage to a lower cost media such as tape or optical disk.
Archiving to the archiving profession could mean maintaining data FOREVER or for Long Term (LE) preservation be that 100, 200, 300, 400, 500 or 1,000 or more years.
The IT industry have had a major issue in coming to terms with this definition of archiving and the technology and to date the IT industry and profession has not had a good track record for addressing these long term preservation requirement.
How is it possible to combine these two competing archival processes to accommodate for the genuine requirement to maintain data created electronically for the ongoing business process an then for the long term legal, moral and historical requirements of society, government and history?
Migration and emulation have been the process most commonly identified and used by the IT profession to date to address long term preservation requirement. Why? Because most magnetic and other types of data storage devices have come and gone (not to mention the software obsolescence issues) leaving us without the ability to even try to retrieve our data stores be they held on 8, 5.25 or 3.5 inch floppy media, hard drives and or magnetic tape plus Optical Discs at 5.25, 12 and 14 inch formats, removable or non-removal rigid disk packs from a multitude of data storage suppliers and last but no least magnetic tape be ½ inch 9 track way back in computing time or the recent versions of today’s Digital Linear Tape DLT or various versions capable of holing up to Peta Bytes of data in automated devices CD and then DVD were touted as the media of choice for long term storage of lower volume digital data but they too have fallen from grace due to their identified inherent short term deterioration characteristics.
Where are we today?
Some of the current requirements to maintain data over time are:
Digital information, like all information has a records retention requirement as defined by government, regulatory agencies and specific regulatory organisations. The reasons we keep digital information are for the same reasons we keep paper files, and now, digitally born files.
Legal requirements: These have expanded greatly in the past few years with the addition of new laws and regulations especially in the USA e.g. the Sarbanes-Oxley Act of 2002, Health Insurance Portability and Accountability Act of 1996 - HIPPA, U.S. Securities and Exchange Commission - SEC, etc. – and relevant legislation in the Australian region and the enforcement of numerous others.
Litigation defense: The need to defend the company or to protect its intellectual property is increasing each year. Trustworthy and complete documentation is needed or the defense is vulnerable and losses could be in the millions of dollars.
Corporate Governance: Additional requirements beyond the above may be dictated by your company, and mostly driven by self-protection.
Accountability: Certainly the Federal, State and Local Governments require the long-term retention of numerous document types: personnel records, land titles, and Occupational Safety & Health OS&H to name a few.
Societal and Historical: This is wide ranging from historical files and religious record to today’s digitally born art and music…
Personal Values: The family album is undergoing its greatest transition today. Preservation is important or how will your great grandchildren be able to view the pictures of you and your children taken today?
Backup in today’s technical environment:
Backup is used for quick recovery from spinning media, near-line media of fixed content…
Retention is defined by the laws, regulations, common practices, or by the owner of the information or IT policies and practices.
Preservation for keeping files over an extended period of time requires the highest level of certainty and accuracy as the files may not be used or viewed frequently enough to assure no inaccuracies are evident. These files need to have the added assurance of being unalterable. As the time period from creation to discovery of an error increases, the lower the probability of being able to detect and correct it or numerous errors in the applicable files, while still retaining their trustworthiness.
The risks for preservation are singularly influenced by the long time horizon associated with the retention of the files. Preservation over short timeframes provides many options for retention to overcome issues. It is not so easy when preservation requirements are in the 30+ year horizon. If we extend the time period for the retention of digital data out to 100 or 500 or more years = permanent FOREVER digital data has a permanence issue of major proportions.
Technology Obsolescence if the most widely discussed and still has nearly all the same attributes of the past several decades. The major gains have been the reduced cost of the storage media and the storage capacity per unit. However, the Life Expectancy - LE of the magnetic tape tops out at 30 years. How many organizations are really planning on that horizon? The practical limit for the maintenance of digital data is still 3-5 years. From a technological obsolescence challenge, there still remains the same major issues that existed since the introduction of computing to the masses in the 1980 and 1990’s and before that for large mainframe installations… frequent advancements, new elements introduced to the technology chain, and product discontinuances. Hardware is still highly proprietary… and for economic factors, it will remain so in the future. Vendors come and go…
Software has a similar problem, and many new programs enter the user market each day. Formats for files in applications continue to advance, often with little regard for their predecessors.
Media stability is the most risky… look at the warranty of the media, and if it fails, the manufacturer will give you a new one, absent your precious information of course…
Managerial obsolescence has been with us all along, yet only recently has it been elevated for scrutiny and is now being discussed at the executive level. Essentially management changes about the same as the technology. In hindsight, many digital losses were preventable had management been better. In today’s business world, there are frequent changes in the management ranks, and continuous pressure on budgets. Will the company make the financial commitment to keep the files viable as they age?
As management changes, do some things drop between the cracks on policy for refresh, emulation or migration? Think about the files you left behind the last time you upgraded your PC!
Are resources consumed for more pressing business needs than preservation?
If I as a manager do the things I should to preserve this year, I will miss my budget and my compensation is reduced….. Hmmm…. and maybe I will have a new job next year…. or who would know if I wait until next year? A missed cycle for the retention of digital information could be unrecoverable later. Thus, management risk is significant.
At least to the technological issues identified as we will leave the managerial risk and obsolescence for others to address.
A recent paper presented at the Society for Imaging Science and Technology – IS&T Archiving Conference 2005 - 26th April 2005 in Washington DC titled - Ending Digital Obsolescence – plus a second paper delivered at the Association for Information and Image Management International - AIIM 2005 Conference and Exposition titled Datasurance® - Patent Pending - Preservation Archive for Digital Files may provide the answer to this long term process and media issue.
The papers were presented by Ken Quick and Mike Maxwell of Affiliated Computer Services - ACS of Dallas, TX and Mike Maxwell Consultant, Representing ACS respectively.
What is Datasurance®?
The Datasurance® product offers to maintain digital data in any format be it music or voicemail, X-rays, MRI’s, emails, databases, applications, Operating Systems - OS, charts and excel spreadsheets, word documents, PowerPoint presentations, .tif and other image files plus black and white and colour digital images and videos onto a long term, fail safe - copy of last resort media, MICROFILM!
How does the process work? Well ACS give us a glimpse at how they create this miracle and that is by the application of 2-D Barcode technology and the Datasurance® product media (black & white) microfilm. ACS refers to the output media as analogue / digital tape.
There are thousands of different file types and formats, with more coming each year. Yet, there is one thing they have in common… At the base level, they are a sequence of 0’s and 1’s (zeros and one) that the program transforms into a colour spot, a program, a sound, a character, and a command. Files are written to media and transmitted over networks as binary information, 0’s and 1’s. Whether stored on disk, tape or optical media, the files are pulses or spots of 0’s or 1’s. This attribute becomes the key to Datasurance® preservation concept.
2-D Data Matrix Barcodes with information available @ The Barcode Software Center - Barcode Basics - DataMatrix or @ IDAutomation.com, Inc. - Free Online DataMatrix Barcode Image Generator ECC200 or @ Inlite Research Inc. - ClearImage DataMatrix or @ ISO/IEC 16022:2000 : Information technology - International symbology specification - Data Matrix information available @ Standsa Australia - SAI Global - Standards On-Line Select - ISO/IEC 16022:2000 : Information technology - International symbology specification - Data Matrix are non-proprietary with the specifications in the public domain and readily available. Further, this 2-D barcode has built-in error correction code and two different cyclic redundancy codes to assure the information in the 2-D barcode can be extracted even if there is significant damage to the 2-D barcode, to assure the information is read as written. The error correction code assures accuracy even if 25-40% of the 2-D barcode is unreadable. The correct data can still be rendered.
Datasurance® uses this format to store the 0’s and 1’s as a 2-D barcode. The process creates as many 2-D barcodes as needed to represent a file. Each is sequentially encoded for proper decode and re-assembly. Now any file can be represented as a series of 2-D Barcode “pictures.”
Very simply, the Datasurance® process takes the sequence of 0’s and 1’s in the file and converts them into a sequence of 2-D Data Matrix barcodes – as many as needed based on the size of the file. For example, a PowerPoint presentation that includes colour, text, sound, video, spreadsheet and animation, is still at the base level of 0’s and 1’s.
The process assembles the 2-D barcodes into groups, and prints them to film. Each 2-D barcode is sequentially numbered to assure its correct place in the writing. There are several writers available today…The 2-D barcodes are printed on silver halide microfilm 16 or 35 mm and processed to AIIM/ANSI standards for archival storage for Long Term LE >100 years.
How do we interpret or retrieve the data from the Datasurance® product?
The process for creating a file from the 2-D barcode is accomplished by scanning the 2-D barcode and decoding it to get the 0’s and 1’s sequence. The process then converts the 0’s and 1’s into the appropriate file. The resultant file will be an exact copy of the original file that was used for input. This is what happens when a file comes over the internet or modem to your computer - a series of 0’s and 1’s - then a program on your computer converts the series of 0’s and 1’s to the file that is a picture, or a message or a web page, etc.
Because of the error correction code included in the 2-D barcode, the copy file is identical to the original. And the Error correction code assures accuracy even if 25-40% of the 2-D barcode is unreadable. The correct data can still be rendered. The process for creating a file from the 2-D barcode is accomplished by scanning the 2-D barcode and decoding to get the 0’s and 1’s sequence. The process then converts the 0’s and 1’s into the appropriate file.
To summarize, this process can be used to encode any digital file to be stored in this form. One process is universal for the thousands of file formats, programs and operating systems. Now everything digital can be preserved with this one approach, sound, colour pictures, voice mail, art, Operating Systems – OS and valuable documents and objects.
Does this mean that microfilm has found a new life after numerous years of decline in volume due to increasing digital storage capacities in ever decreasing physical size of storage devices e.g. Thumb Drives @ up to 4 GB and increasing plus numerous other hard dives @ 1-inch e.g. Seagate announces 8 GB 1-inch hard drive or @ 2.5-inch e.g. 2.5-inch Portable Hard Disk Drive with Capacity of 20G-40G and Blu-Ray currently 50 GB but getting discs up to 100 GB and beyond – 200 GB which appears to be the limit of this technology - was always part of the Blu-ray plan.
Only time will tell if the Datasurance® product cuts the mustard and becomes a winning and commercially viable product for the trustworthy retention of all things digital to the delight of archivists, records and information managers, preservationist and historians.
We have seen the holy grail of long term digital data storage trumpeted on a number of occasions e.g. as early as 1998 with the statement that Norsam saves archives from obsolescence - View high-density, HD-Rosetta data with microscope, not computer - HD-Rosetta-which allows approximately 90,000 – A4 = 210 x 297 mm size analogue images to be stored on 2" discs-created from microfilm, original documents or other physically scanned images to media microscopic reproductions of images that are readable with the human eye. These reproductions are put on corrosion-proof, nonmagnetic master discs, called Pancake Discs that are available in nickel, gold, titanium, glass, stainless steel, silicon or some other media. See the 1998 report @ The Seybold Report on Publishing Systems - Norsam saves archives from obsolescence - View high-density, HD-Rosetta data with microscope, not computer.
Happy digital data storing on microfilm with the Datasurance® product and then sleep easy each night knowing that your data is safe now and into the future long after you retire.
Laurie Varendorff ARMA
Laurie Varendorff, ARMA, a former RMAA Western Australia Branch president and national director, has been involved in records management for 31 years. He has his own consulting and training business near Perth, Western Australia, and has tutored in recordkeeping and archival storage and preservation at Perth’s Edith Cowan University. Phone: +61 (0)8 9291 6925; mobile: 0417 094 147; email @ Laurie Varendorff
Please Note: This article was published in the - IDM - Image & Data Manager with the title of Age-defying Storage in the 1st July 2005 issue and available online @ image and data Manager online.
Please Note: This article was published in - The GREEN SHEET - INCORPORATING THE MICROGRAPHICS MARKET PLACE AND THE MICROGRAPHICS NEWSLETTER - as a Feature article in Issue No. 36 ISSN 1476-3842 December 2005 Edition on page (16).
SPECIAL NOTE: Use of this article by publishers, commercial, government, or educational organisations requires a financial agreement to be negotiated with Laurie as the copyright holder for this work.