ISO 28500:2009 Addresses File Format Standard for Online Data
October 23, 2009 // Published as a news service by IHS
The International Organization for Standardization (ISO) issued ISO 28500:2009 - Information and documentation - WARC file format, which offers a convention for linking multiple data objects into one long file.
The format can be used to build applications for harvesting, managing, accessing and exchanging content, according to the ISO.
The WARC format is an extension of the ARC file format, which has been used by the Internet archive since 1996 and by numerous heritage institutions to store "web crawls," which represent extracts of entire web pages and their links, experts said.
The motivation to extend the ARC arose from discussions and experiences of organizations within the International Internet Preservation Consortium (IIPC), who were finding it difficult to store and manage information coming from the Internet, ISO experts said.
The WARC format differs from the ARC in that it offers the recording of Hypertext Transfer Protocol (HTTP) request headers and of arbitrary metadata, the allocation of an identifier for every contained file, the management of duplicates and of migrated records and the segmentation of the records.
WARC files are intended to store every type of digital content, whether retrieved by HTTP or another protocol.
ISO 28500:2009 was developed by ISO technical committee ISO/TC 46 - Information and documentation, subcommittee SC 4, Technical interoperability.
Source: International Organization for Standardization (ISO).