Brewster Kahle of the Internet Archive talks about archiving operations
Created in early 2006, Archive-It is a web archiving subscription service that allows institutions and individuals to build and preserve collections of digital content and create digital archives. Archive-It allows the user to customize their capture or exclusion of web content they want to preserve for cultural heritage reasons. Through a web application, Archive-It partners can harvest, catalog, manage, browse, search, and view their archived collections.
In terms of accessibility, the archived web sites are full text searchable within seven days of capture. Content collected through Archive-It is captured and stored as a WARC file. A primary and back-up copy is stored at the Internet Archive data centers. A copy of the WARC file can be given to subscribing partner institutions for geo-redundant preservation and storage purposes to their best practice standards. Periodically, the data captured through Archive-It is indexed into the Internet Archive’s general archive.
As of March 2014, Archive-It had more than 275 partner institutions in 46 U.S. states and 16 countries that have captured more than 7.4 billion URLs for more than 2,444 public collections. Archive-It partners are universities and college libraries, state archives, federal institutions, museums, law libraries, and cultural organizations, including the Electronic Literature Organization, North Carolina State Archives and Library, Stanford University, Columbia University, American University in Cairo, Georgetown Law Library, and many others.
Internet Archive ScholarEdit
Main article: Internet Archive Scholar
In September 2020 Internet Archive announced a new initiative to archive and preserve open access academic journals, called Internet Archive Scholar. Its full-text search index includes over 25 million research articles and other scholarly documents preserved in the Internet Archive. The collection spans from digitized copies of eighteenth century journals through the latest open access conference proceedings and pre-prints crawled from the World Wide Web.
In 2021, the Internet Archive announced the initial version of the General Index, a publicly available index to a collection of 107 million academic journal articles.
In addition to web archives, the Internet Archive maintains extensive collections of digital media that are attested by the uploader to be in the public domain in the United States or licensed under a license that allows redistribution, such as Creative Commons licenses. Media are organized into collections by media type (moving images, audio, text, etc.), and into sub-collections by various criteria. Each of the main collections includes a “Community” sub-collection (formerly named “Open Source”) where general contributions by the public are stored.
The Audio Archive is an audio archive that includes music, audiobooks, news broadcasts, old time radio shows, podcasts, and a wide variety of other audio files. As of January 2023, there are more than 15,000,000 free digital recordings in the collection. The subcollections include audio books and poetry, podcasts, non-English audio, and many others. The sound collections are curated by B. George, director of the ARChive of Contemporary Music.
Next to the stock HTML5 audio player, Winamp-resembling Webamp is available.
Digital Library of Amateur Radio and CommunicationsEdit
A project to preserve recordings of amateur radio transmissions, with funding from the Amateur Radio Digital Communications foundation.
Live Music ArchiveEdit
Main article: Live Music Archive
The Live Music Archive sub-collection includes more than 170,000 concert recordings from independent musicians, as well as more established artists and musical ensembles with permissive rules about recording their concerts, such as the Grateful Dead, and more recently, The Smashing Pumpkins. Also, Jordan Zevon has allowed the Internet Archive to host a definitive collection of his father Warren Zevon‘s concert recordings. The Zevon collection ranges from 1976 to 2001 and contains 126 concerts including 1,137 songs.
The Great 78 ProjectEdit
Main article: The Great 78 Project
The Great 78 Project aims to digitize 250,000 78 rpm singles (500,000 songs) from the period between 1880 and 1960, donated by various collectors and institutions. It has been developed in collaboration with the Archive of Contemporary Music and George Blood Audio, responsible for the audio digitization.
Not to be confused with Netlabel.
The Archive has a collection of freely distributable music that is streamed and available for download via its Netlabels service. The music in this collection generally has Creative Commons-license catalogs of virtual record labels.
This collection contains more than 3.5 million items. Cover Art Archive, Metropolitan Museum of Art – Gallery Images, NASA Images, Occupy Wall Street FlickrArchive, and USGS Maps are some sub-collections of Image collection.
Cover Art ArchiveEdit
Logo of Cover Art Archive
The Cover Art Archive is a joint project between the Internet Archive and MusicBrainz, whose goal is to make cover art images on the Internet. As of April 2021,this collection contains more than 1,400,000 items.
Metropolitan Museum of Art imagesEdit
The images of this collection are from the Metropolitan Museum of Art. This collection contains more than 140,000 items.
The NASA Images archive was created through a Space Act Agreement between the Internet Archive and NASA to bring public access to NASA’s image, video, and audio collections in a single, searchable resource. The IA NASA Images team worked closely with all of the NASA centers to keep adding to the ever-growing collection. The nasaimages.org site launched in July 2008 and had more than 100,000 items online at the end of its hosting in 2012.
Occupy Wall Street Flickr archiveEdit
This collection contains Creative Commons-licensed photographs from Flickr related to the Occupy Wall Street movement. This collection contains more than 15,000 items.
This collection contains more than 59,000 items from Libre Map Project.
This collection contains mathematical images created by mathematical artist Hamid Naderi Yeganeh.
One of the sub-collections of the Internet Archive’s Video Archive is the Machinima Archive. This small section hosts many Machinima videos. Machinima is a digital artform in which computer games, game engines, or software engines are used in a sandbox-like mode to create motion pictures, recreate plays, or even publish presentations or keynotes. The archive collects a range of Machinima films from internet publishers such as Rooster Teethand Machinima.com as well as independent producers. The sub-collection is a collaborative effort among the Internet Archive, the How They Got Game research project at Stanford University, the Academy of Machinima Arts and Sciences, and Machinima.com.
This collection contains approximately 160,000 microfilmed items from a variety of libraries including the University of Chicago Libraries, the University of Illinois at Urbana-Champaign, the University of Alberta, Allen County Public Library, and the National Technical Information Service.
Moving image collectionEdit
See also: Wikipedia list of films freely available on the Internet Archive
The Internet Archive holds a collection of approximately 3,863 feature films. Additionally, the Internet Archive’s Moving Image collection includes: newsreels, classic cartoons, pro- and anti-war propaganda, The Video Cellar Collection, Skip Elsheimer’s “A.V. Geeks” collection, early television, and ephemeral material from Prelinger Archives, such as advertising, educational, and industrial films, as well as amateur and home movie collections.
Subcategories of this collection include:
- IA’s Brick Films collection, which contains stop-motionanimation filmed with Legobricks, some of which are “remakes” of feature films.
- IA’s Election 2004 collection, a non-partisan public resource for sharing video materials related to the 2004 United States presidential election.
- IA’s FedFlix collection, Joint Venture NTIS-1832 between the National Technical Information Service and Public.Resource.Org that features “the best movies of the United States Government, from training films to history, from our national parks to the U.S. Fire Academy and the Postal Inspectors”
- IA’s Independent Newscollection, which includes sub-collections such as the Internet Archive’s World At War competition from 2001, in which contestants created short films demonstrating “why access to history matters”. Among their most-downloaded video files are eyewitness recordings of the devastating 2004 Indian Ocean earthquake.
- IA’s September 11 Television Archive, which contains archival footage from the world’s major television networks of the terrorist attacks of September 11, 2001, as they unfolded on live television.
Open Educational Resources
See also: Open educational resources
Open Educational Resources is a digital collection at archive.org. This collection contains hundreds of free courses, video lectures, and supplemental materials from universities in the United States and China. The contributors of this collection are ArsDigita University, Hewlett Foundation, MIT, Monterey Institute, and Naropa University.
TV News Search & Borrow
TV tuners at the Internet Archive
In September 2012, the Internet Archive launched the TV News Search & Borrow service for searching U.S. national news programs. The service is built on closed captioning transcripts and allows users to search and stream 30-second video clips. Upon launch, the service contained “350,000 news programs collected over 3 years from national U.S. networks and stations in San Francisco and Washington D.C.” According to Kahle, the service was inspired by the Vanderbilt Television News Archive, a similar library of televised network news programs. In contrast to Vanderbilt, which limits access to streaming video to individuals associated with subscribing colleges and universities, the TV News Search & Borrow allows open access to its streaming video clips. In 2013, the Archive received an additional donation of “approximately 40,000 well-organized tapes” from the estate of a Philadelphiawoman, Marion Stokes. Stokes “had recorded more than 35 years of TV news in Philadelphia and Boston with her VHS and Betamaxmachines.”
This collection contains approximately 3,000 items from Brooklyn Museum.
In December 2020, the film research library of Lillian Michelson was donated to the archive.