Knowledge (XXG)

Australian Web Archive

Source 📝

156:
In 2017, the AGWA and the PANDORA archive were amalgamated with the other web archive collections, to form the Trove web archive collection. After further development and the creation of the Australia Web Archive, government websites archived via AGWA and now included in AWA can still be searched
235:
With many of the earlier websites from the 1990s now lost, mainly because of the frequent change of web platforms, the Australian Web Archive is a significant initiative that will help to save current and future web pages, especially Australian content. Material will continue to be added to the
165:
A web archive is described by the NLA as a "collection of snapshots of websites captured while they are accessible on the web, and then preserved in a static copy". The collection archived in the AWA is "relevant to the cultural, social, political, research and commercial life and activities of
89:
The PANDORA infrastructure, which works well for a selective small scale archiving, does not adapt to large scale "bulk harvesting" of web content, so a new technical system had to be developed whereby a web archiving service which would integrate the delivery of archived websites within a live
112:
websites. The NLA began regular harvests of the websites in June 2011, after a significant obstacle had been overcome with an administrative agreement made in May 2010 allowing the NLA to collect, preserve and make accessible government websites without having to seek prior permission for each
231:
There is a "Limit to the gov.au web domain" option before searching, and government websites archived via AGWA can still be searched separately using the "Advanced Search" option. Other options in Advanced Search are to limit by timespan of the snapshots, domain and file type.
121:
for storage and Open Wayback for delivery of the service. There is a huge amount of publishing by the government, but many challenges to overcome trying to preserve content, such as its sudden disappearance. In March 2014, the AGWA was made publicly accessible.
148:. It only included Commonwealth Government websites collected through bulk harvests of nearly 1000 seed URLs. The scheduling of the harvests was not yet routinely established, but harvests were being conducted roughly three times per year. 567: 351: 55:
collections. Access is through a single interface in Trove, which is publicly available. The Australian Web Archive was created in March 2019, and is one of the biggest
600: 197:
is envisaged in the future, as content grows. Usability by a wide range of users, and in particular the search functionality, were major focuses during development.
387: 420: 193:
built in-house. The developers also devised techniques to filter out unwanted "noise". The data remains on the Library servers, although a move to the
208:’s page ranking algorithm (based frequency of clicks on a page), modified to lead to better, high-quality resources. Other technologies include a 251: 794: 531: 166:
Australia and Australians". It collects web material via both scheduled archiving of selected websites and publications as well as some
789: 575: 359: 200:
The archive is fully searchable, based on a combination of techniques used by the developers. Each team created a unique and complex
44: 125:
The AGWA meets the preservation and retention requirements for websites as "retain as national archives" (RNA) material under the
784: 48: 32: 468: 90:
website interface delivering the archived websites seamlessly to the user, which is difficult to achieve technically.
144:
As of early 2015, the AGWA included content dating from 2005, which amounted to about 144 million files occupying 15
799: 568:"The Australian Government Web Archive: Collecting the government's online documentary heritage goes large scale" 270:
to collect and preserve "selected Asia/Pacific websites related to specific events or socio-political groups".
109: 59:
in the world. Its purpose is to provide a resource for historians and researchers, now and into the future.
660: 98: 101:
websites are Commonwealth records, and are therefore publications to be managed in accordance with the
82:. Later, the earliest websites from the .au web domain, dating back to 1996, were obtained from the 263: 539: 246: 388:"The Australian Web Archive is a momentous achievement – but things will get harder from here" 279: 217: 267: 225: 201: 186: 182: 83: 736: 625: 324: 209: 178: 40: 28: 70:
In 2005, the NLA started archiving annual snapshots of the entire Australian web domain (
778: 445: 241: 194: 190: 138: 469:"Preserving Australia's Web History:The beginning of the Australian Web Archive" 213: 79: 56: 711: 52: 118: 39:
platform, an online library database aggregator. It comprises the NLA's own
108:
The Australian Government Web Archive (AGWA) consists of bulk archiving of
177:
of data, with 9 billion records. It contains more functionality than the
174: 145: 114: 86:. In 2019 this content was first made publicly accessible through Trove. 300:"Preserving and Accessing Networked DOcumentary Resources of Australia" 113:
website or document, as was the case before that. The service uses the
421:"National Library launches 'enormous' archive of Australia's Internet" 205: 167: 75: 236:
Archive, and other online material collected in accordance with the
767: 686: 493: 221: 173:
As of March 2019, when it began, AWA already contained around 600
130: 36: 67:
The PANDORA service started archiving websites in October 1996.
299: 134: 71: 141:) are not always captured, so must be managed separately. 266:
are not included in the AWA, but NLA partners with the
16:
Open online database of archived Australian websites
650:NOTE: AWA help page says 400 tb, 8 billion records 157:separately using the "Advanced Search" option. 31:of archived Australian websites, hosted by the 561: 559: 557: 8: 526: 524: 522: 520: 518: 516: 514: 170:harvesting relating to significant events. 712:"Australian Web Archive - Advanced Search" 601:"Archiving Australian Government websites" 381: 379: 377: 352:"The Australian Government Web Archive" 291: 595: 593: 414: 412: 410: 408: 7: 252:digital collections selection policy 661:"Check Out Australia's Web Archive" 566:Koerbin, Paul (11 February 2015). 532:"About the Australian Web Archive" 467:McKenzie, Amelia (12 March 2019). 350:Koerbin, Paul (11 February 2015). 14: 45:Australian Government Web Archive 494:"Archived websites (1996 – now)" 63:History of the three components 605:National Archives of Australia 419:Nott, George (11 March 2019). 78:. ".au"), collected via large 1: 741:National Library of Australia 630:National Library of Australia 572:National Library of Australia 473:National Library of Australia 386:Bruns, Axel (14 March 2019). 356:National Library of Australia 329:National Library of Australia 133:and document files ( such as 49:National Library of Australia 33:National Library of Australia 795:Australian digital libraries 117:web crawler for harvesting, 448:. PANDORA. 18 February 2009 204:, by adapting a version of 27:) is an publicly available 816: 446:"History and Achievements" 790:Web archiving initiatives 238:National Library Act 1960 687:"Australian Web Archive" 110:Commonwealth Government 21:Australian Web Archive 785:Archives in Australia 258:Asia/Pacific websites 99:Australian Government 737:"Archived websites" 626:"Archived websites" 325:"Archived websites" 264:Asia Pacific region 187:full-text searching 247:Copyright Act 1968 244:provisions of the 161:Description of AWA 632:. 7 December 2018 536:Trove Help Centre 280:National edeposit 218:Not Safe For Work 103:Archives Act 1983 807: 800:Online databases 771: 770: 768:Official website 753: 752: 750: 748: 733: 727: 726: 724: 722: 708: 702: 701: 699: 697: 683: 677: 676: 674: 672: 657: 651: 648: 642: 641: 639: 637: 622: 616: 615: 613: 611: 597: 588: 587: 585: 583: 574:. Archived from 563: 552: 551: 549: 547: 542:on 17 March 2020 538:. Archived from 528: 509: 508: 506: 504: 490: 484: 483: 481: 479: 464: 458: 457: 455: 453: 442: 436: 435: 433: 431: 416: 403: 402: 400: 398: 392:The Conversation 383: 372: 371: 369: 367: 362:on 30 April 2020 358:. Archived from 347: 341: 340: 338: 336: 321: 315: 314: 312: 310: 296: 268:Internet Archive 262:Websites in the 226:machine learning 220:classifier from 202:search algorithm 183:Internet Archive 181:, hosted by the 84:Internet Archive 815: 814: 810: 809: 808: 806: 805: 804: 775: 774: 766: 765: 762: 757: 756: 746: 744: 743:. 23 March 2020 735: 734: 730: 720: 718: 710: 709: 705: 695: 693: 685: 684: 680: 670: 668: 667:. 11 April 2019 659: 658: 654: 649: 645: 635: 633: 624: 623: 619: 609: 607: 599: 598: 591: 581: 579: 565: 564: 555: 545: 543: 530: 529: 512: 502: 500: 492: 491: 487: 477: 475: 466: 465: 461: 451: 449: 444: 443: 439: 429: 427: 418: 417: 406: 396: 394: 385: 384: 375: 365: 363: 349: 348: 344: 334: 332: 331:. 23 March 2020 323: 322: 318: 308: 306: 304:Pandora Archive 298: 297: 293: 288: 276: 260: 212:(effectively a 210:Bayesian filter 179:Wayback Machine 163: 154: 96: 65: 47:(AGWA) and the 41:PANDORA archive 29:online database 17: 12: 11: 5: 813: 811: 803: 802: 797: 792: 787: 777: 776: 773: 772: 761: 760:External links 758: 755: 754: 728: 703: 678: 665:Southern Phone 652: 643: 617: 589: 553: 510: 485: 459: 437: 404: 373: 342: 316: 290: 289: 287: 284: 283: 282: 275: 272: 259: 256: 250:and the NLA's 162: 159: 153: 150: 139:Word documents 95: 92: 80:crawl harvests 64: 61: 15: 13: 10: 9: 6: 4: 3: 2: 812: 801: 798: 796: 793: 791: 788: 786: 783: 782: 780: 769: 764: 763: 759: 742: 738: 732: 729: 717: 713: 707: 704: 692: 688: 682: 679: 666: 662: 656: 653: 647: 644: 631: 627: 621: 618: 606: 602: 596: 594: 590: 578:on 1 May 2020 577: 573: 569: 562: 560: 558: 554: 541: 537: 533: 527: 525: 523: 521: 519: 517: 515: 511: 499: 495: 489: 486: 474: 470: 463: 460: 447: 441: 438: 426: 425:Computerworld 422: 415: 413: 411: 409: 405: 393: 389: 382: 380: 378: 374: 361: 357: 353: 346: 343: 330: 326: 320: 317: 305: 301: 295: 292: 285: 281: 278: 277: 273: 271: 269: 265: 257: 255: 253: 249: 248: 243: 242:legal deposit 239: 233: 229: 227: 223: 219: 215: 211: 207: 203: 198: 196: 192: 191:search engine 188: 184: 180: 176: 171: 169: 160: 158: 151: 149: 147: 142: 140: 136: 132: 128: 123: 120: 116: 111: 106: 104: 100: 93: 91: 87: 85: 81: 77: 73: 68: 62: 60: 58: 54: 50: 46: 42: 38: 35:(NLA) on its 34: 30: 26: 22: 745:. Retrieved 740: 731: 719:. Retrieved 715: 706: 694:. Retrieved 690: 681: 669:. Retrieved 664: 655: 646: 634:. Retrieved 629: 620: 608:. Retrieved 604: 580:. Retrieved 576:the original 571: 544:. Retrieved 540:the original 535: 501:. Retrieved 497: 488: 476:. Retrieved 472: 462: 450:. Retrieved 440: 428:. Retrieved 424: 395:. Retrieved 391: 364:. Retrieved 360:the original 355: 345: 333:. Retrieved 328: 319: 307:. Retrieved 303: 294: 261: 245: 237: 234: 230: 199: 172: 164: 155: 152:Amalgamation 143: 127:Archives Act 126: 124: 107: 102: 97: 88: 69: 66: 57:web archives 24: 20: 18: 214:spam filter 185:, allowing 779:Categories 286:References 129:; however 119:WARC files 175:terabytes 146:terabytes 74:with the 51:'s ".au" 397:30 April 366:30 April 335:30 April 309:30 April 274:See also 189:using a 115:Heritrix 240:, the 224:, and 206:Google 168:ad hoc 131:videos 76:suffix 53:domain 43:, the 747:8 May 721:8 May 716:Trove 696:8 May 691:Trove 671:8 May 636:6 May 610:8 May 582:6 May 546:8 May 503:6 May 498:Trove 478:6 May 452:6 May 430:6 May 222:Yahoo 216:), a 195:cloud 37:Trove 749:2020 723:2020 698:2020 673:2020 638:2020 612:2020 584:2020 548:2020 505:2020 480:2020 454:2020 432:2020 399:2020 368:2020 337:2020 311:2020 135:PDFs 94:AGWA 72:URLs 19:The 137:or 25:AWA 781:: 739:. 714:. 689:. 663:. 628:. 603:. 592:^ 570:. 556:^ 534:. 513:^ 496:. 471:. 423:. 407:^ 390:. 376:^ 354:. 327:. 302:. 254:. 228:. 105:. 751:. 725:. 700:. 675:. 640:. 614:. 586:. 550:. 507:. 482:. 456:. 434:. 401:. 370:. 339:. 313:. 23:(

Index

online database
National Library of Australia
Trove
PANDORA archive
Australian Government Web Archive
National Library of Australia
domain
web archives
URLs
suffix
crawl harvests
Internet Archive
Australian Government
Commonwealth Government
Heritrix
WARC files
videos
PDFs
Word documents
terabytes
ad hoc
terabytes
Wayback Machine
Internet Archive
full-text searching
search engine
cloud
search algorithm
Google
Bayesian filter

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.