Archive of webpage
Jump to navigation
Jump to search
A comparison of web page archiving backup tools. Comparison criteria include (1) whether embedded link text remains clickable, (2) whether basic information like archive date and original URL are preserved, and (3) how information is organized, such as reorganizing archived pages through tags.
- Free Services: Primary recommendation is Archive.is, which can store webpage-embedded images. Even if the original webpage is lost, it preserves complete information. The secondary option is Internet Archive: Wayback Machine, which allows you to view archived versions of webpages from different time periods.
- Paid Services: Primary recommendation is the desktop version of Evernote, as it can successfully capture webpages that require login credentials. For bookmarking public webpages only, consider Raindrop or Pinboard, which automatically capture webpage content and embedded images after adding the bookmark URL.
Comparing the Article Backup Results of Different Social Media Websites[edit]
Medium:
- Wayback Machine: Possible backup failure, successful backups may lack images. Examples of failures[1] and successes[2] are given.
- Webpage archive: Possible successful backup (link), with examples including one with blurred images [3].
- Perma.cc: Shows an example of a successful backup.
- historio: When loading the backup, the content is visible for a few seconds, but it seems to conflict with CSS, resulting in a blank display. Using the mhtml format is necessary to read the backup.
- Diigo (private access): Notes on reading backups in mhtml format.
PTT:
- Wayback Machine: Mentions partial success with restrictions due to adult content warnings[4][5].
- Webpage archive: Successful backup.
- Perma.cc: Backup failure due to 18+ warnings.
- historio: Successful backup.
- Diigo (private access): Successful backup.
Facebook:
- Wayback Machine: Backup results in a login screen, even when set to public.
- Webpage archive: Error message "Not Found (yet?)"
- Perma.cc: "You’re Temporarily Blocked" message.
- historio: Using bookmarklet had no effect, backup was not successful.
- Diigo (private access): Reading backups in mhtml format.
Dcard
- Wayback Machine: Backup failed due to HTTP 403 error.
- Webpage archive: Backup failed[6]
- Diigo (private access): Reading backups in mhtml format.
YouTube
- Wayback Machine: (1) Videos cannot be played, (2) Comments are not visible [7]
- Webpage archive: (1) Videos cannot be played, (2) Comments are visible [8]
Desktop tools[edit]
| check | approach | filetype | cached media (images, flash...) | clickable text embeded with links | kept the saved time* | kept the original URL | Comments |
| Fx 2.0: Save as HTML (kept images) | html | saved with another directory | yes | yes | no | ||
| Fx 2.0: Save as HTML (html only) | html | no | yes | yes | no | ||
| ☆ | Fx 2.0 + ScrapBook 1.2 | html | saved with another directory | yes | yes* | yes | |
| ☆ | Fx 1.5 + MAF 0.6.3: Save as MAF MHT Archive | mht | embeded into a single file | yes | yes | yes | |
| Fx 2.0 + Google Toolbar for Firefox 3: Send with Gmail | html | no, they use the original URL of media | yes | yes | yes | ||
| ☆ | IE 6.0.x: Save as MHT | mht | embeded into a single file | yes | yes | yes | |
| Acrobat PDFMaker 7.0.5 | embeded into a single file | yes | yes | yes | |||
| Print to Adobe Acrobat Printer | embeded into a single file | no | yes | yes | |||
| Print to pdfFactory Pro v2.45 | embeded into a single file | no | yes | yes | |||
| IE + Adobe Acrobat 7: Convert web page to PDF | embeded into a single file | yes | yes | no | |||
| Unipage Unifier 1.0 RC3(kept images or flash...) | html | embeded into a single file | yes | yes | no |
Online services[edit]
| check | approach | filetype | cached media (images, flash...) | clickable text embeded with links | kept the saved time* | kept the original URL | Information organization / Comments |
| BackupUrl.com (cache image) | html | yes | yes | yes | yes | no (visited: 2009-04-09) | |
| ☆ | Evernote Web (no cache image) | html | no, they use the original URL of media | yes | yes | yes | tags; It also offer the sync software ((visited: 2008-03-29)) |
| Furl (no cache image) | html | no, they use the original URL of media | yes | yes | yes | Topic (tags) | |
| Yahoo My Web 2.0 Beta (no cache image) | html | no, they use the original URL of media | yes | yes | yes | tags | |
| ☆ | Google Notebook (no cache image) | html | no, they use the original URL of media | yes | yes | yes | tags |
| "Jump" Knowledge | html | no, they use the original URL of media | yes | yes | yes | You can annotate the webpages, and share the link to others. | |
| toread (no cache image) | html(Email) | no, they use the original URL of media(written in related path will appear normally) | yes | yes | yes | ||
| WebCite(access error: 2007-05-07) | html | no, they use the original URL of media(written in related path will appear normally) | yes | yes | yes | You can browse or backup the same page at different time. |
About kept the saved time: Most files already have this property. It varied easily if we saved to different storage media or FTP to another location. But the solution of Fx 1.5 + ScrapBook 0.18.4 saved this property with another function (metadata).
Winner is Firefox!