waybackurls

A simple Node.js package to fetch historical URLs from the Wayback Machine

Available as both a CLI tool and a library for programmatic use.

Installation

Global Installation (CLI)

npm install -g waybackurls

Local Installation (Library)

npm install waybackurls

Usage

CLI Usage

Fetch URLs and display to stdout:

waybackurls -d example.com

Save URLs to a file:

waybackurls -d example.com -o results.txt

Legacy syntax (still supported):

waybackurls example.com

Library Usage

Import and use in your JavaScript code:

const { fetchWaybackUrls } = require('waybackurls');

// Basic usage
(async () => {
  try {
    const urls = await fetchWaybackUrls('example.com');
    console.log(`Found ${urls.length} URLs:`);
    urls.forEach(url => console.log(url));
  } catch (error) {
    console.error('Error:', error.message);
  }
})();

With options:

const { fetchWaybackUrls } = require('waybackurls');

(async () => {
  const urls = await fetchWaybackUrls('example.com', {
    matchPrefix: true,  // Match *.example.com (default: true)
    collapse: 'urlkey'  // Deduplication strategy (default: 'urlkey')
  });
  
  // Process URLs
  const uniqueDomains = new Set(urls.map(url => new URL(url).hostname));
  console.log('Unique subdomains found:', Array.from(uniqueDomains));
})();

API

`fetchWaybackUrls(domain, options)`

Fetches historical URLs for a domain from the Wayback Machine.

Parameters:

domain (string, required): The domain to fetch URLs for (e.g., 'example.com')
options (object, optional):
- matchPrefix (boolean): Match URLs with domain as prefix using *.domain (default: true)
- collapse (string): CDX collapse strategy for deduplication (default: 'urlkey')

Returns: Promise<string[]> - Array of historical URLs

Throws: Error if domain is invalid or API request fails

CLI Options

Usage: waybackurls -d <domain> [-o <output-file>]

Options:
  -d, --domain <domain>    Domain to fetch URLs for (required)
  -o, --output <file>      Output file (optional, defaults to stdout)

How It Works

This tool queries the Wayback Machine CDX API to retrieve all historical URLs archived for a given domain. The CDX (Capture Index) API provides a searchable index of web archives.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github		.github
bin		bin
scripts		scripts
.gitignore		.gitignore
.npmignore		.npmignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
examples.js		examples.js
index.js		index.js
logo.png		logo.png
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

waybackurls

Installation

Global Installation (CLI)

Local Installation (Library)

Usage

CLI Usage

Library Usage

API

`fetchWaybackUrls(domain, options)`

CLI Options

How It Works

License

About

Uh oh!

Releases 1

Uh oh!

Languages

License

chethanyadav456/waybackurls

Folders and files

Latest commit

History

Repository files navigation

waybackurls

Installation

Global Installation (CLI)

Local Installation (Library)

Usage

CLI Usage

Library Usage

API

fetchWaybackUrls(domain, options)

CLI Options

How It Works

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Uh oh!

Languages

`fetchWaybackUrls(domain, options)`