Downloading images from the PRs #169297

Kirus59 · 2025-08-09T13:21:47Z

Kirus59
Aug 9, 2025

Select Topic Area

Question

Body

In my github action (written using Node.js) I need to extract images and videos from the PR to send them to Discord when it is merged (along with the changelog).

When sending a request via https.get() using the url https://github.com/user-attachments/assets/6ed7869c-2a5a-41d7-94f8-f76bb2c9ddf1, a response with status 404 was received. However, the link opened normally in the browser and using the curl command.

I also tried sending a request to a direct url (since https://github.com/user-attachments/... redirects to another one), and I received a 403 response with the following text:

<Error><Code>AccessDenied</Code><Message>There were headers present in the request which were not signed</Message><HeadersNotSigned>host</HeadersNotSigned><RequestId>3QNY0R0DNH7H3FNA</RequestId><HostId>7R52bKvYNCC4o7pxpVhTC3hGXGShAZGJvKk4MhNst7diRCtYGhvMthWiDjro3apUCpg23dUQ4aI=</HostId></Error>

I also tried to explicitly specify the header value 'Host', but it did not help.
I tried to use the aws4 library and the request:

const aws4 = require('aws4');
const https = require('https');

const url = new URL('https://github-production-user-asset-6210df.s3.amazonaws.com/157578255/473711935-c23a2122-2260-499d-aabe-d4c95403d6ef.png?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAVCODYLSA53PQK4ZA%2F20250809%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20250809T091659Z&X-Amz-Expires=300&X-Amz-Signature=6547f8e485b516ae6c9c8abe7a9757551418049875bd6261e2b72f21b96138c7&X-Amz-SignedHeaders=host');

const opts = {
  host: url.host,
  path: url.pathname + url.search,
  method: 'GET',
};

aws4.sign(opts);

const req = https.request(opts, (res) => {
  let data = '';
  res.on('data', (chunk) => { data += chunk; });
  res.on('end', () => {
    console.log('Status:', res.statusCode);
    console.log('Data:', data);
  });
});

req.on('error', (err) => {
  console.error('Error:', err);
});

req.end();

But, I received a 400 response with the following text:

<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>AWS4-HMAC-SHA256 Credential=undefined/20250809/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date, Signature=fed5c1c857fba66ce6f6da105c252e82c8071d69a3151979e910164fbe099609</ArgumentValue><RequestId>HS9Z0FRRWBRJYDPM</RequestId><HostId>rDaLy6LF2o7ONhDC2fMm+Olg6cnN9OxZ4kkf2md2LK+Eh/kzwY/tfsYDpXsk2trFyE0kECpepLtLwAhSa2v+Tf2kFnEFLo5s</HostId></Error>

Does anyone know how to solve this problem and successfully connect to a URL to download files?

Answered by Anipaleja

Aug 9, 2025

You’re getting 403/404 because PR attachment links from GitHub are served via short-lived, pre-signed AWS S3 URLs.

Here’s what’s going on:

Links like https://github.com/user-attachments/assets/... are redirects to AWS S3. The redirected URL already contains authentication in its query string (X-Amz-Algorithm, X-Amz-Signature, etc.). These URLs expire in ~5–10 minutes. If you try to hit the raw S3 domain without following the redirect, you’ll get 403 AccessDenied. If you try to sign them again (e.g., with aws4), you’ll get Only one auth mechanism allowed. If your request code doesn’t follow redirects, the original github.com/user-attachments/... will return 404.

How to download them in Nod…

View full answer

Anipaleja · 2025-08-09T13:51:24Z

Anipaleja
Aug 9, 2025

You’re getting 403/404 because PR attachment links from GitHub are served via short-lived, pre-signed AWS S3 URLs.

Here’s what’s going on:

Links like https://github.com/user-attachments/assets/... are redirects to AWS S3. The redirected URL already contains authentication in its query string (X-Amz-Algorithm, X-Amz-Signature, etc.). These URLs expire in ~5–10 minutes. If you try to hit the raw S3 domain without following the redirect, you’ll get 403 AccessDenied. If you try to sign them again (e.g., with aws4), you’ll get Only one auth mechanism allowed. If your request code doesn’t follow redirects, the original github.com/user-attachments/... will return 404.

How to download them in Node.js

You don’t need to re-sign or add AWS headers. Just follow the redirects and download the file within the valid time window.

import https from 'https';
import { URL } from 'url';

function downloadFile(fileUrl) {
  return new Promise((resolve, reject) => {
    const url = new URL(fileUrl);

    const options = {
      hostname: url.hostname,
      path: url.pathname + url.search,
      method: 'GET',
      headers: {
        'User-Agent': 'Mozilla/5.0', // avoids some GitHub 403s
      }
    };

    const req = https.request(options, (res) => {
      // Handle redirects
      if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) {
        return resolve(downloadFile(res.headers.location));
      }

      let data = [];
      res.on('data', chunk => data.push(chunk));
      res.on('end', () => {
        resolve(Buffer.concat(data)); // raw file data
      });
    });

    req.on('error', reject);
    req.end();
  });
}

// Example usage
(async () => {
  try {
    const url = 'https://github.com/user-attachments/assets/6ed7869c-2a5a-41d7-94f8-f76bb2c9ddf1';
    const fileBuffer = await downloadFile(url);
    console.log('Downloaded file size:', fileBuffer.length);
  } catch (err) {
    console.error('Download failed:', err);
  }
})();

Follow redirects from the github.com/user-attachments/... link.
Don’t re-sign the request — the AWS signature is already embedded.
Download the file before the pre-signed URL expires.
Set a User-Agent header to avoid certain 403 responses.

If you’re processing PRs in a GitHub Action, you can also use the GitHub API to fetch the PR body/comments. You can also extract all https://github.com/user-attachments/... links with a regex.

Download them using the above function.

0 replies

Kirus59 · 2025-08-09T14:27:41Z

Kirus59
Aug 9, 2025
Author

@Anipaleja

I tried your method, but the images didn't download from the PR...

PR link: Kirus59/space-station-14#11

Used in this way:

import fs from 'fs';
import util from 'util';

const writeFileAsync = util.promisify(fs.writeFile);

function httpsReq(fileUrl){
  return new Promise((resolve, reject) => {
    const url = new URL(fileUrl);

    const options = {
      hostname: url.hostname,
      path: url.pathname + url.search,
      method: 'GET',
      headers: {
        'User-Agent': 'Mozilla/5.0', // avoids some GitHub 403s
      }
    };

    const req = https.request(options, (res) => {
      // Handle redirects
      if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) {
        return resolve(httpsReq(res.headers.location));
      }

      let data = [];
      res.on('data', chunk => data.push(chunk));
      res.on('end', () => {
        resolve(Buffer.concat(data)); // raw file data
      });
    });

    req.on('error', reject);
    req.end();
  });
}

async function downloadMedia(url, outputFolder, recursive = true){
    if (!fs.existsSync(outputFolder)){
        fs.mkdirSync(outputFolder, { recursive: true });
    }

    try {
        const fileBuffer = await httpsReq(url);
        console.log('Downloaded file size:', fileBuffer.length);

        const savePath = path.join(__dirname, 'test.png');
        await writeFileAsync(savePath, fileBuffer);

        return new MediaData('test.png', 'image', fileBuffer.length);
    }
    catch (err){
        console.error('Download failed:', err);
    }

    return null;
}

0 replies

Anipaleja · 2025-08-09T14:35:32Z

Anipaleja
Aug 9, 2025

Hi @Kirus59,
It looks like the reason your test didn’t download anything from that PR isn’t the redirect logic, it’s that the code is trying to download directly from the PR URL without first getting the actual direct media links.

In your example PR (Kirus59/space-station-14#11), the images are stored on https://github.com/user-attachments/... links. GitHub’s web UI shows them inline, but your script never fetched those links, so you were effectively downloading the HTML page instead of the image.

Why your current code isn’t working

You’re not fetching the PR body/comments via the GitHub API. Passing the PR’s web URL downloads HTML, not the media file. You need the exact user-attachments URL from the PR markdown. These links are time-sensitive (AWS pre-signed URLs), so they must be fetched and downloaded in the same run.

Fixed approach

Use the GitHub API to get the PR body.
Extract all https://github.com/user-attachments/... URLs with regex.
Download each URL using your existing request logic.

Example:

import https from 'https';
import fs from 'fs';
import path from 'path';
import { Octokit } from '@octokit/rest';

const octokit = new Octokit({
  auth: process.env.GITHUB_TOKEN // repo read permissions required
});

async function getPRMediaLinks(owner, repo, prNumber) {
  const { data: pr } = await octokit.pulls.get({
    owner,
    repo,
    pull_number: prNumber
  });

  const body = pr.body || '';
  const regex = /(https:\/\/github\.com\/user-attachments\/[^\s)]+)/g;
  return [...body.matchAll(regex)].map(match => match[1]);
}

function httpsReq(fileUrl) {
  return new Promise((resolve, reject) => {
    const url = new URL(fileUrl);
    const options = {
      hostname: url.hostname,
      path: url.pathname + url.search,
      method: 'GET',
      headers: { 'User-Agent': 'Mozilla/5.0' }
    };

    const req = https.request(options, res => {
      if (res.statusCode >= 300 && res.statusCode < 400 && res.headers.location) {
        return resolve(httpsReq(res.headers.location));
      }
      let data = [];
      res.on('data', chunk => data.push(chunk));
      res.on('end', () => resolve(Buffer.concat(data)));
    });

    req.on('error', reject);
    req.end();
  });
}

async function downloadMedia(owner, repo, prNumber) {
  const links = await getPRMediaLinks(owner, repo, prNumber);

  for (let i = 0; i < links.length; i++) {
    const buffer = await httpsReq(links[i]);
    const savePath = path.join(process.cwd(), `image-${i}.png`);
    fs.writeFileSync(savePath, buffer);
    console.log(`Saved ${savePath} (${buffer.length} bytes)`);
  }
}

// Example run
downloadMedia('Kirus59', 'space-station-14', 11);

TLDR

Don’t request the PR page directly, use the API to get markdown. Extract all user-attachments URLs via regex. Download them immediately before the URLs expire. Set a User-Agent header to avoid certain 403 errors.

If you want, this can be wrapped into a GitHub Action so that it runs when a PR is merged, then pulls all media from the PR body, and then sends them directly to Discord.

That would make the process fully automatic.

0 replies

Kirus59 · 2025-08-09T14:50:40Z

Kirus59
Aug 9, 2025
Author

@Anipaleja
Found out what the error was, it turns out that my regex when extracting the url from the string <img width="1920" height="1080" alt="image" src="https://github.com/user-attachments/assets/6ed7869c-2a5a-41d7-94f8-f76bb2c9ddf1" /> also extracted the quotation mark at the end, which is why the request didn't work correctly 😄

I fixed the regex and now everything works, thank you so much for your help

1 reply

Anipaleja Aug 9, 2025

Awesome, glad you found the issue! Regex can be tricky with those trailing quotes, easy to overlook but makes all the difference. 😄

If you don’t mind, could you please mark my answers as “solutions” or “answers” here? It really helps me unlock that galaxy brain achievement and keeps me motivated to help even more!

Thanks again for the update, and feel free to reach out if you hit any other snags!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

Downloading images from the PRs #169297

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GitHub Community

Downloading images from the PRs #169297

Uh oh!

Kirus59 Aug 9, 2025

Select Topic Area

Body

You’re getting 403/404 because PR attachment links from GitHub are served via short-lived, pre-signed AWS S3 URLs.

Here’s what’s going on:

How to download them in Nod…

Replies: 4 comments · 1 reply

Uh oh!

Anipaleja Aug 9, 2025

You’re getting 403/404 because PR attachment links from GitHub are served via short-lived, pre-signed AWS S3 URLs.

Here’s what’s going on:

How to download them in Node.js

Uh oh!

Uh oh!

Kirus59 Aug 9, 2025 Author

Uh oh!

Anipaleja Aug 9, 2025

Why your current code isn’t working

Fixed approach

TLDR

Uh oh!

Kirus59 Aug 9, 2025 Author

Uh oh!

Anipaleja Aug 9, 2025

Kirus59
Aug 9, 2025

Replies: 4 comments 1 reply

Anipaleja
Aug 9, 2025

Kirus59
Aug 9, 2025
Author

Anipaleja
Aug 9, 2025

Kirus59
Aug 9, 2025
Author