Skip to content

Filter certain query params when generating filenames from URLs

What problem do you want to solve?

Filenames of generated HTML reports should be shorter and not contain unnecessary or sensitive values from dynamic URLs.

What is the proposed solution?

A change to the getFileNameFromUrl method. Maybe like this:

/**
 * Check whether a given string hast the format of a URL.
 *
 * @param {string} str candidate string that might be a URL
 * @returns {boolean} true, if given string is a URL according to node
 */
const stringIsAValidUrl = (str) => {
    // TODO allow only certain protocols like https?
    try {
        new URL(str);
        return true;
    } catch (err) {
        return false;
    }
};

let isString = value => typeof value === 'string' || value instanceof String;
const queryParamsToFilter = process.env.PA11Y_URL_PARAMS_FILENAME_FILTER || 'hash,secret,ck,session,sid,sessionid,sessid,phpsessid,session_id,token,password,pwd,nonce,redirect_uri,signature,hmac,state,utm_whatever';
const maxFilenameLength = 250;
const getFileNameFromUrl = (givenUrl) => {
    if (!givenUrl || !isString(givenUrl) || givenUrl === '') {
        console.error('Invalid URL: ' + givenUrl);
        throw new Error('Invalid URL: ' + givenUrl);
    }
    let url = givenUrl;
    if (stringIsAValidUrl(givenUrl)) {
        const urlObj = new URL(givenUrl);
        queryParamsToFilter.split(',').forEach((queryParam) => {
            urlObj.searchParams.delete(queryParam);
        });
        if (urlObj.username) {
            urlObj.username = 'redacted';
        }
        if (urlObj.password) {
            urlObj.password = 'removed';
        }
        url = urlObj.toString();
    }

    // Covers for all URL non-alphanumeric reserved/unreserved characters (for consistency) per
    // https://developers.google.com/maps/documentation/urls/url-encoding
    const result = url
        .replaceAll(/(?<url>^(?:https?|file):\/\/|\.html$)/g, '')
        .replaceAll(/[!#$%&'()*+,./:;=?@[\]_~]+/g, '-')
        .replace(/^-|-$/, '');
    return (result.length <= maxFilenameLength) ? result : result.slice(0, maxFilenameLength);
};

Application version

Latest Puppeteer 24.x versions with latest versions of @aarongoldenthal/pa11y and @aarongoldenthal/pa11y-ci

Thanks for updating the dependencies in the forks!

Additional details

This would be a nice feature for the pa11y-ci-reporter-cli-summary as well (see issue 41 - the URLs are shortened there, but when you use dynamic URLs (from API calls or after logins or whatever) the output might not show any host or path segment, but only query parameters in the cli output.

Edited by graste
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information