Research / formulate future ways of automating/streamlining scaffolding/bootstrap of new content

The process of adding new content to the repository (e.g., adding a new site and the third-part elements that it has on it) is laborious and error-prone. We must streamline and automate this whenever possible.

To do so, we must separate the content authoring process into two broad tasks:

Scaffolding/bootstrapping of new content (i.e., add a new site and its third-party elements – including basic block rules – to the repository – as well as flagging/reporting on third-party content that we already know about)
The editorial process of then deciding which of carrying out research on those third-party elements, deciding which of them are trackers, and refining the blocking rules.

To avoid a misunderstanding, automation applies to (1) above, not (2).

To be perfectly clear, keeping (2) unautomated is one of the greatest strengths of Better.

Better is about quality over quantity; an open/transparent human editorial process, based on the principle of the Ethical Design Manifesto, is at our core.

With that out of the way, here is some preliminary information on both how we are doing things today as well as possible ways of automating the scaffolding/bootstrapping of new content in the future.

Safari Web Inspector

Safari Web Inspector is our primary tool for investigations. Unfortunately, it does not have any sort of data export feature and cannot, for example, export a HAR file. That would be the ideal solution as we could get before and after snapshots with the content blocker off and on. We currently do this manually.

Electron-har

Electron-har by Stanley Shyiko and, especially the fork by Brandtley McMinn – which I have forked here – could be hugely valuable.

With Brandtley’s recent additions, electron-har can save a HAR file and can emulate the user agent string and reported screen dimensions, etc., of an iPhone, etc. This should get us pretty close to sampling the behaviour of a site when content blockers are not on, including any differences in behaviour on mobile.

This functionality alone will enable us to:

Create scaffolding for a new site (e.g., /sites/new-site.com/index.md) including the before statistics for page load time/weight/etc.
Create draft scaffolding for all the third-party content on it that doesn’t already have an entry [and link to the ones that already do from the new site page] (e.g., /trackers/drafts/tracker-a.com/index.md, /trackers/drafts/tracker-b.com/index.md, etc.)
Along with a whois look-up, to create a draft company page, if necessary.

What it does not give us is a way of sampling the after statistics for a page as it uses Electron/Chromium instead of Safari to download web pages and, thus, is not affected by Content Blockers.

This is not a huge limitation, however, as Safari Web Inspector can be used, manually, to gather the after statistics from the Simulator or the desktop. One question is whether there will be significant discrepancies between the two sets of timings (from electron-har and Safari Web Inspector). While this should be investigated, my hunch is that there won’t be and that the differences between timings made on the simulator/device and desktop should be more markedly different.

whois

Whois can be used as part of an automated process to scaffold the company page for a site using a slug of the ‘Registrant Organization’ field in standard output (e.g., in /companies/drafts/new-company-name-from-whois.md)

e.g., for doubleclick.com

Registrant Organization: Google Inc.

Would result in:

/companies/drafts/google-inc.md

(With the backlink list of sites, trackers, etc. already filled in.)

Mozilla Lightbeam for Firefox

Mozilla Lightbeam is a Firefox extension that explores the third-party sites that you’re exposed to when you visit a page. It can output a JSON representation of the data in the following format:

[
  [
    source (string),
    target (string),
    timestamp (number),
    contentType (string),
    cookie (bool),
    sourceVisited (boolean),
    secure (boolean),
    sourcePathDepth (number),
    sourceQueryDepth (number),
    sourceSubdomain (number),
    targetSubdomain (number),
    method,
    statusCode (number),
    cacheable (boolean),
    privateTab (boolean)
  ],
…
]

Unlike Safari Web Inspector and electron-har, Lightbeam does a deep inspection of relationships, going beyond just the third-party sites that are loaded directly by the page you’re on but following links to the third-party sites that other third-party sites expose you to. In this way, it can be used to find a wider array of trackers and should be considered as a manual tool in the investigative process even if we do not find a way of integrating the JSON output into our automated scaffolding/bootstrapping process.

Like electron-har, it suffers from not respecting native content blockers and thus cannot be used to get an ‘after’ snapshot of a site’s performance.

JSX

JavaScript for Automation is a poorly-documented automation system for OS X that could be used as a last-ditch effort to automate the bootstrapping/scaffolding process using Safari Web Inspector by driving it using System Events. (Do you see the lengths we have to go to when people don’t make their tools interoperable with de-facto standards like HAR?)

Here’s a bit of code that, for example, opens up the Web Inspector:

#!/usr/bin/env osascript -l JavaScript

Safari = Application("Safari");
Safari.activate();
Safari.includeStandardAdditions = true

SystemEvents = Application('System Events')
Safari = SystemEvents.processes['Safari']

// e.g., Show the Develop menu
// delay(1);
// var developMenu = Safari.menuBars[0].menuBarItems.byName('Develop');
// developMenu.click();

delay(1);
SystemEvents.keystroke('i', {using: ['option down', 'command down']})

Here are some links for JSX resources:

JSX cookbook wiki
JavaScript for Automation session at WWDC 2014 (Transcript, Video, Slides (PDF))
JSX Release Notes
JSX by Example
Automation of OS X, the JS way
Stack overflow
Building OS X apps with JavaScript
Wikipedia

Thoughts, etc., welcome! cc @laura @mark @stefan