Ahead of Time Compilation with Angular
April 14th, 2017
Pre-compiling your Angular code can significantly reduce bundle size and improve performance. In this article we will review an example app to see it in action.
EDIT: Headless Chrome is shipping in Chrome 59 so the need to use the full Canary path will eventually go away. You can check your Chrome version in the menu under Help > About Google Chrome.
This walkthrough shows you how to get headless Chrome up and running on OSX and explains in detail how to use the code examples provided by the Chrome team.
Headless mode in Chrome is a new way to interact with websites without having to actually have a window up on the screen. This might seem like a trivial improvement but it is actually a huge step forward for scraping data from the web. Currently there are number of stable but informal solutions to scraping such as PhantomJS or NightmareJS (which is written in Electron). ~Neither or these tools is going away~ (edit: the PhantomJS sole maintainer has resigned) and they’re still great solutions to scraping. If you have existing systems that are working using these tools, you can keep using them.
With that said, some users have run into trouble working with PhantomJS and Nightmare. Both have caveats when running on a shell-only system (one without an actual screen or window manager). For example, in Nightmare (and any electron app), you would need to install a virtual display manager in order to run the application. Additionally, since Nightmare is Electron based, it has a different security model than Chrome and may fail to catch certain security issues during testing that would happen on a production.
Headless Chrome has been released in Chrome 59. As of April 13, 2017 Chrome Canary is the only channel that contains Chrome 59. This means that right now, you need to install Chrome Canary if you want to use headless browsing. This will change in the future and eventually The Chrome Dev Team will bring Chrome 59 into the main Chrome build.
To install Chrome Canary, you can download it or install it with homebrew:
brew install Caskroom/versions/google-chrome-canary
Many of the examples of using headless Chrome just show using a simple chrome command. This is great for Linux but does not work on OSX since that command does not get installed to your path (yet).
So to find Chrome’s path, let’s fire up our terminal to find where Chrome Canary was installed on our system.
sudo find / -type d -name "*Chrome Canary.app"
You’ll probably get some permissions errors but you’ll also get a path that looks something like this:
/Applications/Google Chrome Canary.app
Since we’ve found the path to Chrome Canary, we can use this to start Chrome in headless mode.
Once we have the path to Canary we need to run a single command to start Chrome as a headless server.
/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary --headless --remote-debugging-port=9222 --disable-gpu https://chromium.org
Specifically notice that we escaped the spaces in the file name and are looking deep into the Mac .app file to the actual Chrome binary itself. We then passed it the flags needed to start the headless browser and direct it to an initial url of https://chromium.org. The browser is waiting for us to connect on port 9222 to give it further instructions. Keep this tab open and the server running. Open another tab where we’ll connect to the browser and give it some instructions.
I am going to use Node.js to connect to our running Chrome Canary instance. You’ll need Node installed for this part of the walkthrough.
Let’s generate a generic node project with just one dependency on the Chrome Remote Interface package which will help us communicate with Chrome. We’ll also create a blank index.js file:
mkdir my-headless-chrome && cd my-headless-chrome
npm init --yes
npm install --save chrome-remote-interface
touch index.js
Now we’re going to put some code into our index.js. This is the boilerplate example provided by the Chrome team. It instructs the browser to navigate to github.com and captures all of the network requests made on the page by watching the network property on the client.
const CDP = require('chrome-remote-interface');
CDP(client => {
// extract domains
const { Network, Page } = client;
// setup handlers
Network.requestWillBeSent(params => {
console.log(params.request.url);
});
Page.loadEventFired(() => {
client.close();
});
// enable events then start!
Promise.all([Network.enable(), Page.enable()])
.then(() => {
return Page.navigate({ url: 'https://github.com' });
})
.catch(err => {
console.error(err);
client.close();
});
}).on('error', err => {
// cannot connect to the remote endpoint
console.error(err);
});
Finally start our node application.
node index.js
And we’ll see all of the network requests made by Chrome, all without even having an actual browser window!
https://github.com/
https://assets-cdn.github.com/assets/frameworks-12d63ce1986bd7fdb5a3f4d944c920cfb75982c70bc7f75672f75dc7b0a5d7c3.css
https://assets-cdn.github.com/assets/github-2826bd4c6eb7572d3a3e9774d7efe010d8de09ea7e2a559fa4019baeacf43f83.css
https://assets-cdn.github.com/assets/site-f4fa6ace91e5f0fabb47e8405e5ecf6a9815949cd3958338f6578e626cd443d7.css
https://assets-cdn.github.com/images/modules/site/home-illo-conversation.svg
https://assets-cdn.github.com/images/modules/site/home-illo-chaos.svg
https://assets-cdn.github.com/images/modules/site/home-illo-business.svg
https://assets-cdn.github.com/images/modules/site/integrators/slackhq.png
https://assets-cdn.github.com/images/modules/site/integrators/zenhubio.png
https://assets-cdn.github.com/images/modules/site/integrators/travis-ci.png
https://assets-cdn.github.com/images/modules/site/integrators/atom.png
https://assets-cdn.github.com/images/modules/site/integrators/circleci.png
https://assets-cdn.github.com/images/modules/site/integrators/codeship.png
https://assets-cdn.github.com/images/modules/site/integrators/codeclimate.png
https://assets-cdn.github.com/images/modules/site/integrators/gitterhq.png
https://assets-cdn.github.com/images/modules/site/integrators/waffleio.png
https://assets-cdn.github.com/images/modules/site/integrators/heroku.png
https://assets-cdn.github.com/images/modules/site/logos/airbnb-logo.png
https://assets-cdn.github.com/images/modules/site/logos/sap-logo.png
https://assets-cdn.github.com/images/modules/site/logos/ibm-logo.png
https://assets-cdn.github.com/images/modules/site/logos/google-logo.png
https://assets-cdn.github.com/images/modules/site/logos/paypal-logo.png
https://assets-cdn.github.com/images/modules/site/logos/bloomberg-logo.png
https://assets-cdn.github.com/images/modules/site/logos/spotify-logo.png
https://assets-cdn.github.com/images/modules/site/logos/swift-logo.png
https://assets-cdn.github.com/images/modules/site/logos/facebook-logo.png
https://assets-cdn.github.com/images/modules/site/logos/node-logo.png
https://assets-cdn.github.com/images/modules/site/logos/nasa-logo.png
https://assets-cdn.github.com/images/modules/site/logos/walmart-logo.png
https://assets-cdn.github.com/assets/compat-8a4318ffea09a0cdb8214b76cf2926b9f6a0ced318a317bed419db19214c690d.js
https://assets-cdn.github.com/assets/frameworks-6d109e75ad8471ba415082726c00c35fb929ceab975082492835f11eca8c07d9.js
https://assets-cdn.github.com/assets/github-5d29649478f4a2b05588bbd0d25cd56ff5445b21df31b4cccca942ad8687e1e8.js
https://assets-cdn.github.com/images/modules/site/heroes/home-code-bg-alt-01.svg
https://assets-cdn.github.com/static/fonts/roboto/roboto-light.woff
https://assets-cdn.github.com/static/fonts/roboto/roboto-regular.woff
https://assets-cdn.github.com/static/fonts/roboto/roboto-medium.woff
This is great to see the assets that might be loaded, but what about if we want to walk the DOM for elements that exist in the page? We could use a script like this which pulls out all of the image tags from Github.com:
const CDP = require('chrome-remote-interface');
CDP(chrome => {
chrome.Page.enable()
.then(() => {
return chrome.Page.navigate({ url: 'https://github.com' });
})
.then(() => {
chrome.DOM.getDocument((error, params) => {
if (error) {
console.error(params);
return;
}
const options = {
nodeId: params.root.nodeId,
selector: 'img',
};
chrome.DOM.querySelectorAll(options, (error, params) => {
if (error) {
console.error(params);
return;
}
params.nodeIds.forEach(nodeId => {
const options = {
nodeId: nodeId,
};
chrome.DOM.getAttributes(options, (error, params) => {
if (error) {
console.error(params);
return;
}
console.log(params.attributes);
});
});
});
});
});
}).on('error', err => {
console.error(err);
});
You’ll see that we can get the following data structure representing the tags in the page including the urls of the image.
[ 'src',
'https://assets-cdn.github.com/images/modules/site/home-illo-conversation.svg',
'alt',
'',
'width',
'360',
'class',
'd-block width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/home-illo-chaos.svg',
'alt',
'',
'class',
'd-block width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/home-illo-business.svg',
'alt',
'',
'class',
'd-block width-fit mx-auto mb-4' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/integrators/slackhq.png',
'alt',
'',
'class',
'd-block integrations-collage-img width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/integrators/zenhubio.png',
'alt',
'',
'class',
'd-block integrations-collage-img width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/integrators/travis-ci.png',
'alt',
'',
'class',
'd-block integrations-collage-img width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/integrators/atom.png',
'alt',
'',
'class',
'd-block integrations-collage-img width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/integrators/circleci.png',
'alt',
'',
'class',
'd-block integrations-collage-img width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/integrators/codeship.png',
'alt',
'',
'class',
'd-block integrations-collage-img width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/integrators/codeclimate.png',
'alt',
'',
'class',
'd-block integrations-collage-img width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/integrators/gitterhq.png',
'alt',
'',
'class',
'd-block integrations-collage-img width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/integrators/waffleio.png',
'alt',
'',
'class',
'd-block integrations-collage-img width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/integrators/heroku.png',
'alt',
'',
'class',
'd-block integrations-collage-img width-fit mx-auto' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/airbnb-logo.png',
'alt',
'Airbnb',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/sap-logo.png',
'alt',
'SAP',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/ibm-logo.png',
'alt',
'IBM',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/google-logo.png',
'alt',
'Google',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/paypal-logo.png',
'alt',
'PayPal',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/bloomberg-logo.png',
'alt',
'Bloomberg',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/spotify-logo.png',
'alt',
'Spotify',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/swift-logo.png',
'alt',
'Swift',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/facebook-logo.png',
'alt',
'Rails',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/node-logo.png',
'alt',
'Node',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/nasa-logo.png',
'alt',
'Nasa',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
[ 'src',
'https://assets-cdn.github.com/images/modules/site/logos/walmart-logo.png',
'alt',
'Walmart',
'class',
'logo-img px-2 px-sm-4 px-md-5 px-lg-0' ]
Happy scraping!
Pre-compiling your Angular code can significantly reduce bundle size and improve performance. In this article we will review an example app to see it in action.
Cordova and React Native are JavaScript based frameworks for building cross-platform mobile apps. They differ greatly in their approach, with Cordova being embedded web view centric, and React Native being native centric. React Native is just one of…
Add the Universal Windows Platform to your existing React Native app, using a Windows 10 VM in VirtualBox, and the React Native Windows plugin
Jim has over 13 years of experience in software development. He brought his first computer home on the school bus after pulling it out of the dumpster at his school. He worked for 7 years in IT support and infrastructure before making the change over to development using primarily JavaScript. His primary focus is on Node.js, React and React Native development. Over the next few years he anticipates working more and more with GraphQL and Apollo to reduce the data fetching requirements of modern web and cross-platform mobile applications. Jim is self-taught, forward-looking and is passionate about cultivating our Twin Cities tech community by bringing in new developers from a wide variety of backgrounds. As a former frontend engineering instructor he is passionate about teaching, mentoring, listening and speaking about our craft.