How to programmatically get google search results
Recently, I needed a way how to programmatically search on google, while developing a linux applet - rofi-search which allows for interactively searching for top results across multiple search engines.
It's not known much that there is an official Google API
you can use for getting the data right now!
Data scrapers
Common techniques of solving the problem nowadays involve scraping the DOM
for data from Google's website.
There are multiple scraping libraries for many languages available, notably GoogleScraper package for python
or google-it package for NodeJS
.
google-it
code example in NodeJS
const search = require('google-it') const response = await search({ 'query': 'How to programmatically get google search results' }); console.log(response);
Disadvantage of data scrapers are that they need to be actively maintained and patched every time upstream website is updated, not mentioning that it is relatively heavy solution with noticeable processing overhead.
However, often enough, it is the only way how to get the data from proprietary data sources. This time fortunately, we are in luck though as Google made it's public Custom Search Engine API
available for everyone for free.
Google Custom Search Engine (CSE)
Google's CSE allows us to create a custom search engine specifically configured to our requirements whether it is prioritizing search results from specific websites or searching our web only.
Important thing to know is if configured correctly, it can search for results on the entire web!
The following are few simple steps you will need to take, to setup your search API:
- Create your own custom search engine https://cse.google.com/cse/all and get
Search engine ID
from the settings panel
-
Under
Basics
settings section where you got yourCSE ID
, find and enableSearch the entire web
option. -
Get API key for the created search engine.
-
Now we can use our
CSE ID
andAPI key
to make a GET API request tohttps://www.googleapis.com/customsearch/v1?key=yourkey&cx=yourid&q=query
Here is
NodeJS
code example which does not require any dependencies that you can copy paste:
const https = require('https'); /** * @param {Object} options * @param {String} options.key - Google API key * @param {String} options.cx - Search Engine ID * @param {String} options.q - search query * * ...for list of all supported options, see * https://developers.google.com/custom-search/v1/reference/rest/v1/cse/list * * @returns {Promise<Object>} */ function search(options) { return new Promise(function(resolve, reject) { let query = ''; //serialize url query parameters Object.keys(options || {}).forEach(function(prop) { query += prop+'='+encodeURIComponent(options[prop])+'&'; }); //api request options const opt = { hostname: 'www.googleapis.com', port: 443, path: `/customsearch/v1?${query}`, method: 'GET', headers: { Accept: 'application/json' } } //response data let data = ''; const req = https.request(opt, res => { //process incomming response data stream res.on('data', chunk => { data += chunk; }); //when end of stream is reached and there is no more data to read //return resolve or rejected promise, //depending on response http status code res.on('end', () => { if (res.statusCode >= 200 && res.statusCode < 300) { resolve(JSON.parse(data)); } else { reject(data); } }); }); req.on('error', error => { reject(error); }); req.end(); }); };
You can use the search
function as follows:
const response = await search({ cx: '001111568431131411111:aa33rweuabc', key: 'AKDYksKYicBgvX6k7G8mFWCABo0HqelXM7dOsA0', q: 'How to programmatically get google search results' }); //dumps search results console.log(response.items);
And there you have it, a lightweight solution which does not require you to regularly update scraping code nor install a ton of dependencies. More importantly, it uses official API supported by Google.
There are few considerations that are helpful to make though.
CSE
doesn't include features such as personalized results, oneboxes etc.. and sometimes it might give slightly different results than when scraping the data from google's website.
However, from my own experience with using CSE
, the differences in search results have been insignificant.
You can comment directly below, but you would need to authorise the Utterance's app to post issues to GitHub on your behalf.