Complete Guide To Web Scraping Using Make (Formerly Integromat)

Hideyuki Shibata
8 min readNov 2, 2023
Here’s what a simple web scraping scenario looks like

In this post, I’ll cover one of the most handy things you can do in Make: web scraping. You don’t need to write a single line of Python script. Just follow along this post and you’ll be on the right track!

I left a scenario template of Make that you can try out at the bottom of this post. It’s up to you if you read through the post or just skip to it!

If you need a much more detailed guide that covers a lot more about web scraping with Make from little functions to scenario templates, I’ve just published a course that’s worth reading.

Use the following discount code to get 20% off :)

IREADYOURMEDIUM

Setting up a Google Sheets module

Let’s say we have a Google Sheet where you have a list of links you want to extract content from. I collected some links from Product Hunt:

List of links from Product Hunt

Now, go to Make and create a scenario. For the first module, we pick Google Sheets > Search Rows. If you have the links you want to scrape in Airtable, for example, you choose Airtable > Search Records here.

Choose Google Sheets > Search Rows

Go ahead and connect to the right Google Sheet. Setting up a filter is a good practice. For the sake of this demo, I’ll just simply say “Rows where links exist”. Let’s not worry about other fields for now.

Set your search filter for Search Rows

Right click the module and press “Run this module only”.

“Run this module only” lets you easily map its output in the following steps

Now, you should see the collections of links and other pieces of information like the image below.

These actual values make it easier to map the keys in the following steps

If you don’t see the green check at this point, you might doing something wrong. If you do see the green check, the next step is setting up the actual scraping step!

Moving on to the HTTP module

Now, add an HTTP module right after the Google Sheets module. You access the URLs you want to scrape with an HTTP module. You don’t really have to know in detail what HTTP means for now. Just remember when you want to extract information, this is the module you will go for.

A little intimidating but we’ll be okay

Choose Make a request.

Choose Make a request

The rest is very simple. Map the link from the previous Google Sheets module into the URL field of the HTTP module. Again, there’s no need to worry about other fields.

Map map map

Let’s run once

Setting up the actual scraping part is pretty much done. That was fast! Let,s run the scenario once and see what it does. Before hitting Run once ▶️ button, go back to the Google Sheets module and set the Maximum number of returned rows to 1. This ensures that this test run will apply to only one link. If you have a 1,000 links for example, and if you don’t limit the number of rows returned, you will end up actually scraping the 1,000 links for a simple test (unless you forcefully stop the scenario) and you don’t really want that.

Set it to 1 when testing

Once you are done with the setting, it’s time to Run once ▶️.

Voila! Basic scraping successful

Yes! We got the content under the Data label. We didn’t write a single line of code to achieve this. And we’re not done!

Extracting the information you want

While being able to extract the full HTML from a webpage feels great, often times what you want from the webpage isn’t the whole thing. You might be trying to extract the title, url, image, or author. We’ll nowlook at how we can extract the right information from the full HTML.

Let’s add an XML module.

XML: Extensible Markup Language (like it matters)

After clicking on XML, choose Perform XPath Query. Basically what you can do with it is specify which part of the full content you want to extract.

Perform XPath Query lets you extract elements you need from the entire HTML

When you are done choosing the right module, you will be shown two fields.

First field is XML. Here, map the Data from the previous HTTP module output.

Map map map

Now, the second field “Xpath Query” is where things get a little tricky, but we’ll proceed one step at a time. Remember that the list of links I have in my Google Sheets are of Product Hunt? Let’s look at one of the pages and decide what to extract.

We’ll extract these pieces of text

For this demo, I’ll go for the product name and the description as highlighted above. Now, open the Developer Tools, then navigate to the element you want to extract, the title and description in my case.

Don’t worry I hate the Developer Tools too

Then right click on the HTML element that holds the content. In my case, an h1 tag holds the product name so I right click on that, and then, Copy > Copy XPath.

Don’t think, just copy

Go back to Make, and paste whatever you copied into the XPath Query field. Press OK when you are done.

See if it works!

Let’s Run once ▶️. If it works, you are all set!

Unfortunately though, I didn’t get any output as you can see blow although the scenario did not incur any error. What this means is that we did not specify the right XPath. Not sure why this happens as we literally copy-pasted the XPath but it is what it is. We have to go with something else.

First try! Nothing returned 🙃

Now, there are other ways to write the XPath we are after. You can manually write the XPath but it can be a little complicated. So let’s do this instead:

https://chat.openai.com/share/8f798c89-65f8-4572-b048-7707c33555e5

The XPath we created before considers the structure of the full HTML, but that is not all there is to it. You can also create an XPath using only the element in question although they don’t come in the same format. ChatGPT should create multiple XPaths for you so let’s try those and at least one of them should work!

It worked!

Do the same for other pieces of information you want to extract. I’ll go ahead and add another XML module to extract the product description.

You need one module for each element you want to extract

Putting data back into Google Sheets

We’ve successfully built a scenario that extracts information from web pages. What we want to do now is to put the extracted information back into the Google Sheet we are working with.

I’ve added two columns in the Google Sheet. One for product names, the other for product descriptions.

Back to the Google Sheet

Let’s go ahead and add a Google Sheets > Update a Row module. Let’s not choose the Add a Row module here because we are literally updating the existing rows where the links reside.

We’re not ADDING a row. We’re UPDATING one.

Moving onto the setup. After choosing the right Sheet, you will need to fill out Row number field, and the columns you want to update. For the Row number field, you look at the very first Google Sheets module. The search operation it performs returns the row number of the link that you are working with. So go ahead and map it into the field.

The very first module returns the row numbers of the rows you are dealing with, so we’ll use that.

You can leave the link field empty, or any column where something already exists for that matter. Leaving these fields empty in the Make side will not override the existing text in Google Sheets.

Map map map

Finally, we delete whatever number we input into Maximum number of returned rows in the first Google Sheets module. You can set it to 10, 50, 100, or any other number depending on your needs but it really isn’t what matters in this demo, so I’ll just leave it empty.

This was set to 1 when testing

And there you have it! I’ve built a web scraping tool in a matter of 15 minutes. All that’s left to do is watch the data flow into your Google Sheet.

Beautifully done!

Let’s hit Run Once ▶️ and see if it does what it’s supposed to do.

We made it!

Perfect! If you have a lot of links, say 1,000 or 10,000, maybe more, or if you are expecting for another scenario or people to add more and more links here, you might want to schedule the scenario to run every once in a while. If not, Run Once ▶️ will do the job just fine.

Get your template!

I leave the Google Sheet I used in this demo here. And the Make scenario too. Feel free to give it a try!

If you need a much more detailed guide that covers a lot more about web scraping with Make from little functions to scenario templates, I’ve just published a course that’s worth reading.

Use the following discount code to get 20% off :)

IREADYOURMEDIUM

--

--