Web Scraping with Make.com: Essential Functions to Level Up Your Game (Formerly Integromat)

Hideyuki Shibata
5 min readJun 27, 2024

--

Functions are often used in Tools > Set multiple variables module

You don’t find many resources on web scraping with Make.com but on functions? Pretty much non-existent. But it’s very important to know that these native functions are very powerful and will definitely improve your web scraping skill using Make.

If you haven’t read my post on how to perform web scraping with Make.com, head over to it below.

If you web scraping with Make.com is a breeze for you, you don’t have to read that one above but I do recommend this one below. I wrote it for advanced Make users.

So, if you are still here, let’s go over the essential functions for web scraping with Make!

Download your template used in this tutorial!

Downloading the template and follow along ✌️ (No code automation hacks and secrets will be delivered to your inbox every once in a while)

Just to give you an idea, this is how the template looks like:

Download and follow along 👨‍💻

So, without further ado, let’s get started.

stripHTML()

You know how you get the entire source code when doing HTTP > Make a request to a certain web page.

Let’s get content from webflow.com
<!DOCTYPE html><! — … 😅

What stripHTML() does is literally strip the HTML from the argument and leave the text content. Let’s set it up like the following screenshot:

stripHTML(scraped_data)
A closer look

Let’s ▶︎ Run once and see what comes out of it.

No HTML, just the text content 🥳

Beautiful! Some words are attached together because of how stripHTML() works but it’s not much of a big deal if you are feeding the text into AI for further processing.

replace()

replace() is another useful function for manipulating scraped data. In the last screenshot, you see all these empty areas, right? That doesn’t really look good already, but if you are going to store this long string into a database or spreadsheet, your colleagues might not like it (I won’t!).

That’s where the replace() function comes in along with a little bit of regular expressions. Let’s add another Tools > Set multiple variables module and use the function.

TBH I don’t think I have a scenario where I don’t use the replace() function
A closer look

What this does is take any space or new line and replace them with a space. For example, let’s take a look at the text below:

  • [Space]
  • [Space][Space][Space][Space][Space][Space]
  • [New Line]
  • [New Line][New Line][New Line][New Line]
  • [Space][Space][New Line][New Line][New Line]

All these will be replaced with a single [Space], making the long string look prettier. The power of regular expressions!

Let me just add that replacing these text with an empty string, and not a [Space] would produce a long string with no space at all, therefore use a [Space]!

Let’s ▶︎ Run once and see the result.

No HTML, no unnecessary spaces and new lines

Yes! Much better.

substring()

Sometimes the scraped content can be too long for whatever purpose you have for web scraping. For example, if you are storing the scraped content into a Google Sheets cell, only 50,000 characters are allowed. Airtable gives you a bit more flexibility by allowing 100,000 characters in a single cell. Another example would be putting the scraped content in a LLM prompt. Depending on which model you are using, it can have a pretty limited context length (or max tokens), and giving it unnecessary information can cause hallucination and increased cost.

So in these cases you might need to cut down the length of the scraped content, and that is exactly substring() comes in.

Let’s set it up and see it in action.

It should look like this:

Gimme from 0 to 3,000 characters
A closer look

The argument 0 and 3,000 means, “Hey start from 0 and finish at the 3,000th character”.

Let’s ▶︎ Run once.

substring() in action

It worked!

contains()

Sometimes you want to perform a operation depending on whether the scraped data contains a certain set of characters. For example, you might want to ignore a scraped data that contains “gmail.com”. You can do that with the contains() function.

To check whether a string contains a certain set of characters, you can set up a filter between modules and let things pass if they meet the condition.

However, that isn’t the only way. Using the contains() function allows you to do the same, and in my opinion, there are cases using the contains() function rather than a filter will make your workflow much simpler.

Let’s have a look!

We’ll look for the word “Webflow” in the text scraped from webflow.com
A closer look

This one is pretty simple! Let’s ▶︎ Run once.

True 🔮 Of cource webflow.com contains “Webflow”

That’s a wrap!

I hope you learned a thing or two about Make’s native functions. They can be very powerful if you get the hang of it, and when you do, your scenarios will start looking more and more purple because you will be using many Tools > Set multiple variables modules to use the functions. And I hope you get to that level one day!

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Hideyuki Shibata
Hideyuki Shibata

No responses yet

Write a response