Claybooks: Extract data from paginated databases in seconds

What we used

Clay (Free) to scrape and clean paginated data

free resources

Download the free template here

need help

Ask a question in the slack community

Schedule a meeting with Clay expert

Before we begin

‍

Manually scraping data from large, paginated databases takes forever. And many web scrapers simply can’t navigate pagination easily to do it automatically.

‍

Good news: this Claybook uses Zenrow and Clay’s custom formulas to scrape paginated databases in seconds.

‍

How it works:

Create thousands of paginated database URLs using Clay formulas, automatically
Use Zenrows (natively integrated in Clay) to scrape each page, one by one
Use scraped data for enrichment, personalization, outbound, and more

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Step 1:

Copy this template

First, copy this template in Clay.

If you’re already a user, you’ll be directed to your workspace. If not, you’ll be prompted to create a free Clay account.

The interactive demos on the right will walk you through each step of this Claybook.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Step 2:

Add paginated page numbers

How many pages exist in the paginated database you want to scrape? Add them to the first column, giving each page its own row. We’ll use these page numbers in the next step to create dynamic URLs.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Step 3:

Create pagination URLs using Clay formulas

Next, so you don’t have to manually paste URLs into Clay, we’ll use a formula to add the page number to the database URL string automatically. Simply paste the URL of the database in the formula and insert the page number cell in the proper spot.

Once finished, run the Zenrow data scrape column.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Step 4:

Edit AI data cleaning prompt to fit your needs

Next, to ensure we’re pulling usable data from often complex databases, use ChatGPT to clean the dataset.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Step 5:

Write data to another table for further enrichment

Last, write your newly scraped database data to a new table. Why? We want to get all the datapoints into their own columns. The only way to do that is to write to a new table. In the new table, you can use this data for anything, like further enrichment or for outbound messaging.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.