Saturday, 30 April 2022

How to Build a Website Scraper with Puppeteer and Firebase Functions

Let’s create a simple website scraper that download the content of a web page and extract the content of the page. For this example, we will use the New York Times website as the source of the content. The scraper will extract the top 10 news headlines on the page and display them on the web page. The scraping is done using the Puppeteer headless browser and web application is deployed on Firebase functions.

Scrape Website

1. Initialize a Firebase Function

Assuming that you have already created a Firebase project, you can initialize the Firebase functions in a local environment by running the following command:

mkdir scraper
cd scraper
npx firebase init functions
cd functions
npm install puppeteer

Follow through the prompts to initialize the project. We are also installing the Puppeteer package from NPM to use the Puppeteer headless browser.

2. Create a Node.js Application

Create a new pptr.js file in the functions folder that will contain the application code for scraping the content of the page. The script will only download the HTML content of the page and block all images, stylesheets, videos and fonts to reduce the amount of time it takes to download the page.

We are using XPath expression to select headlines on the page that are wrapped under the h3 tag. You may use Chrome Dev Tools to find the XPath of the headlines.

const puppeteer = require('puppeteer');

const scrapeWebsite = async () => {
  let stories = [];
  const browser = await puppeteer.launch({
    headless: true,
    timeout: 20000,
    ignoreHTTPSErrors: true,
    slowMo: 0,
    args: [
      '--disable-gpu',
      '--disable-dev-shm-usage',
      '--disable-setuid-sandbox',
      '--no-first-run',
      '--no-sandbox',
      '--no-zygote',
      '--window-size=1280,720',
    ],
  });

  try {
    const page = await browser.newPage();

    await page.setViewport({ width: 1280, height: 720 });

    // Block images, videos, fonts from downloading
    await page.setRequestInterception(true);

    page.on('request', (interceptedRequest) => {
      const blockResources = ['script', 'stylesheet', 'image', 'media', 'font'];
      if (blockResources.includes(interceptedRequest.resourceType())) {
        interceptedRequest.abort();
      } else {
        interceptedRequest.continue();
      }
    });

    // Change the user agent of the scraper
    await page.setUserAgent(
      'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36'
    );

    await page.goto('https://www.nytimes.com/', {
      waitUntil: 'domcontentloaded',
    });

    const storySelector = 'section.story-wrapper h3';

    // Only get the top 10 headlines
    stories = await page.$$eval(storySelector, (divs) =>
      divs.slice(0, 10).map((div, index) => `${index + 1}. ${div.innerText}`)
    );
  } catch (error) {
    console.log(error);
  } finally {
    if (browser) {
      await browser.close();
    }
  }
  return stories;
};

module.exports = scrapeWebsite;

3. Write the Firebase Function

Inside the index.js file, import the scraper function and export it as a Firebase function. We are also writing a scheduled function that will run every day and will call the scraper function.

It is important to increase the function memory and time out limits as Chrome with Puppeteer is a heavy resource.

// index.js
const functions = require('firebase-functions');
const scrapeWebsite = require('./pptr');

exports.scrape = functions
  .runWith({
    timeoutSeconds: 120,
    memory: '512MB' || '2GB',
  })
  .region('us-central1')
  .https.onRequest(async (req, res) => {
    const stories = await scrapeWebsite();
    res.type('html').send(stories.join('<br>'));
  });

exports.scrapingSchedule = functions.pubsub
  .schedule('09:00')
  .timeZone('America/New_York')
  .onRun(async (context) => {
    const stories = await scrapeWebsite();
    console.log('The NYT headlines are scraped every day at 9 AM EST', stories);
    return null;
  });

4. Deploy the Function

If you wish to test the function locally, you may run the npm run serve command and navigate to the function endpoint on localhost. When you are ready to deploy the function to the cloud, the command is npm run deploy.

Puppeteer Firebase Function

5. Test the Scheduled Function

If you would like to test the scheduled function locally, you can run the command npm run shell to open an interactive shell for invoking functions manually with test data. Here type the function name scrapingSchedule() and hit enter to get the function output.

Firebase Functions Shell



from Digital Inspiration https://ift.tt/63ab4Eg

Thursday, 28 April 2022

How to Build a HTML Form for Uploading Files to Google Cloud Storage

Let’s write a simple web application that will allow users to upload files to Google Cloud Storage without authentication. The client site of the application will have an HTML form with one or more input fields. The server side is a Node.js application that will handle the file upload. The application may be deployed to Google Cloud Run, Firebase Function or as a Google Cloud Function.

HTML Form

Our HTML form includes a name field and a file input field that accepts only image files. Both the fields are required.

When the user submits the form, the form data is sent to the server, encoded as multipart/form-data, using the Fetch API. The server will validate the form data and if the form is valid, it will upload the file to Google Cloud Storage.

<form method="post" enctype="multipart/form-data">
  <input type="text" name="name" id="name" placeholder="Your name" required />
  <input type="file" name="image" accept="image/*" required />
  <input type="submit" value="Submit Form" />
</form>

<script>
  const formElem = document.querySelector('form');
  formElem.addEventListener('submit', async (e) => {
    e.preventDefault();
    const formData = new FormData();
    formData.append('name', e.target[0].value);
    formData.append('file', e.target[1].files[0]);
    const response = await fetch('/submitform', {
      method: 'POST',
      body: formData,
    });
    const data = await response.text();
    return data;
  });
</script>

Node.js Application

Our application will have two routes:

  1. The home (/) route that will display the form.
  2. The submit form route that will handle the file upload.
// index.js
const express = require('express');
const router = require('./router');

const app = express();

app.get('/', (_, res) => {
  res.sendFile(`${__dirname}/index.html`);
});

app.use(express.json({ limit: '50mb' }));
app.use(express.urlencoded({ extended: true, limit: '50mb' }));
app.use(router);

app.listen(process.env.PORT || 8080, async () => {
  console.log('listening on port 8080');
});

Since the Express server cannot handle multi-part form data, we are using the Multer middleware to parse the form data that includes both text content and binary data. Also, we are discarding the original file name of the uploaded file and assigned our own unique file name generated from the uuid library.

// router.js
const express = require('express');
const { Storage } = require('@google-cloud/storage');
const { v4: uuidv4 } = require('uuid');
const multer = require('multer');

const storage = new Storage();
const router = express.Router();
const upload = multer();

router.post('/submit', upload.single('file'), async (req, res) => {
  const { name } = req.body;
  const { mimetype, originalname, size } = req.file;
  if (!mimetype || mimetype.split('/')[0] !== 'image') {
    return res.status(400).send('Only images are allowed');
  }
  if (size > 10485760) {
    return res.status(400).send('Image must be less than 10MB');
  }
  const bucketName = '<<GOOGLE_CLOUD_STORAGE_BUCKET_NAME>>';
  const fileExtension = originalname.split('.').pop();
  const fileName = `${uuidv4()}.${fileExtension}`;
  const file = storage.bucket(bucketName).file(fileName);
  await file.save(req.file.buffer, {
    contentType: mimetype,
    resumable: false,
    public: true,
  });
  const url = `https://storage.googleapis.com/${bucketName}/${fileName}`;
  console.log(`File uploaded by ${name}`, url);
  return res.status(200).send(url);
});

module.exports = router;

Using Firebase Functions

If you are planning to deploy your file upload application to Firebase functions, some changes are required since our Multer middleware is not compatible with Firebase functions.

As a workaround, we can convert the image to base64 on the client side and then upload the image to Google Cloud Storage. Alternatively, you may use the Busboy middleware to parse the form data.

const convertBase64 = (file) => {
  return new Promise((resolve, reject) => {
    const fileReader = new FileReader();
    fileReader.readAsDataURL(file);
    fileReader.onload = () => {
      const base64String = fileReader.result;
      const base64Image = base64String.split(';base64,').pop();
      resolve(base64Image);
    };
    fileReader.onerror = (error) => {
      reject(error);
    };
  });
};

const handleUpload = async (file) => {
  const base64 = await convertBase64(file);
  const { type, size, name } = file;

  const response = await fetch('/submitform', {
    headers: { 'Content-Type': 'application/json' },
    method: 'POST',
    body: JSON.stringify({ type, size, name, base64 }),
  });

  const url = await response.text();
  console.log(`File uploaded by ${name}`, url);
};

The submit form handler will have to be tweaked to convert the base64 image to a buffer and then upload the image to Google Cloud Storage.

router.post('/upload', async (req, res) => {
  const { name, type, size, base64 } = req.body;
  const buffer = Buffer.from(base64, 'base64');
  await file.save(buffer, {
    contentType: type,
    resumable: false,
    public: true,
  });
  return res.send(`File uploaded`);
});

Cors for Cross-origin Requests

If you are serving the form on a different domain than the form handler, you will need to add the cors middleware to your application.

const cors = require('cors')({ origin: true });
app.use(cors);

You should set the access control policy of your Google Cloud Storage bucket to “Fine-grained” and not “Uniform.” When individual files are uploaded to Cloud Storage, they are public but the container folder is still private.



from Digital Inspiration https://ift.tt/3L1XsUr

Tuesday, 26 April 2022

How to Generate Dynamic QR Codes to Collect Payments through UPI

The BHIM UPI payment system has transformed the way we pay for goods and services in India. You scan a QR Code with your mobile phone, enter the secret PIN and the money gets instantly transferred from your bank account to the merchant’s bank account. There’s no transaction fee, the money is transferred in real-time and no data of the payer is shared with the payee.

Our online store initially accepted payments through credit cards only but after we added the UPI QR Code on the checkout page, more that 50% of customers in India are making payments through UPI. Other than instant payouts, the big advantage of UPI is that the merchant need not pay any transaction fee to PayPal or Stripe.

UPI QR Code

Create Dynamic UPI QR Codes

When you sign-up for any UPI app, be it PhonePe, Paytm, Google Pay, WhatsApp, Amazon Pay or any other BHIM UPI app, they will all provide you with a downloadable QR Code that you can attach in emails, invoices, embed on your website or print and paste near your billing counter. Customers will scan this QR Code, enter the billing amount, and confirm the payment.

The QR code provided by UPI apps are static and thus do not include the amount that has to be paid by the customer. Our UPI QR Code generator is designed solve this problem. It generates a dynamic QR Code that includes the amount and thus the merchant can control how much the customer has to pay after scanning the QR code.

Visit labnol.org/upi to generate dynamic QR codes for UPI payments. The website does not collect, store or process any of the data you enter in the QR Code form.

UPI QR Code in Google Sheets

If you are using Document Studio to generate customer invoices inside Google Sheets, you can write a simple function to embed the payment QR code in your PDF invoices. QR Codes can be added in emails as well that are sent through Gmail Mail Merge

QR Code in Google Sheets

Go to your Google Sheet, click the Extensions menu and choose Apps Script Editior from the dropdown. Copy-paste the UPI function inside the script editor and save your project.

/**
 * Create a UPI QR Code for payments
 *
 * @param {29.99} amount The amount requested in INR
 * @param {"xyz@upi"} merchant_upi UPI address of the merchant
 * @param {"Blue Widgets"} merchant_name Full name of the payee
 * @param {"250"} size The size of the QR image in pixels
 * @return The QR Code
 * @customfunction
 */

function UPI(amount, merchant_upi, merchant_name, size) {
  if (amount.map) {
    return amount.map(function (amount2) {
      return UPI(amount2, merchant_upi, merchant_name, size);
    });
  }

  const googleChart = `https://chart.googleapis.com/chart?cht=qr&choe=UTF-8`;
  const upiData = `upi://pay?pn=${merchant_name}&pa=${merchant_upi}&am=${amount}`;
  return `${googleChart}&chs=${size}x${size}&chl=${encodeURIComponent(upiData)}`;
}

Now you can add the QR code to any cell in the Google Sheet by using the UPI function in combination with the IMAGE function as shown in the following example:

=IMAGE(UPI("19.95", "digitalinspirationindia@icici", "Digital Inspiration", "200"))

How UPI QR Codes are Generated

Internally, the QR Code for UPI payments contains the merchant’s UPI ID, the amount to be paid and the payee name in the following format:

upi://pay?pa=<merchant_upi_id>&pn=<payee_name>&am=<amount>&tn=<transaction_notes>

If the am parameter is not provided in the UPI url, the customer will have to manually enter the amount in the UPI app before confirming the payment. The UPI deeplink specs also recommend using the mam (minimum amount) parameter to specify the minimum amount that the customer has to pay. Set its value to “null” so that the customer cannot pay less than the specified amount.

You may also include custom notes in the QR code and these will be sent to you in the transaction history of your bank statement.



from Digital Inspiration https://ift.tt/okIKcYw

Saturday, 23 April 2022

How to Use Google OAuth 2.0 to Access Google APIs with Refresh Token

Let’s build a simple web application that uses Google OAuth 2.0 to access Google APIs. The user can sign-in with their Google account and authorize the application to access their Google Drive or any other Google service.

When the user signs in, Google redirects the user to the Google OAuth 2.0 authorization page. The user is asked to grant access to the application. The application then exchanges the authorization code for an access token and a refresh token. The access token will expire after an hour but the refresh token will be valid indefinitely (unless manually revoked by the user).

We’ll thus store the refresh token in Cloud Firestore, and use it to generate a new access token whenever the application needs to access Google APIs on behalf of the user.

We are not using Google Sign-in with Firebase Authentication since it does not provide the refresh token that is required to run background API tasks unattended.

Step 1: Create the Google OAuth 2.0 Client

Create a new OAuth 2.0 client inside your Google Cloud project as described in this step by step guide.

Inside your Google Cloud Console, go the APIs & Services section, click on Credentials and click on Create credentials > OAuth Client Id to create a new client ID.

Google OAuth Sign-in

During development, you can put https://localhost:5001/oauthCallback as the redirect URI since the Firebase emulator, by default, will run the web application locally on port 5001.

Make a note of the Client ID and Client Secret provided by Google.

Step 2: Initialize Firebase Function

Open your terminal, create a new project directory and initialize the Firebase project.

$ mkdir oauth2-application
$ cd oauth2-application
$ npx firebase init functions
$ npm install googleapis

You may choose the Use an existing Firebase project option and then select your Google Cloud project with the function. Switch to the functions directory.

Step 3. Initialize Firebase Environment Variables

Create a new .env file and add the following environment variables:

CLIENT_ID=<your client ID>
CLIENT_SECRET=<your client secret>
REDIRECT_URI=<your redirect URI>

Step 4. Generate Authorization URL

We’ll create a function that generates an authorization URL for the user to sign-in with their Google account. In addition to the drive scope, our application also requests for the userinfo.email scope to get the user’s email address.

const functions = require('firebase-functions');
const { google } = require('googleapis');

exports.googleLogin = functions.https.onRequest((request, response) => {
  const SCOPES = [
    'https://www.googleapis.com/auth/userinfo.email',
    'https://www.googleapis.com/auth/drive.metadata.readonly',
  ];
  const oAuth2Client = new google.auth.OAuth2(
    process.env.CLIENT_ID,
    process.env.CLIENT_SECRET,
    process.env.REDIRECT_URI
  );
  const authUrl = oAuth2Client.generateAuthUrl({
    access_type: 'offline',
    scope: SCOPES,
    prompt: 'consent',
    login_hint: request.query.email_address || '',
  });
  response.set('Cache-Control', 'private, max-age=0, s-maxage=0');
  response.redirect(authUrl);
});

We set the access_type to offline to get a refresh token. The consent is set to prompt to force the user to consent to the application. We also set the login_hint to the email address of the user if they are logged into multiple Google accounts.

Step 5. Store the Refresh Token

Once the user signs in, Google redirects the user to the redirect URI. The redirect URI contains the authorization code that we need to exchange for an access token and refresh token for storing in the database.

const functions = require('firebase-functions');
const admin = require('firebase-admin');
const { google } = require('googleapis');

admin.initializeApp();

exports.oAuthCallback = functions.https.onRequest(async (request, response) => {
  const { query: { error, code } = {} } = request;

  // User may deny access to the application.
  if (error) {
    response.status(500).send(error);
    return;
  }

  const oAuth2Client = new google.auth.OAuth2(
    process.env.CLIENT_ID,
    process.env.CLIENT_SECRET,
    process.env.REDIRECT_URI
  );

  // Exchange the authorization code for an access token.
  const { tokens } = await oAuth2Client.getToken(code);

  oAuth2Client.setCredentials(tokens);
  const oauth2 = google.oauth2({
    auth: oAuth2Client,
    version: 'v2',
  });

  // Get the user's email address and Google user ID
  const { data } = await oauth2.userinfo.get();
  const { id, email } = data;
  const { refresh_token } = tokens;

  // Store the refresh token in the Firestore database.
  // Set merge: true to not overwrite any other data in the same document
  await admin.firestore().collection('users').doc(id).set({ id, email, refresh_token }, { merge: true });

  response.set('Cache-Control', 'private, max-age=0, s-maxage=0');
  response.send(`User ${email} is authorized! ${id}`);
});

Here’s how the documents are stored in the Firestore NoSQL database:

Firestore Access Token

Step 6: Access Google APIs

Now that we have the refresh token, we can use it to generate a new access token and access the Google APIs. In our example, the drive function will return the 5 most recent files from Google Drive of the authorized user.

const functions = require('firebase-functions');
const admin = require('firebase-admin');
const { google } = require('googleapis');

admin.initializeApp();

exports.drive = functions.https.onRequest(async (request, response) => {
  const { user_id = '' } = request.query;
  const user = await admin.firestore().collection('users').doc(user_id).get();
  if (!user.exists) {
    response.status(404).send(`User ${user_id} not found`);
    return;
  }

  const { refresh_token } = user.data();
  const oAuth2Client = new google.auth.OAuth2(
    process.env.CLIENT_ID,
    process.env.CLIENT_SECRET,
    process.env.REDIRECT_URI
  );
  oAuth2Client.setCredentials({ refresh_token });

  const googleDrive = google.drive({ version: 'v3', auth: oAuth2Client });
  const { data: { files = [] } = {} } = await googleDrive.files.list({
    pageSize: 5,
    fields: 'files(id, name)',
  });

  response.status(200).send({ files });
});

Step 7: Create a Firebase Cloud Function

You can run the following command to test the functions locally:

firebase emulators:start --only functions

When you are ready to deploy the functions to your Firebase project, you can run the following command:

firebase deploy --only functions


from Digital Inspiration https://ift.tt/l25Iepx