Update Blog “using-retrofit-to-disguise-scraping-as-a-rest-api”

This commit is contained in:
Harsh Shandilya 2023-09-13 07:23:21 +00:00
parent e74fd88f89
commit 454e11249e
1 changed files with 5 additions and 4 deletions

View File

@ -1,13 +1,14 @@
---
title: Using Retrofit to disguise scraping as a REST API
date: 2023-09-02T21:05:24.630Z
date: 2023-09-13T07:08:10.659Z
summary: We've all used Retrofit to interact with REST APIs for as long as we
can remember, but what if there was no API?
draft: true
---
While trying to implement post search functionality in [Claw](https://msfjarvis.dev/g/compose-lobsters), my [lobste.rs](https://lobste.rs) client I stumbled into a _tiny_ problem: there was no API! lobste.rs has a [web-based search](https://lobste.rs/search) but no equivalent mechanism via the JSON API I was using for doing everything else within the app.
Square's Retrofit is best known for being the gold standard of REST clients in the JVM/Android ecosystem, but it's excellent API design also lends itself to great extensibility which we will leverage today.
Thankfully, lobste.rs has a fairly JavaScript-free front-end which makes it a suitable candidate for HTML scraping. The search page used URL query parameters to specify the search term which made it quite easy to reliably construct a URL which would contain the posts we were interested in and it looked something like this: `/search?q={query}&what=stories&order=newest&page={page}`.
While trying to implement post search functionality in [Claw](https://msfjarvis.dev/g/compose-lobsters), my [lobste.rs](https://lobste.rs) client I stumbled into a *tiny* problem: there was no API! lobste.rs has a [web-based search](https://lobste.rs/search) but no equivalent mechanism via the JSON API I was using for doing everything else within the app.
However, with a little [Jsoup](https://jsoup.org) and Retrofit's [Converter](https://github.com/square/retrofit/blob/40c4326e2c608a07d2709bfe9544cb1d12850d11/retrofit/src/main/java/retrofit2/Converter.java) API it became relatively easy to implement this feature without the rest of the app being able to tell that it was not backed by REST.
Thankfully, lobste.rs has a fairly JavaScript-free front-end which makes it a suitable candidate for HTML scraping. The search page used URL query parameters to specify the search term which made it quite easy to reliably construct a URL which would contain the posts we were interested in, and it looked something like this: `/search?q={query}&what=stories&order=newest&page={page}`.
Retrofit has a [Converter](https://github.com/square/retrofit/blob/40c4326e2c608a07d2709bfe9544cb1d12850d11/retrofit/src/main/java/retrofit2/Converter.java) API which lets users convert request/response bodies to and from their HTTP representations. We will leverage this to convert the raw HTML body we will receive from the search page into a list of LobstersPost objects.