Fork me on GitHub
#announcements
<
2020-02-17
>
djanus21:02:26

Hello! After more than 3 years in the making, I am proud to announce the release of Skyscraper 0.3.0, a scraping framework that helps you build structured dumps of whole websites. Home: https://github.com/nathell/skyscraper/ Major improvements in 0.3.0: • Skyscraper has been rewritten from scratch to be asynchronous and multithreaded, based on core.async. • Skyscraper now supports saving the scrape results to a SQLite database. • In addition to the classic scrape function that returns a lazy sequence of nodes, there is an alternative, non-lazy, imperative interface (`scrape!`) that treats producing new results as side-effects. • reaver (using JSoup) is now available as an optional underlying HTML parsing engine, as an alternative to Enlive. See NEWS.md for a complete list. I’m particularly happy about the database abilities of this release – for a glimpse of what it can do, see https://cljdoc.org/d/skyscraper/skyscraper/0.3.0/doc/database-integration. Happy scraping! 🏛️

🏛️ 32
👍 84
👏 48