Follow

crawl.blog/first/
Started a new about web crawling!

This time around I'm using and , usually I'm more fond of Pelican. So far I have a bit conflicted experience but it's not bad! I've looked into but it didn't look quite ready for the content I'm looking to produce.

Any feedback is welcome :)

@Wraptile - Five years of web scraping? Did you get the whole 'Net?

Your planned posts look promising. If you're open to requests, would you do an article or two on "why we scraped ___" for some of your more interesting projects?

@ptvirgo there are always new things and challenges coming up so the years really flew by. 😬

Regarding the idea - it's great! But I really need to dig around for interesting cases. Most of the super exciting things from the top of my head fell through after estimations. My favorite one that I didn't get to work on was a broad crawler for US government that would crawl escort forums to collect data on human trafficking. It was a tough one but I hope they went through with it - great idea!

@Wraptile - FWIW, I would definitely read an article about what crawlers can, and can't, do to help fight human trafficking.

Sign in to participate in the conversation
Mastodon for Tech Folks

This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either! We adhere to an adapted version of the TootCat Code of Conduct and have documented a list of blocked instances. Ash is the admin and is supported by Fuzzface, Brian!, and Daniel Glus as moderators. Hosting costs are largely covered by our generous supporters on Patreon – thanks for all the help!