Show newer
Started a new about web crawling!

This time around I'm using and , usually I'm more fond of Pelican. So far I have a bit conflicted experience but it's not bad! I've looked into but it didn't look quite ready for the content I'm looking to produce.

Any feedback is welcome :)

I've recently wrote a blog on -crawling with

I'm now preparing part 2 of this blog post that would delve deeper into: async crawling and part 3 of reverse engineering websites for crawling.

I'm doing this in preparation of writing a book with publishing on web crawling with python so any feedback is welcome!

Amazon's API is as awkward as it gets. I feel like we're going back in time when it comes to API design.
What's next? Delete("write a poem about a key I'd like to delete")

Ugh never had as ugly python code in as I have with

That feeling when you are dumping these complex, super nested json documents and management tells you they need it a csv format as well. 😀

So far my approach is flatten the json and dump it as is. It looks ugly but there's no other way other than parsing by hand, right?

Finally got 10k stackoverflow rep!
I'm mostly doing -crawling tags.

Lately the question quality hasn't been that great - there seems to be an influx of new users. However there are a lot of new great mentors to compete with as well.
This means I kinda need to expand my tag subscription if I want to reach 20k and above :)

Thinking of writing a blog on operating system stack: web browser, terminal emulator and window manager.
There's just so much to say that I'm not sure how could I possibly fit it into one blog and avoid complexities.

Been running this stack for over a year now and it has been great!

Anyone knows a simple alternative to that would use disk and ram intelligently?

I love Redos but man it's a memory hog and it's keeping data that hasn't been touched for days in memory rather than putting it away to disk...

Mastodon for Tech Folks

This Mastodon instance is for people interested in technology. Discussions aren't limited to technology, because tech folks shouldn't be limited to technology either!