Categories
rails ruby Security Uncategorized

invisible reCAPTCHA v3 with Rails and Devise

We’re recently being hit with more and more bots.

Some of them are crawling our site and hitting valid or invalid endpoints. We’ve seen plenty of credential stuffing attacks as well. Most of them distributed across different IPs, with each IP hitting us at low frequency.

And most recently, someone abused our registration form to spam their recipients via our system.

It was quite clever actually. When you register, you enter your name, email and password. We then send a confirmation email saying something like

“Hey Roberta, thanks for joining. Please click here to confirm your account”.

Now those guys used their victim’s email address, and used the name field to link to a URL. So those users would get an email

“Hey lottery tickets http://some.link, thanks for joining. Please click here to confirm your account”.

Slimey. Naturally our own email system took the hit of sending spam. Double ouch.

Luckily, we had some anomaly detection in place, and we blocked those guys quickly. They used some browser automation from a fixed set of IPs, so it was easy to block. At least until the next wave…

I’ve been dealing with those types of scenarios with fail2ban, and it’s really quite effective. We define regular expressions to inspect our log files matching certain patterns, and then ban if we see repeated offensive behaviour. fail2ban is limited though in some aspects.

First of all, those rules are a bit of a pain to create and maintain, and you need to make sure the offending IP appears on the application log record you want to capture. In some cases it’s easy, but not always. The bigger problem however is that fail2ban doesn’t scale. The more servers you have — let’s say in a load-balanced setup — the less accurate fail2ban becomes. Or you need to aggregate all your logs on a single fail2ban host, creating a single point of failure or a bottleneck…

So I was searching for a better solution. Sadly there aren’t many. Cloudflare, which we also use, offers some degree of protection. But it’s not as flexible. And of course there’s reCAPTCHA. You know, those annoying things asking you to pick traffic signs, or even just click “I’m not a robot”?

Now I was initially hesitating to use it. I’m not sure why, but the fact that it doesn’t really have any real competition bothers me. Plus, as a user, I’m frequently annoyed by those challenges, and I hate this experience.

Luckily, the latest version of reCAPTCHA (v3) doesn’t present any user-facing challenges. It’s completely invisible. The no-competition problem is not something I can solve. I discovered that even Cloudflare itself uses reCAPTCHA in some cases! And these guys have their own Javascript challenge and what not… So I decided to bite the bullet, and give it a shot.

Setting it up is surprisingly simple, and from my limited experience, quite effective. That is, the scores it produced were surprisingly accurate. Albeit my ability to test different scenarios was limited.

I’ll try to give some pointers for implementing reCAPTCHA v3 with Rails 5.1 and Devise 4. The implementation can work on any form or controller however, and not just with Devise.

Categories
work

Is it zen at work?

I really enjoyed reading It Doesn’t Have to Be Crazy at Work recently. It’s another bestseller from Basecamp. After reading Rework before, a lot of things felt a bit familiar. Too familiar, perhaps. But their new book still has a few new ideas and covers things from a different angle. Well worth a read.

Working remotely, and at a company with very similar culture and values to Basecamp, a lot of what they write about resonated. Much of the way we structure things at work was inspired or wholesale copied from Basecamp to be completely honest. Why reinvent the wheel when someone hands you an instruction manual for building a perfect one?

But some things caught me by surprise. It felt a little too zen, or even contradictory in some cases? But it definitely gave me pause. Maybe we’re doing some things wrong, and can improve even further? I’m still unsure, but hope we can experiment with some ideas. Let me jump into a few examples…

Categories
Technology

SmugMug video data loss

I’ve written only recently about SmugMug, and expressed my frustration as a developer who built an open-source tool for their platform. This has led me to try to get my data out of SmugMug as I was considering moving away from it as well… Only to discover that some of my video data is lost and/or not being made available. This applies only for videos. Both the quality is potentially degraded, and the metadata that is available on SmugMug cannot be downloaded or exported out of their platform.

If you upload a video to SmugMug, they don’t actually store the original video for you. Here’s a quote from their official page:

Originals

We don’t keep a copy of the original video you upload. We make high-quality display copies, which are probably altered from what you send us.

I’m not sure what this high-quality display copy means in actual terms, but I won’t be surprised if some quality is lost in the process. For a company that prides itself caring for photographers, where quality and reliability is key, I find it rather vague and disconcerting.

Furthermore, what isn’t mentioned on this page is that if you want to download your videos again, those videos would be stripped-out of the original metadata as well. This metadata includes information about the Camera you used, the date/time of the video, location information etc. All of this data is still stored on SmugMug, but you can’t get it back when you download it. It’s locked-in. For me, personally, this is even worse than losing video quality. My video memories are very tightly linked to the time and location of those original scenes. Without this info, the videos are next to useless. I just can’t find them (without going manually through hundreds or thousands of dateless and location-less videos, that is)

Categories
Uncategorized

Introducing envwarden – manage your server secrets with Bitwarden

TL;DR

envwarden is a simple, open-source script that lets you easily manage your server secrets with the Bitwarden password manager.

Categories
python ruby Technology

An open letter to SmugMug

TL;DR

SmugMug is great, but its developer ecosystem is, in my humble opinion, crumbling, and can use some serious love — or put out of its misery and die…

Dear SmugMug, there are lots of people, myself included, who want to see you thrive and succeed. People who are spending their free time, resources and energy on sharing their tools with the community. People who can build great things on top of SmugMug, and can make SmugMug even more successful than it currently is. Please don’t forget us. We are the potential evangelists, multipliers, and we do this for free. Please treat our free gifts with respect. These gifts might be free, but they are precious. They should be cherished, rather than ignored, or discarded.

Categories
optimization Performance rails ruby

The dark side of Rails Russian Doll Caching

Rails Russian Doll Caching is super cool. It’s simple, effective and makes caching much easier to reason about.

There’s a dark side to it though. Not in the negative, evil sense. But rather the hidden, unknown, confusing sense.

Categories
iphone mobile Technology

The one (stupid) feature

My wife and I started using Amazon photos a while ago. I didn’t think that much of it first, but it was included with our Prime membership, and offered an automatic upload from our phones, plus free storage (for photos), so why not?

Fast forward a couple of years. We’ve since cancelled Prime, and I wanted to switch to Dropbox, which has comparable automatic upload, a mobile app, and superior sync with a proper linux client. But I couldn’t. Why? Because of this one (stupid) feature.

Which one?

Categories
Technology

Why I’m not using Fastmail

Prepare for a somewhat ranty post, but it doesn’t come from a bad place. I honestly want Fastmail to succeed. I’m eager to see more alternatives for email hosting, and clients (and there are scaringly few).
I also acknowledge that some of the problems I bumped into are quite specific to my own setup, which isn’t common. So in some ways, it’s not about you, Fastmail. It’s me. Make your own judgement.

TL;DR – Fastmail is pretty neat, but their support sucks. Their support ticket system sucks even more, and their product is not clear enough to work without support. From my personal experience anyway.

Categories
monitoring optimization Performance python Technology

a scalable Analytics backend with Google BigQuery, AWS Lambda and Kinesis

On my previous post, I described the architecture of Gimel – an A/B testing backend using AWS Lambda and redis HyperLogLog. One of the commenters suggested looking into Google BigQuery as a potential alternative backend.

It looked quite promising, with the potential of increasing result accuracy even further. HyperLogLog is pretty awesome, but trades space for accuracy. Google BigQuery offers a very affordable analytics data storage with an SQL query interface.

There was one more thing I wanted to look into and could also improve the redis backend – batching writes. The current gimel architecture writes every event directly to redis. Whilst redis itself is fast and offers low latency, the AWS Lambda architecture means we might have lots of active simultaneous connections to redis. As another commenter noted, this can become a bottleneck, particularly on lower-end redis hosting plans. In addition, any other backend that does not offer low-latency writes could benefit from batching. Even before trying out BigQuery, I knew I’d be looking at much higher latency and needed to queue and batch writes.

Categories
monitoring optimization Performance python Technology

a Scaleable A/B testing backend in ~100 lines of code (and for free*)

(updated: 2016-05-07)

tip-toeing on the shoulders of giants

Before I dive into the reasons for writing Gimel in the first place, I’d like to cover what it’s based on. Clearly, 100 lines of code won’t get you that far on their own. There are two (or three) essential components this backend is running on, which makes it scalable and also light-weight in terms of actual code:

  1. AWS Lambda (and Amazon API Gateway) – handle the requests to both store experiment data and to return the experiment results.
  2. Redis – using Sets and HyperLogLog data structures to store the experiment data. It provides an extremely efficient memory footprint and great performance.

For free?

css.php