โŒ

Reading view

Weekly Update 447

Weekly Update 447

I'm home! Well, for a day, then it's off to the other side of the country (which I just flew over last night on the way back from Dublin ๐Ÿคฆโ€โ™‚๏ธ) for an event at the Microsoft Accelerator in Perth on Monday. Such is the path we've taken, but it does provide some awesome opportunities to meet up with folks around the world and see some really interesting stuff. Come by if you're over that way or if you're on the east coast of Aus, I'll be at NDC Melbourne only a couple of weeks later. And somewhere in the midst of all that, we'll get this HIBP UX rebuild finished...

Weekly Update 447
Weekly Update 447
Weekly Update 447
Weekly Update 447

References

  1. Sponsored by:ย Malwarebytes Browser Guard blocks phishing, ads, scams, and trackers for safer, faster browsing
  2. I'm speaking at the Microsoft Student Accelerator in Perth on Monday (it's free, and you don't need to be a student ๐Ÿ™‚)
  3. We're going to incorporate some more partners into HIBP where they can offer useful services to data breach victims (the thinking is that they'll appear on the dedicated breach page where they can offer something useful as it relates to that specific incident)
  4. The HIBP UX rebuild repo is tracking everything we're doing (chime in on the discussions or submit any issues you find)

Weekly Update 446

Weekly Update 446

After an unusually long day of travelling from Iceland, we've finally made it to the land of Guinness, Leprechauns, and a tax haven for tech companies. This week, there are a few more lessons from the successful phish against me the previous week, and in happier news, there is some really solid progress on the HIBP UX rebuild. We spent a bunch of time with Stefan and Ingiber (the guy rebuilding the front end) whilst in Reykjavik and now have a very clear plan mapped out to get this finished in the next 6 weeks. More on that in this week's update, enjoy!

Weekly Update 446
Weekly Update 446
Weekly Update 446
Weekly Update 446

References

  1. Sponsored by:ย Malwarebytes Browser Guard blocks phishing, ads, scams, and trackers for safer, faster browsing
  2. Silent Push has done some great analysis on the source of my phish (they've linked it similar attacks against SendGrid and Mailgun accounts, among others)
  3. Every outstanding HIBP UX rebuild task is now on public display (we're targeting 17 May to complete all this and roll out the new site)

Weekly Update 445

Weekly Update 445

Well, this certainly isn't what I expected to be talking about this week! But I think the fact it was someone most people didn't expect to be on the receiving end of an attack like this makes it all the more consumable. I saw a lot of "if it can happen to Troy, it can happen to anyone" sort of commentary and whilst it feels a bit of obnoxious for me to be saying it that way, I appreciate the sentiment and the awareness it drives. It sucked, but I'm going to make damn sure we get a lot of mileage out of this incident as an industry. I've no doubt whatsoever this is a net-positive event that will do way more good than harm. On that note, stay tuned for the promised "Passkeys for Normal People" blog post, I hope to be talking about that in next week's video (travel schedule permitting). For now, here's the full rundown of how I got phished:

Weekly Update 445
Weekly Update 445
Weekly Update 445
Weekly Update 445

References

  1. Sponsored by:ย Malwarebytes Browser Guard blocks phishing, ads, scams, and trackers for safer, faster browsing
  2. I obviously didn't like being on the receiving end of this, but I reckon 34 minutes from pwned to public disclosure is a new record ๐Ÿ˜Š (this is what I'm going to be driving organisations towards in many future data breach cases)
  3. Despite me falling for something I should have spotted, the public response and press had been outstandingly positive (that's a piece from this week's sponsor, I felt their writeup summed things up nicely)

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

You know when you're really jet lagged and really tired and the cogs in your head are just moving that little bit too slow? That's me right now, and the penny has just dropped that a Mailchimp phish has grabbed my credentials, logged into my account and exported the mailing list for this blog. I'm deliberately keeping this post very succinct to ensure the message goes out to my impacted subscribers ASAP, then I'll update the post with more details. But as a quick summary, I woke up in London this morning to the following:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

I went to the link which is on mailchimp-sso.com and entered my credentials which - crucially - did not auto-complete from 1Password. I then entered the OTP and the page hung. Moments later, the penny dropped, and I logged onto the official website, which Mailchimp confirmed via a notification email which showed my London IP address:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

I immediately changed my password, but not before I got an alert about my mailing list being exported from an IP address in New York:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

And, moments after that, the login alert from the same IP:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

This was obviously highly automated and designed to immediately export the list before the victim could take preventative measures.

There are approximately 16k records in that export containing info Mailchimp automatically collects and they appear as follows:

[redacted]@gmail.com,Weekly,https://www.troyhunt.com/i-now-own-the-coinhive-domain-heres-how-im-fighting-cryptojacking-and-doing-good-things-with-content-security-policies/#subscribe,2,"2024-04-13 22:03:08",160.154.[redacted].[redacted],"2024-04-13 22:00:50",160.154.[redacted].[redacted],5.[redacted lat],'-4.[redacted long],0,0,Africa/Abidjan,CI,AB,"2024-04-13 22:03:08",130912487,3452386287,,

Every active subscriber on my list will shortly receive an email notification by virtue of this blog post going out. Unfortunately, the export also includes people who've unsubscribed (why does Mailchimp keep these?!) so I'll need to work out how to handle those ones separately. I've been in touch with Mailchimp but don't have a reply yet, I'll update this post with more info when I have it.

I'm enormously frustrated with myself for having fallen for this, and I apologise to anyone on that list. Obviously, watch out for spam or further phishes and check back here or via the social channels in the nav bar above for more. Ironically, I'm in London visiting government partners, and I spent a couple of hours with the National Cyber Security Centre yesterday talking about how we can better promote passkeys, in part due to their phishing-resistant nature. ๐Ÿคฆโ€โ™‚๏ธ

More soon, I've hit the publish button on this 34 mins after the time stamp in that first email above.

More Stuff From After Initial Publish

Every Monday morning when I'm at home, I head into a radio studio and do a segment on scams. It's consumer-facing so we're talking to the "normies" and whenever someone calls in and talks about being caught in the scam, the sentiment is the same: "I feel so stupid". That, friends, is me right now. Beyond acknowledging my own foolishness, let me proceed with some more thoughts:

Firstly, I've received a gazillion similar phishes before that I've identified early, so what was different about this one? Tiredness, was a major factor. I wasn't alert enough, and I didn't properly think through what I was doing. The attacker had no way of knowing that (I don't have any reason to suspect this was targeted specifically at me), but we all have moments of weakness and if the phish times just perfectly with that, well, here we are.

Secondly, reading it again now, that's a very well-crafted phish. It socially engineered me into believing I wouldn't be able to send out my newsletter so it triggered "fear", but it wasn't all bells and whistles about something terrible happening if I didn't take immediate action. It created just the right amount of urgency without being over the top.

Thirdly, the thing that should have saved my bacon was the credentials not auto-filling from 1Password, so why didn't I stop there? Because that's not unusual. There are so many services where you've registered on one domain (and that address is stored in 1Password), then you legitimately log on to a different domain. For example, here's my Qantas entry:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

And the final thought for now is more a frustration that Mailchimp didn't automatically delete the data of people who unsubscribed. There are 7,535 email addresses on that list which is nearly half of all addresses in that export. I need to go through the account settings and see if this was simply a setting I hadn't toggled or something similar, but the inclusion of those addresses was obviously completely unnecessary. I also don't know why IP addresses were captured or how the lat and long is calculated but given I've never seen a prompt for access to the GPS, I imagine it's probably derived from the IP.

I'll park this here and do a deeper technical dive later today that addresses some of the issues I've raised above.

The Technical Bits

I'll keep writing this bit by bit (you may see it appear partly finished while reading, so give the page a refresh later on), starting with the API key that was created:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

This has now been deleted so along with rolling the password, there should no longer be any persistent access to the account.

Unfortunately, Mailchimp doesn't offer phishing-resistant 2FA:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

By no means would I encourage people not to enable 2FA via OTP, but let this be a lesson as to how completely useless it is against an automated phishing attack that can simply relay the OTP as soon as it's entered. On that note, another ridiculous coincidence is that in the same minute that I fell for this attack, I'd taken a screen cap of the WhatsApp message below and shown Charlotte - "See, this reinforces what we were talking about with the NCSC yesterday about the importance of passkeys":

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

Another interesting angle to this is the address the phish was sent to:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

The rest of that address is probably pretty predictable (and I do publish my full "normal" address on the contact page of this blog, so it's not like I conceal it from the public), but I find it interesting that the phish came to an address only used for Mailchimp. Which leaves two possibilities:

  1. Someone specifically targeted me and knew in advance the pattern I use for the address I sign up to services with. They got it right first go without any mail going to other addresses.
  2. Someone got the address from somewhere else, and I've only ever used it in one place...

Applying some Occam's razor, it's the latter. I find the former highly unlikely, and I'd be very interested to hear from anyone else who uses Mailchimp and received one of these phishes.

Still on email addresses, I originally read the phish on my iThing and Outlook rendered it as you see in the image above. At this point, I was already on the hook as I intended to login and restore my account, so the way the address then rendered on the PC didn't really stand out to me when I switched devices:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

That's so damn obvious ๐Ÿคฆโ€โ™‚๏ธ The observation here is that by not rendering the sender's address, Outlook on iOS hid the phish. But having said that, by no means can you rely on the address as a solid indicator of authenticity but in this case, it would have helped.

Curious as to why unsubscribed users were in the corpus of exported data, I went searching for answers. At no point does Mailchimp's page on unsubscribing mention anything about not deleting the user's data when they opt out of receiving future emails. Keeping in mind that this is AI-generated, Google provided the following overview:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

That "Purpose of Keeping Unsubscribes" section feels particularly icky and again, this is the AI and not Mailchimp's words, but it seems to be on point. I can go through and delete unsubscribed addresses (and I'll do that shortly as the last thing I'm going to do now is rush into something else), but then it looks like that has to be a regular process. This is a massive blindspot on Mailchimp's behalf IMHO and I'm going to provide that feedback to them directly (just remembered I do know some folks there).

I just went to go and check on the phishing site with the expectation of submitting it to Google Safe Browsing, but it looks like that will no longer be necessary:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

2 hours and 15 minutes after it snared my creds, Cloudflare has killed the site. I did see a Cloudflare anti-automation widget on the phishing page when it first loaded and later wondered if that was fake or they were genuinely fronting the page, but I guess that question is now answered. I know there'll be calls of "why didn't Cloudflare block this when it was first set up", but I maintain (as I have before in their defence), that it's enormously hard to do that based on domain or page structure alone without creating a heap of false positives.

On the question of the lat and long in the data, I just grabbed my own records and found an IP address belonging to my cellular telco. I had two records (I use them to test both the daily and weekly posts), both with the same IP address and created within a minute of each other. One had a geolocation in Brisbane and the other in far north Queensland, about 1,700km away. In other words, the coords do not pinpoint the location of the subscriber, but the record does contain "australia/brisbane,au,qld" so there's some rough geolocation data in there.

Loading the List into Have I Been Pwned

When I have conversations with breached companies, my messaging is crystal clear: be transparent and expeditious in your reporting of the incident and prioritise communicating with your customers. Me doing anything less than that would be hypocritical, including how I then handle the data from the breach, namely adding it to HIBP. As such, Iโ€™ve now loaded the breach and notifications are going out to 6.6k impacted individual subscribers and another 2.4k monitoring domains with impacted email addresses.

Looking for silver linings in the incident, Iโ€™m sure Iโ€™ll refer this blog post to organisations I disclose future breaches to. Iโ€™ll point out in advance that even though the data is โ€œjustโ€ email addresses and the risk to individuals doesnโ€™t present a likelihood of serious harm or risk their rights and freedoms (read that blog post for more), itโ€™s simply the right thing to do. In short, for those who read this in future, do not just as I say, but as I do.

The Washup

I emailed a couple of contacts at Mailchimp earlier today and put two questions to them:

  1. Are passkeys on your roadmap
  2. Where does Mailchimp stand on โ€œunsubscribeโ€ not deleting the data

A number of people have commented on social media about the second point possibly being to ensure that someone who unsubscribes canโ€™t then later be resubscribed. Iโ€™m not sure that argument makes a lot of sense, but Iโ€™d like to see people at least being given the choice. Iโ€™m going to wait on their feedback before deciding if I should delete all the unsubscribed emails myself, Iโ€™m not even sure if thatโ€™s possible via the UI or requires scripting against the API,.

The irony of the timing with this happening just as Iโ€™ve been having passkey discussions with the NCSC is something Iโ€™m going to treat as an opportunity. Right before this incident, Iโ€™d already decided to write a blog post for the normies about passkey, and now I have the perfect example of their value. Iโ€™d also discussed with the NCSC about creating a passkey equivalent of my whynohttps.com project which highlighted the largest services not implementing HTTPS by default. As such, Iโ€™ve just registered whynopasskeys.com (and its singular equivalent) and will start thinking more about how to build that out so we can collectively put some pressure on the services that donโ€™t support unphishable second factors. I actually attempted to register that domain whilst out walking today, only to be met with the following courtesy of DNSimple:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

Using a U2F key on really important stuff (like my domain registrar) highlights the value of this form of auth. Todayโ€™s phish could not have happened against this account, nor the other critical ones using a phishing resistant second factor and we need to collectively push orgs in this direction.

Sincere apologies to anyone impacted by this, but on balance I think this will do more good than harm and I encourage everyone to share this experience broadly.

Update 1: I'll keep adding more thoughts here via updates, especially if there's good feedback or questions from the community. One thing I'd intended to add earlier is that the more I ponder this, the more likely I think it is that my unique Mailchimp address was obtained from somewhere as opposed to guessed in any targeted fashion. A possible explanation is the security incident they had in 2022, which largely targeted crypto-related lists, but I imagine would likely have provided access to the email addresses of many more customers too. I'll put that to them when I get a response to my earlier email.

Update 2: I now have an open case with Mailchimp and they've advised that "login and sending for the account have been disabled to help prevent unauthorized use of the account during our investigation". I suspect this explains why some people are unable to now sign up to the newsletter, I'll try and get that reinstated ASAP (I'd rolled creds immediately and let's face it, the horse has already bolted).

Pondering this even further, I wonder if Mailchimp has any anti-automation controls on login? The credentials I entered into the phishing site were obviously automatically replayed to the legitimate site, which suggests something there is lacking.

I also realised another factor that pre-conditioned me to enter credentials into what I thought was Mailchimp is their very short-lived authentication sessions. Every time I go back to the site, I need to re-authenticate and whilst the blame still clearly lies with me, I'm used to logging back in on every visit. Keeping a trusted device auth'd for a longer period would likely have raised a flag on my return to the site if I wasn't still logged in.

Update 3: Mailchimp has now restored access to my account and the newsletter subscription service is working again. Here's what they've said:

We have reviewed the activity and have come to the same conclusion that the unauthorized export and API key from 198.44.136.84 was the scope of the access. Given we know how the access took place, the API key has been deleted, and the password has been reset, we have restored your access to the account.

They've also acknowledged several outstanding questions I have (such as whether passkeys are on the roadmap) and have passed them along to the relevant party. I'll update this post once I have answers.

There's been a lot of discussion around "Mailchimp are violating my local privacy laws by not deleting emails when I unsubscribe", and that's one of the outstanding questions I've sent them. But on that, I've had several people contact me and point out this is not the case as the address needs to be retained in order to ensure an opted-out individual isn't later emailed if their address is imported from another source. Read this explainer from the UK's ICO on suppression lists, in particular this para:

Because we donโ€™t consider that a suppression list is used for direct marketing purposes, there is no automatic right for people to have their information on such a list deleted.

I suspect this explains Mailchimp's position, but I suggest that should be clearer during the unsubscribe process. I just went through and tested it and at no time is it clear the email address will be retained for the purpose of supression:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

My suggestion would be to follow our approach for Have I Been Pwned where we give people three choices and allow them to choose how they'd like their data to be handled:

A Sneaky Phish Just Grabbed my Mailchimp Mailing List

At present, Mailchimp is effectively implementing the first option we provide and the folks that are upset were expecting the last option. Hopefully they'll consider a more self-empowering approach to how people's data is handled, I'll update this blog post once I have their response.

Update 4: Someone has pointed out that the sending email address in the phish actually belongs to a Belgian cleaning company called Group-f. It's not unusual for addresses like this to be used to send malicious mail as they usually don't have a negative reputation and more easily pass through spam filters. It also indicates a possible compromise on their end, so I've now reached to them to report the incident.

Update 5: I've been contacted by someone that runs a well-known website that received the same phishing email as me. They made the following observation regarding the address that received the phish:

We have subscribed to Mailchimp with an address that is only used to subscribe to services, no outgoing communication from us. The phishing emails were delivered to exactly this address, couldn't yet find them on any other address. This makes me very much believe that possibility #2 is the case - they got the address from somewhere.

This aligns with my earlier observation that a customer list may have been obtained from Mailchimp and used to send the phishing emails. They went on to say they were seeing multiple subsequent phishes targeting their Mailchimp account.

Btw, we got some more (Mailchimp) phishing emails today โ€” same style, this time 4 times writing about a new login detected, and once that an abuse report was received and we needed to take immediate action.

That a customer list may have been compromised was one of the questions I put to Mailchimp and am still awaiting an answer on. That was about 36 hours ago now, so I've just given them a little nudge.

Update 6: There have been a lot of suggestions that Mailchimp should be storing the hashes of unsubscribed emails rather than the full addresses in the clear. I understand the sentiment, and it does offer some protection, but it by no means ticks the "we no longer have the address" box. This is merely pseudoanonymisation, and the hashed address can be resolved back to the clear if you have a list of plain text candidates to hash and compare them to. There's a good explainer of this in the answer to this question on Security Stack Exchange about hashing email addresses for GDPR compliance. IMHO, my example of how we handle this in HIBP is the gold standard that Mailchimp should be implementing.

And there's also another problem: short of cracking the hashed addresses, you can never export a list of unsubscribed email addresses, for example, if you wanted to change mail campaign provider. The only way that would work is if the hashing algorithm is the same in the destination service, or you build some other level of abstraction at any other future point where you need to compare plain text values to the hashed impression list. It's messy, very messy.

Update 7: Validin has written a fantastic piece about Pulling the Threads of the Phish of Troy Hunt that takes a deep dive into the relationship between the domain the phish was hosted on and various other campaigns they've observed.

Given these similarities, we believe the phishing attempt of Troy Hunt is very likely Scattered Spider.

Scattered Spider certainly has previous form, and this was a very well-orchestrated phish. Four days on as I write this, it's hard not to be a bit impressed about how slick the whole thing was.

Weekly Update 444

Weekly Update 444

It's time to fly! ๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‡ฎ๐Ÿ‡ธ ๐Ÿ‡ฎ๐Ÿ‡ช That's two new flags (or if you're on Windows and can't see flag emojis, that's two new ISO codes) I'll be adding to my "places I've been list" as we start the journey by jetting out to London right after I publish this blog. If you're in the area, I'll be speaking at Oxford University on Wednesday at 17:00 and that's a free and open event. And since recording this morning, we have managed to confirm that I will be speaking at a community event in Reykjavik the following Monday morning, and you'll see a link on my 2025 events page as soon as they make one available. No public events planned for Ireland yet, but if you're in Dublin and would like to run something the week after I'm in Iceland, get in touch. Just to round out a big schedule, I'll be back in Aus speaking in Perth at Microsoft's Student Accelerator on 14 April and then it's off to NDC Melbourne shortly after that for a talk on the 30th. Then rest ๐Ÿ™‚

Weekly Update 444
Weekly Update 444
Weekly Update 444
Weekly Update 444

References

  1. Sponsored by:ย Report URI: Guarding you from rogue JavaScript! Donโ€™t get pwned; get real-time alerts & prevent breaches #SecureYourSite
  2. Cloudflare has found almost half of the passwords people use on their customers' sites are compromised (but somehow, that's not the story that got many people's attention)
  3. Cloudflare's stats were gathered via their leaked credential detection service (one of the sources they use for this is Have I Been Pwned's Pwned Passwords)
  4. And no, a password alone is not personally identifiable information (yes, that's an AI-generated response because, no, you can't find any reference whatsoever to a password being PII in any formal gov docs)
  5. The Lexipol breach went into HIBP (apparently it was carried out by "Puppygirl Hacker Polycule", who'd have thunk it?!)
  6. SpyX also went in (Zack reckons this is the 25th spyware service to be breached since 2017)
  7. We're smashing out front end work for the HIBP UX rebuild (go and check out that repo, submit issues and join in on the discussion, we'd love your input)

Weekly Update 443

Weekly Update 443

What an awesome response to the new brand! I'm so, so happy with all the feedback, and I've gotta be honest, I was nervous about how it would be received. The only negative theme that came through at all was our use of Sticker Mule, which apparently is akin to being a Tesla owner. Political controversy aside, this has been an extremely well-received launch and I've also loved seeing the issues raised on the open source repo for the front end and Ingiber's (near instant!) addressing of each and every one of them. Please keep that feedback coming, and I'll talk more about some of the changes we've made as a result in the next weekly update.

Weekly Update 443
Weekly Update 443
Weekly Update 443
Weekly Update 443

References

  1. Sponsored by:ย 1Password Extended Access Management: Secure every sign-in for every app on every device.
  2. We've open sourced the repo with the front end dev work (please feel free to raise issues, chime in on the discussion and submit PRs)
  3. Every commit we make to the above repo is pushed out to a static site at preview.haveibeenpwned.com (remember - it's static - this is front end stuff only)
  4. We're pushing to the preview site using Cloudflare Pages (this is such a cool, easy way of deploying code)
  5. We've made the stickers available via a Sticker Mule store (there's no markup on these, just get 'em at cost)
  6. We've also put the stickers, 3D models and other visual assets in the open source branding repo (especially handy if you want to get stickers made at a place that aligns to your political preference ๐Ÿ˜)

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

Designing the first logo for Have I Been Pwned was easy: I took a SQL injection pattern, wrote "have i been pwned?" after it and then, just to give it a touch of class, put a rectangle with rounded corners around it:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

Job done! I mean really, what more did I need for a pet project with a stupid name that would likely only add to the litany of failed nerdy ideas I'd had before that? And then, to compress 11 and a bit years into a single sentence: it immediately became unexpectedly popular, I added an API and a notification service, I said "pwned" before US Congress, I added Pwned Passwords, went through a failed M&A, hired a developer and basically, devoted my life to running this service. There's been some "water under the bridge", so to speak.

The rebrand we're soft-launching today has been a long time coming, and true to that form, we're not rushing it. This is a "soft launch" in that we're sharing work in progress that's sufficiently evolved to put it out there to the public, but you won't see it in production anywhere yet. The website is no different, the social channels still have the same hero shots and avatars etc. This is the time to seek feedback and tweak before committing more effort into writing code and pushing this to the masses.

A quick primer on "why", as the question has come up a few times whilst previously discussing this. Assume for a moment that my valiant 2013 attempt at a logo was, itself, aesthetically sufficient. It's a hard one to use in different use cases (favicon, merch) and it's quite "busy" in it's current form with no easily recognisable symbol which makes it hard to apply to many use cases. And there are loads of use cases; I mentioned a couple just now, but how about in formal documents such a the contracts we write for enterprise customers? Or as it appears on Stripe-generated invoices, stickers, my 3D printed logos, email signatures and so on and so forth. And branding isn't just a logo, it's a whole set of different use cases and variants of the logo and colours such that you have flexibility to present the brand's image in a cohesive, recognisable fashion. Branding is an art form.

At one point there, I'd had a go at redoing the logo myself. It was terrible. You know how you can have this vision of something aesthetic in your mind and know instantly if it's the right thing when you see it, but just can't quite articulate it yourself? I'm like that with interior design... and logos. So, I reached out to Fiverr for help, and immediately regretted it:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

I mean... wow. Ok, I get free revisions, let's give the designer another chance:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

Dammit! This just wasn't going to work, and we were going to need to make a much more serious commitment if we wanted this done right. So, we went to Luft Design in Norway as Charlotte and Mikael went way back, and with his help, we went around and around through various iterations of mood boards, design styles, colours and carved out time in Oslo during our visit there in December to sit with Stefan as well and really nut this thing out. I was adamant that I wanted something immediately recognisable but also modern and cohesive without being fussy. Basically, give me everything, which Mikael did:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

Let me talk you through the logic of these three variations, beginning with the icon. Mikael initially gave us multiple possible variations of a totally different icons which implied different things. My issue with that is you have to know what the symbology means in order for it to make sense. Perhaps if you're starting from scratch that can work, but when you're a decade+ into a name and a brand, there's history that I think you need to carry forward. One of the variations Mikael did reused that original SQL injection pattern I applied to the logo back in 2013 and just for the sake of justifying my choice, here's what it means for the uninitiated:

Take a SQL query like this:

SELECT * From User WHERE Name = 'blah'

Now, imagine "blah" is untrusted user input, that being data that someone submits via a form, for example. They might then change "blah" to the following:

blah';DROP TABLE USER

We'll shortcut the whole SQL injection lesson about validation of untrusted data and parameterization of queries and just jump straight to the resultant query:

SELECT * From User WHERE Name = 'blah';DROP TABLE USER'

And now, due to the additional query appended to the original one, your user table is gone. However... the SQL has a syntax error as there's a rogue apostrophe hanging off the end, so we fix it by using commenting syntax like so:

blah';DROP TABLE USER;--

Chief among the characters in that pattern are these guys:

';--

And that's the history; these are characters that play a role in the form of attack that has led to so many of the breaches in HIBP today. Turns out they're also really easy to stylise and represent as a concise logo:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

We agonised over variations of this for months. The problem is that when you think about all the ones that are really recognisable without accompanying words, they're recognisable because the brand is massive. The Nike swoosh, the Mitsubishi diamonds, the Pepsi circle, the Apple logo etc. HIBP obviously doesn't have that level of cachet, but I really like the simplicity of reach of those, and that's what we have with this one as well as that connection to the history of the brand and the practical use of those characters.

But just as with many of those other recognisable logos, these are times when what is effectively just a logo alone isn't enough, so we have the longer form version:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

"Have I Been Pwned" is a mouthful. It's not just long to say, it's long to put on the screen, long to print as a sticker, long to put on a shirt and so on and so forth. "Pwned", on the other hand, is short, concise and, I'd argue, has acheived much greater recognition as a word due to HIBP. Reading how โ€œPWNEDโ€ went from hacker slang to the internetโ€™s favorite taunt, I think that's a fair conclusion to draw. For a moment, we even toyed with the idea of an actual rename to just "Pwned" and looked at trying to buy pwned.com via a broker which, uh, didn't work out real well:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

Appartently, you can put a price on it! So no, we're not renaming anything, we're just providing various stylistic options for representing the logo. This is why we still have the much wordier versions as well:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand
Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

Unlike old mate at Fiverr, a proper branding exercise like Mikael has done goes well beyond just the logo alone. For example, we have a colour palette:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

And we have typography:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

Hoodies:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

And t-shirts:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

You get the idea.

But most importantly, there's the website. Obviously the brand needs to prevail across to the digital realm, but there's also the issue of the front-end tech stack we build on, and that's something I've been thinking about for months now:

In 2013, I built the front end of @haveibeenpwned on Bootstrap and jQuery. In 2025, @stebets and I are rebuilding it as part of a rebrand. What should we use? What are the front end tools that make web dev awesome today? (vanilla HTML, CSS and JS aside, of course)

โ€” Troy Hunt (@troyhunt) December 27, 2024

You can read all sorts of different suggestions in that thread but in the end, we decided to keep it simple:

  1. Bootstrap 5
  2. Vanilla JS (i.e. just write JavaScript without a framework dependency like jQuery)
  3. Sass (which compiles to CSS anyway)

And that's it. Except Stefan and I are busy guys and we really didn't want to invest our precious cycles rebuilding the front end, so we got Ingiber Olafsson to do it. Ingiber came to us via Stefan (so now we have two Icelanders, two Norwegians and... me), and he's been absolutely smashing out the new front end of HIBP:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand
Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand
Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand
Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand
Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

What I've really enjoyed with Ingiber's approach is that everything he's built is super clean, lightweight and visually beautiful (based on Mikael's work, of course). I've really appreciated his attention to detail that isn't always obvious too, for example making sure accessibility for the visually impaired is maximised:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

Ingiber has helped get us to the point where very soon, Stefan and I will begin the integration work to roll the new brand into the main website. That's not just branding work either as the UX is getting a major overhaul. Some stuff is fairly minor: the list of pwned websites is now way too large and we need to have a dedicated page per breach. Other stuff is much more major: we want to have a specific "login" facility (quoting as it will likely remain passwordless by sending a token via email), where we'll then consolidate everything from notification enrolment to domain management to viewing stealer logs. It's a significant paradigm shift that requires a lot of very careful thought.

A quick caveat on the examples above and the others in the repository: we've given Ingiber free reign to experiment and throw ideas around. As a result, we've got some awsome stuff we hadn't thought about before. We've also got some stuff that will be infeasible in the short term, for example, a link through to the official response of the breached company and the full timeline of events. I hope ideas like this keep coming (both from Ingiber and the community), but just keep in mind that some things you see in this repo won't be on the website the day we roll all this out.

As with so much of this project since day one, we're doing this out in the open for everyone to see. Part of that is this blog post heralding what's to come, and part of it is also open sourcing the ux-rebuild repository. I actually created that repo more than a year ago and started crowd-sourcing ideas before closing it off last month whilst Ingiber got working. It's now open again, and I'd like to invite anyone interested to check out what we're building, leave their comments (either here on in the repo), send PRs and so on and so forth. I'm really stoked with the work the guys I've mentioned in this blog post have done, but there will be other great ideas that none of us have thought of yet. And if you come up with something awesome, we already have truckloads of stickers and 3D printed logos I'd love to send you:

Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand
Soft-Launching and Open Sourcing the Have I Been Pwned Rebrand

So there we have it, that's the rebrand. Do please send us your feedback, not just about logos and look and feel, but also what you'd like to see UX and feature wise on the website. The discussions list on that repo is a great place the chime in or add new ideas, or even just the comments section below ๐Ÿ‘‡

Edit: Wow, all the responses have been awesome! Gotta be honest, I was nervous redefining the brand after so long, but I couldn't have hoped for a better response ๐Ÿ˜Š I have two quick additions to this post:

  1. Due to popular demand, I've opened a store on Sticker Mule where you can now purchase the stickers. These are listed at their cost price, there's no markup from us, just enjoy them and share liberally.
  2. I should have thought of this before publishing this post, but we've now published the static HTML pages to preview.haveibeenpwned.com. This is running on Cloudflare pages and is auto-deployed on each GitHub merge into main, so you'll see this continue to evolve over the coming weeks.

Weekly Update 442

Weekly Update 442

We survived the cyclone! That was a seriously weird week with lots of build-up to an event that last occurred before I was born. It'd been 50 years since a cyclone came this far south, and the media was full of alarming predictions of destruction. In the end, we maxed out at 52kts just after I recorded this video:

Itโ€™s here. But 47kts max gusts isnโ€™t too bad, nothing actually blowing over here (yet). pic.twitter.com/qFyrZdiyRW

โ€” Troy Hunt (@troyhunt) March 7, 2025

We remained completely untouched and unaffected beyond needing to sweep up some leaves once the rain (which has also been unremarkable), finally stops. It appears the worst damage has been a lot of homes without power and perhaps most obviously, the beaches have done a complete vanishing act with all the sand:

What our favourite beach is like today, versus before. Theyโ€™ll rebuild it, this isnโ€™t unprecedented, but yeah, thereโ€™s some work to be done now. pic.twitter.com/6zFMG7bZqK

โ€” Troy Hunt (@troyhunt) March 8, 2025

But hey, everyone is fine (not just us, the whole city AFAIK), so that's a good outcome. Back on topic, here's this week's video:

Weekly Update 442
Weekly Update 442
Weekly Update 442
Weekly Update 442

References:

  1. Sponsored by:ย Report URI: Guarding you from rogue JavaScript! Donโ€™t get pwned; get real-time alerts & prevent breaches #SecureYourSite
  2. We're filling in the gaps of the stealer logs that have come before, and doing our best to clean everything up a bit while we're there (but we're never going to have totally "clean" data: GIGO)
  3. Someone tried to phish a PayPal OTP from me and instead faced some great trolling by Elle (so proud ๐Ÿฅฒ)
  4. Someone also tried to phish my X credentials from me (that one really took some thinking to emphatically put it in the "phish" box)

We're Backfilling and Cleaning Stealer Logs in Have I Been Pwned

We're Backfilling and Cleaning Stealer Logs in Have I Been Pwned

I think I've finally caught my breath after dealing with those 23 billion rows of stealer logs last week. That was a bit intense, as is usually the way after any large incident goes into HIBP. But the confusing nature of stealer logs coupled with an overtly long blog post explaining them and the conflation of which services needed a subscription versus which were easily accessible by anyone made for a very intense last 6 days. And there were the issues around source data integrity on top of everything else, but I'll come back to that.

When we launched the ability to search through stealer logs last month, that wasn't the first corpus of data from an info stealer we'd loaded, it was just the first time we'd made the website domains they expose searchable. Now that we have an actual model around this, we're going to start going back through those prior incidents and backfilling the new searchable attributes. We've just done that with the 26M unique email address corpus from August last year and added a bunch previously unseen instances of an email address mapped against a website domain. We've also now flagged that incident as "IsStealerLog", so if you're using the API, you'll see that attribute now set to true.

For the most part, that data is all handled just the same as the existing stealer log data: we map email addresses to the domains they've appeared against in the logs then make all that searchable by full email address, email address domain or website domain (read last week's really, really long blog post if you need an explainer on that). But there's one crucial difference that we're applying both to the backfilling and the existing data, and that's related to a bit of cleaning up.

A theme that emerged last week was that there were email addresses that only appeared against one domain, and that was the domain the address itself was on. If john@gmail.com is in there and the only domain he appears against is gmail.com, what's up with that? At face value, John's details have been snared whilst logging on to Gmail, but it doesn't make sense that someone infected with an info stealer only has one website they've logging into captured by the malware. It should be many. This seems to be due to a combination of the source data containing credential stuffing rows (just email and password pairs) amidst info stealer data and somewhere in our processing pipeline, introducing integrity issues due to the odd inputs. Garbage in, garbage out, as they say.

So, we've decided to apply some Occam's razor to the situation and go with the simplest explanation: a single entry for an email address on the domain of that email address is unlikely to indicate an info stealer infection, so we're removing those rows. And not adding any more that meet that criteria. But there's no doubt the email address itself existed in the source; there is no level of integrity issues or parsing errors that causes john@gmail.com to appear out of thin air, so we're not removing the email addresses in the breach, just their mapping to the domain in the stealer log. I'd already explained such a condition in Jan, where there might be an email address in the breach but no corresponding stealer log entry:

The gap is explained by a combination of email addresses that appeared against invalidly formed domains and in some cases, addresses that only appeared with a password and not a domain. Criminals aren't exactly renowned for dumping perfectly formed data sets we can seamlessly work with, and I hope folks that fall into that few percent gap understand this limitation.

FWIW, entries that matched this pattern accounted for 13.6% of all rows in the stealer log table, so this hasn't made a great deal of difference in terms of outright volume.

This takes away a great deal of confusion regarding the infection status of the address owner. As part of this revision, we've updated all the stealer log counts seen on domain search dashboards, so if you're using that feature, you may see the number drop based on the purged data or increase based on the backfilled data. And we're not sending out any additional notifications for backfilled data either; there's a threshold at which comms becomes more noise than signal and I've a strong suspicion that's how it would be received if we started sending emails saying "hey, that stealer log breach from ages ago now has more data".

And that's it. We'll keep backfilling data, and the entire corpus within HIBP is now cleaner and more succinct. And we'll definitely clean up all the UX and website copy as part of our impending rebrand to ensure everything is a lot clearer in the future.

I'll leave you with a bit of levity related to subscription costs and value. As I recently lamented, resellers can be a nightmare to deal with, and we're seriously considering banning them altogether. But occasionally, they inadvertently share more than they should, and we get an insight into how the outside world views the service. Like a recent case where a reseller accidentally sent us the invoice they'd intended to send the customer who wanted to purchase from us, complete with a 131% price markup ๐Ÿ˜ฒ It was an annual Pwned 4 subscription that's meant to be $1,370, and simply to buy this on that customer's behalf and then hand them over to us, the reseller was charging $3,165. They can do this because we make the service dirt cheap. How do we know it's dirt cheap? Because another reseller inadvertently sent us this internal communication today:

We're Backfilling and Cleaning Stealer Logs in Have I Been Pwned

FWIW, we do have credit cards in Australia, and they work just the same as everywhere else. I still vehemently dislike resellers, but at least our customers are getting a good deal, especially when they buy direct ๐Ÿ˜Š

Weekly Update 441

Weekly Update 441

Processing data breaches (especially big ones), can be extremely laborious. And, of course, everyone commenting on them is an expert, so there's a heap of opinions out there. And so it was with the latest stealer logs, a corpus of data that took the better part of a month to process. And then I made things confusing in various ways which led to both Disqus comment and ticket hell. But hey, it's finally out and now it's back to normal breach processing for the foreseeable future ๐Ÿ™‚

Weekly Update 441
Weekly Update 441
Weekly Update 441
Weekly Update 441

References

  1. Sponsored by:ย Report URI: Guarding you from rogue JavaScript! Donโ€™t get pwned; get real-time alerts & prevent breaches #SecureYourSite
  2. I trawled through 23 billion stealer logs to get a 284M breached email addresses into HIBP (and learned that explaining this concept clearly is hard!)
  3. Apple is pulling support for their Advanced Data Protection E2E offering (but will the status quo change before they force existing users to disable it?)
  4. Spyware / stalkerware apps Cocospu and Spyic leaker their data for all to see (and since that recording, Spyzie has also been added to the list)
  5. The Zimi Senoa IoT switches are beautiful... (...but I think that Bluetooth mesh via a proprietary hub is going to be a show-stopper)

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

I like to start long blog posts with a tl;dr, so here it is:

We've ingested a corpus of 1.5TB worth of stealer logs known as "ALIEN TXTBASE" into Have I Been Pwned. They contain 23 billion rows with 493 million unique website and email address pairs, affecting 284M unique email addresses. We've also added 244M passwords we've never seen before to Pwned Passwords and updated the counts against another 199M that were already in there. Finally, we now have a way for domain owners to query their entire domain for stealer logs and for website operators to identify customers who have had their email addresses snared when entering them into the site. (Note: stealer logs are still freely and easily searchable by individuals, scroll to the bottom for a walkthrough.)

This work has been a month-long saga that began hot off the heels of processing the last massive stash of stealer logs in the middle of Jan. That was the first time we'd ever added more context to stealer logs by way of making the websites email addresses had been logged against searchable. To save me repeating it all here, if you're unfamiliar with stealer logs as a concept and what we've previously done with HIBP, start there.

Up to speed? Good, let's talk about ALIEN TXTBASE.

Origin Story

Last month after loading the aforementioned corpus of data, someone in a government agency reached out and pointed me in the direction of more data by way of two files totalling just over 5GB. Their file names respectively contained the numbers "703" and "704", the word "Alien" and the following text at the beginning of each file:

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

Pulling the threads, it turned out the Telegram channel referred to contained 744 files of which my contact had come across just the two. The data I'm writing about today is that full corpus, published to Telegram as individual files:

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

A quick side note on Telegram: There's been growing concern in recent years about the use of Telegram by organised crime, especially since the founder's arrest in France last year for not cracking down on illegal activity on the platform. Telegram makes it super easy to publish large volumes of data (such as we're talking about here) under the veil of anonymity and distribute it en mass. This is just one of many channels involved in cybercrime, but it's noteworthy due to the huge amount of freely accessible data.

The file in the image above contained over 36 million rows of data consisting of website URLs and the email addresses and passwords entered into them. But the file is just a sample - a teaser - with more data available via the subscription options offered in the message. And that's the monetisation route: provide existing data for free, then offer a subscription to feed newly obtained logs to consuming criminals with a desire to exploit the victims again. Again? The stealer logs are obtained in the first place by exploiting the victim's machine, for example:

How do people end up in stealer logs? By doing dumb stuff like this: โ€œAround October I downloaded a pirated version of Adobe AE and after that a trojan got into my pcโ€ pic.twitter.com/igEzOayCu6

โ€” Troy Hunt (@troyhunt) August 5, 2024

So now this guy has malware running on his PC which is siphoning up all his credentials as they're entered into websites. It's those credentials that are then sold in the stealer logs and later used to access the victim's accounts, which is the second exploitation. Pirating software is just one way victims become infected; have a read of this recent case study from our Australian Signals Directorate:

When working from home, Alice remotely accesses the corporate network of her organisation using her personal laptop. Aliceย downloaded, onto herย personal laptop, a version of Notepad++ from a website she believed to be legitimate. Anย info stealerย was disguised as the installer for the Notepad++ software.
When Alice attempted to install the software, the info stealer activated and beganย harvesting user credentialsย from her laptop. This included her work username and password, which she had saved in her web browserโ€™s saved logins feature. The info stealer then sent those user credentials to a remote command-and-control server controlled by a cybercriminal group.

Eventually, data like Alice's ends up in places like this Telegram channel and from there, it enables further crimes. From the same ASD article:

Stolen valid user credentials are highly valuable to cybercriminals, because they expedite the initial access to corporate networks and enterprise systems.

So, that's where the data has come from. As I said earlier, ALIEN TXTBASE is by no means the only Telegram channel out there, but it is definitely a major distribution channel.

Verification

When there's a breach of a discrete website, verification of the incident is usually pretty trivial. For example, if Netflix suffered a breach (and I have no indication they have, this is just an example), I can go to their website, head to the password reset field, enter a made-up email address and see a response like this:

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

On the other hand, an address that does exist on the service usually returns a message to the effect of "we've sent you a password reset email". This is called an "enumeration vector" in that it enables you to enumerate through a list of email addresses and find out which ones have an account on the site.

But stealer logs don't come from a single source like Netflix, instead they contain the credentials for a whole range of different sites visited by people infected with malware. However, I can still take lines from the stealer logs that were captured against Netflix and test the email addresses. (Note: I never test if the password is valid, that would be a privacy violation that constitutes unauthorised access and besides, as you'll read next, there's simply no need to.)

Initially, I actually ran into a bit of a roadblock when testing this:

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

I found this over and over again so, I went back and checked the source data and inspected this poor victim's record:

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

Their Netflix credentials were snared when they were entered into the website with a path of "/ph-en/login", implying they're likely Filipino. Let's try VPN'ing into Manilla:

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

And suddenly, a password reset gives me exactly what I need:

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

That's a little tangent from stealer logs, but Netflix obviously applies some geo-fencing logic to certain features. This actually worked out better than expected verification-wise because not only was I able to confirm the presence of the email address on their service, but that the stealer log entry placing them in the Philippines was also geographically correct. It was reproducible too: when I saw "something went wrong", but the path began with "mx", I VPN'd into Mexico City and Netflix happily confirmed the reset email was sent. Another path had "ve", so it was off to Caracas and the Venezuelan victim's account was confirmed. You get the idea. So, strong signal on confirmation of account existence via password reset, now let's also try something more personal.

I emailed a handful of HIBP subscribers and asked for their support verifying a breach. I got a fast, willing response from one guy and sent over more than 1,100 rows of data against his address ๐Ÿ˜ฒ It's fascinating what you can tell about a person from stealer log data: He's obviously German based on the presence of websites with a .de address, and he uses all the usual stuff most of us do (Amazon, eBay, LinkedIn, Airbnb). But it's the less common ones that make for more interesting reading: He drives a Mercedes because he's been logging into an address there for owners, and it also appears he likes whisky given his account at an online specialist. He's a Firefox user, as he's logged in there too, and he seems to be a techie as he's logged into Seagate and a site selling some very specialised electrical testing equipment. But is it legit?

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

Imagine the heart-in-mouth moment the poor guy must have had seeing his digital life laid out in front of him like that and knowing criminals have this data. It'd be a hell of a shock.

Having said all that, whilst I'm confident there's a large volume of legitimate data in this corpus, it's also possible there will be junk. Fabricated email addresses, websites that were never used, etc. I hope folks who experience this can appreciate how hard it is for us to discern "legitimate" stealer logs from those that were made up. We've done as much parsing and validation as possible, but we have no way of knowing if someone@yourdomain.com is an email address that actually exists or if it does, if they ever actually used Netflix or Spotify or whatever. They're just strings of data, and all we can do is report them as found.

Searching Entries Against Your Website with a Pwned 5 Subscription

When we published the stealer logs last month, I kept caveating everything with "experimental". Even the first word of the blog post title was "experimenting", simply because we didn't know how this would be received. Would people welcome the additional data we were making available? Or find it unactionable noise? It turns out it was entirely the former, and I didn't see a single negative comment or, as it often has been in the past with stealer logs or malware breaches, angry victims demanding I send their entire row(s) of data. And I guess that makes sense given what we made available so, starting today, we're making even more available!

First, a bit of nomenclature. Consider the following stealer log entry:

https://www.netflix.com/en-ph/login:john@gmail.com:P@ssw0rd

There are four parts to this entry that are relevant to the HIBP services I'm going to write about here:

ย  ย  ย  Email Address ย 
ย  Website Domain ย  Email Alias ย  Email Domain ย 
https:// www.netflix.com /en-ph/login john @ example.com P@ssw0rd

Last month, we added functionality to the UI such that after verifying your email address you could see a collection of website domains. In the example above, this meant that John could use the free notification service to verify control of his email address after which he'd see www.netflix.com listed. (Note: we're presently totally redesigning this as part of our UX rebuild and it'll be much smoother in the very near future.) Likewise, we introduced an API to do exactly the same thing, but only after verifying control of the email domain. So, in the case above, the owner of example.com would be able to query john@example.com and get back www.netflix.com (along with any other website domains poor John had been using).

Today, we're introducing two new APIs, and they're big ones:

  1. Query stealer logs by email domain
  2. Query stealer logs by website domain

The first one is akin to our existing domain search feature so in the example above, the owner of the domain could query the stealer logs for example.com and get back each email address alias and the website domains they appear against. Here's what that output looks like:

{
  "john": [
    "netflix.com"
  ],
  "jane": [
    "netflix.com",
    "spotify.com"
  ]
}

The previous model only allowed querying by email address, so you could end up with an organisation needing to iterate through thousands of individual API requests. This model means that can now be done in a single request, which will make life much easier for larger organisations assessing the exposure of their workforce.

The second new API is designed for website operators who want to identify customers who've had their credentials snared at login. So, in our demo row above, Netflix could query www.netflix.com (after verifying control of the domain, of course) and retrieve a list of their customers from the stealer logs:

[
  "john@example.com",
  "jane@yahoo.com"
]

Both these new APIs are orientated towards larger organisations and can return vast volumes of data. When considering how to price this service, the simplest, most commensurate model we arrived at was to use a pricing tier we already had: Pwned 5:

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

Whilst we'd previously only ever listed tiers 1 through 4, we'd always had higher tiers sitting there in the background for organisations needing higher rate limits. Surfacing this subscription and adding the ability to query stealer logs via these two new APIs makes it easy for new and existing subscribers alike to unlock the feature. And if you are an existing subscriber, the price is simply adjusted pro rata at Stripe's end such that your existing balance is carried forward. Per the above image, this subscription is available either monthly or annually so if you just want to see what's in the current corpus of data and keep the cost down, take it for a month then cancel it. (Note: the Pwned 5 subscription is also now required for the API to search by email address we launched last month, but the web UI that uses the notification service to show stealer log results by email is absolutely still free and will remain that way.)

Another small addition since last month is that we've added an "IsStealerLog" flag on the breach model. This means that anyone programmatically dealing with data in HIBP can now easily elect to handle stealer logs differently than other breaches. For example, a new breach with this flag set to "true" might then trigger a query of the new API endpoints to search by domain so that an organisation can update their records with any new stealer log entries.

Anyone searching by email domain already knows the scope of addresses on their domain as it's reported on their dashboard. Plus, when email notifications are sent on breach load it tells you exactly how many new addresses from your domains are in the breach. Starting today, we've also added a column to explain how many email addresses appear against your website domain:

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

In other words, 3 people have had their email address grabbed by an info stealer when logging on to hibp-integration-tests.com, and the new API will return all of those addresses. It is only API-based for the moment, we'll consider if a UI makes sense as part of the rebranded site launch, it may not becuase of the potentially huge volumes of data.

Just one last thing: for the two new APIs that query by domain, we've set a rate limit which is entirely independent of the rate limit on, say, breached account searches. Whilst a Pwned 5 subscription would allow 1,000 requests to that API every minute, it's significantly more restricted when hitting those two new stealer log APIs. We haven't published a number as I expect we'll tweak it a bit based on usage, but it's more than enough for any normal use of the service whilst ensuring we don't get completely overwhelmed by high-overhead searches. The stealer log API that queries by email address inherits the 1,000 RPM rate limit of the Pwned 5 subscription.

We've Added 244M New Passwords to Pwned Passwords

One of the coolest most awesome best things we've ever done with HIBP is to create a massive repository of passwords that's all open source (both code and data) and can be queried anonymously without disclosing the searched password. Pwned Passwords is amazing, and it has gained huge traction:

There it is - weโ€™ve now passed 10,000,000,000 requests to Pwned Password in 30 days ๐Ÿ˜ฎ This is made possible with @Cloudflareโ€™s support, massively edge caching the data to make it super fast and highly available for everyone. pic.twitter.com/kw3C9gsHmB

โ€” Troy Hunt (@troyhunt) October 5, 2024

10 billion times a month, our API helps a service somewhere assist one of their customers with making a good password choice. And that's just the API - we have no idea the full scope of the service as having the data open source means people can just download the entire corpus and run it themselves.

Per the opening para, we now have an additional 244 million previously unseen passwords in this corpus. And, as always, they make for some fun reading:

  1. tender-kangaroo
  2. swimmingkangaroo59
  3. gentlekangaroo
  4. CaptainKangaroo340

And, uh, some kangaroos doing other stuff I can't really repeat here. Those passwords are at the final stages of loading and should flush through cache to Cloudflare's hundreds of edge nodes in the next few hours. That's another quarter of a billion that join the list of previously breached ones, whilst 199 million we'd already seen before have had their prevalence counts upped.

HIBP in Practice

It's amazing to see where my little pet project with the stupid name has gone, and nobody is more surprised than me when it pops up in cool places. Looking around for some stealer log references while writing this blog post, I came across this one:

This was already in place when you created a new account or updated your password. But now it's also verified on every login against the live HIBP database. Hats off to the tremendous service HIBP provides to the internet ๐Ÿ™ https://t.co/Z61AgDaL2t

โ€” DHH (@dhh) February 4, 2025

That's awesome! That's exactly the sort of use case that speaks to the motto of "do good things with breach data after bad things happen". By adding this latest trove of data, the folks using Basecamp will immediately benefit simply by virtue of the service being plugged into our API. And sidenote: David has done some amazing stuff in the past so I was especially excited to see this shout-out ๐Ÿ˜Š

This one is a similar story, albeit using Pwned Passwords:

Their service is phenomenal! We also inform users in our product if they set/change their password to a known password that has been hacked. Admins have the option to not allow for users to use these passwords, if they wish. pic.twitter.com/bvLfYm9xzH

โ€” Brad Marshall (@iamBMarshall) February 4, 2025

Inevitably, those requests form a slice of the 10 billion monthly we see that are now able to identify a quarter of a billion more compromised ones and hopefully, keep them out of harm's way.

For many organisations, the data we're making available via stealer logs is the missing piece of the puzzle that explains patterns that were previously unexplainable:

Gotta say Iโ€™m pretty happy with what we did with stealer logs last week, think weโ€™re gonna need to do more of this ๐Ÿ˜Ž pic.twitter.com/4rMaMmL8LU

โ€” Troy Hunt (@troyhunt) January 21, 2025

I've had so many emails to that effect after loading various stealer logs over the years. The constant theme I love hearing is how it benefits the good guys and, well, not so much the bad guys:

I love it when @haveibeenpwned screws over the bad guys ๐Ÿ˜Ž pic.twitter.com/I29aMhwClW

โ€” Troy Hunt (@troyhunt) January 16, 2025

The introduction of these new APIs today will finally help many organisations identify the source of malicious activity and even more importantly, get ahead of it and block it before it does damange. Whilst there won't be any set cadence to the addition of more stealer logs (obviously, we can't really predict when this stuff will emerge), I have no doubt we'll continue to add a lot more data yet.

Techie Bits

Processing this data has been non-trivial to say the least, so I thought I'd give you a bit of an overview of how I ultimately approached it. In essence, we're talking about transforming a very large amount of highly redundant text-based data and ultimately ending up with a distinct list of email addresses against website domains. Here's the high-level process:

  1. Start with 744 files of logs totalling 1.5TB and containing 23 billion rows
  2. Extract all the unique email addresses from the entire corpus of stealer logs using the open source Email Address Extractor tool (284M rows)
  3. In a .NET console app, process each file in point 1 and extract the domain and email address from valid lines (domain, email and password, all colon delimited) to produce 744 files totalling 390GB (9.7B rows)
  4. In another .NET console app, consolidate all 744 files from the previous point into a single file with a distinct set of website domain and email address pairs (789M rows)
  5. Take the file from the previous point, and in another .NET console app, extract all the unique domains (18M rows)
  6. Use SQL BCP to upload the files from the two previous points to SQL Azure
  7. Insert any new domains that don't already exist in HIBP (these are held in a dedicated table) via a TSQL statement (6.7M rows)
  8. Upload the 284M unique email addresses like with a typical data breach (69% of them were already in HIBP)
  9. Join the distinct list of domains and email addresses to the data uploaded in the previous point and insert email and domain pairs into the stealer log table, but only if they haven't been seen before via another TSQL statement (493M rows)
  10. Wait - if I took distinct website and email pairs in step 4 and got 789M rows then in step 9 it only inserted 493M, what happened? There were 220M rows already in HIBP from last month which will account for some of the gap (existing records aren't reinserted), and there was also some additional validation in SQL Server courtesy of code we only have in that environment. The remaining gap is explained by the .NET code not ignoring case on distinct, so in other words, I dumped and uploaded way more data than I had to and made SQL Sever do extra work ๐Ÿคฆโ€โ™‚๏ธ
  11. Go live and get ๐Ÿบ

It was actually much, much harder than this due to the trials and errors of attempting to distil what starts out as tens of billions of rows down into hundreds of millions of unique examples. I was initially doing a lot more of the processing in SQL Azure because hey, it's just cloud and you can turn up the performance as much as you want, right?! After running 80 cores of Azure SQL Hyperscale for days on end ($ouch$) and seeing no end in sight to the execution of queries intended to take distinct values across billions of rows, I fell back to local processing with .NET. You could, of course, use all sorts of other technologies of choice here, but it turned out that local processing with some hand-rolled code was way more efficient than delegating the task to a cloud-based RDBMS. The space used by the database tells a big part of the story:

Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

As I've said many times before, the cloud, my friends, is not always the answer. Do as much processing as possible locally on sunk-cost hardware unless there's a compelling reason not to.

I've detailed all this here in part because that's what I've always done with this project over the last 11 and a bit years, but also to illustrate how much time, effort, and money is burned processing data like this. It's very non-trivial, especially not when everything has to ultimately go into an increasingly large system with loads of external dependencies and be fast, reliable and cost-effective.

Conclusion

From 23 billion raw rows down to a much more manageable and discrete set of data, the latest stealer logs are now searchable via all the ways you've done for years, plus those two new domain-based stealer log APIs. We've called this breach "ALIEN TXTBASE Stealer Logs", let's see what positive outcomes we can all now produce with the data.

Edit 1: Let me re-emphasise an important point from the blog post I think got a bit buried: The web UI that uses the notification service to show stealer log results by email is absolutely still free and will remain that way. If youโ€™ve got an email address in this breach and you want to see the stealer log domains against it, do this:

  1. Go to the notification page: https://haveibeenpwned.com/NotifyMe
  2. Fill in your email address and send yourself the verification email
  3. Click the link emailed to you and scroll to the bottom of the page where you'll find a message similar to this:
Processing 23 Billion Rows of ALIEN TXTBASE Stealer Logs

We need to make this clearer as it's obviously confusing. Thanks for everyone's feedback, we're working on it.

Edit 2: I've just published a (much shorter!) blog post that addresses a common theme in the comments regarding email addresses that only appear against a single website domain, being the same domain as the email address itself is on. Check that out if that's you.

Weekly Update 440

Weekly Update 440

Wait - it's Tuesday already?! When you listen to this week's (ok, last week's) video, you'll probably get the sense I was a bit overloaded. Yeah, so that didn't stop, and the stealer log processing and new feature building just absolutely swamped me. Plus, I spent from then until now in Sydney at various meetings and events which was great, but didn't do a lot for my productivity. Be that as it may, we're now less than 12 hours off launching this all so, in the interests of not having me stay up all night putting the finishing touches on it, let me drop here and come back in a few days to talk about how it's all been received ๐Ÿคž

Weekly Update 440
Weekly Update 440
Weekly Update 440
Weekly Update 440

References

  1. Sponsored by:ย Report URI: Guarding you from rogue JavaScript! Donโ€™t get pwned; get real-time alerts & prevent breaches #SecureYourSite

Weekly Update 439

Weekly Update 439

We're now eyeball-deep into the HIBP rebrand and UX work, totally overhauling the image of the service as we know it. That said, a guiding principle has been to ensure the new looks is immediately recognisable and over months of work, I think we've achieved that. I'm holding off sharing anything until we're far enough down the road that we're confident in the direction we're heading, and then I want to invite the masses to contribute as we head towards a (re)launch.

Whilst I didn't talk about it in this week's video, let me just recap on why we're doing this: the decisions made for a pet project nearly 12 years ago now are very different to the decisions made for a mainstream service with so many dependencies on it today. We're at a point where we need more professionalism and cohesion and that's across everything from the website design and content, the branding on our formal documentation, the stickers I hand out all over the place, the swag we want to make and even the signatures on our emails. Our task is to keep the heart and soul of a humble community-first project whilst simultaneously making sure it actually looks like we know what we're doing ๐Ÿ™‚

Weekly Update 439
Weekly Update 439
Weekly Update 439
Weekly Update 439

References

  1. Sponsored by:ย 1Password Extended Access Management: Secure every sign-in for every app on every device.
  2. Authorised access by DOGE employees is not a data breach (no, not even if you really, really, really don't like Donald and Elon)
  3. The HIBP rebrand is now a long way through, and we'd love to hear your ideas (it's not just the look and feel, I want to get a lot more functionality in there)
  4. The latest Zacks breach went into HIBP (that's right, this isn't their first rodeo)
  5. Apparently, our discussion about possibly banning resellers is newsworthy (and this isn't a done deal yet, we are also looking at the feasibility of automating away the pain)

Weekly Update 437

Weekly Update 437

It's IoT time! We're embarking on a very major home project (more detail of which is in the video), and some pretty big decisions need to be made about a very simple device: the light switch. I love having just about every light in our connected... when it works. The house has just the right light early each morning, it transitions into daytime mode right at the perfect time based on the amount of solar radiation in the sky, into evening time courtesy of the same device and then blacks out when we go to bed. And some lights come on with movement based on motion sensors in fans (Big Ass fans have occupancy sensors), cameras (Ubiquiti camera raise motion events), and tiny dedicated Zigbee sensors. But getting the right physical switches in combination with the right IoT relays has been a bit more challenging. Listen to this week's show let me know if you have any "bright" ideas ๐Ÿ™‚

Weekly Update 437
Weekly Update 437
Weekly Update 437
Weekly Update 437

References

  1. Sponsored by:ย 1Password Extended Access Management: Secure every sign-in for every app on every device.
  2. Light switches, IoT relays and other complex discussions about simple circuits (it's such a critical component of the house, especially when you replicate the model >100 times over)
  3. Apparently, the YubiKey phish wasn't a phish (seriously folks, if I can't tell when comms is legit or not, how are the normies expected to get it right?!)
  4. The ABC's analysis of 4-digit PINs in HIBP is really well done! (although I did spend way too much time explaining to other journalists how there are only 10,000 possible values ๐Ÿค”)
  5. The HIBP Grafana dashboard is looking epic! (although I may be blowing way more time on it than anyone could reasonably justify...)

Weekly Update 436

Weekly Update 436

We're heading back to London! And making a trip to Reykjavik. And Dublin. I talked about us considering this in the video yesterday, and just before publishing this post, we pulled the trigger and booked the tickets. The plan is to pretty much repeat the US and Canada trip we did in September and spend the time meeting up with some of the law enforcement agencies and various other organisations we've been working with over the years. As I say in the video, if you're in one of these locations and are in a position to stand up a meetup or user group session, I'd love to hear from you. Europe is a hell of a long way to go so we do want to make the most of the travel, stand by for more plans as they emerge.

Weekly Update 436
Weekly Update 436
Weekly Update 436
Weekly Update 436

References

  1. Sponsored by:ย Report URI: Guarding you from rogue JavaScript! Donโ€™t get pwned; get real-time alerts & prevent breaches #SecureYourSite
  2. The HIBP "Wall of Graphs" looks awesome! (I'll blog it up, but there's more to be done first)
  3. Spamming ~500 companies attempting to look for bug bounties is muppet behaviour (all whilst putting them on CC too ๐Ÿคฆโ€โ™‚๏ธ)
  4. Despite a couple of dissenting voices re the muppet characterisation, 84.5% of people agreed with my description (or in other words, 15.5% of people were completely wrong)

Weekly Update 435

Weekly Update 435

If I'm honest, I was in two minds about adding additional stealer logs to HIBP. Even with the new feature to include the domains an email address appears against in the logs, my concern was that I'd get a barrage of "that's useless information" messages like I normally do when I load stealer logs! Instead, the feedback was resoundingly positive. This week I'm talking more about the logic behind this, some of the challenges we faced with it and what we might see in the future. Stay tuned, because I think we're going to be seeing a lot more of this in HIBP.

Weekly Update 435
Weekly Update 435
Weekly Update 435
Weekly Update 435

References

  1. Sponsored by:ย Report URI: Guarding you from rogue JavaScript! Donโ€™t get pwned; get real-time alerts & prevent breaches #SecureYourSite
  2. For the first time ever, we added a heap of additional info about stealer logs to HIBP (ok, it's just the domains an address appears against, but that turns out to have been really useful)

Weekly Update 433

Weekly Update 433

It sounds easy - "just verify people's age before they access the service" - but whether we're talking about porn in the US or Australia's incoming social media laws, the reality is way more complex than that. There's no unified approach across jurisdictions and even within a single country like Australia, the closest we've got to that is a government scheme usually intended for accessing public services. And even if there was a technically workable model, who wants to get either the gov or some big tech firm involved in their use of Instagram or Pornhub?! There's a social acceptance to be considered and not only that, circumvention of age controls is very easy when you can simply VPN into another jurisdiction and access the same website blocked in your locale. Or in the case of the adult material, I'm told (๐Ÿคทโ€โ™‚๏ธ) there are many other legally operating websites in other parts of the world that are less inclined to block individuals in specific states from foreign countries. There'll be no easy solutions for this one, but it'll make for an entertaining year ๐Ÿ˜Š

Weekly Update 433
Weekly Update 433
Weekly Update 433
Weekly Update 433

References

  1. Sponsored by:ย Report URI: Guarding you from rogue JavaScript! Donโ€™t get pwned; get real-time alerts & prevent breaches #SecureYourSite
  2. My trusty Synology DS1512+ finally died after 12 years of faithful service (since recording this video, the new DS923+ arrived and migration was super smooth)
  3. Pornhub addressed the age verification mandate from a bunch of US states by simply... blocking them (I wonder if there's a way around that...)
  4. Proton VPN has seen a "massive surge" in VPN signups from the US (...there we go ๐Ÿ™‚)
  5. The EFF reckons there is no effective age verification method (they also downplay the negative impacts of social media on kids, which I disagree with)
  6. The Glamira data breach made it into HIBP (link through to a Reddit thread where the company acknowledged the breach last year, no word on whether they disclosed to impacted individuals)

Weekly Update 432

Weekly Update 432

There's a certain irony to the Bluesky situation where people are pushing back when I include links to X. Now, where have we seen this sort of behaviour before? ๐Ÿค” When I'm relying on content that only appears on that platform to add context to a data breach in HIBP and that content is freely accessible from within the native Bluesky app (without needing an X account), we're out of reasonable excuses for the negativity. And if "because Elon" is the sole reason and someone is firm enough in their convictions on that, there's a very easy solution ๐Ÿ™‚

Weekly Update 432
Weekly Update 432
Weekly Update 432
Weekly Update 432

References

  1. Sponsored by:ย 1Password Extended Access Management: Secure every sign-in for every app on every device.
  2. We're rebuilding the front-end of Have I Been Pwned (there's a lot of opinions on that thread!)
  3. People on Bluesky are complaining about posting links to content that only exist on X (not exactly the right way to encourage use of other platforms)

Weekly Update 431

Weekly Update 431

I fell waaay behind the normal video cadence this week, and I couldn't care less ๐Ÿ˜Š I mean c'mon, would you rather be working or sitting here looking at this view after snowboarding through Christmas?!

Christmas Day awesomeness in Norway ๐Ÿ‡ณ๐Ÿ‡ด Have a great one friends, wherever you are ๐Ÿง‘โ€๐ŸŽ„ pic.twitter.com/F2FtcJYzRC

โ€” Troy Hunt (@troyhunt) December 25, 2024

That said, Scott and I did carve out some time to chat about the, uh, "colourful" feedback he's had after finally putting a price on some Report URI features he'd been giving away free for years. And there's more data breaches, of course, including a couple I loaded over the previous week that I think were particularly interesting. Enjoy this week's video, next week's will be a 2024 wrap-up from somewhere much, much sunnier ๐Ÿ˜Ž

Weekly Update 431
Weekly Update 431
Weekly Update 431
Weekly Update 431

References

  1. Sponsored by:ย 1Password Extended Access Management: Secure every sign-in for every app on every device.
  2. After many years, Scott put a price on the free tier of Report URI (and some of the feedback he got ๐Ÿ˜ฒ)
  3. I couldn't raise Young Living Essential Oils about their data breach (and their data is spread all over a popular clear web hacking forum too)
  4. The "French Citizens" data breach had Millions of French people in it... (...and a lot of other people too)

Weekly Update 430

Weekly Update 430

I'm back in Oslo! Writing this the day after recording, it feels like I couldn't be further from Dubai; the temperature starts with a minus, it's snowing and there's not a supercar in sight.

Back on business, this week I'm talking about the challenge of loading breaches and managing costs. A breach load immediately takes us from a very high percentage cache hit ratio on Cloudflare to zero. Consequently, our SQL costs skyrocket as the DB scales to support the load. Approximately 28 hours after loading the two breaches I mention in this week's update, we're still running a DB scale that's 350% larger than once we have a high cache hit ratio, and that directly hits my wallet. We need to work on this more because as I say in the video, I really don't like financial incentives that influence how breaches are handled, such as delaying them and bulking them together to reduce the impact of cache flush events like this. We'll give that more thought, I think there are a few ways to tackle this. For now, here's this week's video and some of the challenges we're facing:

Weekly Update 430
Weekly Update 430
Weekly Update 430
Weekly Update 430

References

  1. Sponsored by:ย 1Password Extended Access Management: Secure every sign-in for every app on every device.
  2. Some people really don't like supercars (although I suspect it's more about not liking to see either the enjoyment others take in them or the success they may have achieved)
  3. Being online means having constant attacks against your online things (but failed login attempts against my son's and my Microsoft accounts are just that - failed attempts)
  4. The German electricity provider Tibber had 50k records breached (a little one, but newsworthy enough to have hit the media)
  5. And the first-ever Senegalese data breach went into HIBP courtesy of Yonรฉma (not exactly a high cross-over with our usual subscribers, but a breach is still a breach)
โŒ