Normal view

Received before yesterday

Troy Hunt
Have I Been Pwned 2.0 is Now Live! 19 May 2025 at 15:19

Have I Been Pwned 2.0 is Now Live!

19 May 2025 at 15:19

This has been a very long time coming, but finally, after a marathon effort, the brand new Have I Been Pwned website is now live!

Have I Been Pwned 2.0 is Now Live!

Feb last year is when I made the first commit to the public repo for the rebranded service, and we soft-launched the new brand in March of this year. Over the course of this time, we've completely rebuilt the website, changed the functionality of pretty much every web page, added a heap of new features, and today, we're even launching a merch store 😎

Let me talk you through just some of the highlights, strap yourself in!

The Search

The signature feature of HIBP is that big search box on the front page, and now, it's even better - it has confetti!

Well, not for everyone, only about half the people who use it will see a celebratory response. There's a reason why this response is intentionally jovial, let me explain:

As Charlotte and I have travelled and spent time with so many different users of the service around the world, a theme has emerged over and over again: HIBP is a bit playful. It's not a scary place emblazoned with hoodies, padlock icons, and fearmongering about "the dark web". Instead, we aim to be more consumable to the masses and provide factual, actionable information without the hyperbole. Confetti guns (yes, there are several, and they're animated) lighten the mood a bit. The alternative is that you get the red response:

There was a very brief moment where we considered a more light-hearted treatment on this page as well, but somehow a bit of sad trombone really didn't seem appropriate, so we deferred to a more demure response. But now it's on a timeline you can scroll through in reverse chronological order, with each breach summarising what happened. And if you want more info, we have an all-new page I'll talk about in a moment.

Just one little thing first - we've dropped username and phone number search support from the website. Username searches were introduced in 2014 for the Snapchat incident, and phone number searches in 2021 for the Facebook incident. And that was it. That's the only time we ever loaded those classes of data, and there are several good reasons why. Firstly, they're both painful to parse out of a breach compared to email addresses, which we simply use a regex to extract (we've open sourced the code that does this). Usernames are a string. Phone numbers are, well, it depends. They're not just numbers because if you properly internationalise them (like they were in the Facebook incident), they've also got a plus at the front, but they're frequently all over the place in terms of format. And we can't send notifications because nobody "owns" a username, and phone numbers are very expensive to send SMSs to compared to sending emails. Plus, every other incident in HIBP other than those two has had email addresses, so if we're asking "have I been pwned?" we can always answer that question without loading those two hard-to-parse fields, which usually aren't present in most breaches anyway. When the old site offered to accept them in the search box, it created confusion and support overhead: "why wasn't my number in the [whatever] breach?!". That's why it's gone from the website, but we've kept it supported on the API to ensure we don't break anything... just don't expect to see more data there.

The Breach Page

There are many reasons we created this new page, not least of which is that the search results on the front page were getting too busy, and we wanted to palm off the details elsewhere. So, now we have a dedicated page for each breach, for example:

That's largely information we had already (albeit displayed in a much more user-friendly fashion), but what's unique about the new page is much more targeted advice about what to do after the breach:

I recently wrote about this section and how we plan to identify other partners who are able to provide appropriate services to people who find themselves in a breach. Identity protection providers, for example, make a lot of sense for many data breaches.

Now that we're live, we'll also work on fleshing this page out with more breach and user-specific data. For example, if the service supports 2FA, then we'll call that out specifically rather than rely on the generic advice above. Same with passkeys, and we'll add a section for that. A recent discussion with the NCSC while we were in the UK was around adding localised data breach guidance, for example, showing folks from the UK the NCSC logo and a link to their resource on the topic (which recommends checking HIBP 🙂).

I'm sure there's much more we can do here, so if you've got any great ideas, drop me a comment below.

The Dashboard

Over the course of many years, we introduced more and more features that required us to know who you were (or at least that you had access to the email address you were using). It began with introducing the concept of a sensitive breach during the Ashley Madison saga of 2015, which meant the only way to see your involvement in that incident was to receive an email to the address before searching. (Sidenote: There are many good reasons why we don't do that on every breach.) In 2019, when I put an auth layer around the API to tackle abuse (which it did beautifully!) I required email verification first before purchasing a key. And more things followed: a dedicated domain search dashboard, managing your paid subscription and earlier this year, viewing stealer logs for your email address.

We've now unified all these different places into one central dashboard:

From a glance at the nav on the left, you can see a lot of familiar features that are pretty self-explanatory. These combine relevant things for the masses and those that are more business-oriented. They're now all behind the one "Sign In" that verifies access to the email address before being shown. In the future, we'll also add passkey support to avoid needing to send an email first.

The dashboard approach isn't just about moving existing features under one banner; it will also give us a platform on which to build new features in the future that require email address verification first. For example, we've often been asked to provide people with the ability to subscribe their family's email addresses to notifications, yet have them go to a different address. Many of us play tech support for others, and this would be a genuinely useful feature that makes sense to place at a point where you've already verified your email address. So, stay tuned for that one, among many others.

The Domain Search Feature

More time went into this one feature than most of the other ones combined. There's a lot we've tried to do here, starting with a much cleaner list of verified domains:

The search results now give a much cleaner summary and add filtering by both email address and a hotly requested new feature - just the latest breach (it's in the drop-down):

All those searches now just return JSON from APIs and the whole dashboard acts as a single-page app, so everything is really snappy. The filtering above is done purely client-side against the full JSON of the domain search, an approach we've tested with domains of over a quarter million breached email addresses and still been workable (although arguably, you really want that data via the API rather than scrolling through it in a browser window).

Verification of domain ownership has also been completely rewritten and has a much cleaner, simpler interface:

We still have work to do to make the non-email verification methods smoother, but that was the case before, too, so at least we haven't regressed. That'll happen shortly, promise!

The API

First things first: there have been no changes to the API itself. This update doesn't break anything!

There's a discussion over on the UX rebuild GitHub repo about the right way to do API documentation. The general consensus is OpenAPI and we started going down that route using Scalar. In fact, you can even see the work Stefan did on this here at haveibeenpwned.com/scalar:

It's very cool, especially the way it documents samples in all sorts of different languages and even has a test runner, which is effectively Postman in the browser. Cool, but we just couldn't finish it in time. As such, we've kept the old documentation for now and just styled it so it looks like the rest of the site (which I reckon is still pretty slick), but we do intend to roll to the Scalar implementation when we're not under the duress of such a big launch.

The Merch Store

You know what else is awesome? Merch! No, seriously, we've had so many requests over the years for HIBP branded merch and now, here we are:

We actually now have a real-life merch store at merch.haveibeenpwned.com! This was probably the worst possible use of our time, considering how much mechanical stuff we had to do to make all the new stuff work, but it was a bit of a passion project for Charlotte, so yeah, now you can actually buy HIBP merch. It's all done through Teespring (where have I heard that name before?!) and everything listed there is at cost price - we make absolutely zero dollars, it's just a fun initiative for the community 🙂

We did try out their option for stickers too, but they fell well short of what we already had up with our little one-item store on Sticker Mule so for now, that remains the go-to for laptop decorations. Or just go and grab the open source artwork and get your own printed from wherever you please.

The Nerdy Bits

We still run the origin services on Microsoft Azure using a combination of the App Service for the website, "serverless" Functions for most APIs (there are still a few async ones there that are called as a part of browser-based features), SQL Azure "Hyperscale" and storage account features like queues, blobs and tables. Pretty much all the coding there is C# with .NET 9.0 and ASP.NET MVC on .NET Core for the web app. Cloudflare still plays a massive role with a lot of code in workers, data in R2 storage and all their good bits around WAF and caching. We're also now exclusively using their Turnstile service for anti-automation and have ditched Google's reCAPTCHA completely - big yay!

The front end is now latest gen Bootstrap and we're using SASS for all our CSS and TypeScript for all our JavaScript. Our (other) man in Iceland Ingiber has just done an absolutely outstanding job with the interfaces and exceeded all our expectations by a massive margin. What we have now goes far beyond what we expected when we started this process, and a big part of that has been Ingiber's ability to take a simple requirement and turn it into a thing of beauty 😍 I'm very glad that Charlotte, Stefan and I got to spend time with him in Reykjavik last month and share some beers.

We also made some measurable improvements to website performance. For example, I ran a Pingdom website speed test just before taking the old one offline:

And then ran it over the new one:

So we cut out 28% of the page size and 31% of the requests. The load time is much of a muchness (and it's highly variable at that), but having solid measures for all the values in the column on the right is a very pleasing result. Consider also the commentary anyone in web dev would have seen over the years about how much bigger web pages have become, and here we are shaving off solid double-digit percentages 11 years later!

Finally, anything that could remotely be construed as tracking or ad bloat just isn't there, because we simply don't do any of that 🙂 In fact, the only real traffic stats we have are based on what Cloudflare sees when the traffic flows through their edge nodes. And that 1Password product placement is, as it's always been, just text and an image. We don't even track outbound clicks, that's up to them if they want to capture that on the landing page we link to. This actually makes discussions such as we're having with identity theft companies that want product placement much harder as they're used to getting the sorts of numbers that invasive tracking produces, but we wouldn't have it any other way.

The AI

I wanted to make a quick note of this here, as AI seems to be either constantly overblown or denigrated. Either it's going to solve the world's problems, or it just produces "slop". I used Chat GPT in particular really extensively during this rebuild, especially in the final days when time got tight and my brain got fried. Here are some examples where it made a big difference:

I'm using Bootstrap icons from here: https://icons.getbootstrap.com/

What's a good icon to illustrate a heading called "Index"?

This was right at the 11th hour when we realised we didn't have time to implement Scalar properly, and I needed to quickly migrate all the existing API docs to the new template. There are over 2,000 icons on that page, and this approach meant it took about 30 seconds to find the right one, each and every time.

We killed off some pages on the old site, but before rolling it over, I wanted to know exactly what was there:

Write me a PowerShell script to crawl haveibeenpwned.com and write out each unique URL it finds

And then:

Now write a script to take all the paths it found and see if they exist on stage.haveibeenpwned.com

It found good stuff too, like the security.txt file I'd forgotten to migrate. It also found stuff that never existed, so it's the usual "trust, but verify" situation.

And just a gazillion little things where every time I needed anything from some CSS advice to configuring Cloudflare rules to idiosyncrasies in the .NET Core web app, the correct answer was seconds away. I'd say it was right 90% of the time, too, and if you're not using AI aggressively in your software development work now (and I'm sure there are much better ways, too) I'm pretty confident in saying "you're doing it wrong".

The Journey Here

It's hard to explain how much has gone into this, and that goes well beyond just what you see in front of you on the website today. It's seemingly little things, like minor revisions to the terms of use and privacy policy, which required many hours of time and thousands of dollars with lawyers (just minor updates to how we process data and a reflection of new services such as the stealer logs).

We pushed out the new site in the wee hours of Sunday morning my time, and almost everything went well:

One or two little glitches that we've fixed and pushed quickly, that's it. I've actually waited until now, 2 days after going live, to publish this post just so we could iron out as much stuff as possible first. We've pushed more than a dozen new releases already since that time, just to keep iterating and refining quickly. TBH, it's been a bit intense and has been an enormously time-consuming effort that's dominated our focus, especially over the last few weeks leading up to launch. And just to drive that point home, I literally got a health alert first thing Monday morning:

Nothing like empirical data to make a point! That last weekend when we went live was especially brutal; I don't think I've devoted that much high-intensity time to a software release for decades.

Have I Been Pwned has been a passion for a quarter of my life now. What I built in 2013 was never intended to take me this far or last this long, and I'm kinda shocked it did if I'm honest. I feel that what we've built with this new site and new brand has elevated this little pet project into a serious service that has a new level of professionalism. But I hope that in reading this, you see that it has maintained everything that has always been great about the service, and I'm so glad to still be here writing about it today in the 205th blog post with that tag. Thanks for reading, now go and enjoy the new website 😊

Edit (a few hours after initially posting): Let me expand on Cloudflare's Turnstile as it'll explain some idiosyncrasies some people have seen:

This is an anti-automation approach that doesn't involve palming traffic to Google (like reCAPTCHA did), and it can be implemented completely invisibly. There are more invasive implementations of it, but we're trying to be seamless here. It involves some Cloudflare script running in the browser and providing a challenge, which is then submitted with the HTTP request and verified server side. We've had it on HIBP in one form or another since 2023, and it can be awesome... until it isn't. If the challenge fails, what happens next? It depends.

On forms where we really need to block the robots (for example, any that send email), a failed Turnstile challenge was initially just showing a red error. It now says this:

Our anti-automation process thinks you're a bot, which you're obviously not! Try behaving like a human and clicking the button again and if it still misbehaves, give the page a reload.

We've often found a second click or a page reload solves the problem, so hopefully this sends people in the right direction. If it doesn't, we'll need to look at more in-your-face implementations of Turnstile that show a widget you need to interact with. To have a go yourself and see it in action, try the dashboard sign in page.

The other place Turnstile features heavily is on the main search page at the root of the site. We don't want that API being hit by bots, so it's a must have there. Here, like on the other pages of the new site, we're asynchronously posting to API endpoints and sending the challenge token along with the request. What we're doing differently on the front page, however, is that if the challenge fails and returns HTTP 401 when posted to the HIBP endpoint (you'll also see a response body of "Invalid Turnstile token"), we were meant to be falling back to a full page post. That wasn't happening in the new site when we first launched it. But it is now 🙂

When the full page post back occurs, Cloudflare will present a managed challenge. This is much more invasive, but it's also much more reliable and will then serve the same result as you would have seen anyway, albeit via a full page load. We implement the same managed challenge logic on the deep-linked account pages, which you can see here: https://haveibeenpwned.com/account/test@example.com

According to the Cloudflare stats, about 82% of all our issued challenges are successfully solved:

Of the 18% that aren't, many will be due to bots stopped by Turnstile doing exactly what it's meant to do. It's likely a single-digit percentage of requests that are real humans being impeded, and we need to look at ways to get that number down, but at least the fallback positions are improved now. If you were having problems, give the site a good refresh, see how you go and leave your feedback in the comments below.

Troy Hunt
After the Breach: Finding new Partners with Solutions for Have I Been Pwned Users 8 May 2025 at 17:33

After the Breach: Finding new Partners with Solutions for Have I Been Pwned Users

Troy Hunt

By:Troy Hunt

8 May 2025 at 17:33

After the Breach: Finding new Partners with Solutions for Have I Been Pwned Users

For many years, people would come to Have I Been Pwned (HIBP), run a search on their email address, get the big red "Oh no - pwned!" response and then... I'm not sure. We really didn't have much guidance until we partnered with 1Password and started giving specific advice about how to secure your digital life. So, that's passwords sorted, but the impact of data breaches goes well beyond passwords alone...

There are many different ways people are impacted by breaches, for example, identity fraud. Breaches frequently contain precisely the sort of information that opens the door to impersonation and just taking a quick look at the HIBP stats now, there's a lot of data out there:

227 breaches exposed physical address
243 breaches exposed date of birth
288 breaches exposed phone numbers

That's just the big numbers, then there's the long tail of all sorts of other exposed high-risk data, including partial credit cards (32 breaches), government-issued IDs (18 breaches) and passport numbers (7 breaches). As well as helping people choose good passwords, we want to help them stay safe in the other aspects of their lives put at risk when hackers run riot.

Identity protection services are a good example, and I might be showing my age here, but I've been using them since the 90's. Today, I use a local Aussie one called Truyu which is built by the Commonwealth Bank. Let me give you two examples from them to illustrate why it's a useful service:

The first one came on Melbourne Cup day last year, a day when Aussies traditionally get drunk and lose money betting on horse races. Because gambling (sorry - "gaming") is a heavily regulated industry, a whole bunch of identity data has to be provided if you want to set up an account with the likes of SportsBet. Whilst I personally maintain that gambling is a tax on people who can't do maths, Charlotte was convinced we should have a go anyway, which resulted in Truyu popping up this alert:

This was me (and yes, of course we lost everything we bet) but... what if it wasn't me, and my personal information had been used by someone else to open the account? That's the sort of thing I'd want to know about fast. As for all those "Illion Credit Header" entries, I asked Truyu to help explain what they mean and why they're important to know:

Illion Credit Header – Banking Finance Segment : This segment includes information that links you to financial institutions—such as banks, lenders, or credit card provider. It helps confirm your financial presence and association with trusted entities, but it can also reveal if your identity is being used across multiple banks fraudulently.
Illion Credit Header – Telecommunications Segment: This covers data from telco providers (e.g., Optus, Telstra, Vodafone), indicating that your identity has been used to open or inquire about telco services. Telco accounts are often targeted for fraud (SIM swaps, device purchases), so unexpected entries here can flag potential misuse of your ID.
Illion Credit Header – Utilities Segment - This segment includes information showing you've been associated with utility services like electricity, gas, or water. If someone uses your ID to set up a utility account, it will show here—often before more obvious signs of fraud occur.
Illion Credit Header – Public Records Segment: This includes any publicly available identity-linked records, such as: Court judgements, Bankruptcies, ASIC or other official listings

Yep, I'd definitely want to know if it wasn't me that initiated all that!

Then, on a recent visit to see the Irish National Cyber Security Centre, we found ourselves hungry in Dublin. Google Maps recommended this epic sushi place, but when we arrived, a sign at the front advised they didn't accept credit cards - in 2025!! Carrying only digital cards, having no cash and being hungry for sushi, I explored the only other avenue the store suggested: creating a Revolut account. Doing so required a bunch of personal information because, like betting, finance is a heavily regulated industry. This earned me another early warning from Truyu about the use of my data:

I pay Truyu A$4.99 each month via a subscription on my iPhone, and IMHO, it's money well spent. For full disclosure, Truyu is also an enterprise subscriber to HIBP (like 1Password is), and you can see breaches we've processed in their app too. I've included them here because they're a great example of a service that adds real value "after the breach", and it's one I genuinely use myself.

The point of all this is that there are organisations out there offering services that are particularly relevant to data breach victims, and we'd like to find the really good ones and put them on the new HIBP website. We've even built out some all-new dedicated spaces, for example on the new breach page:

After the Breach: Finding new Partners with Solutions for Have I Been Pwned Users

But choosing partners is a bit more nuanced than that. For example, a service like Truyu caters to an Aussie audience, and the way identity protection works in the US or UK, for example, is different. We need different partners in different parts of the world, and further, offering different services. Identity protection is one thing, but what else? There are many different risks that both individuals and organisations (of which there are hundreds of thousands using HIBP today) face after being in a data breach.

So, we're looking for more partners that can make a positive difference for the folks that land on HIBP, do a search and then ask "now what?!" We're obviously going to be very selective and very cautious about who we work with because the trust people have in HIBP is not something I'll ever jeopardise by selecting the wrong partners. And, of course, any other brand that appears on this site needs to be one that reflects not just our values and mission, but is complementary to our favourite password manager as well.

Now that we're on the cusp of launching this new site (May 17 is our target), I'm inviting any organisations that think they fit the bill to get in touch with me and explain how they can make a positive difference to data breach victims looking for answers "after the breach".

Troy Hunt
The Have I Been Pwned Alpine Grand Tour 2 May 2025 at 01:32

The Have I Been Pwned Alpine Grand Tour

Troy Hunt

By:Troy Hunt

2 May 2025 at 01:32

I love a good road trip. Always have, but particularly during COVID when international options were somewhat limited, one road trip ended up, well, "extensive". I also love the recent trips Charlotte and I have taken to spend time with many of the great agencies we've worked with over the years, including the FBI, CISA, CCCS, RCMP, NCA, NCSC UK and NCSC Ireland. So, that's what we're going to do next month across some very cool locations in Europe:

The Have I Been Pwned Alpine Grand Tour

Whilst the route isn't set in stone, we'll start out in Germany and cover Liechtenstein, Switzerland, France, Italy and Austria. We have existing relationships with folks in all but one of those locations (France, call me!) and hope to do some public events as we recently have at Oxford University, Reykjavik and even Perth back on (almost) this side of the world. And that's the reason for writing this post today: if you're in proximity of this route and would like to organise an event or if you're a partner I haven't already reached out to, please get in touch. We usually manage to line up a healthy collection of events and assuming we can do that again on this trip, I'll publish them to the events page shortly. There's also a little bit of availability in Dubai on the way over we'll put to productive use, so definitely reach out if you're over that way.

If you're in another part of the world that needs a visit with a handful of HIBP swag, let me know, there's a bunch of other locations on the short list, and we're always thinking about what's coming next 🌍

Troy Hunt
Welcoming The Gambia National CSIRT to Have I Been Pwned 30 April 2025 at 19:29

Welcoming The Gambia National CSIRT to Have I Been Pwned

Troy Hunt

By:Troy Hunt

30 April 2025 at 19:29

Welcoming The Gambia National CSIRT to Have I Been Pwned

Today, we're happy to welcome the Gambia National CSIRT to Have I Been Pwned as the 38th government to be onboarded with full and free access to their government domains. We've been offering this service for seven years now, and it enables national CSIRTs to gain greater visibility into the impact of data breaches on their respective nations.

Our goal at HIBP remains very straightforward: to do good things with data breaches after bad things happen. We hope this initiative helps support the Gambia National CSIRT as it has with many other governments around the world.

Troy Hunt
You'll Soon Be Able to Sign in to Have I Been Pwned (but Not Login, Log in or Log On) 24 April 2025 at 00:48

You'll Soon Be Able to Sign in to Have I Been Pwned (but Not Login, Log in or Log On)

Troy Hunt

By:Troy Hunt

24 April 2025 at 00:48

You'll Soon Be Able to Sign in to Have I Been Pwned (but Not Login, Log in or Log On)

How do seemingly little things manage to consume so much time?! We had a suggestion this week that instead of being able to login to the new HIBP website, you should instead be able to log in. This initially confused me because I've been used to logging on to things for decades:

So, I went and signed in (yep, different again) to X and asked the masses what the correct term was:

When accessing your @haveibeenpwned dashboard, which of the following should you do? Preview screen for reference: https://t.co/9gqfr8hZrY
— Troy Hunt (@troyhunt) April 23, 2025

Which didn't result in a conclusive victor, so, I started browsing around.

Cloudflare's Zero Trust docs contain information about customising the login page, which I assume you can do once you log in:

Another, uh, "popular" site prompts you to log in:

After which you're invited to sign in:

You can log in to Canva, which is clearly indicated by the HTML title, which suggests you're on the login page:

You can log on to the Commonwealth Bank down here in Australia:

But the login page for ANZ bank requires to log in, unless you've forgotten your login details:

Ah, but many of these are just the difference between the noun "login" (the page is a thing) and the verb "log in" (when you perform an action), right? Well... depends who you bank with 🤷‍♂️

And maybe you don't log in or login at all:

Finally, from the darkness of seemingly interchangeable terms that may or may not violate principles of English language, emerged a pattern. You also sign in to Google:

And Microsoft:

And Amazon:

And Yahoo:

And, as I mentioned earlier, X:

And now, Have I Been Pwned:

You'll Soon Be Able to Sign in to Have I Been Pwned (but Not Login, Log in or Log On)

There are some notable exceptions (Facebook and ChatGPT, for example), but "sign in" did emerge as the frontrunner among the world's most popular sites. If I really start to overthink it, I do feel that "log[whatever]" implies something different to why we authenticate to systems today and is more a remnant of a bygone era. But frankly, that argument is probably no more valid than whether you're doing a verb thing or a noun thing.

Troy Hunt
Experimenting with Stealer Logs in Have I Been Pwned 13 January 2025 at 13:48

Experimenting with Stealer Logs in Have I Been Pwned

Troy Hunt

By:Troy Hunt

13 January 2025 at 13:48

Experimenting with Stealer Logs in Have I Been Pwned

TL;DR — Email addresses in stealer logs can now be queried in HIBP to discover which websites they've had credentials exposed against. Individuals can see this by verifying their address using the notification service and organisations monitoring domains can pull a list back via a new API.

Nasty stuff, stealer logs. I've written about them and loaded them into Have I Been Pwned (HIBP) before but just as a recap, we're talking about the logs created by malware running on infected machines. You know that game cheat you downloaded? Or that crack for the pirated software product? Or the video of your colleague doing something that sounded crazy but you thought you'd better download and run that executable program showing it just to be sure? That's just a few different ways you end up with malware on your machine that then watches what you're doing and logs it, just like this:

These logs all came from the same person and each time the poor bloke visited a website and logged in, the malware snared the URL, his email address and his password. It's akin to a criminal looking over his shoulder and writing down the credentials for every service he's using, except rather than it being one shoulder-surfing bad guy, it's somewhat larger than that. We're talking about billions of records of stealer logs floating around, often published via Telegram where they're easily accessible to the masses. Check out Bitsight's piece titled Exfiltration over Telegram Bots: Skidding Infostealer Logs if you'd like to get into the weeds of how and why this happens. Or, for a really quick snapshot, here's an example that popped up on Telegram as I was writing this post:

As it relates to HIBP, stealer logs have always presented a bit of a paradox: they contain huge troves of personal information that by any reasonable measure constitute a data breach that victims would like to know about, but then what can they actually do about it? What are the websites listed against their email address? And what password was used? Reading the comments from the blog post in the first para, you can sense the frustration; people want more info and merely saying "your email address appeared in stealer logs" has left many feeling more frustrated than informed. I've been giving that a lot of thought over recent months and today, we're going to take a big step towards addressing that concern:

The domains an email address appears next to in stealer logs can now be returned to authorised users.

This means the guy with the Gmail address from the screen grab above can now see that his address has appeared against Amazon, Facebook and H&R Block. Further, his password is also searchable in Pwned Passwords so every piece of info we have from the stealer log is now accessible to him. Let me explain the mechanics of this:

Firstly, the volumes of data we're talking about are immense. In the case of the most recent corpus of data I was sent, there are hundreds of text files with well over 100GB of data and billions of rows. Filtering it all down, we ended up with 220 million unique rows of email address and domain pairs covering 69 million of the total 71 million email addresses in the data. The gap is explained by a combination of email addresses that appeared against invalidly formed domains and in some cases, addresses that only appeared with a password and not a domain. Criminals aren't exactly renowned for dumping perfectly formed data sets we can seamlessly work with, and I hope folks that fall into that few percent gap understand this limitation.

So, we now have 220 million records of email addresses against domains, how do we surface that information? Keeping in mind that "experimental" caveat in the title, the first decision we made is that it should only be accessible to the following parties:

The person who owns the email address
The company that owns the domain the email address is on

At face value it might look like that first point deviates from the current model of just entering an email address on the front page of the site and getting back a result (and there are very good reasons why the service works this way). There are some important differences though, the first of which is that whilst your classic email address search on HIBP returns verified breaches of specific services, stealer logs contain a list of services that have never have been breached. It means we're talking about much larger numbers that build up far richer profiles; instead of a few breached services someone used, we're talking about potentially hundreds of them. Secondly, many of the services that appear next to email addresses in the stealer logs are precisely the sort of thing we flag as sensitive and hide from public view. There's a heap of Pornhub. There are health-related services. Religious one. Political websites. There are a lot of services there that merely by association constitute sensitive information, and we just don't want to take the risk of showing that info to the masses.

The second point means that companies doing domain searches (for which they already need to prove control of the domain), can pull back the list of the websites people in their organisation have email addresses next to. When the company controls the domain, they also control the email addresses on that domain and by extension, have the technical ability to view messages sent to their mailbox. Whether they have policies prohibiting this is a different story but remember, your work email address is your work's email address! They can already see the services sending emails to their people, and in the case of stealer logs, this is likely to be enormously useful information as it relates to protecting the organisation. I ran a few big names through the data, and even I was shocked at the prevalence of corporate email addresses against services you wouldn't expect to be used in the workplace (then again, using the corp email address in places you definitely shouldn't be isn't exactly anything new). That in itself is an issue, then there's the question of whether these logs came from an infected corporate machine or from someone entering their work email address into their personal device.

I started thinking more about what you can learn about an organisation's exposure in these logs, so I grabbed a well-known brand in the Fortune 500. Here are some of the highlights:

2,850 unique corporate email addresses in the stealer logs
3,159 instances of an address against a service they use, accompanied by a password (some email addresses appeared multiple times)
The top domains included paypal.com, netflix.com, amazon.com and facebook.com (likely within the scope of acceptable corporate use)
The top domains also included steamcommunity.com, roblox.com and battle.net (all gaming websites likely not within scope of acceptable use)
Dozens of domains containing the words "porn", "adult" or "xxx" (definitely not within scope!)
Dozens more domains containing the corporate brand, either as subdomains of their primary domain or org-specific subdomains of other services including Udemy (online learning), Amplify ("strategy execution platform"), Microsoft Azure (the same cloud platform that HIBP runs on) and Salesforce (needs no introduction)

That said, let me emphasise a critical point:

This data is prepared and sold by criminals who provide zero guarantees as to its accuracy. The only guarantee is that the presence of an email address next to a domain is precisely what's in the stealer log; the owner of the address may never have actually visited the indicated website.

Stealer logs are not like typical data breaches where it's a discrete incident leading to the dumping of customers of a specific service. I know that the presence of my personal email address in the LinkedIn and Dropbox data breaches, for example, is a near-ironclad indication that those services exposed my data. Stealer logs don't provide that guarantee, so please understand this when reviewing the data.

The way we've decided to implement these two use cases differs:

Individuals who can verify they control their email address can use the free notification service. This is already how people can view sensitive data breaches against their address.
Organisations monitoring domains can call a new API by email address. They'll need to have verified control of the domain the address is on and have an appropriately sized subscription (essentially what's already required to search the domain).

We'll make the individual searches cleaner in the near future as part of the rebrand I've recently been talking about. For now, here's what it looks like:

Because of the recirculation of many stealer logs, we're not tracking which domains appeared against which breaches in HIBP. Depending on how this experiment with stealer logs goes, we'll likely add more in the future (and fill in the domain data for existing stealer logs in HIBP), but additional domains will only appear in the screen above if they haven't already been seen.

We've done the searches by domain owners via API as we're talking about potentially huge volumes of data that really don't scale well to the browser experience. Imagine a company with tens or hundreds of thousands of breached addresses and then a whole heap of those addresses have a bunch of stealer log entries against them. Further, by putting this behind a per-email address API rather than automatically showing it on domain search means it's easy for an org to not see these results, which I suspect some will elect to do for privacy reasons. The API approach was easiest while we explore this service then we can build on that based on feedback. I mentioned this was experimental, right? For now, it looks like this:

Lastly, there's another opportunity altogether that loading stealer logs in this fashion opens up, and the penny dropped when I loaded that last one mentioned earlier. I was contacted by a couple of different organisations that explained how around the time the data I'd loaded was circulating, they were seeing an uptick in account takeovers "and the attackers were getting the password right first go every time!" Using HIBP to try and understand where impacted customers might have been exposed, they posited that it was possible the same stealer logs I had were being used by criminals to extract every account that had logged onto their service. So, we started delving into the data and sure enough, all the other email addresses against their domain aligned with customers who were suffering from account takeover. We now have that data in HIBP, and it would be technically feasible to provide this to domain owners so that they can get an early heads up on which of their customers they probably have to rotate credentials for. I love the idea as it's a great preventative measure, perhaps that will be our next experiment.

Onto the passwords and as mentioned earlier, these have all been extracted and added to the existing Pwned Passwords service. This service remains totally free and open source (both code and data), has a really cool anonymity model allowing you to hit the API without disclosing the password being searched for, and has become absolutely MASSIVE!

I thought that doing more than 10 billion requests a month was cool, but look at that data transfer - more than a quarter of a petabyte just last month! And it's in use at some pretty big name sites as well:

That's just where the API is implemented client-side, and we can identify the source of the requests via the referrer header. Most implementations are done server-side, and by design, we have absolutely no idea who those folks are. Shoutout to Cloudflare while we're here for continuing to provide the service behind this for free to help make a more secure web.

In terms of the passwords in this latest stealer log corpus, we found 167 million unique ones of which only 61 million were already in HIBP. That's a massive number, so we did some checks, and whilst there's always a bit of junk in these data sets (remember - criminals and formatting!) there's also a heap of new stuff. For example:

Tryingtogetkangaroo
Kangaroolover69
fuckkangaroos

And about 106M other non-kangaroo themed passwords. Admittedly, we did start to get a bit preoccupied looking at some of the creative ways people were creating previously unseen passwords:

passwordtoavoidpwned13
verygoodpassword
AVerryGoodPasswordThatNooneCanGuess2.0

And here's something especially ironic: check out these stealer log entries:

People have been checking these passwords on HIBP's service whilst infected with malware that logged the search! None of those passwords were in HIBP... but they all are now 🙂

Want to see something equally ironic? People using my Hack Yourself First website to learn about secure coding practices have also been infected with malware and ended up in stealer logs:

So, that's the experiment we're trying with stealer logs, and that's how to see the websites exposed against an email address. Just one final comment as it comes up every single time we load data like this:

We cannot manually provide data on a per-individual basis.

Hopefully, there's less need to now given the new feature outlined above, and I hope the massive burden of looking up individual records when there are 71 million people impacted is evident. Do leave your comments below and help us improve this feature to become as useful as we can possibly make it.