Email address validation: please stop

It’s something that’s been bugging me for a long time. All around the web, people are making flawed attempts at validating email addresses, causing a headache for their users, and probably for themselves.

I really started to notice this when I began to use the disposable addresses system that Gmail provides. Any mail sent to <youraddress>+<some_other_string>@gmail.com arrives in the Gmail inbox for <youraddress>@gmail.com. This is quite handy, and I personally use it for automatically tagging email I receive. For instance, for any email related to unicorns, I’d simply enter “<myaddress>+unicorns@gmail.com” on the sign-up form, and my mail filters would automatically tag all mail sent to that address for me (as an aside, these don’t really work as “proper” disposable email addresses as it’s easy to just strip everything after the “+” character in the local part of the address, and get the proper address). Sounds great, right? Well it is, until half of the internet fails at email address validation and rejects it.

The problem is that the email address specification allows for far more than most programmers expect it to. For instance, things like ” ! $ & * – = ^ ` | ~ # % ‘ + / ? _ { } ” are all valid, along with a whole bunch of others (even “@” if you quote or escape it). Some of these are a tad silly. Using another “@” sign by escaping, for instance, is just confusing, and is probably only used by sociopaths. Reject some of those others however, and you’ll start to annoy your users.

I was recently at a talk given by Andrew Godwin at FOSDEM. In that he mentioned a problem Django ran into, where their regular expression used for email validation would hang on long input (scratch that, I think this is the bug he mentioned, that other one is hideously old). After some head scratching, they came up with an improved regular expression, which didn’t have the issue. I’m not sure that either solution actually validates according to the specification though, and if the validation falls on the side of being too strict, it’s probably out there irritating people right now. As a fun aside Perl’s Mail::RFC822::Address module gives you a glimpse at a regular expression that actually follows the specification from RFC822.

Even the best validation is only going to get you a syntactically correct email address, with no guarantee that it actually exists. If you want to know that you’re being given a valid address, send it an email and have the user click a validation link in it, and stop annoying your users!

EDIT: I wrote a little follow up article on some of the points raised by commenters.

53 thoughts on “Email address validation: please stop

  1. Lucy

    Just tried sending an email to +@foobar!$>_<@gmail.com from another gmail account…

    The address “_” in the “To” field was not recognized. Please make sure that all addresses are properly formed.

    Am I escaping it wrong, or is Google wrong?

  2. bma

    I believe “” also need to be escaped. (As far as I can tell from reading RFC5322, any printable ASCII character can appear as long as it’s escaped with a ; certain characters are specified as not needing to be escaped.)

  3. bma

    Oh for fuck’s sake SInjo, your blog is a pile of cock. Between the double quotes there should have been the angle brackets.

  4. bma

    Nathan: Sinjo is correct, “@” appears only in the “specials” set of characters, which may not appear in an atom except as part of a quoted-pair, which is a backslash then any printable character.

  5. Pingback: Tweets that mention Sinjo » Email address validation: please stop -- Topsy.com

  6. Sinjo Post author

    @bma: Sorry, don’t think I’m doing anything special that would cause that (ie it may well just be WordPress). :/

  7. Pingback: Email address validation: please stop « Interesting Tech

  8. Jeremy Weiskotten

    Annoying, overly strict validations aren’t just a problem with email addresses.

    I’m actually building a service to help developers and designers detect and measure the impact of these kinds of problems in their sites. We only support Ruby on Rails (2.3, 3) right now but have plans to expand in the future. We’re currently in alpha but will be launching publicly soon.

    Since we’re looking for feedback from people who care about these kinds of issues, I invite you and your visitors to check out http://www.tripwireapp.com, enter your email address, and email jeremy@tripwireapp.com to let me know you’d like early access (the only thing we ask in return is honest feedback).

  9. Mark O'Leary

    You are mildly inconvenienced by losing your archiving shortcut through bad email address sanity checking.

    On the other hand, my perfectly valid email address includes the apostrophe that is part of my name, and it is rejected by nearly three quarters of the websites out there. Most of my ecommerce and service sign-up decisions are predicated largely on whether their forms can cope with “O’Leary” as an email address component.

    Do it right, please, people?

    [at this point I pressed submit and got an error message: "please enter a valid email address". I rest my case. Nice job, WordPress.]

  10. Captain Irony

    Irony: I tried to leave a comment on your blog with the following email address:

    “agre32)(&213 _132.@your company.com”

    I received this message from your site:

    “Error: please enter a valid email address.”

  11. Sinjo Post author

    @Mark O’Leary, @Captain Irony: Haha, I had a feeling I’d get bitten by WordPress on this one. I’ve been thinking of replacing the default comment system with Disqus (for other reasons), though I have no idea if it will pull the same stunt.

  12. Adam N

    The important part is to use the most standard method possible to check the email address (i.e. never write your own, at least use a framework’s or even better, use something from the standard library of whatever language you’re using).

    Unfortunately, Python 2.x has 3 email validators, all a bit different :-/

  13. Graham

    Another thing I’ve seen is when sites blacklist based entirely on the username. I’ve had webmaster@mydomain.com blocked (without a proper error message I might add) … they just never sent me my email. I must be a scammer right? *rolls eyes*

  14. Ehren Murdick

    I disagree. Data validation is important for security reasons; it’s always better to whitelist and assume there are attacks strategies out there that I can’t think of. The issue is over-zealous validation not validation itself. I use RFC822 for all of my email validations.

  15. Raf

    There are reasons why certain sites reject addresses with the + symbol. It allows you to create an unlimited amount of email addresses. If the site operators want to make sure email addresses are unique, such as when they’re giving out trial keys, it makes sense to at least make it a little bit tougher for people to make multiple accounts.

  16. Miguel

    @Raf:

    That’s not a valid reason. Qmail-based systems use the very common character ‘-’ for the same purpose. Should that be banned too? How many email addresses would be rendered unusable that way?

    And what about the hordes of people with their own domains and catch-all delivery enabled? They can “create” all the email addresses they want without using any particular character in the local part.

  17. Adam (DinnerPlanner)

    I’ve found it quite frustrating in the past with email validation from both sides of the fence.

    On one side you want to make it easy for your users when they screw something up, but keep the regex as simple as possible. On the other side, it’s quite annoying when your perfectly valid email is rejected for being invalid.

  18. Brooks Moses

    @Adam: Seems like the appropriate solution there may be a warning rather than a full rejection: “This email address looks weird; perhaps you’ve made a typo. Click ‘Yes’ to indicate that it’s correct, or re-enter your email address in this box.”

  19. greyzone

    Totally agree with Raf. Many websites do everything they can to prevent multiple registrations… so the + symbol will most likely not be accepted any time soon.

  20. Fred Fnord

    Dude, you think that’s bad? A lot of them can’t even get the HOST part right.

    One of my email address includes an address like fred@foo-bar.com

    I have had that rejected at perhaps five percent of the sites I use it on. Because the hostname contains a hyphen.

  21. eoin

    @Lucy Many email clients make special use of when including names and email addresses in the to fields. Maybe try escaping them?

  22. v0mit

    Email validation is just waste of bits and bytes. Most sites especially forums have the email validation just to “prevent” spam, while sites like PayPal actually has a good reason to validate your email.

    A spammer equipped with a domain and a mail server can bypass the email validation of forums/blog easily. Just create a single accont “mailgoeshere@domain.com” and set it to catchall. Then all mail sen to the domain which don’t have account associated with it will en up in that account.

    Then the spammer can just input a “random@domain.com” in the registration, and it ends up in the catchall account. He wont need to make a new account for each email, and has “unlimited” emails at his disposal.

  23. Bill H

    Instead of validation, how about “fixing” broken e-mail addresses? That’s kind of the point of this type of code in the first place. The best implementation of “fixing” that I’ve seen is the MakeValidEmailAddress() function located here:

    http://barebonescms.com/documentation/extra_components/

    Which claims to pass the most ridiculous validation test suite I’ve seen, follows every e-mail standard out there, and uses a state engine instead of regular expressions. Addresses using gmail’s automatic tagging mechanism easily pass through unscathed and it will autocorrect any obvious mistakes you might make such as gmail,com versus gmail.com.

  24. Maarten

    Just another programmer’s item about E-mail validation, 99,99% of all normal usable e-mail addresses you can put on a business card gets validated by even the most stupid regular expressions used around.

    Of course you can see it as a flaw on the programmers side, but .. get real, maybe it’s time to use a normal address the moment you grow up.

    Maarten

  25. foobar

    Just use regular characters for your email address and the problem is solved. Using special chars and quotes and apostrophes not only confuses people that you have to tell your address to over the phone but it gets treated like an invalid address by programmers because IT’S FUCKING RETARDED TO USE THESE SPECIAL CHARS. Stop trying to be a unique individual by requiring people to bend to you loser requests.

  26. pelrun

    foobar, read the second paragraph of the damn article again. Most special characters are silly to have in email addresses even if they’re technically valid, but ‘+’ is *specifically* useful and it’s painful to lose that functionality just because some input validator isn’t doing it’s job properly.

  27. Pingback: Almost all E-Mail validation algorithms are wrong « Kissaki Blog

  28. Matěj Cepl

    And of course, it is even worse (this is more rant against stupid MUAs, I am looking at you Thunderbird!) … no RFC requires @domain part be there either. So it is perfectly all right to send email

    From: me
    To: you
    Subject: test

    example

    And of course it is none of MUAs business whether such email address will be canonicialized or delivered to local user.

    Matěj

  29. Anonymous

    It is, in fact, impossible to validate email addresses with regular expressions. Valid addresses are given by a context free language that is not regular, i.e. it’s one level higher in the Chomsky Hierarchy.

  30. Benny Bottema

    Here’s a complete RFC 2822 compliant e-mail address validation in Java using a collection of regular expressions: http://code.google.com/p/vesijama/source/browse/trunk/src/org/codemonkey/vesijama/EmailValidationUtil.java

    From RFC 2822 itself:

    “This standard specifies a syntax for text messages that are sent between computer users, within the framework of “electronic mail” messages. This standard supersedes the one specified in Request For Comments (RFC) 822, “Standard for the Format of ARPA Internet Text Messages” [RFC822]“

  31. Pingback: Bananas Development Blog

  32. Alexandr Ciornii

    Montreal: this regex is generated, it can be written more readable with Perl regexes (Perl regexes can be much more readable than in other languages), but it would make this regex slightly slower (5%, I suppose).

  33. Pingback: Sinjo » Email address validation: an addendum

  34. Michael

    Perl’s Mail::RFC822::Address made me laugh when I saw it. And everyone’s insistence that it is impossible to validate email addresses using regular expressions got me ambitious. So I wrote an article on email address validation using regular expressions:

    http://squiloople.com/2009/12/20/email-address-validation/

    There’s a regular expression there which conforms to RFC 5321, another which conforms to RFC 5322 (complete with nested comments), and a class which allows for easy control over what to allow in an email address.

    Was I re-inventing the wheel? Yes. But then the wheel used to be made of wood. Now it’s better:

    http://svn.php.net/viewvc/php/php-src/trunk/ext/filter/logical_filters.c?revision=297353&view=markup#l499

    Although you should note that the expression used by PHP is based on an earlier expression of mine. I’ve updated it since then. I’m hoping they’ll update the native code to reflect this (using, I assume, my expression which conforms to RFC 5321 — comments and obsolete local-parts should just be ignored; RFC 5322 is a waste of space).

  35. Steve

    By pointing to the Perl validator and the mail spec, you made it clear that validation itself is not the problem.

  36. Gerold Setz

    Hi,

    detecting disposable email addresses is not so easy (except the ones with a + sign) as the number of domains providing such temporary mail services is growing constantly. That’s why I started a new webbased service that anyone can use. It is better than maintaining local blacklists, I guess.

    Have a look at http://www.block-disposable-email.com/

    Gerold

Comments are closed.