## The TL;DR
Tokenization (nothing to do with NFTs) is one of the ways that backends protect sensitive information, like credit cards or social security numbers.
Sensitive information is important in many of the apps you use every day, but it’s dangerous to store and use unprotected
Tokenization replaces sensitive data with placeholders that point back to it
To tokenize your data, you’ll need a system/platform to create and manage tokens
Tokenization works in step with encryption, but in theory may be more secure
Security is always part and parcel of building a good application. But as more data privacy regulation seems to be on the way, developers are thinking more about how to secure their application (and user) data.
## Securing sensitive data is quite important
Today it’s pretty normal (standard?) to casually input your credit card information into some random site on the internet, or assume a government form on the IRS website is going to take care of that social security number they’re asking for. I remember back in high school when I was buying and selling fake NBA jerseys (not as profitable as Technically), I somehow trusted these random Google-translated Chinese websites with my debit card number and PIN.
So what’s actually happening behind the scenes when you upload sensitive information to a website? First things first, it’s probably getting stored somewhere.
Recall that application backends are almost always just a bunch of data, usually stored in a relational database like Postgres or MySQL. You’ve got all of your application data in that database – users, orders, products, whatever your app needs – and there are a lot of people and systems regularly accessing it. Most of that data isn’t that sensitive, and in fact is to some degree publicly available (prices of items on your site, for example).
🚨 Confusion Alert 🚨
In addition to actual developers on your team having access to your production database, it will also likely be connected to third party services. You might have your database hooked up to a data platform to do some ETL into your data warehouse, a monitoring service like Datadog, or an app builder like Retool. If any of those components are breached, your database is also in danger.
🚨 Confusion Alert 🚨
But user data, credit cards, SSNs, bank account numbers, and the like are very sensitive. And if they’re stored alongside the rest of your business data, anyone who has access to your database has access to that information. It only takes one compromised account for you to get in a lot of shit (see the Twitter admin panel breach).
Because of how delicate this data is (and perhaps in response to high profile breaches over the past decade), we’re seeing more regulation aimed at tightening up how companies store sensitive data. Credit cards are a great place to start. Since 2004, the major credit card companies (Visa, Mastercard, Discover, etc.) have required companies to comply with a set of standards called PCI DSS, or the Payment Card Industry Data Security Standard. The intricacies here are beyond my knowledge, but my understanding is that you must either encrypt or tokenize your users’ credit card information. Then you get to display this nice logo, after a PCI Level 1 compliance assessment that can apparently cost tens of thousands of dollars.
This is just one example, but there’s a lot more regulation coming. Nacha, the governing body for the U.S.’s ACH system, just issued some new rules requiring companies processing a lot of ACH payments to either encrypt or tokenize their bank account data. There are also headwinds around regulating data residency, which would require user data to be physically stored in the same country as the user who generated it.
All of which is to say – protecting sensitive information in your systems is very important. Tokenization is one way of doing it.
## Tokenization puts it all behind a curtain
Tokenization helps developers protect sensitive data in their systems by creating a layer of separation between your sensitive data (credit cards, bank account info, etc.), the rest of your data (products, users, orders, whatever), and the systems that need to use that sensitive data.
Imagine a piece of sensitive data. It might look something like this:
{
name: “technically”,
credit_card_number: 1729364366719903,
credit_card_expiration: “07/27”,
credit_card_security_code: 331
}
To protect this data, you’ll run it through your tokenization system and it will do two things:
Spit out a token to point to this piece of sensitive data.
The token might just be a random string of numbers, like:
{
“token_id”: ‘Ei1763h-1kjdghf7-55461-kdsjd90-su1881’
}
Or in some systems, you might want it to retain some qualities of the original piece of data that it’s representing. If we were tokenizing a credit card number, we might have the token’s length be the same as a credit card number, and even maybe carry the last 4 digits of the actual card:
{
“token_id”: ‘5555-5555-5555-9903’
}
The benefit of this format is it’s more human readable – you can probably tell it’s meant to represent a credit card of some sort – and also that it matches the original size and shape of the data, so storing it in your systems is easier.
Translate the token back to the original, useful data.
Once your data is tokenized, the token is mostly useless to your systems – you can’t charge a fake credit card number. So when you need to use that tokenized data, you’ll have to be able to translate the token back to the original data format. So instead of storing the actual sensitive data in your systems and accessing it directly, you place the tokenization system in between your app and the sensitive data, and translate whenever you need to use the data directly. This translating is also called detokenization.
For a more concrete example, imagine you’re an e-commerce site selling some very legal, completely not fishy stuff. Your customers need to input their credit card information to buy something, and you want to save that information to make subsequent purchases easier (let’s say they create an account). Storing that credit card information directly in your systems and accessing it directly is risky (see previous section).
Instead, you store the credit card info in a completely separate system, and that system spits out a token representing the credit card. You store that token in your regular database for future use. When your customer (hopefully) returns and you need to charge their credit card again, you pop out the token, make a request to the tokenization system, and it sends you back the actual credit card information, so you can send that to your payment processor and (finally) make some money.
## The tokenization platform, and where encryption fits in
A few misc. notes:
### What a tokenization “platform” is
We’ve mentioned this concept of the “tokenization system” a few times – what exactly is that? Do you build it or is it something you pay a company for?
Your tokenization system is just a series of little things your developers will put together to implement tokenization for your app. People really use the word platform too much these days. Anyway it’s usually the following things:
A storage system for:
Storing sensitive data
Storing which token refers to which data
Managing authentication and authorization (who gets to access this system, what they get to do)
API endpoints for:
Storing new data and tokenizing it
If your system supports creating tokens that look like the original data, some facility for this as well
Detokenizing or translating tokens
Sharing tokenized or detokenized data with other systems (e.g. a payment processor)
Is this impossible to build yourself? Absolutely not. But like any self contained element of a backend these days, it’s worth thinking about whether you’d prefer to buy it than build it yourself. There are third party tokenization platforms like Basis Theory and VGS that take care of all of the above (plus things like search, access policies, and embeddable forms), but of course come with their own set of tradeoffs.
🔍 Deeper Look 🔍
Your sensitive data has to get stored somewhere. And in theory, your tokenization platform could also be breached. The real security benefit here is having it separated from the rest of your production data. That means access to it can be tightly controlled instead of just being given to everyone who can access your regular business data. But yes, it’s not entirely foolproof.
🔍 Deeper Look 🔍
### Where encryption fits in
If you’ve read up about encryption, a lot of this post may sound familiar. Tokenization is slightly different than encryption, and which you choose – you can do both by the way – to secure your sensitive data depends on your requirements and priorities – remember, every application is a snowflake ♥️. A few highlights:
Tokens can look like your original data, while encrypted data must be a very long random string → in theory, tokens are easier to store
Tokens cannot be traced back to the original data, while encrypted data can be brute forced (although this is highly unlikely)
Encrypted data is decrypted with a key (sharing them is notoriously difficult), while tokenized data is detokenized through your platform (which you can authenticate to in various ways)
Encryption is a lot more popular as of the time of writing
Basically, you’d most likely pick one or the other. If any of these bullet points are too in the weeds, just keep in mind that tokenization and encryption are two methods for securing your sensitive data, and both have their strengths and weaknesses.
Thanks for this article Justin. I am a product manager in a Fintech company that uses blockchain to tokenize assets to provide fractional government bonds. I didn't know it was a process used to store sensitive information like encryption.
Any POV on why encryption is more popular than tokenization? It sounded like tokenizing is more secure and easier to use so was kind of surprised at the last bullet.