Technically Monthly (Issue 2, May 2025)

Notes on this new MCP thing, Databricks vs Snowflake, new posts on CSS, AI model training, how marketers can get more technical, and updates to all 16 public company breakdowns on Technically.

Justin

and

David

May 01, 2025

Hello loyal, handsome Technically readers. This is the second ever Technically Monthly, and it’s a big one. We’ve got:

3 brand, spanking new posts on Technically 2.0
Updates to all 16 of the public software company breakdowns on Technically
A guest explanation of what this MCP (model context protocol) thing is all about
A guest take on the new front in the cloud data competition between Databricks, GCP, Snowflake and others
A new term in the Technically universe: model inference

New on Technically 2.0

What’s CSS?

Available as a sneak peek to paid subscribers on Substack, and now in its permanent home in the Building Software Products knowledge base.

Today, as we wade through the ocean of the internet, we’re blessed with a vibrant marine ecosystem, a veritable coral reef of colors, borders, shapes, gradients, and textures.

We have one person to thank for this, and that’s Håkon Wium Lie, the inventor of CSS.

Available as a sneak peek to paid subscribers on Substack, and now in its permanent home in the AI, it's not that Complicated knowledge base. Thanks again to Vercel for sponsoring our writing on AI this year.

Language models are trained in fundamentally different ways than classic ML models. What do things like pre-training, instructional fine-tuning and RLHF mean, and how does language model training just come down to a curating an ideal set of questions & answers?

The top 5 technical things Marketers should know about engineering

Available free for all humankind (no pets please). Feel free to forward to that marketer we all know, who has a developer inside waiting to get out.

Let's face it - marketers work with a highly technical set of tools and constantly rely on engineers for changes to the site or moving data around.

Today more than ever, being more technical will help you understand what’s going on under the hood, become more self-sufficient, and ultimately just move faster.

Updates to all public company breakdowns

We’ve finally got our facts straight.

In April we updated all 16 public company breakdowns (including those on companies like Snowflake, MongoDB, Cloudflare) based on the facts in their most recent SEC filings.

What’d we find most interesting?

The way in which public companies are handling their 💅 “AI makeover” since we first covered them. Some are acquiring companies, and some are slapping on a new coat of paint by supporting vectors, but every company feels a need to do…something. More on these AI makeovers in future issues.

Notes from the field: major technical news

AI in Practice: What’s MCP and what does it actually do?

From Technically’s AI adoption correspondent, Jason Ganz, who leads developer experience at dbt Labs.

You might have noticed over the last several weeks, a whole lot of announcements that looked like this:

Well I’m one of the guys in the picture. Let’s take a look at what MCP even is, why it matters, and where it might go from here.

What is Model Context Protocol (MCP)?

From the MCP docs:

MCP is an open protocol that standardizes how applications provide context to LLMs. Think of MCP like a USB-C port for AI applications. Just as USB-C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools.

Like the name says, MCP is a protocol to give context to a model. So if you break it down:

Model: A large language model like GPT 4o, Claude 3.7 Sonnet or Gemini 2.5
Context: Information that gets fed into the model at inference time
Protocol: a system of rules that allows two or more entities of a communications system to transmit information

Providing the Right Context at the Right Time

One of the things that make Large Language Models (LLMs) powerful is that they can do in-context learning, meaning as you feed them information in a chat, they’re able to keep using it.

If you put a page of a book into an LLM, it will be able to summarize that page, even if it’s never seen that book before.

But context windows are limited: by page 38 you might run out of context capacity, just as the story was heating up. So we need a more dynamic way of passing context to a language model.

This is a blocker for a whole bunch of useful applications that could take action on your behalf.

If I’m asking an LLM what my schedule looks like tomorrow, it can make a wild guess. But it doesn’t actually know what my schedule is, or have any reasonable way of finding that information.

That’s where MCP comes in - now, instead of having to copy and paste a screenshot of my Google Calendar into Claude, I can hook the Google Calendar MCP server into it, and when I ask “Do I have a lunch meeting tomorrow”, it will query the Google Calendar MCP server to get the necessary context (what calendar events I have booked tomorrow).

MCP is the new API

If you’re thinking - this sounds a lot like an API, you’re right. A good mental model for right now is that MCPs can give LLMs access to any information that lives behind an API.

But there’s one more important step - say I want to tell my AI system to schedule a 3 hour block at lunch for me to do “Important Business”. If I already have something scheduled at that time, MCP tools could take action in other systems to make that 3hr block possible, like sending requests (with apologies of course) to move my existing meetings.

You, the language model, and the MCP tools work together in a loop:

You ask the LLM to clear your calendar
The LLM determines what MCP tools to use (your calendar) and fetches the context it needs (your events)
The LLM tells you what it plans to do, and generally asks nicely for your approval
Once approved, the LLM uses MCP tools to move around your calendar events and schedule a block

When you hear about AI agents, it’s basically this: an AI system using tools and running loops like this one.

Where is MCP going

MCP adoption started as a slow burn, but over the past month an MCP Workshop went just as viral as it’s possible for a dev tools conference talk to go, we’ve seen a tremendous amount of momentum. Particularly we’ve seen:

An influx of Official and Unofficial MCP servers to serve as context sources for MCP
A commitment to support MCP by OpenAI, Google and Microsoft

The second point is important - if all of the big players in the industry are really going to adopt MCP, that means that it has a solid chance of becoming a true standard. Watch to see how deeply MCP actually gets integrated into products that aren’t Anthropic’s (who created MCP).

The other thing to watch is the evolution of the protocol itself - there’s a lot of functionality to build out to make this fulfill its promise. The two things I’m looking for are support for remote MCPs (hosted in the Cloud - right now most MCP only works on Desktop applications and is pretty clunky to setup if you’re not a developer) and support for enterprise authentication like OAuth, which will be necessary for businesses to adopt this at scale.

Datagopoly Watch: cloud providers get into the moving business

From Technically’s data infrastructure correspondent, Elliot Gunn of Datafold, a platform for data quality and automated data migration, who previously answered the question “What’s a data migration?”

There’s a new front in the cloud data infrastructure competition between Databricks, Snowflake, AWS and Google Cloud.

Previously, vendors fought for market share of data workloads, the actual compute time that you buy to run analytics or machine learning
Now, they're racing to automate business logic migration: how you define metrics like revenue, which can be more complex than you’d think
Whoever solves this first will lead in platform adoption, as logic migration is a key blocker to moving between platforms

Workload wars: The backstory

Over the past decade, cloud data vendors (Snowflake, Databricks, AWS, GCP) competed for market share of companies' data workloads.

Snowflake won analytics share from AWS’s Redshift by separating compute from storage, making it easier to scale up workloads as your data size grew. Databricks introduced the "Lakehouse" (don’t ask) to take share from Snowflake. Snowflake and BigQuery added Python support to capture machine learning workloads from Databricks.

Each vendor sought workload share by serving different types of workloads (analytics vs. machine learning).

Business logic is the new battleground

Cloud data vendors didn't care historically what analytics you're computing - they just provided the compute. But you can't just switch platforms — you have to migrate both data and business logic.

The data part is mostly solved. The business logic part has been a money pit, requiring engineers or expensive consultants.

Every company is a unique snowflake (the actual ❄️, not the data warehouse) in how it calculates metrics. Business logic encodes years of institutional knowledge and is tedious to migrate.

This is where cloud vendors make real money - from global companies modernizing decades-old systems, which can require years-long migration projects.

What cloud data platforms are doing

Three major players have made moves:

Google launched upgrades to its BigQuery Migration Service at Cloud Next
Snowflake made its previously premium SnowConvert migration tool free
Databricks acquired BladeBridge, a popular SQL-conversion tool

All claim AI as their edge, but their approaches vary:

BladeBridge handles customized logic but is now Databricks-only
SnowConvert only converts to Snowflake SQL
BigQuery Migration Service has patchy support for complex logic

None offer true end-to-end automation or automate the hardest step: data validation, which proves the new logic produces the same output as the old logic.

What to watch for

We're still far from commoditizing data migrations. Look out for:

Enterprise case studies validating these automated tools really work
Support for modern frameworks like dbt or Coalesce.io, which make migration more straightforward
Self-serve solutions without heavy vendor support

Other cloud providers, like AWS, have sat on the sidelines but will potentially respond with similar moves in 6-12 months.

Open table formats like Iceberg and Delta (which we’ll probably cover in a future note) are the elephant in the room - potentially letting you use multiple platforms on the same dataset without migration, which could throw the competitive dynamic into a blender.

New in the Universe: Model inference

Let’s close with a favorite term we added to the Technically universe of technical concepts recently, AI model inference:

Inference is a fancy term that just means using an ML model that has already been trained. In most contexts, it refers to using the model via API, although technically something like prompting an LLM is also inference.

Coming up next month

Thanks for joining us for Technically Monthly. Stay tuned for the next one, which will feature posts on:

An overview of the market for data stores
What’s generative AI?
How can AI use websites?
How NoSQL databases are a great fit for AI