Codemods with Babel Plugins

author
Laurie Barth

Codemods are an incredibly powerful tool that not everyone has heard of, so let's start there. A codemod is an automated way to transform one set of syntax into another. They're often written to ship with API changes or other things that fundamentally change the interface developers interact with. This makes sense because you want users to adopt your latest and greatest improvements, but if you've changed things enough that all their existing code needs to be rewritten, that's a bit of a barrier. Writing a codemod automates those changes and let's them benefit right away.

So codemods are great, right? Right! Except there is something I noticed when searching around the internet for write-ups on them. Almost every tutorial is written using the JSCodeShift API or explains how to make a Babel plugin to do the transform without any of the other necessary context.

Until now! So without further ado, let's dive in.

An Overview

What are we trying to accomplish in this tutorial?

  • We want to use vanilla Babel to write a custom plugin that will transform our code
  • We want to preserve formatting as much as possible so that the git diff users see is laser-focused on the substantive changes the codemod is making
  • We want to include a testing suite and a CLI to make this production ready

Babel Transform

The first thing we're going to do is write a minimal transform that we can run as we set up the rest of our structure. We'll need to install @babel/core.

What's nice is that we can test this code without the rest of our setup, using ASTExplorer. Using the transform toggle at the top select babel v7 and start writing your plugin! Let's walk through an example.

Egghead is no longer just for videos, it has written content as well! So the API is changing from this:

<Egghead video={src} />

To this:

<Egghead content={src} />

Let's write a transform for this. The first thing to note is that babel is passed into every plugin and includes types. If you look at the API you'll notice that it includes all of the AST (Abstract Syntax Tree) types your code is parsed into. It also includes helper methods like isIdentifier that can substitute for type === 'Identifier' which makes the code a bit easier to follow.

export function updateContent(babel) {
const {types: t} = babel
return {
visitor: {
// our transform goes here
},
}
}

We've made a function that acts as our plugin, we've pulled types out of the included babel object and we're ready to return a visitor object. Our main functionality will live inside that object.

Using the AST we see that video is a JSXIdentifier, so we're going to visit that type.

export function updateContent(babel) {
const {types: t} = babel
return {
visitor: {
JSXIdentifier(path) {
if (path.node.name === `video`) {
path.replaceWith(t.jsxIdentifier(`content`))
// or path.node.name === `content`
}
},
},
}
}

When we visit a type we get access to path. path is the node object and additional metadata. In this case, we're looking at the name of the JSXIdentifier and checking if it's video. If it is, we want to replace it. There are two ways to do this.

We can replace the entire path with a new JSXIdentifier with the name "content". Or, we can use the existing JSXIdentifier and override the name. This is a simple type so either way is fine, but it's useful to know both are possible as you traverse more complicated types.

Preserving Formatting

So far our example is a single line, but it exists inside larger files and projects. If we don't preserve formatting we could wind up changing every line of that file with small things like spaces and semicolons.

We can prevent that using recast, so let's install that. Recast gives us functions that parse and print our code while preserving things like white space.

Let's import the functions we need and set up a wrapper function that will call our updateContent plugin.

import {parse, print} from 'recast'
export function babelRecast(code) {
const ast = parse(code)
const result = print(ast).code
return result
}

Right now, we're parsing code and printing it using recast. But it's missing a few things. First, recast defaults to using a parser called esprima, we want to use Babel. Second, we're not running our plugin on the code! So let's solve those problems.

Using a Babel parser with recast

There are a few different ways to use Babel with recast. The simplest is to use the Babel parser that recast includes for us.

import {parse, print} from 'recast'
export function babelRecast(code) {
const ast = parse(code, {parser: require('recast/parsers/babel')})
const result = print(ast).code
return result
}

This works perfectly fine. But what if you want something a bit more custom? As it turns out, you can pass a completely custom parser to recast. In this case, we're going to grab parseSync from @babel/core and set it up ourselves. [Note that you need to install all the plugins you use here].

import {parse, print} from 'recast'
import {parseSync} from '@babel/core'
export function babelRecast(code, filePath) {
const ast = parse(code, {
parser: {
parse: (source) =>
parseSync(source, {
plugins: [
`@babel/plugin-syntax-jsx`,
`@babel/plugin-proposal-class-properties`,
],
overrides: [
{
test: [`**/*.ts`, `**/*.tsx`],
plugins: [[`@babel/plugin-syntax-typescript`, {isTSX: true}]],
},
],
filename: filePath,
parserOpts: {
tokens: true, // recast uses this
},
}),
},
})
const result = print(ast).code
return result
}

There is a lot going on here, so it's worth talking through it. parser is still taking a custom parser, but it's one we're defining ourselves using the Babel parsing function parseSync. We're passing a few plugins as well as an override. The override looks for TypeScript files and includes the TypeScript syntax plugin when it finds one. The test RegEx is compared to the filename that we're passing into the function.

The last piece is the strangest. tokens:true. Inside of recast, it checks to see if the custom parser has been successful. It does so by looking at ast.tokens, otherwise it falls back to esprima. tokens are not returned by default, so we need to return them as part of the AST so recast respects our custom parser.

Running the plugin with recast

To make the example easier to follow we're going to use the first of our parsing examples. Using the AST we get as a result, we'll import transformFromAstSync from @babel/core. This function will take our code when it's in the form of an AST and transform it.

import {parse, print} from 'recast'
import {transformFromAstSync} from '@babel/core'
export function babelRecast(code) {
const ast = parse(code, {parser: require('recast/parsers/babel')})
const options = {plugins: [updateContent]}
const {ast: transformedAST} = transformFromAstSync(ast, code, options)
const result = print(transformedAST).code
return result
}

So that's it? Not quite.

Important options for our Babel transform

There are a few key options we need to pass to the transformFromAstSync function.

const options = {
cloneInputAst: false,
code: false,
ast: true,
plugins: [updateContent],
}

The first is cloneInputAst. By default, this option is set to true and ensures that the transformFromAstSync method will clone the input AST to avoid mutations. However, recast uses the AST to store metadata information, allowing it to preserve formatting. If we let Babel clone it, that information gets lost. So we set cloneInputAst to false in order to preserve the work recast has done.

Second, we set code to false to improve performance. If we're parsing a large project to modify code, this is useful.

Finally, we want to set ast to true because by default, Babel won't return the AST.

A working recast implementation

Let's put it all together. We're using a Babel parser inside recast to get our code as an AST. Then we pass our plugin, which does the actual code transformation, to the transformFromAstSync function and get the resulting ast. Note that we alias this to transformedAST because we already have an ast variable in this scope. Lastly, we use the recast print function to turn our transformed AST back into code and return it.

import {parse, print} from 'recast'
import {transformFromAstSync} from '@babel/core'
export function babelRecast(code) {
const ast = parse(code, {parser: require('recast/parsers/babel')})
const options = {
cloneInputAst: false,
code: false,
ast: true,
plugins: [updateContent],
}
const {ast: transformedAST} = transformFromAstSync(ast, code, options)
const result = print(transformedAST).code
return result
}

Now we're running our Babel plugin within recast, preserving formatting!

Wrapping our transform

We could stop here and write a custom CLI and testing suite that uses the babelRecast function. However, there is a faster way. We can use JSCodeshift!

Early on I said we didn't want to use JSCodeshift and now we are, so what's going on? Well, JSCodeshift comes with its own API that differs from using Babel types and visitors. It's custom to the library, rather than being more widely used. However, you can use the library without using that API, which is what we're going to do. This way, we can swap it in and out if we decide not to be tied to that particular tool.

We start by installing the dependecy. As it turns out, JSCodeshift is a module that exports a function. So long as that function takes in code and exports code, you can include whatever you want inside of it!

export default function jsCodeShift(file) {
const transformedSource = babelRecast(file.source, file.path)
return transformedSource
}

That's genuinely all we have to do. We don't have to use the API at all. And what do we get? Well, a lot.

We're now able to run our codemod via CLI:

jscodeshift -t <transform-filename> <transform-target-dir>

This command spins up workers for us and handles batching.

We're also able to make use of testUtils and set up a full testing suite. We make a transforms folder that includes a egghead-codemod.js file with all the code we've written so far. Inside that same directory we need __tests__ and __testfixtures__.

Inside __testfixtures__, we can put as many input/output files as we like. For example, example.input.js:

<Egghead video={src} />

And example.output.js:

<Egghead content={src} />

When we run tests, this input will be transformed using egghead-codemod.js and the result will get compared to the output file. Ensuring that things work as expected.

In order for this to work, we need to do one more thing. Inside __tests__ we need to create egghead-codemod-test.js where we call the tests.

const tests = [`example`]
const defineTest = require(`jscodeshift/dist/testUtils`).defineTest
describe(`codemods`, () => {
tests.forEach((test) =>
defineTest(__dirname, `egghead-codemod`, null, `${test}`),
)
})

Now, we can add additional file names to the tests array and it will run each of them.

One last thing. We showed how a custom parser can be used in recast to work with TypeScript and JavaScript at the same time. We can do a similar thing here to ensure tests run against other file types and don't overwrite their file extensions.

const typescriptTests = [`example`]
const defineTest = require(`jscodeshift/dist/testUtils`).defineTest
describe(`codemods`, () => {
typescriptTests.forEach((test) =>
defineTest(__dirname, `egghead-codemod`, null, `${test}`, {
parser: `ts`,
}),
)
})

And there we have it!

Back to our plugin

We now have a fully functional codemod. At the moment, it isn't doing a whole lot. But it can. I'll add a few extra tips here that might come in handy when writing your own.

In addition to types, Babel also includes template. So inside our plugin we can do this.

export function updateContent(babel) {
const {types: t, template} = babel
return {
visitor: {
ImportDeclaration(path) {
if (path.node.source.value !== `Egghead`) {
return
}
const newImport = template.statement
.ast`import {Egghead} from "egghead-content"`
path.replaceWith(newImport)
},
},
}
}

This allows us to write full replacement statements instead of having to change each part of the ImportDeclaration. It's also worth noting that this takes a template literal, so we can use variables as well.

If you want to make a change to both an import and the thing that uses it, you can take advantage of the fact that path comes with scope information. This is quite undocumented, so it took some asking around.

Let's assume that the example code is something like this:

import Egghead from 'egghead'
<Egghead video={src} />
export function updateContent(babel) {
const {types: t, template} = babel
return {
visitor: {
ImportDeclaration(path) {
if (path.node.source.value !== `Egghead`) {
return
}
const localName = path.node.specifiers?.[0]?.local?.name
const usages = path.scope.getBinding(localName)?.referencePaths
usages.forEach((item) => {
// each item is a usage of Egghead
})
},
},
}
}

This is a great way to use information about the import (or variable, etc) within the usage you're attempting to change.

Codemods can be incredibly involved, so knowing all these plugin tips is useful.

Make your own

Now it's time to dive in and make your own. I highly recommend starting with ASTExplorer to learn the syntax and get comfortable with it if you aren't already. And remember, custom Babel plugins aren't just for codemods.