Workspaces Module Resolution

Originally introduced by yarn, starting with version 7 npm also provides support for this.

What is it?

Workspaces are first-class support in the package manager for monorepo structures in npm packages. A monorepo is simply a collection of related packages, and having them share a single git repo can help eliminate a lot of redundancy in the project setup, as well as make it easier to work with interdependencies during development.

What does a workspaces project look like?

They're relatively simple. They consist of:

a parent directory whose package.json contains a list (or wildcard pattern) defining subprojects
one or more subprojects containing a package.json file similar to a regular npm package.

// <root>/package.json
{
  "workspaces": [
    "package-a", // `./package-a` is a subproject
    "package-b", // `./package-b` is a subproject
    "libs/*"     // each subdirectory of `./libs` is a subproject
  ]
}

Why use them?

For starters, they allow for a single definition of the various project-level configurations like linting, gitignore, test configuration, etc. Additionally, all devDependencies can be in the root package.json, so not duplicated amongst all of the subprojects. However, the most important reason is the way they deal node_modules and the resulting impact on developing with subproject interdependencies.

TLDR;

With workspaces, your project will hoist node_modules (including symlinks to subprojects) and install them at the top level like this:

.
├── node_modules
│   ├── package-b -> ../packages/package-b/
│   └── some-library
└── packages
    ├── package-a
    └── package-b

Traditional npm link based alternatives created a structure like this:

.
├── package-a
│   └── node_modules
│       ├── package-b -> ../../package-b/
│       └── some-library
└── package-b
    └── node_modules
        └── some-library

Long-winded description of the main problem with alternatives

Monorepos existed before the workspaces feature was added to package managers. The main problem with developing packages this way was handing interdependencies in the local environment.

How interdependencies work in production

First, let's look at what we actually want as an end result when we publish interdependent packages. Say we have 2 pacakges: package-a and package-b. package-a depends on package-b. Furthermore, package-b depends on some other third-party library some-library. The file structure of these as separate packages in your dev environment would look something like:

.
├── package-a
│   └── node_modules
│       └── package-b
└── package-b
    └── node_modules
        └── some-library

If we publish both of our packages, some application that wants to use package-a would add it as a dependency:

"dependencies": {
  "package-a": "^1.0.0"
}

When we npm install, its node_modules would look like this:

some-application
└── node_modules
    ├── package-a
    ├── package-b
    └── some-library

npm flattens all modules into one directory. The project depended directly on package-a, so that gets installed. package-a depends on package-b, so that gets installed too. Finally, package-b depends on some-library, so we install that as well, and the result is that all of those are peers in node_modules. This is the desired effect when the library is released.

How we want the development environment to work

Since these are closely related, we want to be able to work on both at the same time. Every time something changes in package-b, we need to pick up these changes in package-a to be able to test it. Using separate projects, we would need to publish package-b to an npm repository, then update the dependency in package-a. This can get really unwieldy, so ideally we have a way to test the changes before publishing anything. This is where workspaces come in.

To describe the advantage of workspaces, let's first examine the problems of the alternatives. There are numerous ways of acheiving this behavior... npm link, yalc, npm local paths, etc. These all function in roughly the same way. In package-a, instead of downloading each new version from the npm repository, we instead symlink the dependency within node_modules:

.
├── package-a
│   └── node_modules
│       └── package-b -> ../../package-b/
└── package-b
    └── node_modules
        └── some-library

Now, when we change something in package-b, we can rebuild it, and that change will immediate be available in package-a. Everything works, and they all lived happily ever after.

Just kidding.

This works... sometimes. The main problem with this strategy is transitive dependency conflicts. As all of the above methods use a similar solution, they all suffer from the same problem.

Let's say we really want to live on the edge and have package-a depend on some-library as well. If we publish our libraries like this, in production, the end result is exactly the same:

some-application
└── node_modules
    ├── package-a
    ├── package-b
    └── some-library

npm detects the duplicate dependency, and it's installed just like before*. However, in our DEVELOPMENT environment, we now have this:

.
├── package-a
│   └── node_modules
│       ├── package-b -> ../../package-b/
│       └── some-library
└── package-b
    └── node_modules
        └── some-library

Now, we have 2 distinct copies of some-library on disk. Let's assume that they're the exact same version, and are thus completely identical copies. Disk is cheap these days, so what's the problem?

Well, in this situation, code that imports some-library that's part of package-a will load the version at package-a/node_modules/some-library, and code that's part of package-b will load the version at package-b/node_modules/some-library. Sometimes, this is no problem. The imported code is the same, and the behavior is the same. However, the actual VALUES exported from some-library are distinct.

To illustrate, say that some-library consists of just one file:

// some-library/index.js

export const cache = {}

package-b references the cache value, e.g.

// package-b/index.js

import { cache } from '@ehaynes/some-library'

export const getCached = (key) => {
  const value = cache[key]
  return `${key} -> ${value}`
}

And finally, package-a refers to getCached:

import { cache } from '@ehaynes/some-library'
import { getCached } from '@ehaynes/package-b'

cache['foo'] = 'oof'

console.log('cached value:', getCached('foo'))

So what happens when we run package-a/index.js?

$ node index.js 
cached value: foo -> undefined

What happened? Step by step:

package-a imported cache from package-a/node_modules/some-library
package-a set the cache value for 'foo' to 'oof'
package-b imported a DIFFERENT value of cache from package-b/node_modules/some-library
package-b read the cache value for 'foo' from that separate object, which resulted in undefined.

While this is a contrived example, there are many cases where a library might export a constant, and these cause cryptic issues that are hard to track down. Some examples:

anything used as a Map key
Symbols
instanceof checks
~~demonic Java features shoehorned into Typescript~~ err, I mean, @Decorators and all of the reflect-metadata mess on which they rely
Typescript class type definitions
Static members in classes
React Context objects
The entire React 16.x render tree

The worst part about this is that it sort of works. You don't get warnings or errors or failures to start. The code runs, but but doesn't work correctly.

How does the workspaces structure resolve this?

The answer lies in how node resolves dependencies. When node attempts to import code, it does a hierarchical traversal. In other words, from the file doing the import, it looks for a node_modules directory as a peer of that file. If one exists and contains the name of the dependency, it loads the module from there. If not, it steps up a directory and repeats until it reaches the filesystem root. Thus, when package-a/index.js imports some-library, it searches:

<root>/packages/package-a/node_modules/some-library - doesn't exist
<root>/packages/node_modules/some-library - doesn't exist
<root>/node_modules/some-library - Found it!

Similarly, when package-b imports some-library, the same process happens, and the same (and only) copy on disk is resolved. Modules are only loaded once, so all subsequent imports get the same in-memory reference.

Similarly, when package-a imports package-b:

<root>/packages/package-a/node_modules/some-library - doesn't exist
<root>/packages/node_modules/some-library - doesn't exist
<root>/node_modules/package-b -> <root>/packages/package-b - This is a symlink to the package-b subproject, but that's fine. It's loaded just as if it were installed here from an npm repository.

ehaynes99/workspaces-project-structure.md

Workspaces Module Resolution

What is it?

What does a workspaces project look like?

Why use them?

TLDR;

Long-winded description of the main problem with alternatives

How interdependencies work in production

How we want the development environment to work

How does the workspaces structure resolve this?

jcardali commented Feb 18, 2022

ehaynes99 commented Feb 18, 2022 •

edited

Loading

ehaynes99/workspaces-project-structure.md

Workspaces Module Resolution

What is it?

What does a workspaces project look like?

Why use them?

TLDR;

Long-winded description of the main problem with alternatives

How interdependencies work in production

How we want the development environment to work

How does the workspaces structure resolve this?

jcardali commented Feb 18, 2022

ehaynes99 commented Feb 18, 2022 • edited Loading

ehaynes99 commented Feb 18, 2022 •

edited

Loading