Originally introduced by yarn
, starting with version 7 npm
also provides support for this.
Workspaces are first-class support in the package manager for monorepo structures in npm packages. A monorepo is simply a collection of related packages, and having them share a single git repo can help eliminate a lot of redundancy in the project setup, as well as make it easier to work with interdependencies during development.
They're relatively simple. They consist of:
- a parent directory whose
package.json
contains a list (or wildcard pattern) defining subprojects - one or more subprojects containing a
package.json
file similar to a regular npm package.
// <root>/package.json
{
"workspaces": [
"package-a", // `./package-a` is a subproject
"package-b", // `./package-b` is a subproject
"libs/*" // each subdirectory of `./libs` is a subproject
]
}
For starters, they allow for a single definition of the various project-level configurations like linting, gitignore, test configuration, etc. Additionally, all devDependencies
can be in the root package.json
, so not duplicated amongst all of the subprojects. However, the most important reason is the way they deal node_modules
and the resulting impact on developing with subproject interdependencies.
With workspaces, your project will hoist node_modules
(including symlinks to subprojects) and install them at the top level like this:
.
├── node_modules
│ ├── package-b -> ../packages/package-b/
│ └── some-library
└── packages
├── package-a
└── package-b
Traditional npm link
based alternatives created a structure like this:
.
├── package-a
│ └── node_modules
│ ├── package-b -> ../../package-b/
│ └── some-library
└── package-b
└── node_modules
└── some-library
Monorepos existed before the workspaces feature was added to package managers. The main problem with developing packages this way was handing interdependencies in the local environment.
First, let's look at what we actually want as an end result when we publish interdependent packages. Say we have 2 pacakges: package-a
and package-b
. package-a
depends on package-b
. Furthermore, package-b
depends on some other third-party library some-library
. The file structure of these as separate packages in your dev environment would look something like:
.
├── package-a
│ └── node_modules
│ └── package-b
└── package-b
└── node_modules
└── some-library
If we publish both of our packages, some application that wants to use package-a
would add it as a dependency:
"dependencies": {
"package-a": "^1.0.0"
}
When we npm install
, its node_modules
would look like this:
some-application
└── node_modules
├── package-a
├── package-b
└── some-library
npm
flattens all modules into one directory. The project depended directly on package-a
, so that gets installed. package-a
depends on package-b
, so that gets installed too. Finally, package-b
depends on some-library
, so we install that as well, and the result is that all of those are peers in node_modules
. This is the desired effect when the library is released.
Since these are closely related, we want to be able to work on both at the same time. Every time something changes in package-b
, we need to pick up these changes in package-a
to be able to test it. Using separate projects, we would need to publish package-b
to an npm repository, then update the dependency in package-a
. This can get really unwieldy, so ideally we have a way to test the changes before publishing anything. This is where workspaces come in.
To describe the advantage of workspaces, let's first examine the problems of the alternatives. There are numerous ways of acheiving this behavior... npm link
, yalc
, npm local paths, etc. These all function in roughly the same way. In package-a
, instead of downloading each new version from the npm repository, we instead symlink the dependency within node_modules
:
.
├── package-a
│ └── node_modules
│ └── package-b -> ../../package-b/
└── package-b
└── node_modules
└── some-library
Now, when we change something in package-b
, we can rebuild it, and that change will immediate be available in package-a
. Everything works, and they all lived happily ever after.
Just kidding.
This works... sometimes. The main problem with this strategy is transitive dependency conflicts. As all of the above methods use a similar solution, they all suffer from the same problem.
Let's say we really want to live on the edge and have package-a
depend on some-library
as well. If we publish our libraries like this, in production, the end result is exactly the same:
some-application
└── node_modules
├── package-a
├── package-b
└── some-library
npm
detects the duplicate dependency, and it's installed just like before*. However, in our DEVELOPMENT environment, we now have this:
.
├── package-a
│ └── node_modules
│ ├── package-b -> ../../package-b/
│ └── some-library
└── package-b
└── node_modules
└── some-library
Now, we have 2 distinct copies of some-library
on disk. Let's assume that they're the exact same version, and are thus completely identical copies. Disk is cheap these days, so what's the problem?
Well, in this situation, code that imports some-library
that's part of package-a
will load the version at package-a/node_modules/some-library
, and code that's part of package-b
will load the version at package-b/node_modules/some-library
. Sometimes, this is no problem. The imported code is the same, and the behavior is the same. However, the actual VALUES exported from some-library
are distinct.
To illustrate, say that some-library
consists of just one file:
// some-library/index.js
export const cache = {}
package-b
references the cache
value, e.g.
// package-b/index.js
import { cache } from '@ehaynes/some-library'
export const getCached = (key) => {
const value = cache[key]
return `${key} -> ${value}`
}
And finally, package-a
refers to getCached
:
import { cache } from '@ehaynes/some-library'
import { getCached } from '@ehaynes/package-b'
cache['foo'] = 'oof'
console.log('cached value:', getCached('foo'))
So what happens when we run package-a/index.js
?
$ node index.js
cached value: foo -> undefined
What happened? Step by step:
package-a
importedcache
frompackage-a/node_modules/some-library
package-a
set the cache value for'foo'
to'oof'
package-b
imported a DIFFERENT value ofcache
frompackage-b/node_modules/some-library
package-b
read the cache value for'foo'
from that separate object, which resulted inundefined
.
While this is a contrived example, there are many cases where a library might export a constant, and these cause cryptic issues that are hard to track down. Some examples:
- anything used as a
Map
key - Symbols
instanceof
checksdemonic Java features shoehorned into Typescripterr, I mean,@Decorators
and all of thereflect-metadata
mess on which they rely- Typescript class type definitions
- Static members in classes
- React
Context
objects - The entire
React
16.x render tree
The worst part about this is that it sort of works. You don't get warnings or errors or failures to start. The code runs, but but doesn't work correctly.
The answer lies in how node resolves dependencies. When node attempts to import code, it does a hierarchical traversal. In other words, from the file doing the import, it looks for a node_modules
directory as a peer of that file. If one exists and contains the name of the dependency, it loads the module from there. If not, it steps up a directory and repeats until it reaches the filesystem root. Thus, when package-a/index.js
imports some-library
, it searches:
<root>/packages/package-a/node_modules/some-library
- doesn't exist<root>/packages/node_modules/some-library
- doesn't exist<root>/node_modules/some-library
- Found it!
Similarly, when package-b
imports some-library
, the same process happens, and the same (and only) copy on disk is resolved. Modules are only loaded once, so all subsequent imports get the same in-memory reference.
Similarly, when package-a
imports package-b
:
<root>/packages/package-a/node_modules/some-library
- doesn't exist<root>/packages/node_modules/some-library
- doesn't exist<root>/node_modules/package-b -> <root>/packages/package-b
- This is a symlink to thepackage-b
subproject, but that's fine. It's loaded just as if it were installed here from an npm repository.
I assume this is a solved case, but what happens if
package-a
andpackage-b
have different versions ofdependency-c
? I guess that would live in the respective packages node_modules, as opposed to top-level one?