Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save drbawb/6fb32d42a64a7a9ad898d9fc4c626a86 to your computer and use it in GitHub Desktop.
Save drbawb/6fb32d42a64a7a9ad898d9fc4c626a86 to your computer and use it in GitHub Desktop.
From cd326356f6589596c204b3543482926a078e6213 Mon Sep 17 00:00:00 2001
From: Robert Straw <drbawb@fatalsyntax.com>
Date: Sat, 15 May 2021 13:46:18 -0500
Subject: [PATCH] PCI: Disable NVMe controller if a shutdown was requested.
When a PCIe FLR is performed, if the function is an NVMe storage device,
any pending shutdown notifications must be properly handled before the
FLR can be issued. This is particularly important if the Linux kernel was
not in charge of the NVMe drive's last configuration. (e.g: the device was
being used by the vfio-pci driver and a KVM guest.)
Per the NVMe 1.1 specification (s3.1.6) if CC.SHN has previously been set,
and the controller has completed its shutdown process (CSTS.SHST is set
to 10b), failure to disable the controller will lead to undefined behavior.
If no other FLR quirks are defined, and the device is an NVMe storage
device, this patch addresses the above behavior by waiting for the
pending shutdown to complete & then disabling the device.
This obviates the need for the previous Samsung SM961 specific FLR,
nvme_disable_and_flr. The SM951/SM961 appear to behave pathologically
if the NVMe is not followed prior to a PCI reset.
Signed-off-by: Robert Straw <drbawb@fatalsyntax.com>
---
drivers/pci/quirks.c | 48 +++++++++++++++++++++++++++++++-------------
1 file changed, 34 insertions(+), 14 deletions(-)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 653660e3ba9e..37057e2e8bf5 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3811,20 +3811,24 @@ static int reset_chelsio_generic_dev(struct pci_dev *dev, int probe)
#define PCI_DEVICE_ID_INTEL_IVB_M_VGA 0x0156
#define PCI_DEVICE_ID_INTEL_IVB_M2_VGA 0x0166
+
/*
- * The Samsung SM961/PM961 controller can sometimes enter a fatal state after
- * FLR where config space reads from the device return -1. We seem to be
- * able to avoid this condition if we disable the NVMe controller prior to
- * FLR. This quirk is generic for any NVMe class device requiring similar
- * assistance to quiesce the device prior to FLR.
+ * If an FLR is being attempted on an NVMe device it must be ensured
+ * that the controller is not left enabled if a shutdown was previously
+ * requested.
+ *
+ * This quirk ensures that, in the event the device is enabled but a shutdown
+ * has been requested prior to the FLR, the device has time to complete
+ * its shutdown internally and is disabled according to section 3.1.6 of
+ * the NVMe 1.1 specification.
*
* NVMe specification: https://nvmexpress.org/resources/specifications/
- * Revision 1.0e:
+ * Revision 1.1:
* Chapter 2: Required and optional PCI config registers
* Chapter 3: NVMe control registers
* Chapter 7.3: Reset behavior
*/
-static int nvme_disable_and_flr(struct pci_dev *dev, int probe)
+static int nvme_pending_shn_flr(struct pci_dev *dev, int probe)
{
void __iomem *bar;
u16 cmd;
@@ -3846,11 +3850,30 @@ static int nvme_disable_and_flr(struct pci_dev *dev, int probe)
cfg = readl(bar + NVME_REG_CC);
- /* Disable controller if enabled */
- if (cfg & NVME_CC_ENABLE) {
+
+ /* Disable the controller before FLR if a shutdown has been requested. */
+ if (cfg & NVME_CC_ENABLE && !(cfg & NVME_CC_SHN_NONE)) {
u32 cap = readl(bar + NVME_REG_CAP);
unsigned long timeout;
+ /* Cap register provides max timeout in 500ms increments */
+ timeout = ((NVME_CAP_TIMEOUT(cap) + 1) * HZ / 2) + jiffies;
+
+ for (;;) {
+ u32 status = readl(bar + NVME_REG_CSTS);
+
+ /* Wait for controller to acknowledge the pending shutdown. */
+ if (status & NVME_CSTS_SHST_CMPLT)
+ break;
+
+ msleep(100);
+
+ if (time_after(jiffies, timeout)) {
+ pci_warn(dev, "Timeout waiting for NVMe controller shutdown.\n");
+ break;
+ }
+ }
+
/*
* Per nvme_disable_ctrl() skip shutdown notification as it
* could complete commands to the admin queue. We only intend
@@ -3865,10 +3888,6 @@ static int nvme_disable_and_flr(struct pci_dev *dev, int probe)
* NVME_QUIRK_DELAY_BEFORE_CHK_RDY. None of those are yet
* supported by this quirk.
*/
-
- /* Cap register provides max timeout in 500ms increments */
- timeout = ((NVME_CAP_TIMEOUT(cap) + 1) * HZ / 2) + jiffies;
-
for (;;) {
u32 status = readl(bar + NVME_REG_CSTS);
@@ -3885,6 +3904,7 @@ static int nvme_disable_and_flr(struct pci_dev *dev, int probe)
}
}
+
pci_iounmap(dev, bar);
pcie_flr(dev);
@@ -3920,10 +3940,10 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
reset_ivb_igd },
{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IVB_M2_VGA,
reset_ivb_igd },
- { PCI_VENDOR_ID_SAMSUNG, 0xa804, nvme_disable_and_flr },
{ PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr },
{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
reset_chelsio_generic_dev },
+ { PCI_ANY_ID, PCI_ANY_ID, nvme_pending_shn_flr },
{ 0 }
};
--
2.31.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment