Issue on Cloud with FreeBSD upgrades

Hello,

We are currently experiencing some issues with the late versions of FreeBSD on our Cloud Servers, making the upgraded instances unable to boot.

At the moment you should not upgrade to versions above 11.2-RELEASE until we figured out a solution.

We tracked down the issue to bug #238258 of FreeBSD and is related to the structure of our image and the use of a raw ZFS disk image.

We are working to fix this issue quickly, and we will keep you informed here as soon as possible.

Thanks for your patience.

1 Like

Note that the VM running FreeBSD 11.x-RELEASE with UFS does not have a boot issue when upgraded to FreeBSD 12.1-RELEASE.

The creation of new FreeBSD servers has been switched to our UFS 11.2-RELEASE image, servers created using this image can be safely upgraded.

We are currently writing a guide to help you upgrading a server created with the ZFS image.

We are working to release an up-to-date 12.1-RELEASE image soon.

Here is a way to upgrade if you are currently using our 11.x image with a ZFS root:

  1. Create a new disk of the same size and attach it to your server
  2. Identify the disk on your server here xbd1
  3. Create a GPT partition table and creates an MBR boot sector
# gpart create -s gpt xbd1

# gpart add -a 4k -s 512K -t freebsd-boot xbd1
  1. Insert the bootloader on this MBR boot sector
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 xbd1
  1. Create a partition for the ZFS volume
# gpart add -a 1m -t freebsd-zfs -l disk0 xbd1
  1. Identify its GPT UUID here 636e5485-258f-11ea-9cce-00163e413eaa
# glabel status
                                      Name  Status  Components
                             gpt/gandiswap     N/A  xbd25p1
                           gpt/gandiconfig     N/A  xbd25p2
gptid/d7550f30-20b9-11ea-9b55-0cc47aa311b8     N/A  xbd25p2
gptid/46c1ed78-258f-11ea-9cce-00163e413eaa     N/A  xbd1p1
                                 gpt/disk0     N/A  xbd1p2
gptid/636e5485-258f-11ea-9cce-00163e413eaa     N/A  xbd1p2
  1. Add it as mirror to the systemroot pool on boot disk
# zpool attach systemroot /dev/xbd0 /dev/gptid/636e5485-258f-11ea-9cce-00163e413eaa
  1. Wait for resilvering
# zpool status
 pool: systemroot
state: ONLINE
 scan: resilvered 1012M in 0h0m with 0 errors on Mon Dec 23 14:27:14 2019
  1. Remove boot disk from mirror on systemroot pool
# zpool detach systemroot xbd0
  1. Shutdown your server using admin interface
  2. Detach old disk
  3. Put new disk in position 0
  4. Specify Raw boot system on new disk
  5. Start your server
  6. You can delete the old disk
  7. You should now be able to upgrade using the usual FreeBSD upgrade process.

Don’t hesitate to ask questions about this process here.

4 Likes

@diconico07 Thanks a lot but these didn’t work for me. I’m still trying to boot my previous volume or at least just get the data back so I can put them back on a new server.

It started on January 8, 2020, when I tried to upgrade my freebsd 11.2 following the normal process and after typing shutdown -r now. The server restarted with a CPU at 100% and it was impossible to access it. :confused:

Then I followed these steps:

  1. Created the server epave with FreeBSD image. IP: 92.*.*.*
  2. The server has the disk sys-epave as a default (20 Go).
  3. Attached the old disk web01 (raw (xen) boot system) to server epave (256 Go) (that I detached from the previous server)
  4. Created a new disk sunrise (256 Go)
  5. Attached sunrise to epave server.
  6. Position:
    • 0 sys-epave
    • 1 web01
    • 2 sunrise
  7. ssh admin@92...*
  8. uname -a
    FreeBSD epave 11.2-RELEASE FreeBSD 11.2-RELEASE #0 r335510: Fri Jun 22 04:32:14 UTC 2018     root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64`
    
  9. geom disk list
     Geom name: xbd0
     Providers:
     1. Name: xbd0
     Mediasize: 21474836480 (20G)
     Sectorsize: 512
     Stripesize: 32768
     Stripeoffset: 0
     Mode: r1w1e3
     descr: (null)
     ident: (null)
     rotationrate: unknown
     fwsectors: 0
     fwheads: 0
    
     Geom name: xbd25
     Providers:
     1. Name: xbd25
     Mediasize: 536870912 (512M)
     Sectorsize: 512
     Stripesize: 32768
     Stripeoffset: 0
     Mode: r1w1e2
     descr: (null)
     ident: (null)
     rotationrate: unknown
     fwsectors: 0
     fwheads: 0
    
     Geom name: xbd1
     Providers:
     1. Name: xbd1
     Mediasize: 274877906944 (256G)
     Sectorsize: 512
     Stripesize: 32768
     Stripeoffset: 0
     Mode: r0w0e0
     descr: (null)
     ident: (null)
     rotationrate: unknown
     fwsectors: 0
     fwheads: 0
    
     Geom name: xbd2
     Providers:
     1. Name: xbd2
     Mediasize: 274877906944 (256G)
     Sectorsize: 512
     Stripesize: 32768
     Stripeoffset: 0
     Mode: r0w0e0
     descr: (null)
     ident: (null)
     rotationrate: unknown
     fwsectors: 0
     fwheads: 0
    

So probably web01 is xbd1, and sunrise is xbd2

gpart show

=>       3  41943029  xbd0  GPT  (20G)
         3       137     1  freebsd-boot  (69K)
       140  41942836     2  freebsd-ufs  (20G)
  41942976        56        - free -  (28K)

=>      8  1048560  xbd25  GPT  (512M)
        8       56         - free -  (28K)
       64  1027965      1  freebsd-swap  (502M)
  1028029        3         - free -  (1.5K)
  1028032    20480      2  ms-basic-data  (10M)
  1048512       56         - free -  (28K)

=>       63  536870849  xbd1  MBR  (256G)
         63  536870849        - free -  (256G)

glabel status

                                      Name  Status  Components
gptid/75fe8dce-7b83-11e8-8a34-109836303c01     N/A  xbd0p1
                             gpt/gandiroot     N/A  xbd0p2
                             gpt/gandiswap     N/A  xbd25p1
                           gpt/gandiconfig     N/A  xbd25p2
gptid/de280618-337a-11ea-a132-0cc47af4349e     N/A  xbd25p2
                            ext2fs/sunrise     N/A  xbd2

ok let’s try the process on this thread

$ su
Password:
root@epave:/ # /sbin/gpart create -s gpt xbd2
xbd2 created
root@epave:/ # /sbin/gpart add -a 4k -s 512K -t freebsd-boot xbd2
xbd2p1 added, but partition is not aligned on 32768 bytes
root@epave:/ # /sbin/gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 xbd2
partcode written to xbd2p1
bootcode written to xbd2
root@epave:/ # /sbin/gpart add -a 1m -t freebsd-zfs -l disk0 xbd2
xbd2p2 added
root@epave:/ # glabel status
                                      Name  Status  Components
gptid/75fe8dce-7b83-11e8-8a34-109836303c01     N/A  xbd0p1
                             gpt/gandiroot     N/A  xbd0p2
                             gpt/gandiswap     N/A  xbd25p1
                           gpt/gandiconfig     N/A  xbd25p2
gptid/de280618-337a-11ea-a132-0cc47af4349e     N/A  xbd25p2
gptid/0125513a-33b3-11ea-8dac-00163e6fc064     N/A  xbd2p1
                                 gpt/disk0     N/A  xbd2p2
gptid/6b04af6d-33b3-11ea-8dac-00163e6fc064     N/A  xbd2p2

GPT UUID: 6b04af6d-33b3-11ea-8dac-00163e6fc064

root@epave:/ # zpool attach systemroot /dev/xbd1 /dev/gptid/6b04af6d-33b3-11ea-8dac-00163e6fc064
cannot open 'systemroot': no such pool
root@epave:/ # zpool status
no pools available

I’m stucked here so far. I probably did something wrong.
My main priorities.

  1. Save the data from web01(then I’ll reconfigure everything to make the data work again)
  2. Better directly reboot web01 without reconfiguring everything.

Any help, or push in the right direction is welcome.

you need to zpool import systemroot first in order to be able to do any actions on it, when the import is done you should be able to continue the procedure (i.e do the zpool attach and so on)

@diconico07 Thanks for the recommendation. This didn’t work.

root@epave:/usr/home/admin # zpool import systemroot
cannot import 'systemroot': no such pool available

So there is something which is not clear.

root@epave:/usr/home/admin # zpool status
no pools available
root@epave:/usr/home/admin # zfs list
no datasets available

In case I was not clear previously.

web01 is the disk which was on the busted server and which contained the freebsd install and all the data.

gpart shows the web01 volume as xdb1, but that’s all.

root@epave:/usr/home/admin # gpart show
=>       3  41943029  xbd0  GPT  (20G)
         3       137     1  freebsd-boot  (69K)
       140  41942836     2  freebsd-ufs  (20G)
  41942976        56        - free -  (28K)

=>      8  1048560  xbd25  GPT  (512M)
        8       56         - free -  (28K)
       64  1027965      1  freebsd-swap  (502M)
  1028029        3         - free -  (1.5K)
  1028032    20480      2  ms-basic-data  (10M)
  1048512       56         - free -  (28K)

=>       40  536870832  xbd2  GPT  (256G)
         40       1024     1  freebsd-boot  (512K)
       1064        984        - free -  (492K)
       2048  536866816     2  freebsd-zfs  (256G)
  536868864       2008        - free -  (1.0M)

=>       63  536870849  xbd1  MBR  (256G)
         63  536870849        - free -  (256G)

PS: congratulations on the luxemburg recovery

Hmm that’s weird, can you try a zpool import (with no other argument) and see if it sees any importable pool (just tried the procedure in the already upgraded situation and I’ve been able to import the pool)

@diconico07 ah this time it returns something more interesting.

root@epave:/usr/home/admin # zpool import
   pool: gandiroot
     id: 3154313865369553295
  state: ONLINE
 status: The pool was last accessed by another system.
 action: The pool can be imported using its name or numeric identifier and
	the '-f' flag.
   see: http://illumos.org/msg/ZFS-8000-EY
 config:

	gandiroot   ONLINE
	  xbd1      ONLINE

Well @karlcow it seems your pool is named gandiroot while in my tests it was named systemroot, you can safely substitute systemroot by gandiroot everywhere in the procedure. You will still have to do a zpool import -f gandiroot to have the pool available.

@diconico07 so I tried and it didn’t work. I will try again this week-end from start and documenting it step by step.

Just for the context about the initial issue. I had a server with FreeBSD 11.2-RELEASE on one disk of 256Go. followed the normal steps to update with freebsd-update to latest 11.2-RELEAsE. Everything worked until shutdown -r now. The server (non) reboot left me with a dashboard showing CPU at 100%. And no way to ssh on the machine.
So this week-end, before doing the process outlined above, I had made a snapshot of the busted disk. (my bad for not having done that BEFORE starting the upgrade.)

Thanks for all the support so far, even if it didn’t work (yet?). This is deeply appreciated.

Hello,

What output did you have with hte command zpool import -f gandiroot ?

@aegiap

root@epave:/usr/home/admin # zpool import
  pool: gandiroot
    id: 3154313865369553295
 state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
   the '-f' flag.
  see: http://illumos.org/msg/ZFS-8000-EY
config:

   gandiroot   ONLINE
     xbd1      ONLINE

root@epave:/usr/home/admin # zpool import -f gandiroot
root@epave:/usr/home/admin # glabel status
                                     Name  Status  Components
gptid/75fe8dce-7b83-11e8-8a34-109836303c01     N/A  xbd0p1
                            gpt/gandiroot     N/A  xbd0p2
                            gpt/gandiswap     N/A  xbd25p1
                          gpt/gandiconfig     N/A  xbd25p2
gptid/de280618-337a-11ea-a132-0cc47af4349e     N/A  xbd25p2
gptid/0125513a-33b3-11ea-8dac-00163e6fc064     N/A  xbd2p1
                                gpt/disk0     N/A  xbd2p2
gptid/6b04af6d-33b3-11ea-8dac-00163e6fc064     N/A  xbd2p2
root@epave:/usr/home/admin # zpool attach gandiroot /dev/xbd1 /dev/gptid/6b04af6d-33b3-11ea-8dac-00163e6fc064
Make sure to wait until resilver is done before rebooting.

If you boot from pool 'gandiroot', you may need to update
boot code on newly attached disk '/dev/gptid/6b04af6d-33b3-11ea-8dac-00163e6fc064'.

Assuming you use GPT partitioning and 'da0' is your new boot disk
you may use the following command:

   gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 da0

root@epave:/usr/home/admin # zpool status
 pool: gandiroot
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
   continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Tue Jan 14 14:07:28 2020
   149M scanned out of 10.9G at 695K/s, 4h30m to go
       148M resilvered, 1.33% done
config:

   NAME                                            STATE     READ WRITE CKSUM
   gandiroot                                       ONLINE       0     0     0
     mirror-0                                      ONLINE       0     0     0
       xbd1                                        ONLINE       0     0     0
       gptid/6b04af6d-33b3-11ea-8dac-00163e6fc064  ONLINE       0     0     0

errors: No known data errors

root@epave:/usr/home/admin # zpool status
 pool: gandiroot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
   still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
   the pool may no longer be accessible by software that does not support
   the features. See zpool-features(7) for details.
 scan: resilvered 10.9G in 0h52m with 0 errors on Tue Jan 14 14:59:54 2020
config:

   NAME                                            STATE     READ WRITE CKSUM
   gandiroot                                       ONLINE       0     0     0
     mirror-0                                      ONLINE       0     0     0
       xbd1                                        ONLINE       0     0     0
       gptid/6b04af6d-33b3-11ea-8dac-00163e6fc064  ONLINE       0     0     0

errors: No known data errors

ok. Restarting from the beginning a last time, before giving up.

Grange repair

Let’s restart from the start

current server

Busted server (2020-01-19)

  • server nerval, Paris SD5, 1 Core, 4 GB RAM
  • 92...*, *.ghst.net
  • *, *.ghst.net
  • Volume: web01 raw (xen) boot system 256GB Storage, attached to nerval.
  • CPU load 100%

Trying to log on the console.

% ssh 92.*@console.gandi.net
92.*@console.gandi.net's password:
Asking for console, please wait
Connected

Grabbing terminal
Ok

But then I can’t do anything else, even quit it. I had to kill it manually from another terminal.

% kill -9 67728

And this doesn’t work too.

% ssh admin@92.*
ssh: connect to host 92.* port 22: Network is unreachable

Rescue mission

Stopping nerval

  • Let’s stop the server nerval with its 100% CPU load. (admin.gandi.net) It took a couple of attempts before being able to stop the server. DONE

  • Let’s detach web01. (admin.gandi.net). DONE

Create a new server

With the web UI:

  • ginkgo

  • Paris SD5

  • FreeBSD 11.2

  • volume: sys-ginkgo (20GB)

  • 1 CPU, 1 GB RAM

Stopping the server to do additional configs.

Additional config:

  • Changed the size to 256 GB

  • Made it raw (xen) boot system

So we have the same configuration than web01.

  • Restarting the server gingko

  • log on the new server gingko


ssh admin@92.*

Password for admin@ginkgo:

FreeBSD ginkgo 11.2-RELEASE FreeBSD 11.2-RELEASE #0 r335510: Fri Jun 22 04:32:14 UTC 2018 root@releng2.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64

Maybe the difference in between the instructions from @diconico07 and my case is that I need to create a new server to be able to boot anything and detach the disk from the old un-bootable server (nerval) to the newly created server (ginkgo).

On the new server ginkgo

$ glabel status
                                      Name  Status  Components
gptid/75fe8dce-7b83-11e8-8a34-109836303c01     N/A  xbd0p1
                             gpt/gandiroot     N/A  xbd0p2
                             gpt/gandiswap     N/A  xbd25p1
                           gpt/gandiconfig     N/A  xbd25p2
gptid/a96fe157-3a5b-11ea-a132-0cc47af4349e     N/A  xbd25p2

Nothing strange, but this

$ gpart show
=>       3  41943029  xbd0  GPT  (256G) [CORRUPT]
         3       137     1  freebsd-boot  (69K)
       140  41942836     2  freebsd-ufs  (20G)
  41942976        56        - free -  (28K)

=>      8  1048560  xbd25  GPT  (512M)
        8       56         - free -  (28K)
       64  1027965      1  freebsd-swap  (502M)
  1028029        3         - free -  (1.5K)
  1028032    20480      2  ms-basic-data  (10M)
  1048512       56         - free -  (28K)

I guess the resizing of the drive was not a good thing to do. grmmph.

If only I could access the old server.

Ok recovered.

gpart recover /dev/xbd0

glabel status
                                      Name  Status  Components
gptid/75fe8dce-7b83-11e8-8a34-109836303c01     N/A  xbd0p1
                             gpt/gandiroot     N/A  xbd0p2
                             gpt/gandiswap     N/A  xbd25p1
                           gpt/gandiconfig     N/A  xbd25p2
gptid/a96fe157-3a5b-11ea-a132-0cc47af4349e     N/A  xbd25p2

gpart show
=>        3  536870901  xbd0  GPT  (256G)
          3        137     1  freebsd-boot  (69K)
        140   41942836     2  freebsd-ufs  (20G)
   41942976  494927928        - free -  (236G)

=>      8  1048560  xbd25  GPT  (512M)
        8       56         - free -  (28K)
       64  1027965      1  freebsd-swap  (502M)
  1028029        3         - free -  (1.5K)
  1028032    20480      2  ms-basic-data  (10M)
  1048512       56         - free -  (28K)

Let’s attach the old drive web01 to ginkgo.

glabel status
                                      Name  Status  Components
gptid/75fe8dce-7b83-11e8-8a34-109836303c01     N/A  xbd0p1
                             gpt/gandiroot     N/A  xbd0p2
                             gpt/gandiswap     N/A  xbd25p1
                           gpt/gandiconfig     N/A  xbd25p2
gptid/a96fe157-3a5b-11ea-a132-0cc47af4349e     N/A  xbd25p2

gpart show
=>        3  536870901  xbd0  GPT  (256G)
          3        137     1  freebsd-boot  (69K)
        140   41942836     2  freebsd-ufs  (20G)
   41942976  494927928        - free -  (236G)

=>      8  1048560  xbd25  GPT  (512M)
        8       56         - free -  (28K)
       64  1027965      1  freebsd-swap  (502M)
  1028029        3         - free -  (1.5K)
  1028032    20480      2  ms-basic-data  (10M)
  1048512       56         - free -  (28K)

=>       63  536870849  xbd1  MBR  (256G)
         63  536870849        - free -  (256G)

Here we are

  • sys-ginkgo is /dev/xbd0
  • web01 is /dev/xbd1

Now that I understand a bit better the original instructions from @diconico07 are made for someone who can still login on the old server, which I can’t.

OK. End of the road. I will reconfigure from scratch the server. I will just keep the old disk around. In case I can get to its content again one day.