2019-08-06 - SHARD 3 Outage

Status - Down

Overview

As part of the recent rolling restart a defect was introduced that has caused SHARD 3 to be corrupted.

This impacts SHARD 3 Node Operators only.

Target Resolution - 10:20pm Aug 6, 2019

Impact

SHARD 3 is not creating blocks and hence Foundational Nodes on Shard 3 are unable to earn rewards.

Root Cause

A wrong version of node binary was deployed onto Harmony-operated nodes (someone had uploaded their local development version to the official bucket we use for internal node deployments), and an insufficient check in the block syncing code path admitted those invalid blocks. We owe you a postmortem on this one over the next few days.

Action Required - SHARD3 Node Operators Only

When notified by the Harmony Team Please complete the following steps

Stop your existing Node

  • Log in to your AWS instance

  • Stop your node

  • Update the harmony software

  • Start your node

Following is a sample set of commands please change all <<items>> to your instance parameters

ssh -i oregon-key-benchmark.pem ec2-user@<<AWS-INSTANCE>> // Attach to your AWS Instance
tmux attach // Attach to the running tmux session
<CTRL><C> // Kill the running node.sh
sudo pkill harmony // Kill the running harmony process
sudo ./node.sh // Restart the instance

Remediation Log (all times are in Pacific Daylight Time)

Additional information will be provided by the end of the day of Aug 9th.