New computer checklist

Here’s a small outline of how I validate used computers as “usable” and “in working condition”. My hope is these steps help computer users spot “lemons” - machines that shouldn’t be depended on because they don’t work all the time.

§Basics

Before stress testing or examining SMART data, consider the following checklist:

Turn it on and ensure you can access the firmware settings/BIOS. F2 and Delete seem like the most common keys.
Reboot the machine a couple times (tip: Control-Alt-Delete reboots your computer when an OS isn’t loaded). Verify the machine POSTs every time (i.e. tries to load the OS).
Verify output and input devices. Maybe not the most important because you’ll likely notice any failings (“Hey the screen doesn’t turn on!”). These steps can save you some time when moving on to the more advanced steps.
1. Play some music/audio if there’s audio output.
2. Verify network interfaces establish a link and can be used for network access.
3. Verify display shows a picture (If you haven’t already)
4. Verify keyboard, mouse, trackpad input works

§Check storage SMART data

Make sure to check the SMART data of your storage devices. Example:

smartctl -x /dev/sda

You can run smartctl against all your devices with this one-liner (be sure to install jq first!):

lsblk --json |
    jq -r '.blockdevices[].name' |
    xargs -I{} sudo smartctl -x /dev/{}

Look for reallocated sector counts and other “Pre-failure” data points. At the bare minimum, look for SMART overall-health self-assessment test result: PASSED¹ In the case the SMART data does not mean anything, and you’re unsure of it, CrystalDiskInfo on Windows provides a user friendly way to view the same information.²

§Stress tests

By pushing your gear close, but not to the engineering limits of the hardware, you can verify it won’t fail under load. Most of these steps are optional, depending on how reliable you need this machine to be. If it’s just a commodity machine being used to browse Facebook, it might not be necessary. The user will likely complain to you if there’s issues with their computer. If I were putting a machine into my personal infrastructure as a server or router, I’d do all the steps. My rule of thumb, if you think these tests are damaging your gear, it needs to be tuned (to generate less load, therefore less heat) or replaced.

§But first, know the engineered limits!

Make sure you look up the datasheets for each component that you are planning to run a thermal load test against. In particular look for the max temperature that component is designed for. Make a note and ensure none of the tests come close to these engineered limits.

Here’s a couple websites that offer specification sheets for popular CPUs and GPUs:

Intel products on Intel Ark
AMD products can be found via the search on their website
Nvidia GPUs can be found here

Let’s take my laptop. It has Intel i3-1115G4. According to Intel Ark the max temperature allowed on the processor die is 100 C. On the other hand, looking at my old AMD FX-8350 on AMD’s website, it must not exceed 61 C. This datapoint matters because exceeding it will likely damage your hardware.

§Run a memory test

Download memtest86+ then write it to a USB device. If you have any sort of Linux live media, chances are it also includes a copy of memtest86+ as well. Personally, I just boot memtest86+ off of GRML. Another way: Debian & its derivatives, NixOS both offer a package that installs memtest86+ into your bootloader menu. You could then select the memtest86+ boot option on next reboot.

Bad RAM is fairly common to encounter out in the wild. I highly recommend this step because issues caused by bad RAM manifest in unique ways on each specific computer. Troubleshooting bad RAM issues in production can be difficult to impossible (“It just doesn’t work, send the machine in for repair.”). This step can take 1-6 hours.

§Run a CPU stress test

Boot a Linux environment then run stress-ng. Try stress-ng --cpu 0 for starters. Specify a timeout using --timeout seconds. Let this run for a couple hours, maybe a day. Use netdata or some other monitoring tool with graphs (over time). Verify the machine cools itself and sounds quiet enough under load. If that is not the case, consider throttling the CPU via cpufreq-set.

For more examples and advanced usage, be sure to check out the stress-ng article on the Ubuntu Wiki and the Red Hat Linux documentation for stress-testing utilizing this tool.

§Bonus: GPU stress test

If you have a GPU, consider a GPU stress test. Furmark seems to be the most demanding (A good thing). I usually skip this step unless I’m having stability issues. Most GPUs will cool fine as long as you have some air flow in your case. Pro tip, check out hwinfo64 as a sensor monitoring tool to complement Windows stress tests.

I’m not sure what to suggest for Linux users. Maybe run a hundred glxgears or something.

§Disk benchmark

Consider running fio or some other disk benchmark. I’ve been using this oneliner. Simply change directory to a filesystem on whichever disk you wish to stress test (cd your-directory), then run the command.

sudo fio \
        --randrepeat=1 \
        --ioengine=libaio \
        --direct=1 \
        --gtod_reduce=1 \
        --name=test \
        --filename=random_read_write.fio \
        --bs=4k \
        --iodepth=64 \
        --size=1G \
        --readwrite=randrw \
        --rwmixread=75 \
        --runtime=3m

§That’s it

If anything is taken away from this short article, I hope folks start running a memory test every time they get a new PC or RAM. Bonus, maybe somebody saves a bunch of trouble by following the steps before trusting their hardware with workloads. Consumer-directed hardware testing mitigates a ton of confusion and frustration. If you can’t push your hardware to close to its specified limits, it’s not good, viable hardware.

§See Also

ArchWiki - Benchmarking
ArchWiki - Stress testing
Google “Linux Stress Test”

this cannot be blank ↩︎
I know it’s not a “scheme flavor”. If you were to ask anyone who isn’t a Racketeer they’d say it’s a scheme, however, so that’s the perception I’m going with :). ↩︎