Commit 118cef1e authored by Winston Weinert's avatar Winston Weinert 💬
Browse files

new blog post about new computer

parent 86976f2a
Pipeline #742438487 passed with stages
in 58 seconds
This diff is collapsed.
This source diff could not be displayed because it is stored in LFS. You can view the blob instead.
+++
title = "New computer checklist"
author = ["Winston (winny) Weinert"]
date = 2023-01-09T00:00:00-06:00
draft = false
cover = "computer_hello.png"
+++
{{< figure src="/ox-hugo/computer_hello.png" >}}
Here's a small outline of how I validate used computers as "usable" and "in
working condition". My hope is these steps help computer users spot "lemons" -
machines that shouldn't be depended on because they don't work all the time.
## Basics {#basics}
Before stress testing or examining SMART data, consider the following checklist:
1. Turn it on and ensure you can access the firmware settings/BIOS. `F2` and
`Delete` seem like the most common keys.
2. Reboot the machine a couple times (tip: `Control`-`Alt`-`Delete` reboots
your computer when an OS isn't loaded). Verify the machine POSTs every time
(i.e. tries to load the OS).
3. Verify output and input devices. Maybe not the most important because
you'll likely notice any failings ("Hey the screen doesn't turn on!").
These steps can save you some time when moving on to the more advanced
steps.
1. Play some music/audio if there's audio output.
2. Verify network interfaces establish a link and can be used for network access.
3. Verify display shows a picture (If you haven't already)
4. Verify keyboard, mouse, trackpad input works
## Stress tests {#stress-tests}
By pushing your gear close, but not to the engineering limits of the hardware,
you can verify it won't fail under load. Most of these steps are optional,
depending on how reliable you need this machine to be. If it's just a
commodity machine being used to browse Facebook, it might not be necessary.
The user will likely complain to you if there's issues with their computer. If
I were putting a machine into my personal infrastructure as a server or router,
I'd do all the steps. My rule of thumb, if you think these tests are damaging
your gear, it needs to be tuned (to generate less load, therefore less heat) or
replaced.
### But first, know the engineered limits! {#but-first-know-the-engineered-limits}
Make sure you look up the datasheets for each component that you are planning
to run a thermal load test against. In particular look for the max
temperature that component is designed for. Make a note and ensure none of
the tests come close to these engineered limits.
Here's a couple websites that offer specification sheets for popular CPUs and
GPUs:
- Intel products on [Intel Ark](https://ark.intel.com/content/www/us/en/ark.html)
- AMD products can be found [via the search on their website](https://www.amd.com/en/)
- Nvidia GPUs can be found [here](https://www.nvidia.com/en-us/geforce/graphics-cards/)
Let's take my laptop. It has Intel i3-1115G4. According to [Intel Ark](https://ark.intel.com/content/www/us/en/ark/products/208652/intel-core-i31115g4-processor-6m-cache-up-to-4-10-ghz.html) the max
temperature allowed on the processor die is 100 C. On the other hand, looking
at my old [AMD FX-8350 on AMD's website](https://www.amd.com/en/products/cpu/fx-8350), it must not exceed 61 C. This
datapoint matters because exceeding it will likely damage your hardware.
### Run a memory test {#run-a-memory-test}
{{< figure src="/ox-hugo/mt86plus-ddr5.png" caption="<span class=\"figure-number\">Figure 1: </span>memtest86+ running. [Source](https://www.memtest.org/)." >}}
Download [memtest86+](https://www.memtest.org/) then write it to a USB device. If you have any sort of
Linux live media, chances are it also includes a copy of memtest86+ as well.
Personally, I just boot memtest86+ off of [GRML](https://grml.org/). Another way: Debian &amp; its
derivatives, NixOS both offer a package that installs memtest86+ into your
bootloader menu. You could then select the memtest86+ boot option on next
reboot.
Bad RAM is fairly common to encounter out in the wild. I
highly recommend this step because issues caused by bad RAM manifest in unique
ways on each specific computer. Troubleshooting bad RAM issues in production
can be difficult to impossible _("It just doesn't_ _work, send the machine in
for repair.")_. This step can take 1-6 hours.
### Run a CPU stress test {#run-a-cpu-stress-test}
{{< figure src="/ox-hugo/stress-ng.png" caption="<span class=\"figure-number\">Figure 2: </span>[Image credit Snapcraft](https://snapcraft.io/install/stress-ng/ubuntu)" >}}
Boot a Linux environment then run `stress-ng`. Try `stress-ng --cpu 0` for
starters. Specify a timeout using `--timeout seconds`. Let this run for a
couple hours, maybe a day. Use netdata or some other monitoring tool with
graphs (over time). Verify the machine cools itself and sounds quiet enough
under load. If that is not the case, consider throttling the CPU via
`cpufreq-set`.
For more examples and advanced usage, be sure to check out the `stress-ng`
article on the [Ubuntu Wiki](https://wiki.ubuntu.com/Kernel/Reference/stress-ng) and the [Red Hat Linux documentation](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/8/html/optimizing_rhel_8_for_real_time_for_low_latency_operation/assembly_stress-testing-real-time-systems-with-stress-ng_optimizing-rhel8-for-real-time-for-low-latency-operation) for
stress-testing utilizing this tool.
### Bonus: GPU stress test {#bonus-gpu-stress-test}
{{< figure src="/ox-hugo/furmark-1-18-gigabyte-gtx1080-xtreme-gaming-stress-test.jpg" caption="<span class=\"figure-number\">Figure 3: </span>furmark - [Image Credit](https://geeks3d.com/furmark/gallery/)" >}}
If you have a GPU, consider a GPU stress test. [Furmark](https://geeks3d.com/furmark/) seems to be the most
demanding (A good thing). I usually skip this step unless I'm having
stability issues. Most GPUs will cool fine as long as you have _some_ air
flow in your case. _Pro tip, check out [hwinfo64](https://www.hwinfo.com/download/) as a sensor monitoring tool
to complement Windows stress tests._
I'm not sure what to suggest for Linux users. Maybe run a hundred glxgears or
something.
### Disk benchmark {#disk-benchmark}
Consider running `fio` or some other disk benchmark. I've been using this
oneliner. Simply change directory to a filesystem on whichever disk you wish
to stress test (`cd your-directory`), then run the command.
```sh
sudo fio \
--randrepeat=1 \
--ioengine=libaio \
--direct=1 \
--gtod_reduce=1 \
--name=test \
--filename=random_read_write.fio \
--bs=4k \
--iodepth=64 \
--size=1G \
--readwrite=randrw \
--rwmixread=75 \
--runtime=3m
```
Regardless, make sure to check the [SMART](https://en.wikipedia.org/wiki/Self-Monitoring,_Analysis_and_Reporting_Technology) data of your storage devices. You can run `smartctl` against all your devices with this
one-liner:
```sh
lsblk --json |
jq -r '.blockdevices[].name' |
xargs -I{} sudo smartctl -x /dev/{}
```
Look
for reallocated sector counts and other "Pre-failure" data points. At the bare
minimum, look for `SMART overall-health self-assessment test result:
PASSED`[^fn:1] In the case the SMART data does not mean anything, and you're
unsure of it, [CrystalDiskInfo](https://crystalmark.info/en/software/crystaldiskinfo/) on Windows provides a user friendly way to
view the same information.[^fn:2]
## That's it {#that-s-it}
If anything is taken away from this short article, I hope folks start running a
memory test every time they get a new PC or RAM. Bonus, maybe somebody saves a
bunch of trouble by following the steps before trusting their hardware with
workloads. Consumer-directed hardware testing mitigates _a ton_ of confusion
and frustration. If you can't push your hardware to close to its specified
limits, it's not good, viable hardware.
### See Also {#see-also}
- [ArchWiki - Benchmarking](https://wiki.archlinux.org/title/Benchmarking)
- [ArchWiki - Stress testing](https://wiki.archlinux.org/title/Stress_testing)
- Google "Linux Stress Test"
[^fn:1]: [Extended discussion &amp; source of this 'SMART' tip.](https://serverfault.com/questions/419007/understanding-smartctl-a-output)
[^fn:2]: Don't worry, no anime waifus in CrystalDiskInfo "Standard Edition".
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment