stega.md 7.36 KB
Newer Older
1
title: Tracking via pasted text
2
3
4
article: true
description: Plain text steganography and how it can be used against you
keywords: privacy, unicode, steganography
nervuri's avatar
nervuri committed
5
6
published: 2021-02-20
updated: 2021-05-10
nervuri's avatar
nervuri committed
7
web: https://nervuri.net/stega
8
9
10
11
12
gemini: gemini://rawtext.club/~nervuri/stega.gmi
gopher: gopher://rawtext.club/0/~nervuri/stega.txt
source: https://gitlab.com/nervuri/nervuri.net/-/blob/master/markdown/stega.md
license: CC-BY-SA
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
nervuri's avatar
nervuri committed
13
<center>
nervuri's avatar
nervuri committed
14
15
16
<div class="center">
  <img src="img/backstab.gif" alt="backstab" title="betrayed by technology" width="309" height="300">
  <!-- image source: https://www.cinejosh.com/news/3/33445/backstabbing-trending-in-current-politics.html -->
17
  <h1>$TITLE</h1>
nervuri's avatar
nervuri committed
18
19
  <p><i>Plain text steganography and how it can be used against you</i></p>
</div>
nervuri's avatar
nervuri committed
20
21
</center>

nervuri's avatar
nervuri committed
22
[Zero-width characters](https://en.wikipedia.org/wiki/Zero-width_space) can be used to embed hidden information inside of plain text.  This is of primary concern to journalists and their sources, but it can affect anyone browsing the Internet.  For example, a page can be dynamically generated server-side to include, between every few words:
nervuri's avatar
nervuri committed
23

nervuri's avatar
nervuri committed
24
* your username / [certificate ID](https://en.wikipedia.org/wiki/Client_certificate) (if logged in)
nervuri's avatar
nervuri committed
25
26
27
28
29
30
31
32
33
* your IP address
* the current timestamp

By copying text from the page and pasting it somewhere public, you would be revealing this information to anyone who knew how to look for it.  Details and demo in this article:

[Be careful what you copy: Invisibly inserting usernames into text with Zero-Width Characters (Tim Ross, 2018)](https://medium.com/@umpox/be-careful-what-you-copy-invisibly-inserting-usernames-into-text-with-zero-width-characters-18b4e6f17b66)

To check if your browser displays zero-width characters, open:

34
* **[Zero-width character test](zero-width)**
nervuri's avatar
nervuri committed
35
36
37
38
39
40

Other plain text watermarking techniques / canary traps are explained on Zach Aysan's blog:

* [Zero-Width Characters: Invisibly fingerprinting text (2017)](https://www.zachaysan.com/writing/2017-12-30-zero-width-characters)
* [Text Fingerprinting Update: Stories and ideas from readers (2018)](https://www.zachaysan.com/writing/2018-01-01-fingerprinting-update)

nervuri's avatar
nervuri committed
41
To fingerprint text, server software would only need to encode a hidden number inside it, repeated between every few words, matching a log entry that contains information about the visitor (username, IP address, cookie, browser details, referrer link, timestamp).  For easily finding pasted excerpts online, the software could similarly hide a unique page-specific identifier within the text, that can later be put into search engines.
nervuri's avatar
nervuri committed
42

nervuri's avatar
nervuri committed
43
To achieve this, aside from zero-width characters, the software could use some of the other techniques described by Zach Aysan: *"differences in dashes (en, em, and hyphens), quotes (straight vs curly), word spelling (color vs colour), and the number of spaces after sentence endings"*, different [types of spaces](https://www.jkorpela.fi/chars/spaces.html), [homoglyphs](https://en.wikipedia.org/wiki/Homoglyph) (a vs а), diacritic forms (ț vs ţ), ligatures (fi vs fi, Ⅳ vs IV, ½ vs 1/2), as well as inserting hard to detect typos into the text.
nervuri's avatar
nervuri committed
44
45
46
47


## Solutions

nervuri's avatar
nervuri committed
48
A partial solution is to convert the text to [ASCII](https://en.wikipedia.org/wiki/ASCII), if language allows.  There are also tools such as:
nervuri's avatar
nervuri committed
49

nervuri's avatar
nervuri committed
50
51
* [Less (CLI)](https://www.greenwoodsoftware.com/less/) - displays zero-width characters when used with the "-U" option.
* [SafeText (CLI)](https://github.com/DavidJacobson/SafeText) - also detects some homoglyphs.  It started out well, but development has stopped; in its current state, there are many problematic characters that it does not detect - see [issues](https://github.com/DavidJacobson/SafeText/issues).
nervuri's avatar
nervuri committed
52
* Several browser extensions that detect **a few** zero-width characters.
nervuri's avatar
nervuri committed
53
54
55

However, they don't protect against the more sophisticated versions of this hack.  A more complete tool would have to include not just a list of forbidden/allowed characters, but also a a spellchecker and a way to detect trailing whitespace - an x-ray mode that might be triggered when dubious text is detected in the clipboard.  And not just text, image-based steganography can be used in a similar way.  A technical solution might never be perfect, but it could cover the vast majority of cases.

nervuri's avatar
nervuri committed
56
An almost perfect non-technical solution is to retype the text.  You can also try downloading the page twice from different accounts / IP addresses and [diff](https://en.wikipedia.org/wiki/Diff) the two versions, or check if the hashes match.  Another solution is to take a screenshot of the text and run it through [OCR](https://en.wikipedia.org/wiki/Optical_character_recognition) software.
nervuri's avatar
nervuri committed
57
58
59
60
61
62
63
64


## Tools for text steganography

* [StegCloak](https://github.com/KuroLabs/stegcloak)
* [Spam Mimic](https://www.spammimic.com/) (see Encode -> Alternate encodings)
* [zwfp](https://github.com/vedhavyas/zwfp)
* [SNOW](http://www.darkside.com.au/snow/)
nervuri's avatar
nervuri committed
65
* [WORDLISTTEXTSTEGANOGRAPHY & EMAILSTEGANO](https://web.archive.org/web/20180217185500/http://mok-kong-shen.de:80/)
nervuri's avatar
nervuri committed
66
* [inØsight — Zero Width Obfuscation](https://git.planetrenox.com/inzerosight/browser-extension) (extension for Firefox and Chromium)
nervuri's avatar
nervuri committed
67
68
69
* [Zero Width Shortener](https://zws.im/) - Shorten URLs using invisible spaces

[Unicode character search](https://www.fileformat.info/info/unicode/char/search.htm)
nervuri's avatar
nervuri committed
70
71
72
73


## Further reading

nervuri's avatar
nervuri committed
74
75
### Text steganography

nervuri's avatar
nervuri committed
76
* [Text based steganography (Robert Lockwood and Kevin Curran, 2017)](https://www.researchgate.net/publication/321844767_Text_based_steganography)
nervuri's avatar
nervuri committed
77
* [Text Steganography with Multi level Shielding (Sharon Rose Govada et al., 2012)](https://www.ijcsi.org/papers/IJCSI-9-4-3-401-405.pdf)
nervuri's avatar
nervuri committed
78
79
80
81
* [Any efficient text-based steganographic schemes? (crypto.stackexchange.com)](https://crypto.stackexchange.com/questions/6058/any-efficient-text-based-steganographic-schemes)
* [Steganography to hide text within text (security.stackexchange.com)](https://security.stackexchange.com/questions/20414/steganography-to-hide-text-within-text)
* [Chaffing and winnowing (Wikipedia)](https://en.wikipedia.org/wiki/Chaffing_and_winnowing)

nervuri's avatar
nervuri committed
82
83
84
85
86
87
88
89
### Control characters

* [Zero-width space (Wikipedia)](https://en.wikipedia.org/wiki/Zero-width_space)
* [Article explaining the role of a few zero-width characters](https://www.ptiglobal.com/2018/04/26/the-beauty-of-unicode-zero-width-characters/)
* [Partial list of Unicode spaces](https://www.jkorpela.fi/chars/spaces.html)
* [Unicode control characters (Wikipedia)](https://en.wikipedia.org/wiki/Unicode_control_characters)
* [Tags (Unicode block) (Wikipedia)](https://en.wikipedia.org/wiki/Tags_(Unicode_block))
* [Unicode Character Database](https://www.unicode.org/Public/UCD/latest/)
nervuri's avatar
nervuri committed
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106

### Homoglyphs

* [Homoglyph (Wikipedia)](https://en.wikipedia.org/wiki/Homoglyph)
* [Confusable detection](https://www.unicode.org/reports/tr39/#Confusable_Detection)
* [confusables.txt](https://unicode.org/Public/security/latest/confusables.txt)

### NFKC normalisation

* ["Apply NFKC normalisation" - SafeText issue](https://github.com/DavidJacobson/SafeText/issues/1)
* [Unicode Normalization FAQ](https://www.unicode.org/faq/normalization.html)
* [Unicode Normalization Forms](https://unicode.org/reports/tr15/)

### Unicode security considerations

* [Unicode Security Issues FAQ](https://www.unicode.org/faq/security.html)
* [Unicode Security Considerations - Technical Report](https://www.unicode.org/reports/tr36/)