...
 
Commits (3)
<h1 align="center">Welcome to utfu 👋</h1>
<p>
<img alt="Version" src="https://img.shields.io/badge/version-0.1.8-blue.svg?cacheSeconds=2592000" />
<img alt="Version" src="https://img.shields.io/badge/version-0.2.1-blue.svg?cacheSeconds=2592000" />
<a href="#" target="_blank">
<img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-yellow.svg" />
</a>
......@@ -9,7 +9,7 @@
</a>
</p>
> Attempts to fix busted character encodings carried over from legacy text formats. This is a work-in-progress and not yet ready for production use.
> Replaces busted characters carried over from legacy text encodings with the proper UTF-8 character.
## Install
......@@ -19,16 +19,27 @@ yarn add utfu || npm install utfu
## Usage
Pass a string to either method, `hex` or `txt`. The former tries to do a regex search and replace for unicode chars. The latter tries to do a search and replace for characters in their typical misrendering (see chart [here](https://www.i18nqa.com/debug/utf8-debug.html)).
Say you've got a string that looks like this:
`There's no way I'm paying €30 for that!`
Pass it to either method, `hex`, `txt`, or `htx` and you'll hopefully get back:
`There's no way I'm paying €30 for that!`
`hex` substitutes unicode hex values (ie., `\u20ac`), which is useful in some contexts. `txt` substitutes the actual character (ie., `€`). And `htx` substitutes the HTML escape sequence (ie., `&#x20AC;`). See chart [here](https://www.i18nqa.com/debug/utf8-debug.html) for mappings.
```javascript
import utfu from 'utfu'
import { hex, txt, htx } from 'utfu'
const dirtyText = 'On a certain level, it�s like shouting �fire� in a crowded theater.'
const cleanText = utfu.hex(dirtyText) || utfu.txt(dirtyText)
const cleanText = hex(dirtyText) || txt(dirtyText)
// --> 'On a certain level, it’s like shouting “fire” in a crowded theater.'
const cleanHTML = htx(dirtyText)
// --> 'On a certain level, it&#x2019;’s like shouting &#x201C;fire&#x201D; in a crowded theater.'
```
## Run tests
......@@ -39,7 +50,7 @@ yarn run test
## Author
👤 **Daniel Sieradski <daniel@self.agency>**
👤 **Daniel Sieradski <hello@self.agency>**
- Website: [self.agency](https://self.agency)
- Twitter: [@selfagency_llc](https://twitter.com/selfagency_llc)
......
{
"name": "utfu",
"version": "0.2.0",
"version": "0.2.1",
"@pika/pack": {
"pipeline": [
[
......@@ -22,7 +22,17 @@
]
]
},
"description": "Attempts to fix busted character encodings carried over from legacy text formats.",
"description": "Replaces busted characters carried over from legacy text encodings with the proper UTF-8 character.",
"keywords": [
"utf",
"utf-8",
"unicode",
"windows-1252",
"fix",
"replace",
"convert",
"characters"
],
"repository": "https://gitlab.com/selfagency/utfu.git",
"author": "Daniel Sieradski <[email protected]>",
"license": "MIT",
......@@ -45,12 +55,16 @@
"eslint-plugin-promise": "^4.2.1",
"eslint-plugin-security": "^1.4.0",
"eslint-plugin-standard": "^4.0.1",
"iconv": "^3.0.0",
"jest": "^25.4.0",
"prettier": "^2.0.5",
"typescript": "^3.8.3"
},
"scripts": {
"test": "jest"
},
"dependencies": {
"escape-unicode": "^0.2.0",
"he": "^1.2.0",
"windows-1252": "^1.0.0"
}
}
<h1 align="center">Welcome to utfu 👋</h1>
<p>
<img alt="Version" src="https://img.shields.io/badge/version-0.1.8-blue.svg?cacheSeconds=2592000" />
<img alt="Version" src="https://img.shields.io/badge/version-0.2.1-blue.svg?cacheSeconds=2592000" />
<a href="#" target="_blank">
<img alt="License: MIT" src="https://img.shields.io/badge/License-MIT-yellow.svg" />
</a>
......@@ -9,7 +9,7 @@
</a>
</p>
> Attempts to fix busted character encodings carried over from legacy text formats. This is a work-in-progress and not yet ready for production use.
> Replaces busted characters carried over from legacy text encodings with the proper UTF-8 character.
## Install
......@@ -19,16 +19,27 @@ yarn add utfu || npm install utfu
## Usage
Pass a string to either method, `hex` or `txt`. The former tries to do a regex search and replace for unicode chars. The latter tries to do a search and replace for characters in their typical misrendering (see chart [here](https://www.i18nqa.com/debug/utf8-debug.html)).
Say you've got a string that looks like this:
`There's no way I'm paying €30 for that!`
Pass it to either method, `hex`, `txt`, or `htx` and you'll hopefully get back:
`There's no way I'm paying €30 for that!`
`hex` substitutes unicode hex values (ie., `\u20ac`), which is useful in some contexts. `txt` substitutes the actual character (ie., `€`). And `htx` substitutes the HTML escape sequence (ie., `&#x20AC;`). See chart [here](https://www.i18nqa.com/debug/utf8-debug.html) for mappings.
```javascript
import utfu from 'utfu'
import { hex, txt, htx } from 'utfu'
const dirtyText = 'On a certain level, it�s like shouting �fire� in a crowded theater.'
const cleanText = utfu.hex(dirtyText) || utfu.txt(dirtyText)
const cleanText = hex(dirtyText) || txt(dirtyText)
// --> 'On a certain level, it’s like shouting “fire” in a crowded theater.'
const cleanHTML = htx(dirtyText)
// --> 'On a certain level, it&#x2019;’s like shouting &#x201C;fire&#x201D; in a crowded theater.'
```
## Run tests
......@@ -39,7 +50,7 @@ yarn run test
## Author
👤 **Daniel Sieradski <daniel@self.agency>**
👤 **Daniel Sieradski <hello@self.agency>**
- Website: [self.agency](https://self.agency)
- Twitter: [@selfagency_llc](https://twitter.com/selfagency_llc)
......
This diff is collapsed.
This diff is collapsed.
import { solo, duo, trio } from "./mappings.js";
const he = require('he');
const win = require('windows-1252');
const mappings = require('./mappings');
const hex = str => {
if (typeof str !== 'string') throw new Error('utfu requires a string to process');
trio.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.hex);
});
duo.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.hex);
});
solo.forEach(mapping => {
str = win.decode(win.encode(str));
mappings.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.hex);
});
return str;
......@@ -16,16 +15,24 @@ const hex = str => {
const txt = str => {
if (typeof str !== 'string') throw new Error('utfu requires a string to process');
trio.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.char);
str = win.decode(win.encode(str));
mappings.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.chars);
});
duo.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.char);
});
solo.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.char);
return str;
};
const htx = str => {
if (typeof str !== 'string') throw new Error('utfu requires a string to process');
str = win.decode(win.encode(str));
mappings.forEach(mapping => {
str = str.replace(mapping.misrender.regex, he.encode(mapping.utf8.chars));
});
return str;
};
export { hex, txt };
\ No newline at end of file
module.exports = {
hex,
txt,
htx
};
\ No newline at end of file
This diff is collapsed.
export function hex(str: any): string;
export function txt(str: any): string;
export function hex(str: any): any;
export function txt(str: any): any;
export function htx(str: any): any;
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
{
"name": "utfu",
"description": "Attempts to fix busted character encodings carried over from legacy text formats.",
"version": "0.2.0",
"description": "Replaces busted characters carried over from legacy text encodings with the proper UTF-8 character.",
"version": "0.2.1",
"license": "MIT",
"files": [
"dist-*/",
......@@ -9,8 +9,22 @@
],
"pika": true,
"sideEffects": false,
"keywords": [
"utf",
"utf-8",
"unicode",
"windows-1252",
"fix",
"replace",
"convert",
"characters"
],
"repository": "https://gitlab.com/selfagency/utfu.git",
"dependencies": {},
"dependencies": {
"escape-unicode": "^0.2.0",
"he": "^1.2.0",
"windows-1252": "^1.0.0"
},
"devDependencies": {
"@babel/core": "^7.9.0",
"@babel/preset-env": "^7.9.5",
......@@ -29,7 +43,6 @@
"eslint-plugin-promise": "^4.2.1",
"eslint-plugin-security": "^1.4.0",
"eslint-plugin-standard": "^4.0.1",
"iconv": "^3.0.0",
"jest": "^25.4.0",
"prettier": "^2.0.5",
"typescript": "^3.8.3"
......
import { solo, duo, trio } from './mappings'
const he = require('he')
const win = require('windows-1252')
const mappings = require('./mappings')
const hex = str => {
if (typeof str !== 'string') throw new Error('utfu requires a string to process')
trio.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.hex)
})
duo.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.hex)
})
solo.forEach(mapping => {
str = win.decode(win.encode(str))
mappings.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.hex)
})
return str
......@@ -16,16 +14,20 @@ const hex = str => {
const txt = str => {
if (typeof str !== 'string') throw new Error('utfu requires a string to process')
trio.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.char)
str = win.decode(win.encode(str))
mappings.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.chars)
})
duo.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.char)
})
solo.forEach(mapping => {
str = str.replace(mapping.misrender.regex, mapping.utf8.char)
return str
}
const htx = str => {
if (typeof str !== 'string') throw new Error('utfu requires a string to process')
str = win.decode(win.encode(str))
mappings.forEach(mapping => {
str = str.replace(mapping.misrender.regex, he.encode(mapping.utf8.chars))
})
return str
}
export { hex, txt }
module.exports = { hex, txt, htx }
This diff is collapsed.
import { hex } from '../src/index.js'
import mappings from '../src/mappings'
mappings.forEach(mapping => {
const str = mapping.misrender.chars
test(`hex: replace ${str} with ${mapping.utf8.hex}`, () => {
expect(hex(str)).toBe(mapping.utf8.hex)
})
})
import he from 'he'
import { htx } from '../src/index.js'
import mappings from '../src/mappings'
mappings.forEach(mapping => {
const str = mapping.misrender.chars
const html = he.encode(mapping.utf8.chars)
test(`htx: replace ${str} with ${html}`, () => {
expect(htx(str)).toBe(html)
})
})
import { hex, txt } from '../src/index.js'
import { solo, duo, trio } from '../src/mappings'
trio.forEach(mapping => {
const str = mapping.misrender.chars
test(`replace ${str} with ${mapping.utf8.char}`, () => {
expect(txt(str)).toBe(mapping.utf8.char)
})
test(`replace ${str} with ${mapping.utf8.hex}`, () => {
expect(hex(str)).toBe(mapping.utf8.hex)
})
})
duo.forEach(mapping => {
const str = mapping.misrender.chars
test(`replace ${str} with ${mapping.utf8.char}`, () => {
expect(txt(str)).toBe(mapping.utf8.char)
})
test(`replace ${str} with ${mapping.utf8.hex}`, () => {
expect(hex(str)).toBe(mapping.utf8.hex)
})
})
import { txt } from '../src/index.js'
import mappings from '../src/mappings'
mappings.forEach(mapping => {
const str = mapping.misrender.chars
test(`txt: replace ${str} with ${mapping.utf8.chars}`, () => {
expect(txt(str)).toBe(mapping.utf8.chars)
})
})
......@@ -2293,6 +2293,11 @@ [email protected]^2.0.0:
resolved "https://registry.yarnpkg.com/escape-string-regexp/-/escape-string-regexp-2.0.0.tgz#a30304e99daa32e23b2fd20f51babd07cffca344"
integrity sha512-UpzcLCXolUWcNu5HtVMHYdXJjArjsF9C0aNnquZYY4uW/Vu0miy5YoWvbV345HauVvcAUnpRuhMMcqTcGOY2+w==
[email protected]^0.2.0:
version "0.2.0"
resolved "https://registry.yarnpkg.com/escape-unicode/-/escape-unicode-0.2.0.tgz#8c33e161a541ba944b75d93f79f15c8a7f419ed9"
integrity sha512-7jMQuKb8nm0h/9HYLfu4NCLFwoUsd5XO6OZ1z86PbKcMf8zDK1m7nFR0iA2CCShq4TSValaLIveE8T1UBxgALQ==
[email protected]^1.11.1:
version "1.14.1"
resolved "https://registry.yarnpkg.com/escodegen/-/escodegen-1.14.1.tgz#ba01d0c8278b5e95a9a45350142026659027a457"
......@@ -3018,6 +3023,11 @@ [email protected]^1.0.3:
dependencies:
function-bind "^1.1.1"
[email protected]^1.2.0:
version "1.2.0"
resolved "https://registry.yarnpkg.com/he/-/he-1.2.0.tgz#84ae65fa7eafb165fddb61566ae14baf05664f0f"
integrity sha512-F/1DnUGPopORZi0ni+CvrCgHQ5FyEAHRLSApuYWMmrbSwoN2Mn/7k+Gl38gJnR7yyDZk6WLXwiGod1JOWNDKGw==
[email protected]^2.1.4:
version "2.8.8"
resolved "https://registry.yarnpkg.com/hosted-git-info/-/hosted-git-info-2.8.8.tgz#7539bd4bc1e0e0a895815a2e0262420b12858488"
......@@ -3068,11 +3078,6 @@ [email protected], [email protected]^0.4.24:
dependencies:
safer-buffer ">= 2.1.2 < 3"
[email protected]^3.0.0:
version "3.0.0"
resolved "https://registry.yarnpkg.com/iconv/-/iconv-3.0.0.tgz#9a293ec123b16b4717e450714ddbb07f985b7d9c"
integrity sha512-bKTEP55J/e+UutBE3BDBWq6KukPWh3GBYCZGbLEY9vxRDUU2F3bqvPsp/a/DEdIamgF2MvW5lF0Rj1U/7KRL+g==
[email protected]^4.0.6:
version "4.0.6"
resolved "https://registry.yarnpkg.com/ignore/-/ignore-4.0.6.tgz#750e3db5862087b4737ebac8207ffd1ef27b25fc"
......@@ -6362,6 +6367,11 @@ [email protected]^2.0.0:
dependencies:
string-width "^2.1.1"
[email protected]^1.0.0:
version "1.0.0"
resolved "https://registry.yarnpkg.com/windows-1252/-/windows-1252-1.0.0.tgz#721d9467d4da172d2b8dd2a56094b04355fe5976"
integrity sha1-ch2UZ9TaFy0rjdKlYJSwQ1X+WXY=
[email protected]~1.2.3:
version "1.2.3"
resolved "https://registry.yarnpkg.com/word-wrap/-/word-wrap-1.2.3.tgz#610636f6b1f703891bd34771ccb17fb93b47079c"
......