Discussion:
What are these secret characters called?
(too old to reply)
Dan Purgert
2021-04-05 19:22:40 UTC
Permalink
If on Windows 10 I copy from any Macrumors web page directly into gVim any
sentence containing the words "iPad" or "iPhone" I always get a strange
question mark indicating some kind of secret hidden character both before
and after the word as in "?iPad?" and "?iPhone?"
https://www.macrumors.com/2021/04/05/apple-ipad-select-mac-iphone-11-trade-in-value/

What are these secret characters called?
Do you know of a better program to identify such hidden secret characters?




-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEBcqaUD8uEzVNxUrujhHd8xJ5ooEFAl5nnnYACgkQjhHd8xJ5
ooFFQgf/Wd/67GaYp/s/5FtSCJl83OCTllraWmP6piWqk4qRe9NQnHswlNs8jWch
CtGPGLndzYGyR27n/mCiHcINNFPgLqfOtryU8Hgm7MAh9M9sEwFZpvFnOaM3k/IF
zaAWrJAV631vHFpWiLrM93EzCBHakHJvFyjtqv3fBH+rfjaCokGo8y65lmYSIqCD
s5gNd7lBjnMQ33jX6aH5W6b4mJbX4BHmw8KKwg5M6kbYrSrDIhrN1r368Wcu711X
qnKAiKD/RP0IHRMV7b/HUGBgoXt1Lo+y8+lknW41gau3JAKNPCz/UFGGyIt+qDXO
Qr1Bxgc82+hRTTrOlzkZbN27ftn3LA==
=dibe
-----END PGP SIGNATURE-----
--
|_|O|_|
|_|_|O| Github: https://github.com/dpurgert
|O|O|O| PGP: 05CA 9A50 3F2E 1335 4DC5 4AEE 8E11 DDF3 1279 A281
Stefan Ram
2021-04-05 19:38:45 UTC
Permalink
Post by Dan Purgert
What are these secret characters called?
I was not able to find these characters here, and, therefore,
just give a generic answer:

Symbols used to indicate your device doesn’t have a font to
display the symbols (especially little boxes) sometimes are
called "tofu".
Post by Dan Purgert
Do you know of a better program to identify such hidden secret characters?
Some versions of vim support Unicode and keystrokes to show
the code of the currect character. Many other editors do to.
Sometimes, you can also find more information looking at the
source code of a web page.
Lewis
2021-04-05 20:51:52 UTC
Permalink
Post by Stefan Ram
Post by Dan Purgert
What are these secret characters called?
I was not able to find these characters here, and, therefore,
Symbols used to indicate your device doesn’t have a font to
display the symbols (especially little boxes) sometimes are
called "tofu".
There are no non-ASCII symbols around the word "iPad" in the headline on
that page.
--
"Are you pondering what I'm pondering?"
"Well, I think so Brain, but what if we stick to the seat covers?"
Lewis
2021-04-05 19:40:28 UTC
Permalink
Post by Dan Purgert
If on Windows 10 I copy from any Macrumors web page directly into gVim any
sentence containing the words "iPad" or "iPhone" I always get a strange
question mark indicating some kind of secret hidden character both before
and after the word as in "?iPad?" and "?iPhone?"
https://www.macrumors.com/2021/04/05/apple-ipad-select-mac-iphone-11-trade-in-value/
There are no special or "secrect" characters.

<div class="titlebar--3N4MCKxL">
<h1 class="heading--1cooZo6n heading--h5--3l5xQ3lN heading--white--2vAPsAl1 heading--noMargin--mnRHPAnD">
Apple Adjusts Trade-In Value of iPad Pro, iPhone 11, and Select Mac Models
</h1>
</div>

Nothing there at all (that is the header that appears in white text on a
red background at the top of the article).

Either gVim or Windwos 10 is confused.
Post by Dan Purgert
Do you know of a better program to identify such hidden secret characters?
Since there are no "hidden secret characters" that is impossible to
answer.
--
I'm a trophy husband.
That seems unlikely.
I didn't say it was a first place trophy.
Athel Cornish-Bowden
2021-04-06 08:06:39 UTC
Permalink
Post by Lewis
Post by Dan Purgert
If on Windows 10 I copy from any Macrumors web page directly into gVim any
sentence containing the words "iPad" or "iPhone" I always get a strange
question mark indicating some kind of secret hidden character both before
and after the word as in "?iPad?" and "?iPhone?"
https://www.macrumors.com/2021/04/05/apple-ipad-select-mac-iphone-11-trade-in-value/
There are no special or "secrect" characters.
<div class="titlebar--3N4MCKxL">
<h1 class="heading--1cooZo6n heading--h5--3l5xQ3lN
heading--white--2vAPsAl1 heading--noMargin--mnRHPAnD">
Apple Adjusts Trade-In Value of iPad Pro, iPhone 11, and Select Mac Models
</h1>
</div>
Nothing there at all (that is the header that appears in white text on a
red background at the top of the article).
Either gVim or Windwos 10 is confused.
Post by Dan Purgert
Do you know of a better program to identify such hidden secret characters?
Since there are no "hidden secret characters" that is impossible to
answer.
I get exactly the same as you (no hidden secret characters) when I look
at the HTML source in Chrome (Mac OS 10.13.2).
--
Athel -- British, living in France for 34 years
Dan Purgert
2021-04-06 15:10:24 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Post by Athel Cornish-Bowden
I get exactly the same as you (no hidden secret characters) when I look
at the HTML source in Chrome (Mac OS 10.13.2).
These Zero-Width Non-Joiner (ZWNJ) characters are hidden to most programs.
https://en.wikipedia.org/wiki/Zero-width_non-joiner

Andreas Karrer saw these ZWNJ ligature characters in vim in the sentence
"On the iPad lineup, the iPad Pro has...",
When I paste that into Win gVim I get question marks around the word "iPad."

What qEdit shows is better since it shows the exact ZWNJ ligature characters
and qEdit shows the many links that are invisible to Windows gVim pastes
"On the ?iPad? lineup, the <link>iPad Pro</link> has..."
(I substituted a question mark for the invisible ZWNJ ligature characters)
https://en.wikipedia.org/wiki/File:ISOIEC-9995-7-081--IEC-60417-6077-1--Symbol-for-ZWNJ.svg

Another instance of the ZWNJ ligature character is where it says
"With the iPhone, Apple..."
What gVim shows is
"With the ?iPhone?, Apple..."
Again qEdit shows the exact ZWNJ ligature character at those question marks.
https://upload.wikimedia.org/wikipedia/commons/a/ad/ISOIEC-9995-7-081--IEC-60417-6077-1--Symbol-for-ZWNJ.svg

I think Andreas Karrer is correct since the characters shown in qEdit are
the same as those for the ZWNJ ligature symbol (sort of a "T pipe").

-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEBcqaUD8uEzVNxUrujhHd8xJ5ooEFAl5nnnYACgkQjhHd8xJ5
ooFFQgf/Wd/67GaYp/s/5FtSCJl83OCTllraWmP6piWqk4qRe9NQnHswlNs8jWch
CtGPGLndzYGyR27n/mCiHcINNFPgLqfOtryU8Hgm7MAh9M9sEwFZpvFnOaM3k/IF
zaAWrJAV631vHFpWiLrM93EzCBHakHJvFyjtqv3fBH+rfjaCokGo8y65lmYSIqCD
s5gNd7lBjnMQ33jX6aH5W6b4mJbX4BHmw8KKwg5M6kbYrSrDIhrN1r368Wcu711X
qnKAiKD/RP0IHRMV7b/HUGBgoXt1Lo+y8+lknW41gau3JAKNPCz/UFGGyIt+qDXO
Qr1Bxgc82+hRTTrOlzkZbN27ftn3LA==
=dibe
-----END PGP SIGNATURE-----
--
|_|O|_|
|_|_|O| Github: https://github.com/dpurgert
|O|O|O| PGP: 05CA 9A50 3F2E 1335 4DC5 4AEE 8E11 DDF3 1279 A281
Andreas Karrer
2021-04-05 22:21:32 UTC
Permalink
Post by Dan Purgert
If on Windows 10 I copy from any Macrumors web page directly into gVim any
sentence containing the words "iPad" or "iPhone" I always get a strange
question mark indicating some kind of secret hidden character both before
and after the word as in "?iPad?" and "?iPhone?"
https://www.macrumors.com/2021/04/05/apple-ipad-select-mac-iphone-11-trade-in-value/
What are these secret characters called?
U+200C, "Zero-Width Non-Joiner". This is generally used to prevent
ligatures. It doesn't make sense to use it before or after a
space or punctuation. In utf-8, this is encoded in three bytes, 0xE2
0x80 0x8C.

https://en.wikipedia.org/wiki/Zero-width_non-joiner
Post by Dan Purgert
Do you know of a better program to identify such hidden secret characters?
My vim (on Linux) displays the six characters: <200c>

- Andi
Dan Purgert
2021-04-05 22:57:34 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
Post by Andreas Karrer
Post by Dan Purgert
What are these secret characters called?
U+200C, "Zero-Width Non-Joiner". This is generally used to prevent
ligatures. It doesn't make sense to use it before or after a
space or punctuation. In utf-8, this is encoded in three bytes, 0xE2
0x80 0x8C.
https://en.wikipedia.org/wiki/Zero-width_non-joiner
Thank you for explaining what those invisible characters around the first
instance of 'iPad' and 'iPhone' in any given sentence were called!

In that Wikipedia article is an image showing EXACTLY the ZWNJ symbol that
shows up as when I paste the text into qEdit freeware by K39 on Windows 10.
https://sourceforge.net/projects/qedit/

How the heck did you figure out what those invisible characters were anyway?
I had tried to figure out what they were but I couldn't find what they were.

They show up in Windows HxD (http://www.mh-nexus.de/) as a period (aka dot
or fullstop) around the first instance of iPhone or iPad in any sentence.
But on the hex side those dots show up as hex 00 (which wasn't useful).

In almost every other program they don't even show up (they're invisible).
Post by Andreas Karrer
Post by Dan Purgert
Do you know of a better program to identify such hidden secret characters?
My vim (on Linux) displays the six characters: <200c>
I don't know how you see them on vim because by the time I paste the text
into Windows gVim those ZWNJ characters (which are in front of and behind
the first instance of iPad or iPhone in a sentence) are already converted
automatically into question marks.

iPad becomes ?iPad? (but only on the first instance in any given sentence).
iPhone becomes ?iPhone? (only on the first instance in any given sentence).

In Windows gVim, a "ga" (get ascii) on the question mark shows as hex 3f
(but it has already been converted to the question mark by that point)

In Windows Notepad++ the zero width non joiner characters are invisible.
But in Windows QEdit (by K39) they show up as does a LOT more secret text!

Can you let me in on the secret of how you figured out what they were?

-----BEGIN PGP SIGNATURE-----

iQEzBAEBCAAdFiEEBcqaUD8uEzVNxUrujhHd8xJ5ooEFAl5nnnYACgkQjhHd8xJ5
ooFFQgf/Wd/67GaYp/s/5FtSCJl83OCTllraWmP6piWqk4qRe9NQnHswlNs8jWch
CtGPGLndzYGyR27n/mCiHcINNFPgLqfOtryU8Hgm7MAh9M9sEwFZpvFnOaM3k/IF
zaAWrJAV631vHFpWiLrM93EzCBHakHJvFyjtqv3fBH+rfjaCokGo8y65lmYSIqCD
s5gNd7lBjnMQ33jX6aH5W6b4mJbX4BHmw8KKwg5M6kbYrSrDIhrN1r368Wcu711X
qnKAiKD/RP0IHRMV7b/HUGBgoXt1Lo+y8+lknW41gau3JAKNPCz/UFGGyIt+qDXO
Qr1Bxgc82+hRTTrOlzkZbN27ftn3LA==
=dibe
-----END PGP SIGNATURE-----
--
|_|O|_|
|_|_|O| Github: https://github.com/dpurgert
|O|O|O| PGP: 05CA 9A50 3F2E 1335 4DC5 4AEE 8E11 DDF3 1279 A281
Anders D. Nygaard
2021-04-06 19:07:40 UTC
Permalink
Post by Dan Purgert
Post by Andreas Karrer
Post by Dan Purgert
What are these secret characters called?
U+200C, "Zero-Width Non-Joiner". This is generally used to prevent
ligatures. It doesn't make sense to use it before or after a
space or punctuation. In utf-8, this is encoded in three bytes, 0xE2
0x80 0x8C.
https://en.wikipedia.org/wiki/Zero-width_non-joiner
Thank you for explaining what those invisible characters around the first
instance of 'iPad' and 'iPhone' in any given sentence were called!
In that Wikipedia article is an image showing EXACTLY the ZWNJ symbol that
shows up as when I paste the text into qEdit freeware by K39 on Windows 10.
https://sourceforge.net/projects/qedit/
How the heck did you figure out what those invisible characters were anyway?
I had tried to figure out what they were but I couldn't find what they were.
I can't answer for Andreas, of course, but "Show source" on the web page
gives me "<p>On the &zwnj;iPad&zwnj; lineup, the ..."

&zwnj; is an HTML (character) entity reference.

/Anders, Denmark

PS. PGP signatures are usually frowned upon here.
Andreas Karrer
2021-04-06 20:38:02 UTC
Permalink
Post by Anders D. Nygaard
Post by Dan Purgert
Post by Andreas Karrer
Post by Dan Purgert
What are these secret characters called?
U+200C, "Zero-Width Non-Joiner". This is generally used to prevent
ligatures. It doesn't make sense to use it before or after a
space or punctuation. In utf-8, this is encoded in three bytes, 0xE2
0x80 0x8C.
https://en.wikipedia.org/wiki/Zero-width_non-joiner
Thank you for explaining what those invisible characters around the first
instance of 'iPad' and 'iPhone' in any given sentence were called!
In that Wikipedia article is an image showing EXACTLY the ZWNJ symbol that
shows up as when I paste the text into qEdit freeware by K39 on Windows 10.
https://sourceforge.net/projects/qedit/
How the heck did you figure out what those invisible characters were anyway?
I had tried to figure out what they were but I couldn't find what they were.
I can't answer for Andreas, of course, but "Show source" on the web page
gives me "<p>On the &zwnj;iPad&zwnj; lineup, the ..."
&zwnj; is an HTML (character) entity reference.
/Anders, Denmark
I made a small html page with zwnjs inserted in 5 different ways:

https://karrer.net/tmp/zwnj.html

Some are displayed with Chrome/Firefox's "View source", some aren't.
Some get copy-pasted as Unicode characters, some don't. You can download
the page using wget or curl and will see that the zwnj's are there,
even though Chrome does not show them.


- Andi
Dan Purgert
2021-04-07 20:47:15 UTC
Permalink
Post by Andreas Karrer
https://karrer.net/tmp/zwnj.html
Thank you for going to the trouble to craft that zwnj.html test page.
https://karrer.net/tmp/zwnj.html

It was very helpful as I tested out various editors & workarounds.

When the first line of karrer.net/tmp/zwnj.html was pasted...
gVim indicated "fo?o / ba?r / ba?z / blech / blurf?l"
qEdit showed the zwnj (but it's hard to see especially in "blurf?l")
Notepad++ doesn't show any of them.
Loading Image...

Even though Notepad++ doesn't show them a macro replaced them.
"fo&o / ba&r / ba&z / blech / blurf&l"

The macro is inserted into %AppData%\Notepad++\shortcuts.xml
https://stackoverflow.com/questions/12124434/easiest-way-to-convert-smart-curly-quotes-to-dumb-straight-quotes-in-notepad

<Macro name="Replace ZWNJ with Ampersand" Ctrl="no" Alt="no" Shift="no" Key="0">
<Action type="3" message="1700" wParam="0" lParam="0" sParam="" />
<Action type="3" message="1601" wParam="0" lParam="0" sParam="&#x200C;" />
<Action type="3" message="1625" wParam="0" lParam="0" sParam="" />
<Action type="3" message="1602" wParam="0" lParam="0" sParam='&amp;' />
<Action type="3" message="1702" wParam="0" lParam="768" sParam="" />
<Action type="3" message="1701" wParam="0" lParam="1609" sParam="" />
</Macro>

What's interesting is this site I had been using supposedly has all
known HTML characters but it doesn't have these ZWNJ characters anywhere!
https://www.toptal.com/designers/htmlarrows/punctuation/

Nonetheless once you told me what they're called (and what they look like)
then I could google away to my hearts (heart's?) content to learn more.

Thank you for identifying the word for the zero width non joiner characters.
--
Removed PGP sig
Loading...