Fork me on GitHub

Parsing academic article fulltexts is a wild ride. In an article from one publisher, the HTML entity for ASCII code 7, "bell" has been inserted. Apparently they want the computer to beep as you read the text. For web display, I guess I'll have to find a way to play a sound file in its place. ๐Ÿ””

๐Ÿ”” 4

The proper handling of code 7 for printers is to print nothing for the character, but beep instead.


This is useful, because it alerts nearby researchers that a fresh article has been printed (presumably).


I assume that some very dusted professor somewhere is waiting by his 1890'โ€™s printer and relying hard on that feature.


How does it know when I'm reading that particular part?


Does it guess based on my scroll position and how long I've been looking at the page?


I think it's probably an error or omission. For printers, it would beep when it hit the particular character. In a terminal, I think it would beep as soon as it is printed on screen.


Also, it's an illegal character for XML 1.0 (which the document is encoded as), which speaks to it being an error. I think it's a by-product of injecting serialized HTML directly from an SQL query into an XML structure.


I see plenty of examples of articles trying to sneak in <bold> and <italic> in titles and text bodies by encoding them as entities in in the string.


Thanks @drewverlee hope youโ€™ll find the rest equally useful!


^ studying the important questions ๐Ÿ˜„


Omg this is important. I once read about a guy who is a professional ice cream taster and he uses a golden spoon because normal spoon tastes too much like spoon. I had to get myself a golden spoon for ice cream after that.


My favorite spoons are wooden. I find if conveys the taste best


Donโ€™t they taste like wood? :thinking_face:


Probably, but maybe I like the taste of wood โ˜บ๏ธ


They don't convey heat or cold as much as well


hey! and does it taste better when you use a golden spoon? asking for a friend ๐Ÿ˜„


Can anyone recall an article, perhaps scientific, comparing the frequency of the most used names/symbols/phrases between different programming languages? By extension the article argued that some languages could be easier to learn as the number of symbols the programmer needs to learn to understand the average program is lower.


I remember the article having histograms over the most frequently used words in different languages.


just google "zipf's law 'programming language'"


Interesting, thank you!


Seems like a zipfian distribution in programming languages would imply that having less symbols doesn't help as much as having the more common ones be highly composable with the less common ones, with a gradual exponential tradeoff in utility from more common to less common symbols, allowing for dense compositions of more common and less common words.


Zipfian distributions seem to be the most generally "expressive" from a generic compression standpoint. By giving shorter symbols to more common things, evolving languages are like compression schemes, able to better express domains. And for specific domains, you can get more compression/expressibility by aligning naming semantics closer to that domain.


is there a known abbrev for "hundred"? e.g. 3k -> 3000, 3<letter> -> 300


@vemv "h"? may be for hecto


sounds adoptable ๐Ÿ˜„


I think if I saw 4h I'd assume hours...

๐Ÿ‘ 12
๐Ÿ˜ž 4