pages tagged blog http://meng6net.localhost/tag/blog/ <p><small>Copyright © 2005-2020 by <code>Meng Lu &lt;lumeng3@gmail.com&gt;</code></small></p> Meng Lu's home page ikiwiki Sun, 21 May 2017 17:25:59 +0000 Nature publishes substandard article 'Oympic feats raise suspicion' by Ewen Callaway http://meng6net.localhost/blog/Nature_publishes_substandard_article___39__Oympic_feats_raise_suspicion__39___by_Ewen_Callaway/ http://meng6net.localhost/blog/Nature_publishes_substandard_article___39__Oympic_feats_raise_suspicion__39___by_Ewen_Callaway/ blog news olympics society Tue, 16 May 2017 23:59:39 +0000 2017-05-16T23:59:39Z <p>The prestigious scientific magazine Nature published a news report titled <a href= "http://www.nature.com/news/why-great-olympic-feats-raise-suspicions-1.11109"> 'Oympic feats raise suspicion'</a> by Ewen Callaway (<a href= "http://meng6net.localhost/[https://twitter.com/ewencallaway">@ewencallaway</a>), which talked about 16-year-old Chinese Olympic gold medalist swimmer <a href="http://en.wikipedia.org/wiki/Ye_Shiwen">Ye Shiwen</a>, and says</p> <blockquote> <p>... how an athlete's performance history and the limits of human physiology could be used to catch dopers ...</p> <p>Was Ye’s performance anomalous?<br /> Yes. Her time in the 400 IM was more than 7 seconds faster than her time in the same event at a major meet in July 2011. But what really raised eyebrows was her showing in the last 50 metres, which she swam faster than US swimmer Ryan Lochte did when he won gold in the men’s 400 IM on Saturday, with the second-fastest time ever for that event. ...</p> <p>Doesn't a clean drug test during competition rule out the possibility of doping?<br /> No, says Ross Tucker, ...</p> </blockquote> <p>Other than this fact-lacking and logically flawed weak reasoning, there is not much data or scientific analysis in this report. It seems a rather substandard article for Nature, especially on a potentially controversial topic. One of the reader comments was particularly well-ground criticism on it:</p> <blockquote> <p>Lai Jiang said:</p> <p>It is a shame to see Nature, which nearly all scientists, including myself, regard as the one of the most prestigious and influential physical science magazines to publish a thinly-veiled biased article like this. Granted, this is not a peer-reviewed scientific article and did not go through the scrutiny of picking referees. But to serve as a channel for the general populous to be in touch with and appreciate sciences, the authors and editors should at least present the readers with facts within proper context , which they failed to do blatantly.</p> <p>First, to compare a player's performance increase, the author used Ye's 400m IM time and her performance at the World championship 2011, which are 4:28. 43 and 4:35.15 respectively, and reached the conclusion that she has got an "anomalous" increase by ~7 sec (6.72 sec). In fact she's previous personal best was 4:33.79 at Asian Games 2010 [1]. This leads to a 5.38 sec increase. In a sport event that 0.1 sec can be the difference between the gold and silver medal, I see no reason that 5.38 sec can be treated as 7 sec.</p> <p>Second, as previously pointed out, Ye is only 16 years old and her body is still developing. Bettering oneself by 5 sec over two years may seem impossible for an adult swimmer, but certainly happens among youngsters. Ian Thorpe's interview revealed that his 400m freestyle time increased 5 sec between the age of 15 and 16 [2]. For regular people including the author it may be hard to imagine what an elite swimmer can achieve as he or she matures, combined with scientific and persistent training. But jumping to a conclusion that it is "anomalous" based on "Oh that's so tough I can not imagine it is real" is hardly sound.</p> <p>Third, to compare Ryan Lochte's last 50m to Ye's is a textbook example of what we call to cherry pick your data. Yes, Lochte is slower than Ye in the last 50m, but (as pointed out by Zhenxi) Lochte has a huge lead in the first 300m so that he chose to not push himself too hard to conserve energy for latter events (whether this conforms to the Olympic spirit and the "use one' s best efforts to win a match" requirement that the BWF has recently invoked to disqualify four badminton pairs is another topic worth discussing, probably not in Nature, though). On the contrary, Ye is trailing behind after the first 300m and relies on freestyle, which she has an edge, to win the game. Failing to mention this strategic difference, as well as the fact that Lochte is 23.25 sec faster (4:05.18) over all than Ye creates the illusion that a woman swam faster than the best man in the same sport, which sounds impossible. Put aside the gender argument, I believe this is still a leading question that implies the reader that something fishy is going on.</p> <p>Fourth, another example of cherry picking. In the same event there are four male swimmers that swam faster than both Lochter (29.10 sec) [3] and Ye (28.93 sec) [4]: Hagino (28.52 sec), Phelps (28.44 sec), Horihata (27.87 sec) and Fraser-Holmes (28.35 sec). As it turns out if we are just talking about the last 50m in a 400m IM, Lochter would not have been the example to use if I were the author. What kind of scientific rigorousness that author is trying to demonstrate here? Is it logical that if Lochter is the champion, we should assume he leads in every split? That would be a terrible way to teach the public how science works.</p> <p>Fifth, which is the one I oppose the most. The author quotes Tucks and implies that a drug test can not rule out the possibility of doping. Is this kind of agnosticism what Nature really wants to educate its readers? By that standard I estimate that at least half of the peer-reviewed scientific papers in Nature should be retracted. How can one convince the editors and reviewers that their proposed theory works for every possible case? One cannot. One chooses to apply the theory to typical examples and demonstrate that in (hopefully) all scenarios considered the theory works to a degree, and that should warrant a publication, until a counterexample is found. I could imagine that the author has a skeptical mind which is critical to scientific thinking, but that would be put into better use if he can write a real peer-reviewed paper that discusses the odds of Ye doping on a highly advanced non-detectable drug that the Chinese has come up within the last 4 years (they obviously did not have it in Beijing, otherwise why not to use it and woo the audience at home?), based on data and rational derivation. This paper, however, can be interpreted as saying that all athletes are doping, and the authorities are just not good enough to catch them. That may be true, logically, but definitely will not make the case if there is ever a hearing by FINA to determine if Ye has doped. To ask the question that if it is possible to false negative in a drug test looks like a rigged question to me. Of course it is, other than the drug that the test is not designed to detect, anyone who has taken Quantum 101 will tell you that everything is probabilistic in nature, and there is a probability for the drug in an athlete's system to tunnel out right at the moment of the test. A slight change as it may be, should we disregard all test results because of it? Let's be practical and reasonable. And accept WADA is competent at its job. Her urine sample is stored for 8 years following the contest for future testing as technology advances. Innocent until proven guilty, shouldn't it be?</p> <p>Sixth, and the last point I would like to make, is that the out-of- competition drug test is already in effect, which the author failed to mention. Per WADA president's press release [5], drug testing for olympians began at least 6 months prior to the opening of the London Olympic. Furthermore there are 107 athletes who are banned from this Olympic for doping. That maybe the reason that everyone will pass at the Olympic games. Hardly anyone fails in competition testing? Because those who did dope are already sanctioned? The author is free to suggest that a player could have doped beforehand and fool the test at the game, but this possibility certainly is ruled out for Ye.</p> <p>Over all, even though the author did not falsify any data, he did ( intentionally or not) cherry pick data that is far too suggestive to be fair and unbiased, in my view. If you want to cover a story of a suspected doping from a scientific point of view, be impartial and provide all the facts for the reader to judge. You are entitled to your interpretation of the facts, and the expression thereof in your piece, explicitly or otherwise , but only showing evidences which favor your argument is hardly good science or journalism. Such an article in a journal like Nature is not an appropriate example of how scientific research or report should be done.</p> <p>1 http://www.fina.org/H2O/index.php?option=com_wrapper&amp;view=wrapper&amp;Itemid=1241<br /> 2 http://www.youtube.com/watch?v=8ETPUKlOwV4<br /> 3 http://www.london2012.com/swimming/event/men-400m-individual-medley/phase=swm054100/index.html<br /> 4 http://www.london2012.com/swimming/event/women-400m-individual-medley/phase=sww054100/index.html<br /> 5 http://playtrue.wada-ama.org/news/wada-presidents-addresses-london-2012-press-conference/?utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=wada-presidents-addresses-london-2012-press-conference</p> </blockquote> <p>How well said!</p> /blog/Nature_publishes_substandard_article___39__Oympic_feats_raise_suspicion__39___by_Ewen_Callaway/#comments Back up MediaWiki http://meng6net.localhost/blog/backup_mediawiki/ http://meng6net.localhost/blog/backup_mediawiki/ blog computing mediawiki sysadmin Tue, 16 May 2017 23:59:39 +0000 2017-05-16T23:59:39Z <p>Wrote a <a href= "http://meng6.net/pages/computing/sysadmin/backup_mediawiki/">note</a> about backing up MediaWiki including its database (MySQL or SQLite), content pages exported as an XML file, all images, and the entire directory where MediaWiki installed, which includes <code>LocalSettings.php</code> and extensions that usually contain customization.</p> /blog/backup_mediawiki/#comments Drawing polyhedra with textured faces in Mathematica http://meng6net.localhost/blog/drawing_polyhedra_with_textured_faces_in_Mathematica/ http://meng6net.localhost/blog/drawing_polyhedra_with_textured_faces_in_Mathematica/ blog computing geometry graphics mathematics note polyhedron programming Tue, 16 May 2017 23:59:39 +0000 2017-05-16T23:59:39Z <p>In Mathematica and Wolfram Language, with the feature called <a href= "https://reference.wolfram.com/language/ref/Texture.html">Texture</a> it's easy to draw 3D solids such as polyhedra with <a href= "https://en.wikipedia.org/wiki/Image_texture">image textures</a> on the surface such as the faces of polyhedra. When playing with this, I drew a <a href= "https://en.wikipedia.org/wiki/Rhombic_hexecontahedron">rhombic hexecontahedron</a> (<a href= "https://www.wolframalpha.com/input/?t=crmtb01&amp;f=ob&amp;i=rhombic+hexecontahedron">properties</a>), the logo of <a href= "https://www.wolframalpha.com/">Wolfram|Alpha</a> with national flags on its faces:</p> <p><a href= "http://meng6net.localhost/blog/drawing_polyhedra_with_textured_faces_in_Mathematica/image/rhombic_hexecontahedron__60-largest-countries-flags.png"> <img src= "http://meng6net.localhost/blog/drawing_polyhedra_with_textured_faces_in_Mathematica/900x-rhombic_hexecontahedron__60-largest-countries-flags.png" width="900" height="902" class="img" /></a></p> <p>☝ Rhombic hexecontahedron wrapped with the 60 most populous countries' national flags, one different flag drawn on each of the 60 rhombic faces. (It's somewhat odd that the most populous countries happen to have a lot of red and green colors in their national flags.)</p> <p>The code is in <a href= "https://github.com/lumeng/repo-meng-lib/blob/master/Mathematica/Notebook/polyhedra_with_textured_faces.nb"> my Git repository</a>.</p> <p>Remarks:</p> <ul> <li>It was a bit tricky to understand what the option <a href= "https://reference.wolfram.com/language/ref/VertexTextureCoordinates.html"> VertexTextureCoordinates</a> is and does, and how to set it up for what I'm trying to draw. It turns out for putting the national flags on the faces of a rhombic hexecontahedron, since both are (usually) isomorphic to a square, <code>VertexTextureCoordinates -&gt; {{0, 0}, {1, 0}, {1, 1}, {0, 1}}</code> is would work as it aligns the four vertices of a rectangular flag to the four vertices of a rhombic face.</li> </ul> <p>Something left to be improved:</p> <ul> <li> <p>Avoid flags that's not isomorphic to the face of the given polyhedron;</p> </li> <li> <p>Compute the suitable values of <code>VertexTextureCoordinates</code> to reduce or even avoid distortion of the flags;</p> </li> <li> <p>Find a reasonable way to align national flags of shapes not isomorphic to the shape of a face of a given polyhedron, such that the main feature of the national flag wrapped on a polyhedron face is well recognizable.</p> </li> </ul> /blog/drawing_polyhedra_with_textured_faces_in_Mathematica/#comments 在 Mac 上安装和配置 Java http://meng6net.localhost/zh/blog/installing_and_configuring_java_on_mac/ http://meng6net.localhost/zh/blog/installing_and_configuring_java_on_mac/ blog computing Tue, 16 May 2017 23:59:39 +0000 2017-05-16T23:59:39Z <p>Mac 上配置 Java 是很複雜的一件事。特別是如果你想要多個版本的 Java 同時存在,比方說想要有最新發佈的 Oracle Java 8 和 Mac 自帶的 Java 6。看了網絡論壇上和 oracle.com,apple.com 上的很多參考材料,試驗了很多不同設置方法,我終於把 Oracle JDK 7 和 JDK 8 同時裝到 MacBook Pro 上了。操作系統 OS X 10.9.2 本身自帶的 Java 6 依然保留,而且是系統默認使用的 Java 版本。統共也許花了一個工作日才完全搞清楚。寫了摘記以備後查:〈<span class="createlink">Installing and configuring Java on Mac</span> 〉.</p> Installing and configuring Java on Mac http://meng6net.localhost/blog/installing_and_configuring_java_on_mac/ http://meng6net.localhost/blog/installing_and_configuring_java_on_mac/ blog computing Tue, 16 May 2017 23:59:39 +0000 2017-05-21T17:25:59Z <p>After reading many references and trying various settings, I finally got Oracle JDK 7 and JDK 8 installed on my MacBook Pro running OS X 10.9.2, while leaving macOS's own system Java which is Java 6 unchanged. I've written a note 〈 <a href= "http://meng6net.localhost/computing/installing_and_configuring/installing_and_configuring_Java_on_macOS/"> Installing and configuring Java on Mac</a> 〉.</p> /blog/installing_and_configuring_java_on_mac/#comments Note on setting up Java projects using Gradle http://meng6net.localhost/blog/note_on_setting_up_java_projects_using_gradle/ http://meng6net.localhost/blog/note_on_setting_up_java_projects_using_gradle/ blog computing Tue, 16 May 2017 23:59:39 +0000 2017-05-16T23:59:39Z <p><a href="https://en.wikipedia.org/wiki/Gradle">Gradle</a> (http://www.gradle.org/) is a modern software building automation tool. I've recently picked it up. Here is a note on setting up a minimal and not-entirely-trivial Java project using Gradle and building (including running tests) with it: 〈 <a href= "http://meng6net.localhost/computing/installing_and_configuring/installing_and_configuring_gradle/"> Installing and configuring Gradle</a> 〉.</p> /blog/note_on_setting_up_java_projects_using_gradle/#comments Relearning p-value http://meng6net.localhost/blog/relearning_p-value/ http://meng6net.localhost/blog/relearning_p-value/ academics blog fallacy note p-value statistics Tue, 16 May 2017 23:59:39 +0000 2017-05-16T23:59:39Z <p>After reading <a href= "http://www.tandfonline.com/doi/pdf/10.1080/00031305.2016.1154108">"The ASA's statement on p-values: context, process, and purpose"</a>, and some other related references, here are some excerpts and notes I took on p-value and null-hypothesis significance testing.</p> <ul> <li> <p>American Statistical Association (ASA) has stated the following five principles about p-values and null hypothesis significance testing:</p> <ol> <li>"P-values can indicate how incompatible the data are with a specified statistical model."</li> <li>"P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone."</li> <li>" … It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself."</li> <li>"Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold."</li> <li>"… Practices that reduce data analysis or scientific inference to mechanical “bright-line” rules (such as “p &lt; 0.05”) for justifying scientific claims or conclusions can lead to erroneous beliefs and poor decision-making. …"</li> <li>"Proper inference requires full reporting and transparency."</li> <li>"A p-value, or statistical significance, does not measure the size of an effect or the importance of a result."</li> <li>"… Smaller p-values do not necessarily imply the presence of larger or more important effects, and larger p-values do not imply a lack of importance or even lack of effect. Any effect, no matter how tiny, can produce a small p-value if the sample size or measurement precision is high enough, and large effects may produce unimpressive p-values if the sample size is small or measurements are imprecise. …"</li> </ol> </li> <li> <p>Null hypothesis is usually a hypothesis that assumes that observed data and its distribution is a result of random chances rather than that of effects caused by some intrinsic mechanisms. It is usually what is to disapprove or to reject in order to establish evidence to or belief in that there is some real effect due to underlying intrinsic mechanism. In turn, the details of the statistical model used in this evaluation can be used to make quantitative estimations on properties of the underlying mechanism.</p> </li> <li> <p>The p-value is the probability that one has falsely rejected the null hypothesis.</p> <ul> <li>The smaller is, the smaller the chance is that one has falsely rejected the null hypothesis.</li> <li>Being able to reject or not being able to reject the null hypothesis may tells one if the observed data suggests that there is an effect, however, it does not tell one how much an effect there is and if the effect is true. See <a href= "https://en.wikipedia.org/wiki/Effect_size">effect size</a>.</li> <li>"a p-value near 0.05 taken by itself offers only weak evidence against the null hypothesis".</li> <li>UK statistician and geneticist Sir Ronald Fisher introduced the p-value in the 1920s. "The p-value was never meant to be used the way it's used today."</li> </ul> </li> <li> <p>As ASA p-value principle No. 3 states, the decision to reject the null hypothesis should not be based solely on if p-value passes a "bright-line" threshold. Rather, in order to reject the null hypothesis, one must make a subjective judgment involving the degree of risk acceptable for being wrong. The degree of risk of being wrong may be specified in terms of confidence levels which characterizes the sampling variability.</p> </li> <li> <p>Alternative ways used for referring to data cherry-picking include data dredging, significance chasing, significance questing, selective inference, <a href= "https://www.urbandictionary.com/define.php?term=p-hacking">p-hacking</a>, snooping, fishing, and double-dipping.</p> </li> <li> <p>"The difference between statistically significant and statistically insignificant is not, itself, statistically significant."</p> </li> <li> <p>"According to one widely used calculation [<sup id= "fnref:1"><a href="http://meng6net.localhost/tag/blog/#fn:1" rel="footnote">1</a></sup>], a p-value of 0.01 corresponds to a false-alarm probability of at least 11%, depending on the underlying probability that there is a true effect; a p-value of 0.05 raises that chance to at least 29%." See the following figure:</p> </li> </ul> <p><span class="createlink">p-value and probable cause.png</span></p> <h2>Some related concepts</h2> <ul> <li> <p>The <a href= "https://en.wikipedia.org/wiki/Standard_score">standard score</a>, or z-score is the deviation from the mean in units of standard deviation. A small p-value corresponds to a large positive z-score.</p> </li> <li> <p><a href= "https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule">68-95-99.7 rule</a></p> </li> <li> <p><a href="https://en.wikipedia.org/wiki/MAGIC_criteria">MAGIC criteria</a>.</p> <ul> <li>Magnitude - How big is the effect? Large effects are more compelling than small ones.</li> <li>Articulation - How specific is it? Precise statements are more compelling than imprecise ones.</li> <li>Generality - How generally does it apply?</li> <li>Interestingness - interesting effects are those that "have the potential, through empirical analysis, to change what people believe about an important issue".</li> <li>Credibility - Credible claims are more compelling than incredible ones. The researcher must show that the claims made are credible.</li> </ul> </li> </ul> <h2>References</h2> <ul> <li> <p>"The problem with p-values: how significant are they, really?", phys.org Science News Wire, 2013, <a href= "http://phys.org/wire-news/145707973/the-problem-with-p-values-how-significant-are-they-really.html"> http://phys.org/wire-news/145707973/the-problem-with-p-values-how-significant-are-they-really.html</a></p> </li> <li> <p>Regina Nuzzo, "Scientific method: statistical errors," 2014, <a href= "http://folk.ntnu.no/slyderse/Nuzzo%20and%20Editorial%20-%20p-values.pdf"> http://folk.ntnu.no/slyderse/Nuzzo%20and%20Editorial%20-%20p-values.pdf</a></p> </li> <li> <p>Tom Siegfried, "Odds Are, It's Wrong - Science fails to face the shortcomings of statistics," 2010, <a href= "https://www.sciencenews.org/article/odds-are-its-wrong">https://www.sciencenews.org/article/odds-are-its-wrong</a></p> </li> <li> <p>Gelman, A., and Loken, E., "The Statistical Crisis in Science," American Scientist, 102., 2014, <a href= "http://www.americanscientist.org/issues/feature/2014/6/thestatistical-crisis-in-science"> http://www.americanscientist.org/issues/feature/2014/6/thestatistical-crisis-in-science</a></p> </li> <li> <p>"The vast majority of statistical analysis is not performed by statisticians," simplystatistics.org, 2013, <a href= "http://simplystatistics.org/2013/06/14/the-vast-majority-of-statistical-analysis-is-not-performed-by-statisticians/"> http://simplystatistics.org/2013/06/14/the-vast-majority-of-statistical-analysis-is-not-performed-by-statisticians/</a></p> </li> <li> <p>"On the scalability of statistical procedures: why the p-value bashers just don't get it," simplystatistics.org, 2014, <a href= "http://simplystatistics.org/2014/02/14/on-the-scalability-of-statistical-procedures-why-the-p-value-bashers-just-dont-get-it/"> http://simplystatistics.org/2014/02/14/on-the-scalability-of-statistical-procedures-why-the-p-value-bashers-just-dont-get-it/</a></p> </li> <li> <p>Andrew Gelmana and Hal Sterna, The Difference Between “Significant” and “Not Significant” is not Itself Statistically Significant, The American Statistician, Volume 60, Issue 4, 2006, <a href= "http://www.tandfonline.com/doi/abs/10.1198/000313006X152649">http://www.tandfonline.com/doi/abs/10.1198/000313006X152649</a></p> </li> </ul> <div class="footnotes"> <hr /> <ol> <li id="fn:1">Goodman, "Of P-Values and Bayes: A Modest Proposal," S. N. Epidemiology 12, 295–297 (2001), <a href= "http://journals.lww.com/epidem/fulltext/2001/05000/of_p_values_and_bayes__a_modest_proposal.6.aspx"> http://journals.lww.com/epidem/fulltext/2001/05000/of_p_values_and_bayes__a_modest_proposal.6.aspx</a><a href="http://meng6net.localhost/tag/blog/#fnref:1" rev="footnote">↩</a></li> </ol> </div> /blog/relearning_p-value/#comments Removing newline characters http://meng6net.localhost/blog/removing_newline_characters/ http://meng6net.localhost/blog/removing_newline_characters/ blog editing emacs tip 国学 文字学 Tue, 16 May 2017 23:59:39 +0000 2017-05-16T23:59:39Z <p>Meng Lu, 2013-7-6</p> <p>Suppose you want to remove newlines in between the Chinese characters:</p> <pre><code>南海少年遊俠客, 詩成嘯傲凌滄州, 曾因酒醉鞭名馬, 生怕情深累美人。 </code></pre> <p>-- note that the 1st and 2nd Chinese comma <code>,</code> actually have two or more white spaces following them -- and change it to a single line</p> <pre><code>南海少年遊俠客,詩成嘯傲凌滄州,曾因酒醉鞭名馬,生怕情深累美人。 </code></pre> <p>One way to do this is using Emacs.</p> <h2>Use <code>query-replace-regexp</code></h2> <p>Press <kbd>M</kbd>-<kbd>x</kbd>, and type <code>query-replace-regexp</code>, or as a shortcut <kbd>C</kbd>-<kbd>M</kbd>-<code>%</code>;</p> <p>Type regexp to match:</p> <pre><code>\([[:nonascii:\]]\) * *\([[:nonascii:\]]\) </code></pre> <p>Note the line break in the regexp need to be typed into the <a href= "http://www.gnu.org/software/emacs/manual/html_node/emacs/Minibuffer.html"> Emacs minibuffer</a> with <kbd>C</kbd>-<kbd>q</kbd> <kbd>C</kbd>-<kbd>j</kbd>.</p> <p>Type regexp to substitute:</p> <pre><code>\1\2 </code></pre> <p>This means the white space character(s) (if any) and newline character between non-ASCII characters will be removed in the substituted version, so the result is the character on the first line followed by that on the second line.</p> <h2>Use <code>fill-paragraph</code></h2> <ul> <li> <p>Set <code>fill-column</code> variable, which controls how wide a line of text can go before line-wrapping to a very large value for the current buffer: <kbd>C</kbd>-<code>x</code> <code>f</code>, <code>10000000</code></p> </li> <li> <p>Highlight the paragraph you'd like to modify: move cursor to the beginning, hold <kbd>Shift</kbd> down and move up and down arrow to extend and decrease the selection;</p> </li> <li> <p>Press <kbd>M</kbd>-<kbd>x</kbd>, and type <code>fill-paragraph</code>.</p> </li> </ul> <p>This should remove all newline characters in the text. Interestingly, if there are multiple white space characters at the end of lines before the new line character, it will keep one of them:</p> <pre><code>南海少年遊俠客, 詩成嘯傲凌滄州, 曾因酒醉鞭名馬,生怕情深累美人。 </code></pre> <p>Note there is an additional white space after the 1st and the 2nd <code>,</code>.</p> <p>The single white space character is actually still redundant, that can be corrected by</p> <pre><code>M-x query-replace-regexp , * , </code></pre> /blog/removing_newline_characters/#comments Random notes about Unicode http://meng6net.localhost/blog/unicode_review/ http://meng6net.localhost/blog/unicode_review/ blog character encoding character set computing note unicode utf8 Tue, 16 May 2017 23:59:39 +0000 2017-05-16T23:59:39Z <p>Unicode 7 was released in June. I read the <a href= "http://www.unicode.org/versions/Unicode7.0.0/">release news</a> and was intrigued to review various concepts about Unicode and character encoding in general, since such is one of those technical issues that one encounter frequently, usually without appreciating or understanding its full technicality (due to its terseness and complexity), hence not sufficiently carefully taking care of it in general. But, if you're unlucky as every living man will be sometime, it bites back on you and you'll have to pay back the <a href="https://en.wikipedia.org/wiki/Technical_debt">technical debt</a>.</p> <p>The first few articles I read some years ago on the topic besides the obvious Wikipedia articles, was Joel Spolsky's oft-referenced article <a href= "http://www.joelonsoftware.com/articles/Unicode.html">〈The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)〉</a> which humorously introduced the history, motivation, and basic ideas to the various concepts and practical information about character sets, Unicode and UTF-8. In my native tongue Chinese, RUAN Yifeng (阮一峰) has <a href= "http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html"> a reading note article about the topic</a> which explains the basics clearly and succinctly. Xah Lee wrote <a href= "http://xahlee.info/comp/unicode_index.html">a series of concise summaries about Unicode characters</a>. (They are also particularly neatly formatted using HTML &amp; CSS.) Some articles summarize interesting subsets of characters such as the <a href= "http://xahlee.info/comp/unicode_arrows.html">arrow characters</a> which I find quite handy as reference.</p> <p>My short mnemonic on the topic, for now is just the following few sentences: Unicode is a character set, which is intended to be a unified set containing characters in all languages (the more precise term might be writing systems), to solve various technical difficulties with having different character sets for different languages and using them in multilingual contexts. In practice, in the most recently released Unicode 7.0, it has defined 113,021 code points, i.e. unique characters. A Unicode code point such as <a href= "http://www.wolframalpha.com/input/?t=crmtb01&amp;f=ob&amp;i=U%2B1234"> <code>U+1234</code></a> is a character uniquely identified by the hexadecimal number following <code>U+</code>. Unicode itself does not specify how characters are represented on computer storage media as sequences bits, viz. 0s and 1s. UTF-8 is a character encoding scheme which is a protocol for representing Unicode points, i.e. the characters as sequences of bits, such as representing Unicode code point <code>U+00FF</code>, i.e. character <code>ÿ</code> as <code>1100001110111111</code>. Conversely, a piece of text data on computer storage medium, which ultimately is a sequence of bits cannot be interpreted or decoded, if an accompanying encoding such as UTF-8 is not given. UTF-8 is an efficient encoding scheme. Some of its advantages include 1) backwards compatibility to ASCII so characters in ASCII including English letters, Arabic numerals, and some regular English punctuations are represented by the exactly same sequences of bits in ASCII and UTF-8, thus old or English-language text data encoded using ASCII can be exactly decoded with UTF-8 as well, which minimizes compatibility glitches; 2) variable length of bit sequences for representing individual characters to reduce space wasted for padding and disambiguation. And <a href= "https://en.wikipedia.org/wiki/UTF-8#Description">the encoding scheme sketched in the Wikipedia article "UTF-8"</a> is useful to quickly remind one of the related concepts. I'll see if I can add more to this in future.</p> <p>Mathematica, the software that I use all the time in and outside of my work only support plane-0 Unicode characters (at least in the front-end, i.e. the notebook interface), that is <code>U+0000</code> to <code>U+FFFF</code>, which unfortunately misses out many Chinese radicals in classical Chinese texts. I used to developed some prototypes for natural language processing with Chinese classical texts in Mathematica, but because of this limitation, it could not get quite nicely done. Databases is another context where careful treatment to <a href= "http://dev.mysql.com/doc/refman/5.0/en/charset.html">character set</a> and character encoding issues and collations can become involving. My most frequently used database is MySQL, in 5.5+, there is <a href= "http://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html"><code> utf8mb4</code></a> which supports storing 4-byte-wide Unicode characters which is quite broad.</p> <p>The ranges of Unicode code points representing Chinese, Japanese and Korean (CJK) characters (as I identified) are</p> <ul> <li>[<code>U+4E00</code>, <code>9FFF</code>]</li> <li>[<code>U+3400</code>, <code>4DFF</code>]</li> <li>[<code>U+F900</code>, <code>FAFF</code>]</li> <li>[<code>U+20000</code>, <code>2A6DF</code>]</li> <li>[<code>U+2F800</code>, <code>2FA1F</code>]</li> </ul> <p>Some useful references:</p> <ol> <li> <p>http://www.fileformat.info/info/unicode/, e.g. <a href= "http://www.fileformat.info/info/unicode/char/00fc/index.htm">ü</a></p> </li> <li> <p>http://www.wolframalpha.com, e.g. <a href= "http://www.wolframalpha.com/input/?t=crmtb01&amp;f=ob&amp;i=%C3%BC"> ü</a></p> </li> <li> <p>A reference about sorting http://collation-charts.org</p> </li> <li> <p>Unicode.org has some computer-readable data files: http://www.unicode.org/Public/UNIDATA/</p> </li> </ol> <p>And, lastly, the technically precise way to write the two words is <code>Unicode</code> and <code>UTF-8</code>, not <code>unicode</code>, <code>utf8</code> or <code>UTF8</code>.</p> /blog/unicode_review/#comments Leveled logging in Bash http://meng6net.localhost/blog/leveled-logging_in_bash/ http://meng6net.localhost/blog/leveled-logging_in_bash/ Bash blog programming Thu, 19 Mar 2015 07:49:38 +0000 2015-06-12T01:48:05Z <p>I usually use Bash for fairly trivial and/or ad-hoc tasks. But sometimes when a Bash program become complex, leveled logging can be useful for debugging and monitoring program states. By leveled logging, I mean printing messages grouped by different tags such as <code>DEBUG</code> and <code>INFO</code> and controlling which of the groups are printed by a "level" parameter. This is roughly what's supported by <a href= "https://docs.oracle.com/javase/8/docs/api/java/util/logging/Level.html"> <code>java.util.logging.Level</code></a> and <a href= "https://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/Level.html"> <code>org.apache.log4j.Level</code></a> in Java.</p> <p>To do this in Bash, I simply defined a set of level constants that is the union of the two schemes given above:</p> <table class="datatable"> <thead> <tr> <th><code>LEVEL</code></th> <th><code>java.util.logging.Level</code></th> <th><code>org.apache.log4j.Level</code></th> </tr> </thead> <tbody> <tr> <td>9</td> <td>(off)</td> <td>(off)</td> </tr> <tr> <td>8</td> <td>n/a</td> <td><code>FATAL</code></td> </tr> <tr> <td>7</td> <td><code>SEVERE</code></td> <td><code>ERROR</code></td> </tr> <tr> <td>6</td> <td><code>WARNING</code></td> <td><code>WARN</code></td> </tr> <tr> <td>5</td> <td><code>INFO</code></td> <td><code>INFO</code></td> </tr> <tr> <td>4</td> <td><code>CONFIG</code></td> <td>n/a</td> </tr> <tr> <td>3</td> <td><code>FINE</code></td> <td><code>DEBUG</code></td> </tr> <tr> <td>2</td> <td><code>FINER</code></td> <td><code>TRACE</code></td> </tr> <tr> <td>1</td> <td><code>FINEST</code></td> <td><code>ALL</code></td> </tr> </tbody> </table> <p><a href="http://meng6net.localhost/data/logging_levels.dsv">Direct data download</a></p> <p>and define a <code>mengLog</code> function that print a message if the message's logging level is same or higher than a global logging level <code>$mengLOGGING_LEVEL</code>. So the logging level can be understood as an indication of priority, the higher the level is, the higher the priority. For instance, when <code>$mengLOGGING_LEVEL=6</code>, messages with level equal or higher than <code>6</code> has an enough priority or even higher priority to be printed, and therefore, WARNING/WARN, SEVERE/ERROR, FATAL messages are printed in the example <code>leveled-logging_example.sh</code>.</p> <p>Code:</p> <ul> <li><a href= "https://github.com/lumeng/repo-meng-lib/blob/master/Bash/mengLog.sh"> <code>mengLog.sh</code></a></li> <li><a href= "https://github.com/lumeng/repo-meng-lib/blob/master/Bash/leveled-logging_example.sh"> <code>leveled-logging_example.sh</code></a></li> </ul> /blog/leveled-logging_in_bash/#comments