pages tagged editing http://meng6net.localhost/tag/editing/ <p><small>Copyright © 2005-2020 by <code>Meng Lu &lt;lumeng3@gmail.com&gt;</code></small></p> Meng Lu's home page ikiwiki Tue, 16 May 2017 23:59:39 +0000 Removing newline characters http://meng6net.localhost/blog/removing_newline_characters/ http://meng6net.localhost/blog/removing_newline_characters/ blog editing emacs tip 国学 文字学 Tue, 16 May 2017 23:59:39 +0000 2017-05-16T23:59:39Z <p>Meng Lu, 2013-7-6</p> <p>Suppose you want to remove newlines in between the Chinese characters:</p> <pre><code>南海少年遊俠客, 詩成嘯傲凌滄州, 曾因酒醉鞭名馬, 生怕情深累美人。 </code></pre> <p>-- note that the 1st and 2nd Chinese comma <code>,</code> actually have two or more white spaces following them -- and change it to a single line</p> <pre><code>南海少年遊俠客,詩成嘯傲凌滄州,曾因酒醉鞭名馬,生怕情深累美人。 </code></pre> <p>One way to do this is using Emacs.</p> <h2>Use <code>query-replace-regexp</code></h2> <p>Press <kbd>M</kbd>-<kbd>x</kbd>, and type <code>query-replace-regexp</code>, or as a shortcut <kbd>C</kbd>-<kbd>M</kbd>-<code>%</code>;</p> <p>Type regexp to match:</p> <pre><code>\([[:nonascii:\]]\) * *\([[:nonascii:\]]\) </code></pre> <p>Note the line break in the regexp need to be typed into the <a href= "http://www.gnu.org/software/emacs/manual/html_node/emacs/Minibuffer.html"> Emacs minibuffer</a> with <kbd>C</kbd>-<kbd>q</kbd> <kbd>C</kbd>-<kbd>j</kbd>.</p> <p>Type regexp to substitute:</p> <pre><code>\1\2 </code></pre> <p>This means the white space character(s) (if any) and newline character between non-ASCII characters will be removed in the substituted version, so the result is the character on the first line followed by that on the second line.</p> <h2>Use <code>fill-paragraph</code></h2> <ul> <li> <p>Set <code>fill-column</code> variable, which controls how wide a line of text can go before line-wrapping to a very large value for the current buffer: <kbd>C</kbd>-<code>x</code> <code>f</code>, <code>10000000</code></p> </li> <li> <p>Highlight the paragraph you'd like to modify: move cursor to the beginning, hold <kbd>Shift</kbd> down and move up and down arrow to extend and decrease the selection;</p> </li> <li> <p>Press <kbd>M</kbd>-<kbd>x</kbd>, and type <code>fill-paragraph</code>.</p> </li> </ul> <p>This should remove all newline characters in the text. Interestingly, if there are multiple white space characters at the end of lines before the new line character, it will keep one of them:</p> <pre><code>南海少年遊俠客, 詩成嘯傲凌滄州, 曾因酒醉鞭名馬,生怕情深累美人。 </code></pre> <p>Note there is an additional white space after the 1st and the 2nd <code>,</code>.</p> <p>The single white space character is actually still redundant, that can be corrected by</p> <pre><code>M-x query-replace-regexp , * , </code></pre> /blog/removing_newline_characters/#comments