Merge multiple text files into one file: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
(Created page with "Step 1: check the last line of text file is newline<ref>[https://stackoverflow.com/questions/34943632/linux-check-if-there-is-a-newline-at-the-end-of-a-fil...")
 
No edit summary
Line 1: Line 1:
== Steps ==
Step 1: check the last line of text file is [[Return symbol | newline]]<ref>[https://stackoverflow.com/questions/34943632/linux-check-if-there-is-a-newline-at-the-end-of-a-file eof - Linux - check if there is a newline at the end of a file - Stack Overflow]</ref>
Step 1: check the last line of text file is [[Return symbol | newline]]<ref>[https://stackoverflow.com/questions/34943632/linux-check-if-there-is-a-newline-at-the-end-of-a-file eof - Linux - check if there is a newline at the end of a file - Stack Overflow]</ref>
* {{kbd | key=<nowiki>tail -c 1 file.txt</nowiki>}} on {{Linux}}. Parameter "-c <span style="text-decoration: underline;">number</span>: The location is <span style="text-decoration: underline;">number</span> bytes." quoted from the [http://man7.org/linux/man-pages/man1/tail.1.html commands manual]. If the last line is newline, returned result will be empty. {{exclaim}} How to check multiple files?
* {{kbd | key=<nowiki>tail -c 1 file.txt</nowiki>}} on {{Linux}}. Parameter "-c <span style="text-decoration: underline;">number</span>: The location is <span style="text-decoration: underline;">number</span> bytes." quoted from the [http://man7.org/linux/man-pages/man1/tail.1.html commands manual]. If the last line is newline, returned result will be empty. {{exclaim}} How to check multiple files?
Line 10: Line 12:
* {{kbd | key=sort -us -o bundle_unique.txt bundle.txt}}<ref>[http://unix.stackexchange.com/questions/19641/how-to-remove-duplicate-lines-in-a-large-multi-gb-textfile linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange]</ref> OS: {{Linux}}, cygwin of {{Win}} "-u means Unique keys; -s means stable sort; -o means output" quoted from [https://www.computerhope.com/unix/usort.htm sort] manual.
* {{kbd | key=sort -us -o bundle_unique.txt bundle.txt}}<ref>[http://unix.stackexchange.com/questions/19641/how-to-remove-duplicate-lines-in-a-large-multi-gb-textfile linux - How to remove duplicate lines in a large multi-GB textfile? - Unix & Linux Stack Exchange]</ref> OS: {{Linux}}, cygwin of {{Win}} "-u means Unique keys; -s means stable sort; -o means output" quoted from [https://www.computerhope.com/unix/usort.htm sort] manual.


Step 4: (optional) Remove the heading of CSV file
Step 5: Verify the merge
* count Number of Lines {{kbd | key=<nowiki>wc -l filename</nowiki>}}<ref>[https://www.tecmint.com/wc-command-examples/ 6 WC Command Examples to Count Number of Lines, Words, Characters in Linux]</ref>
== References ==
<reference />


[[Category:Data Science]]
[[Category:Data Science]]
[[Category:Text file processing]]
[[Category:Text file processing]]

Revision as of 16:29, 30 January 2020

Steps

Step 1: check the last line of text file is newline[1]

Step 2: Merge the content

  • copy *.txt > bundle.txt or copy file1.txt file2.txt > bundle.txt on Win Os windows.png
  • cat *.txt > bundle.txt or cat file1.txt file2.txt > bundle.txt on Linux Os linux.png [2][3]

Step 3: (optional) Remove the duplicated lines

  • sort -us -o bundle_unique.txt bundle.txt[4] OS: Linux Os linux.png , cygwin of Win Os windows.png "-u means Unique keys; -s means stable sort; -o means output" quoted from sort manual.

Step 4: (optional) Remove the heading of CSV file

Step 5: Verify the merge

  • count Number of Lines wc -l filename[5]

References

<reference />