Search the full text in PDF files: Difference between revisions
Jump to navigation
Jump to search
mNo edit summary |
m (Text replacement - "errerrors.blogspot.tw" to "errerrors.blogspot.com") Tags: Mobile edit Mobile web edit |
||
| (52 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
{{Template:File search}} | |||
{| border="1" | |||
| <b>PDF type</b> | 尋找多個PDF檔案裡的資料(PDF跨文件全文搜索) | ||
|| <b>Software</b> | |||
|| <b>full text search</b> | '''Suggestion''' | ||
|| <b>metadata search</b> | * full text search: Adobe reader is good choice because they highlight and locate the keywords you type. | ||
|| <b>comments</b> | * metadata search: Metadata is the data of data. You can fulfill the information of author, keywords when you generated the PDF file. PDF Explorer or xPDFSearch (Total Commander extension) are both good choices to perform the metadata search. | ||
'''Comparison of Solutions''' | |||
<div class="table-responsive" style="width:100%; min-height: .01%; overflow-x: auto;"> | |||
{| border="1" class="wikitable sortable" | |||
| | |||
|scope="col"| <b>PDF type</b> | |||
|scope="col"| <b>Software / service</b> | |||
|scope="col"| <b>full text search</b> | |||
|scope="col"| <b>metadata search</b> | |||
|scope="col"| <b>comments</b> | |||
|- | |- | ||
| Text-PDF | | {{Gd}} | ||
|| Text-PDF | |||
|| [http://www.adobe.com/go/gntray_dl_get_reader Adobe reader] 7.0.7 or Adobe acrobat | || [http://www.adobe.com/go/gntray_dl_get_reader Adobe reader] 7.0.7 or Adobe acrobat | ||
|| OK | || OK | ||
|| OK (but slow) | || OK (but slow) | ||
|| | || (1)the search function combined the full-text and metadata search, (2) locate the keywords you type | ||
|- | |||
| | |||
|| Text-PDF | |||
|| [https://share.adobe.com/ Adobe SHARE beta] | |||
|| OK (English only) | |||
|| No | |||
|| access: 2007-11-28 | |||
|- | |||
| {{Gd}} | |||
|| Text-PDF | |||
|| [http://www.foxitsoftware.com/Secure_PDF_Reader/ Fox Reader] v. 5.1.0 ([http://portableapps.com/apps/office/foxit_reader_portable Foxit Reader Portable]) | |||
|| OK | |||
|| No | |||
|| able to locate the keywords you typed | |||
|- | |||
| | |||
|| Text-PDF | |||
|| [https://mail.google.com/mail/ GMail] | |||
|| No | |||
|| No | |||
|| but [[Search Gmail|Gmail search]] supports searching the [https://mail.google.com/support/bin/answer.py?answer=7190&query=search+filename&topic=&type=f&ctx=search filename] in Mandarin Chinese. access: 2007-05-17 | |||
|- | |- | ||
| Text-PDF | | | ||
|| | || Text-PDF | ||
|| Google desktop search v4 | |||
|| OK, but only index the first 10,000 words | || OK, but only index the first 10,000 words | ||
|| Title only | || Title only | ||
|| | || | ||
|- | |- | ||
| Text-PDF | | | ||
|| Text-PDF | |||
|| [http://locate32.net/index.php Locate32] 3.0.8.1200 | |||
|| No, only find some words | |||
|| OK (English only) | |||
|| access: 2008-02-07 | |||
|- | |||
| | |||
|| Text-PDF | |||
|| [http://homepage.oniduo.pt/pdfe/pdfe.html PDF Explorer] 1.5 | || [http://homepage.oniduo.pt/pdfe/pdfe.html PDF Explorer] 1.5 | ||
|| OK | || OK | ||
|| OK | || OK | ||
|| | || (1)not highlight and locate the keywords you type; (2)extract and index the internal images | ||
|- | |- | ||
| | | | ||
|| [[Google desktop search | || Text-PDF | ||
|| [http://www.docu-track.com/home/prod_user/pdfx_viewer PDF-XChange Viewer] 1.0 (Build 0017) | |||
|| OK | |||
|| No | |||
|| (1) Search "elearning" will find "creative learning", "e-Learning", and "elearning."; (2)[http://playpcesor.blogspot.com/2007/06/pdf-xchange-viewer-pdf.html 異塵行者的介紹] | |||
|- | |||
| | |||
|| Text-PDF | |||
|| [http://www.microsoft.com/windows/desktopsearch/default.mspx Windows Desktop Search] 02.06.5000.5378 | |||
|| OK (with [http://download.adobe.com/pub/adobe/acrobat/win/all/ifilter60.exe PDF IFilter][http://channel9.msdn.com/wiki/default.aspx/Channel9.DesktopSearchIFilters]) | |||
|| OK ex: author:someone | |||
|| | |||
|- | |||
| | |||
|| Text-PDF | |||
|| [http://support.microsoft.com/?scid=kb%3Ben-us%3B940157&x=9&y=5 Windows Search] 4.0 | |||
|| OK | |||
|| OK (中文可) | |||
|| (1)not highlight and locate the keywords you type; (2)indexing too many filetypes and not easy to be customized | |||
|- | |||
| | |||
|| Text-PDF | |||
|| [http://www.lefteous.de/tc/xpdfsearch_eng.htm xPDFSearch] 1.02 ([http://www.ghisler.com/index.htm Total Commander] extension) | |||
|| OK | |||
|| OK | |||
|| not highlight and locate the keywords you type | |||
|- | |||
| | |||
|| Text-PDF | |||
|| [http://desktop.yahoo.com/ Yahoo! Desktop Search] 1.2 | |||
|| OK | |||
|| No | |||
|| (1)not highlight and locate the keywords you type; (2)not support Chinese folder name | |||
|- | |||
| {{Gd}} | |||
|| Text-PDF | |||
|| [http://mail.yahoo.com.tw/ Yahoo! Mail] | |||
|| OK | |||
|| No | |||
|| [http://help.yahoo.com/help/us/bizmail/manage/manage-04.html support] English only. access: 2007-05-17 | |||
|- | |||
| | |||
|| Image-PDF | |||
|| Google desktop search + [http://desktop.google.com/plugins/omnipagesearch.html OmniPage Search Indexer] | |||
|| OK, but only index the first 10,000 words | || OK, but only index the first 10,000 words | ||
|| Title only | || Title only | ||
|| Quick, English Only | || Quick, English Only | ||
|- | |- | ||
|} | |} | ||
</div> | |||
[https://pdfgrep.org/index.html pdfgrep] for {{Linux}} & {{Mac}}<ref>[https://stackoverflow.com/questions/45130195/pdfgrep-how-to-install-pdfgrep-on-mac macos - pdfgrep: how to install pdfgrep on Mac - Stack Overflow]</ref> | |||
* PDF type: Text-PDF | |||
* Full text search: Available | |||
* Metadata search: Not available | |||
* Annotation search: | |||
* Chinese issue: ok | |||
* Indexing for better performance: | |||
* Locate the keywords you type: ok | |||
* Support boolean search: (1) {{kbd | key=<nowiki>OR</nowiki>}}: To matches the content contains {{kbd | key=<nowiki>TERM_A</nowiki>}} or {{kbd | key=<nowiki>TERM_A</nowiki>}} e.g. {{kbd | key=<nowiki>pdfgrep -n --max-count 10 TERM_A|TERM_B foo.pdf</nowiki>}} (2) {{kbd | key=<nowiki>AND</nowiki>}}: Add the option {{kbd | key=<nowiki>-P, --perl-regexp</nowiki>}}<ref>[https://errerrors.blogspot.com/2018/04/resolve-pdfgrep-pcre-support-disabled.html 解決 pdfgrep 正則表示式搜尋時遇到 PCRE support disabled at compile time!]</ref>. To matches the content contains {{kbd | key=<nowiki>TERM_A</nowiki>}} and {{kbd | key=<nowiki>TERM_B</nowiki>}} e.g. {{kbd | key=<nowiki>pdfgrep -n --max-count 10 -i -P '(?=.*TERM_A)(?=.*TERM_B).*' foo.pdf</nowiki>}} | |||
* Comments: | |||
''$'' [https://evernote.com/intl/zh-tw/ Evernote] {{access | date=2018-11-13}} | |||
* PDF type: Text-PDF / Image-PDF<ref>[https://help.evernote.com/hc/en-us/articles/208313388 Tips for searching scanned PDFs – Evernote Help & Learning] / [https://help.evernote.com/hc/zh-tw/articles/208313388 搜尋掃描的 PDF 檔小撇步 – 支援 & 學習中心]</ref> | |||
* Full text search: Available. Search syntax: {{kbd | key=<nowiki><KEYWORD> +resource:application/pdf</nowiki>}}<ref>[https://help.evernote.com/hc/en-us/articles/208313828 How to use Evernote's advanced search syntax – Evernote Help & Learning]</ref> | |||
* Metadata search: Available | |||
* Annotation search: | |||
* Chinese issue: ok | |||
* Indexing for better performance: | |||
* Locate the keywords you type: Highlight the keyword on search result snippets. But I could not locate the next/previous position of keyword. {{exclaim}} | |||
* Support boolean search: | |||
* Comments: | |||
''$'' [http://www.mozkan.com/pdfsearch/index.html PDF Search] v. 1.7 for {{Mac}} | |||
* PDF type: Text-PDF. Not for Image-PDF. | |||
* Full text search: Available | |||
* Metadata search: Not available | |||
* Annotation search: | |||
* Chinese issue: {{exclaim}} Not ok! | |||
* Indexing for better performance: Available | |||
* Locate the keywords you type: Available | |||
* Support boolean search: Available. See details on [http://www.mozkan.com/pdfsearch/help.html#a8 Narrate Results]. | |||
* Comments: Good for searching PDF documents in English. There are still some technical issues in Mandarin Chinese. | |||
[http://www.onenote.com/ Microsoft OneNote] {{access | date=2018-06-19}} | |||
* PDF type: Text-PDF / Image-PDF | |||
* Full text search: Available. | |||
* Metadata search: | |||
* Annotation search: | |||
* Chinese issue: | |||
* Indexing for better performance: | |||
* Locate the keywords you type: Not highlight the location of matched keyword. {{exclaim}} | |||
* Support boolean search: | |||
* Comments: | |||
PDF type | PDF type | ||
* Text-PDF: 由文件檔轉成的PDF檔 | * Text-PDF: The PDF file generated from text files. 由文件檔轉成的PDF檔 | ||
* Image-PDF: 由圖檔轉成的PDF檔 | * Image-PDF: The PDF file generated from image files. 由圖檔轉成的PDF檔 | ||
(left blank intentionally) | |||
<pre> | |||
* PDF type: Text-PDF / Image-PDF | |||
* Full text search: | |||
* Metadata search: | |||
* Annotation search: | |||
* Chinese issue: | |||
* Indexing for better performance: | |||
* Locate the keywords you type: | |||
* Support boolean search: | |||
* Comments: | |||
</pre> | |||
其他組織管理PDF文件的軟體 | |||
* [http://www.adobe.com/products/digitaleditions/#download Adobe - Digital Editions] v1.0.467 | |||
** organization: bookshelf(folder: PDF檔僅能置於一個bookshelf) | |||
** search: 僅能搜尋單一PDF檔內的文字,無法跨檔案搜尋。 | |||
Further reading | |||
* Chin-Hsi Lin (2009). [http://newgenerationresearcher.blogspot.com/2009/11/pdf-xchange-viewer.html 研究生2.0: 用PDF XChange Viewer查詢英文關鍵詞] | |||
References | |||
<references /> | |||
[[Category:Search]] [[Category:Software]] | [[Category:Search]] [[Category:Software]] | ||
Latest revision as of 14:13, 19 January 2019
尋找多個PDF檔案裡的資料(PDF跨文件全文搜索)
Suggestion
- full text search: Adobe reader is good choice because they highlight and locate the keywords you type.
- metadata search: Metadata is the data of data. You can fulfill the information of author, keywords when you generated the PDF file. PDF Explorer or xPDFSearch (Total Commander extension) are both good choices to perform the metadata search.
Comparison of Solutions
| PDF type | Software / service | full text search | metadata search | comments | |
|
Text-PDF | Adobe reader 7.0.7 or Adobe acrobat | OK | OK (but slow) | (1)the search function combined the full-text and metadata search, (2) locate the keywords you type |
| Text-PDF | Adobe SHARE beta | OK (English only) | No | access: 2007-11-28 | |
|
Text-PDF | Fox Reader v. 5.1.0 (Foxit Reader Portable) | OK | No | able to locate the keywords you typed |
| Text-PDF | GMail | No | No | but Gmail search supports searching the filename in Mandarin Chinese. access: 2007-05-17 | |
| Text-PDF | Google desktop search v4 | OK, but only index the first 10,000 words | Title only | ||
| Text-PDF | Locate32 3.0.8.1200 | No, only find some words | OK (English only) | access: 2008-02-07 | |
| Text-PDF | PDF Explorer 1.5 | OK | OK | (1)not highlight and locate the keywords you type; (2)extract and index the internal images | |
| Text-PDF | PDF-XChange Viewer 1.0 (Build 0017) | OK | No | (1) Search "elearning" will find "creative learning", "e-Learning", and "elearning."; (2)異塵行者的介紹 | |
| Text-PDF | Windows Desktop Search 02.06.5000.5378 | OK (with PDF IFilter[1]) | OK ex: author:someone | ||
| Text-PDF | Windows Search 4.0 | OK | OK (中文可) | (1)not highlight and locate the keywords you type; (2)indexing too many filetypes and not easy to be customized | |
| Text-PDF | xPDFSearch 1.02 (Total Commander extension) | OK | OK | not highlight and locate the keywords you type | |
| Text-PDF | Yahoo! Desktop Search 1.2 | OK | No | (1)not highlight and locate the keywords you type; (2)not support Chinese folder name | |
|
Text-PDF | Yahoo! Mail | OK | No | support English only. access: 2007-05-17 |
| Image-PDF | Google desktop search + OmniPage Search Indexer | OK, but only index the first 10,000 words | Title only | Quick, English Only |
- PDF type: Text-PDF
- Full text search: Available
- Metadata search: Not available
- Annotation search:
- Chinese issue: ok
- Indexing for better performance:
- Locate the keywords you type: ok
- Support boolean search: (1) OR: To matches the content contains TERM_A or TERM_A e.g. pdfgrep -n --max-count 10 TERM_A|TERM_B foo.pdf (2) AND: Add the option -P, --perl-regexp[2]. To matches the content contains TERM_A and TERM_B e.g. pdfgrep -n --max-count 10 -i -P '(?=.*TERM_A)(?=.*TERM_B).*' foo.pdf
- Comments:
$ Evernote [Last visited: 2018-11-13]
- PDF type: Text-PDF / Image-PDF[3]
- Full text search: Available. Search syntax: <KEYWORD> +resource:application/pdf[4]
- Metadata search: Available
- Annotation search:
- Chinese issue: ok
- Indexing for better performance:
- Locate the keywords you type: Highlight the keyword on search result snippets. But I could not locate the next/previous position of keyword.

- Support boolean search:
- Comments:
$ PDF Search v. 1.7 for macOS
- PDF type: Text-PDF. Not for Image-PDF.
- Full text search: Available
- Metadata search: Not available
- Annotation search:
- Chinese issue:
Not ok! - Indexing for better performance: Available
- Locate the keywords you type: Available
- Support boolean search: Available. See details on Narrate Results.
- Comments: Good for searching PDF documents in English. There are still some technical issues in Mandarin Chinese.
Microsoft OneNote [Last visited: 2018-06-19]
- PDF type: Text-PDF / Image-PDF
- Full text search: Available.
- Metadata search:
- Annotation search:
- Chinese issue:
- Indexing for better performance:
- Locate the keywords you type: Not highlight the location of matched keyword.

- Support boolean search:
- Comments:
PDF type
- Text-PDF: The PDF file generated from text files. 由文件檔轉成的PDF檔
- Image-PDF: The PDF file generated from image files. 由圖檔轉成的PDF檔
(left blank intentionally)
* PDF type: Text-PDF / Image-PDF * Full text search: * Metadata search: * Annotation search: * Chinese issue: * Indexing for better performance: * Locate the keywords you type: * Support boolean search: * Comments:
其他組織管理PDF文件的軟體
- Adobe - Digital Editions v1.0.467
- organization: bookshelf(folder: PDF檔僅能置於一個bookshelf)
- search: 僅能搜尋單一PDF檔內的文字,無法跨檔案搜尋。
Further reading
- Chin-Hsi Lin (2009). 研究生2.0: 用PDF XChange Viewer查詢英文關鍵詞
References