MySQL full text search equivalents to Google search: Difference between revisions
| (46 intermediate revisions by the same user not shown) | |||
| Line 1: | Line 1: | ||
== AND == | == AND == | ||
Google search: {{kbd | key = keyword1 keyword2}} same as {{kbd | key = keyword1 AND keyword2}} or {{kbd | key = keyword1 +keyword2}}. {{exclaim}} (1) The following is '''exact words''' search. (2) Replace ''column_name'' with your column name | |||
* Google: {{kbd | key = <nowiki>易筋經 AND 吸星大法</nowiki>}} | * Google: {{kbd | key = <nowiki>易筋經 AND 吸星大法</nowiki>}} | ||
* MySQL: {{kbd | key = <nowiki>column_name REGEXP '易筋經' AND column_name REGEXP '吸星大法'</nowiki>}} | * MySQL: {{kbd | key = <nowiki>column_name REGEXP '易筋經' AND column_name REGEXP '吸星大法'</nowiki>}} | ||
| Line 9: | Line 9: | ||
== OR == | == OR == | ||
Google search: {{kbd | key = keyword1 OR keyword2}} | |||
* Google: {{kbd | key = <nowiki>易筋經 OR 吸星大法</nowiki>}} | * Google: {{kbd | key = <nowiki>易筋經 OR 吸星大法</nowiki>}} | ||
* MySQL: {{kbd | key = <nowiki>column_name REGEXP '易筋經' OR column_name REGEXP '吸星大法'</nowiki>}} | * MySQL: {{kbd | key = <nowiki>column_name REGEXP '易筋經' OR column_name REGEXP '吸星大法'</nowiki>}} | ||
| Line 16: | Line 16: | ||
== NOT == | == NOT == | ||
Google search: {{kbd | key = keyword1 NOT keyword2}} same as {{kbd | key = keyword1 -keyword2}} | |||
* Google: {{kbd | key = <nowiki>易筋經 NOT 吸星大法</nowiki>}} | * Google: {{kbd | key = <nowiki>易筋經 NOT 吸星大法</nowiki>}} | ||
* MySQL: {{kbd | key = <nowiki>column_name REGEXP '易筋經' AND column_name NOT REGEXP '吸星大法'</nowiki>}} ([http://sqlfiddle.com/#!2/6fe6a6/9/0 online demo]) | * MySQL: {{kbd | key = <nowiki>column_name REGEXP '易筋經' AND column_name NOT REGEXP '吸星大法'</nowiki>}} ([http://sqlfiddle.com/#!2/6fe6a6/9/0 online demo]) | ||
| Line 23: | Line 23: | ||
== * wildcard operator == | == * wildcard operator == | ||
Google * wildcard operator. "Use *, an asterisk character, known as a wildcard, to match one or more words in a phrase" <ref>[http://www.googleguide.com/wildcard_operator.html Google's * Wildcard Operator - Google Guide]</ref> ([http://sqlfiddle.com/#!2/e148e/1/0 online demo]) | |||
* Google: {{kbd | key = <nowiki>狐狸*叫</nowiki>}} | * Google: {{kbd | key = <nowiki>狐狸*叫</nowiki>}} | ||
* MySQL: {{kbd | key = <nowiki>column_name LIKE '狐狸%叫'</nowiki>}}<ref>[http://www.mysqltutorial.org/mysql-like/ Using MySQL LIKE Operator to Select Data Based on Pattern Matching]</ref> | * MySQL: {{kbd | key = <nowiki>column_name LIKE '狐狸%叫'</nowiki>}}<ref>[http://www.mysqltutorial.org/mysql-like/ Using MySQL LIKE Operator to Select Data Based on Pattern Matching]</ref> | ||
== | == English issue == | ||
When the keyword is short and written in English e.g. {{kbd | key = <nowiki>AI</nowiki>}}, the query result using {{kbd | key = <nowiki>column_name LIKE '%AI%'</nowiki>}} may NOT what you want e.g. Tainan, main, hair and so on. | |||
* (1) Remove all non-alpha-numeric-characters<ref>[https://stackoverflow.com/questions/6942973/how-to-remove-all-non-alpha-numeric-characters-from-a-string mysql - How to remove all non-alpha numeric characters from a string? - Stack Overflow]</ref> (2) REGEXP word boundaries<ref>regex - MySQL REGEXP word boundaries [[:<:]] [[:>:]] and double quotes - Stack Overflow https://stackoverflow.com/questions/18901704/mysql-regexp-word-boundaries-and-double-quotes</ref> e.g. {{kbd | key = <nowiki>(REPLACE(CONVERT(column_name USING ascii), '?', ' ') REGEXP '([[:<:]])AI([[:>:]])')</nowiki>}} | |||
Cited from [https://dev.mysql.com/doc/refman/5.7/en/regexp.html MySQL :: MySQL 5.7 Reference Manual :: 12.5.2 Regular Expressions] | |||
<pre> | |||
[[:<:]], [[:>:]] | |||
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_). | |||
</pre> | |||
教學文章:[https://errerrors.blogspot.com/2021/01/how-to-find-abbreviations-from-article-written-in-english-and-chinese-in-mysql.html 解決簡短英文單字的 MySQL 查詢:搜尋 app 而不是 apple] | |||
== Ignore special characters == | |||
Ignore return symbol and {{kbd | key = <nowiki>span</nowiki>}} tag | |||
* Example: | |||
** Searched the keywords e.g. {{kbd | key = <nowiki>"意法" site:ptt.cc</nowiki>}} on Google and found the search result contains 意 & 法 located in the nearest but different rows. 意 is at the end of the n-th row. 法 is at the beginning of n+1-th row [https://www.ptt.cc/man/Learn_Buddha/D7D5/D575/DFD/D15C/M.1317265764.A.B7D.html]. | |||
* Approach: (1) remove the html tag (2) remove the return symbol ([https://en.wikipedia.org/wiki/Carriage_return Carriage return]). | |||
Ignore white spaces, [https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms Halfwidth and fullwidth symbol] (半形字元和全形字元) | |||
* Examples: | |||
** Searched the keywords e.g. {{kbd | key = <nowiki>"嗎有"</nowiki>}} on Google and found the search result contains {{kbd | key = <nowiki>嗎? 有</nowiki>}} & {{kbd | key = <nowiki>嗎- 有</nowiki>}}. | |||
** Searched the keywords e.g. {{kbd | key = <nowiki>"人物誌Persona"</nowiki>}} on Google and found the search result contains {{kbd | key = <nowiki>人物誌(Persona)</nowiki>}}, {{kbd | key = <nowiki>人物誌(Persona) </nowiki>}} & {{kbd | key =<nowiki>「人物誌」(persona)</nowiki>}}. | |||
* Approach: (1) remove the space symbol (2) remove the Halfwidth and fullwidth symbol. | |||
* References: [https://stackoverflow.com/questions/16733674/php-remove-symbols-from-string PHP remove symbols from string - Stack Overflow] | |||
== Highlight search query keywords on resulting pages == | |||
Returned result: Show 10 characters before or after the search keywords. (cf: Total 130 ~ 240 characters on Google resulting pages.) | |||
=== MySQL approach === | |||
==== SQL syntax ==== | |||
Input search keywords, and returned the the first occurrence of matched paragraph. Using MySQL [http://www.w3resource.com/mysql/string-functions/mysql-substring-function.php SUBSTRING() function], [http://www.w3resource.com/mysql/string-functions/mysql-position-function.php POSITION() function] & [http://www.w3resource.com/mysql/string-functions/mysql-char_length-function.php CHAR_LENGTH() function]. | |||
<pre> | |||
SET @term := "吸星大法"; | |||
SET @message := "笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。 | |||
原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html"; | |||
SELECT | |||
@message | |||
, CASE | |||
WHEN POSITION(@term IN @message) > 0 THEN SUBSTRING(@message | |||
, IF( | |||
POSITION(@term IN @message) > 0 && | |||
POSITION(@term IN @message) -10 < 0 | |||
, 1 | |||
, POSITION(@term IN @message) -10) | |||
, CHAR_LENGTH(@term) + 20 | |||
) | |||
ELSE '' | |||
END AS `scrapbook` | |||
-- Returned result of scrapbook column: Show 10 characters before or after the search keywords. | |||
-- 行比武時,以此功對付吸星大法,使其全身凍僵、天池 | |||
</pre> | |||
Run on [http://sqlfiddle.com/#!9/096df3/5/0 sqlfiddle] | |||
==== Instruction of SQL syntax ==== | |||
(1) [https://www.w3resource.com/mysql/string-functions/mysql-position-function.php MySQL POSITION() function - w3resource] "MySQL POSITION() returns the position of a substring within a string." | |||
<pre> | |||
SET @term := "吸星大法"; | |||
SET @message := "笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。 | |||
原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html"; | |||
SELECT POSITION(@term IN @message) | |||
-- > returns 46 | |||
</pre> | |||
(2) Avoid the the start position is 0 or negative. Minimum start position of each paragraph is 1. | |||
<pre> | |||
SELECT IF( | |||
POSITION(@term IN @message) > 0 && | |||
POSITION(@term IN @message) -10 < 0 | |||
, 1 | |||
, POSITION(@term IN @message) -10) | |||
-- > returns 36 = 46 - 10 | |||
</pre> | |||
(3) Show 10 characters before or after the search keywords. [https://www.w3resource.com/mysql/string-functions/mysql-substring-function.php MySQL SUBSTRING() function - w3resource]"returns a specified number of characters from a particular position of a given string." | |||
<pre> | |||
SELECT | |||
@message | |||
, CASE | |||
WHEN POSITION(@term IN @message) > 0 THEN SUBSTRING(@message | |||
, IF( | |||
POSITION(@term IN @message) > 0 && | |||
POSITION(@term IN @message) -10 < 0 | |||
, 1 | |||
, POSITION(@term IN @message) -10) | |||
, CHAR_LENGTH(@term) + 20 | |||
) | |||
ELSE '' | |||
END AS `scrapbook`; | |||
-- > returns 行比武時,以此功對付吸星大法,使其全身凍僵、天池 | |||
</pre> | |||
<pre> | |||
SET @term := "吸星大法"; | |||
SET @message := "原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html"; | |||
SELECT | |||
@message | |||
, CASE | |||
WHEN POSITION(@term IN @message) > 0 THEN SUBSTRING(@message | |||
, IF( | |||
POSITION(@term IN @message) > 0 && | |||
POSITION(@term IN @message) -10 < 0 | |||
, 1 | |||
, POSITION(@term IN @message) -10) | |||
, CHAR_LENGTH(@term) + 20 | |||
) | |||
ELSE '' | |||
END AS `scrapbook` | |||
-- Returned result of scrapbook column: Show 10 characters before or after the search keywords. | |||
-- [EMPTY] | |||
</pre> | |||
=== Google sheet approach === | |||
Using [https://support.google.com/docs/answer/3098244?hl=zh-Hant REGEXEXTRACT] function {{exclaim}} case-sensitive!: | |||
<table border="1"> | |||
<tr> | |||
<td></td> | |||
<td>A</td> | |||
<td>B</td> | |||
</tr> | |||
<tr> | |||
<td>1</td> | |||
<td>文章</td> | |||
<td>笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。 | |||
原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html</td> | |||
</tr> | |||
<tr> | |||
<td>2</td> | |||
<td>關鍵字</td> | |||
<td>吸星大法</td> | |||
</tr> | |||
<tr> | |||
<td>3</td> | |||
<td>搜尋結果摘要</td> | |||
<td>{{kbd | key=<nowiki>=IF(ISERROR(REGEXEXTRACT(LOWER(B1), "(.{10}"&B2&".{10})")), "", REGEXEXTRACT(LOWER(B1), "(.{10}"&B2&".{10})"))</nowiki>}}</td> | |||
</tr> | |||
</table> | |||
1. English Keyword Version - "AI agent/agents" | |||
Create a Google Sheets formula that suggests a title by extracting text leading up to the "AI agent" mention. {{exclaim}} case-insensitive!: | |||
<pre> | |||
=IF( | |||
REGEXMATCH(A2, "(?i)\bAI\s*agents?\b"), | |||
REGEXEXTRACT( | |||
A2, | |||
".{0,10}(?i)\bAI\s*agents?\b.{0,10}" | |||
)&" ...", | |||
"" | |||
) | |||
</pre> | |||
Here's a breakdown of the Google Sheets formula that extracts excerpts containing "AI agent" or "AI agents": | |||
The formula has two main parts: | |||
# REGEXMATCH to check if the phrase exists | |||
# REGEXEXTRACT to get the surrounding context if found | |||
Pattern explanation: | |||
* `(?i)` makes the match case-insensitive | |||
* `\b` ensures word boundaries | |||
* `\s*` allows any number of spaces | |||
* `s?` makes the 's' optional (matches both singular and plural) | |||
The formula will: | |||
* Search for "AI agent" or "AI agents" in cell A2 | |||
* If found, extract up to 10 characters before and after the match | |||
* Add "..." to indicate truncation | |||
* Return empty string if no match | |||
Will match: | |||
* "AI agent" | |||
* "AI agents" | |||
* "ai Agent" | |||
* "Ai AGENTS" | |||
* "The AI agent is" | |||
* "Multiple AI agents are" | |||
Won't match: | |||
* "AIagent" | |||
* "AImagent" | |||
* "AI agentify" | |||
2. Chinese Keyword Version - "AI代理" or "AI 代理" | |||
Create a Google Sheets formula that suggests a title by extracting text containing "AI代理". {{exclaim}} case-insensitive!: | |||
<pre> | |||
=IF( | |||
REGEXMATCH(A2, "(?i)\bAI\s*代理"), | |||
REGEXEXTRACT( | |||
A2, | |||
".{0,10}(?i)\bAI\s*代理.{0,10}" | |||
)&" ...", | |||
"" | |||
) | |||
</pre> | |||
Here's a breakdown of the Google Sheets formula that extracts excerpts containing "AI代理": | |||
The formula has two main parts: | |||
# REGEXMATCH to check if the phrase exists | |||
# REGEXEXTRACT to get the surrounding context if found | |||
Pattern explanation: | |||
* `(?i)` makes the match case-insensitive (affects the "AI" part) | |||
* `\b` ensures word boundary before "AI" | |||
* `\s*` allows any number of spaces between "AI" and "代理" | |||
The formula will: | |||
* Search for "AI代理" or "AI 代理" in cell A2 | |||
* If found, extract up to 10 characters before and after the match | |||
* Add "..." to indicate truncation | |||
* Return empty string if no match | |||
Will match: | |||
* "AI代理" | |||
* "AI 代理" | |||
* "ai代理" | |||
* "ai 代理" | |||
* "This is AI代理 system" | |||
* "About AI 代理 research" | |||
Won't match: | |||
* "AI代理人" (AI agent person) | |||
* "智能代理" (Intelligent agent) | |||
* "代理AI" (Agent AI) | |||
=== Microsoft Spreadsheet approach === | |||
Using [https://support.office.com/zh-tw/article/FIND%E3%80%81FINDB-%E5%87%BD%E6%95%B8-c7912941-af2a-4bdf-a553-d0d89b0a0628 FIND], [https://support.office.com/zh-tw/article/MID%E3%80%81MIDB-%E5%87%BD%E6%95%B8-d5f9e25c-d7d6-472e-b568-4ecb12433028 MID] & [https://support.office.com/zh-tw/article/CONCATENATE-%E5%87%BD%E6%95%B8-8f8ae884-2ca8-4f7a-b093-75d702bea31d CONCATENATE] functions. {{exclaim}} FIND function is case-sensitive! | |||
<table border="1"> | |||
<tr> | |||
<td></td> | |||
<td>A</td> | |||
<td>B</td> | |||
</tr> | |||
<tr> | |||
<td>1</td> | |||
<td>文章</td> | |||
<td>笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。 | |||
原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html</td> | |||
</tr> | |||
<tr> | |||
<td>2</td> | |||
<td>關鍵字</td> | |||
<td>吸星大法</td> | |||
</tr> | |||
<tr> | |||
<td>3</td> | |||
<td>搜尋結果摘要</td> | |||
<td>{{kbd | key=<nowiki>=IF(ISERROR(FIND(B2, B1)), "", CONCATENATE(MID(B1, IF(FIND(B2, B1)-10 >= 1, FIND(B2, B1)-10, 1), 10), MID(B1, FIND(B2, B1), 10+LEN(B2))))</nowiki>}}</td> | |||
</tr> | |||
</table> | |||
[https://docs.google.com/spreadsheets/d/1ij-50vYqRXJwM71OEWXZHJzrkYfpZYCK-0MFJ3jvY1E/edit?usp=sharing Try it online] | |||
=== PHP approach === | |||
PHP solution: [https://stackoverflow.com/questions/2757556/highlight-multiple-keywords-in-search php - highlight multiple keywords in search - Stack Overflow] ''Unverified'' | |||
== Ranking factors == | |||
Possibile factors | |||
* Google [https://en.wikipedia.org/wiki/PageRank PageRank - Wikipedia] | |||
* [http://www.w3resource.com/mysql/string-functions/mysql-position-function.php MySQL POSITION() function - w3resource] / [http://www.w3resource.com/mysql/string-functions/mysql-length-function.php MySQL LENGTH() function - w3resource] where the keywords located. | |||
== Related articles == | |||
to explore strange new worlds / related articles: | to explore strange new worlds / related articles: | ||
* [http://dev.mysql.com/doc/refman/5.1/en/regexp.html MySQL :: MySQL 5.1 Reference Manual :: 12.5.2 Regular Expressions] | * [http://dev.mysql.com/doc/refman/5.1/en/regexp.html MySQL :: MySQL 5.1 Reference Manual :: 12.5.2 Regular Expressions] | ||
| Line 34: | Line 317: | ||
* [https://search.yahoo.com/search/options?fr=fp-top&p= Yahoo Advanced Web Search] | * [https://search.yahoo.com/search/options?fr=fp-top&p= Yahoo Advanced Web Search] | ||
* [http://onlinehelp.microsoft.com/en-us/bing/ff808438.aspx Advanced search options] of bing | * [http://onlinehelp.microsoft.com/en-us/bing/ff808438.aspx Advanced search options] of bing | ||
* [http://errerrors.blogspot.com/2016/10/excel.html 在 Excel 或 Google 試算表中,布林搜尋多個關鍵字] | |||
* [https://blog.longwin.com.tw/2012/07/mysql-fulltext-search-howto-2012/ MySQL Fulltext Search 使用方式 | Tsung's Blog] 只支援英文 | |||
other search cases: if the column ... (inspired by [http://www.outwit.com/ OutWit]) | other search cases: if the column ... (inspired by [http://www.outwit.com/ OutWit]) | ||
| Line 45: | Line 330: | ||
* does not equal ____ | * does not equal ____ | ||
== References == | |||
<references/> | <references/> | ||
[[Category:Regular expression]] | [[Category:Regular expression]] | ||
| Line 66: | Line 342: | ||
[[Category:Search]] | [[Category:Search]] | ||
[[Category:Data Science]] | [[Category:Data Science]] | ||
[[Category: Revised with LLMs]] | |||
Latest revision as of 14:28, 20 December 2024
AND[edit]
Google search: keyword1 keyword2 same as keyword1 AND keyword2 or keyword1 +keyword2.
(1) The following is exact words search. (2) Replace column_name with your column name
- Google: 易筋經 AND 吸星大法
- MySQL: column_name REGEXP '易筋經' AND column_name REGEXP '吸星大法'
- MySQL: column_name LIKE '%易筋經%' AND column_name LIKE '%吸星大法%' (online demo[1])
- MySQL: IF(LOCATE('易筋經', column_name) > 0) AND IF(LOCATE('吸星大法, column_name) > 0)
- MySQL: column_name LIKE '%易筋經%吸星大法%' AND column_name LIKE '%吸星大法%易筋經%'
Trivial for multiple keywords
OR[edit]
Google search: keyword1 OR keyword2
- Google: 易筋經 OR 吸星大法
- MySQL: column_name REGEXP '易筋經' OR column_name REGEXP '吸星大法'
- MySQL: IF(LOCATE('易筋經', column_name) > 0) OR IF(LOCATE('吸星大法, column_name) > 0)
- MySQL: column_name LIKE '%易筋經%' OR column_name LIKE '%吸星大法%' (online demo[2])
NOT[edit]
Google search: keyword1 NOT keyword2 same as keyword1 -keyword2
- Google: 易筋經 NOT 吸星大法
- MySQL: column_name REGEXP '易筋經' AND column_name NOT REGEXP '吸星大法' (online demo)
- MySQL: IF(LOCATE('易筋經', column_name) > 0) AND IF(LOCATE('吸星大法, column_name) = 0)
- MySQL: column_name LIKE '%易筋經%' AND column_name NOT LIKE '%吸星大法%'
* wildcard operator[edit]
Google * wildcard operator. "Use *, an asterisk character, known as a wildcard, to match one or more words in a phrase" [1] (online demo)
- Google: 狐狸*叫
- MySQL: column_name LIKE '狐狸%叫'[2]
English issue[edit]
When the keyword is short and written in English e.g. AI, the query result using column_name LIKE '%AI%' may NOT what you want e.g. Tainan, main, hair and so on.
- (1) Remove all non-alpha-numeric-characters[3] (2) REGEXP word boundaries[4] e.g. (REPLACE(CONVERT(column_name USING ascii), '?', ' ') REGEXP '([[:<:]])AI([[:>:]])')
Cited from MySQL :: MySQL 5.7 Reference Manual :: 12.5.2 Regular Expressions
[[:<:]], [[:>:]] These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
Ignore special characters[edit]
Ignore return symbol and span tag
- Example:
- Searched the keywords e.g. "意法" site:ptt.cc on Google and found the search result contains 意 & 法 located in the nearest but different rows. 意 is at the end of the n-th row. 法 is at the beginning of n+1-th row [3].
- Approach: (1) remove the html tag (2) remove the return symbol (Carriage return).
Ignore white spaces, Halfwidth and fullwidth symbol (半形字元和全形字元)
- Examples:
- Searched the keywords e.g. "嗎有" on Google and found the search result contains 嗎? 有 & 嗎- 有.
- Searched the keywords e.g. "人物誌Persona" on Google and found the search result contains 人物誌(Persona), 人物誌(Persona) & 「人物誌」(persona).
- Approach: (1) remove the space symbol (2) remove the Halfwidth and fullwidth symbol.
- References: PHP remove symbols from string - Stack Overflow
Highlight search query keywords on resulting pages[edit]
Returned result: Show 10 characters before or after the search keywords. (cf: Total 130 ~ 240 characters on Google resulting pages.)
MySQL approach[edit]
SQL syntax[edit]
Input search keywords, and returned the the first occurrence of matched paragraph. Using MySQL SUBSTRING() function, POSITION() function & CHAR_LENGTH() function.
SET @term := "吸星大法";
SET @message := "笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。
原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html";
SELECT
@message
, CASE
WHEN POSITION(@term IN @message) > 0 THEN SUBSTRING(@message
, IF(
POSITION(@term IN @message) > 0 &&
POSITION(@term IN @message) -10 < 0
, 1
, POSITION(@term IN @message) -10)
, CHAR_LENGTH(@term) + 20
)
ELSE ''
END AS `scrapbook`
-- Returned result of scrapbook column: Show 10 characters before or after the search keywords.
-- 行比武時,以此功對付吸星大法,使其全身凍僵、天池
Run on sqlfiddle
Instruction of SQL syntax[edit]
(1) MySQL POSITION() function - w3resource "MySQL POSITION() returns the position of a substring within a string."
SET @term := "吸星大法"; SET @message := "笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。 原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html"; SELECT POSITION(@term IN @message) -- > returns 46
(2) Avoid the the start position is 0 or negative. Minimum start position of each paragraph is 1.
SELECT IF(
POSITION(@term IN @message) > 0 &&
POSITION(@term IN @message) -10 < 0
, 1
, POSITION(@term IN @message) -10)
-- > returns 36 = 46 - 10
(3) Show 10 characters before or after the search keywords. MySQL SUBSTRING() function - w3resource"returns a specified number of characters from a particular position of a given string."
SELECT
@message
, CASE
WHEN POSITION(@term IN @message) > 0 THEN SUBSTRING(@message
, IF(
POSITION(@term IN @message) > 0 &&
POSITION(@term IN @message) -10 < 0
, 1
, POSITION(@term IN @message) -10)
, CHAR_LENGTH(@term) + 20
)
ELSE ''
END AS `scrapbook`;
-- > returns 行比武時,以此功對付吸星大法,使其全身凍僵、天池
SET @term := "吸星大法";
SET @message := "原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html";
SELECT
@message
, CASE
WHEN POSITION(@term IN @message) > 0 THEN SUBSTRING(@message
, IF(
POSITION(@term IN @message) > 0 &&
POSITION(@term IN @message) -10 < 0
, 1
, POSITION(@term IN @message) -10)
, CHAR_LENGTH(@term) + 20
)
ELSE ''
END AS `scrapbook`
-- Returned result of scrapbook column: Show 10 characters before or after the search keywords.
-- [EMPTY]
Google sheet approach[edit]
Using REGEXEXTRACT function
case-sensitive!:
| A | B | |
| 1 | 文章 | 笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。 原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html |
| 2 | 關鍵字 | 吸星大法 |
| 3 | 搜尋結果摘要 | =IF(ISERROR(REGEXEXTRACT(LOWER(B1), "(.{10}"&B2&".{10})")), "", REGEXEXTRACT(LOWER(B1), "(.{10}"&B2&".{10})")) |
1. English Keyword Version - "AI agent/agents"
Create a Google Sheets formula that suggests a title by extracting text leading up to the "AI agent" mention.
case-insensitive!:
=IF(
REGEXMATCH(A2, "(?i)\bAI\s*agents?\b"),
REGEXEXTRACT(
A2,
".{0,10}(?i)\bAI\s*agents?\b.{0,10}"
)&" ...",
""
)
Here's a breakdown of the Google Sheets formula that extracts excerpts containing "AI agent" or "AI agents":
The formula has two main parts:
- REGEXMATCH to check if the phrase exists
- REGEXEXTRACT to get the surrounding context if found
Pattern explanation:
- `(?i)` makes the match case-insensitive
- `\b` ensures word boundaries
- `\s*` allows any number of spaces
- `s?` makes the 's' optional (matches both singular and plural)
The formula will:
- Search for "AI agent" or "AI agents" in cell A2
- If found, extract up to 10 characters before and after the match
- Add "..." to indicate truncation
- Return empty string if no match
Will match:
- "AI agent"
- "AI agents"
- "ai Agent"
- "Ai AGENTS"
- "The AI agent is"
- "Multiple AI agents are"
Won't match:
- "AIagent"
- "AImagent"
- "AI agentify"
2. Chinese Keyword Version - "AI代理" or "AI 代理"
Create a Google Sheets formula that suggests a title by extracting text containing "AI代理".
case-insensitive!:
=IF(
REGEXMATCH(A2, "(?i)\bAI\s*代理"),
REGEXEXTRACT(
A2,
".{0,10}(?i)\bAI\s*代理.{0,10}"
)&" ...",
""
)
Here's a breakdown of the Google Sheets formula that extracts excerpts containing "AI代理":
The formula has two main parts:
- REGEXMATCH to check if the phrase exists
- REGEXEXTRACT to get the surrounding context if found
Pattern explanation:
- `(?i)` makes the match case-insensitive (affects the "AI" part)
- `\b` ensures word boundary before "AI"
- `\s*` allows any number of spaces between "AI" and "代理"
The formula will:
- Search for "AI代理" or "AI 代理" in cell A2
- If found, extract up to 10 characters before and after the match
- Add "..." to indicate truncation
- Return empty string if no match
Will match:
- "AI代理"
- "AI 代理"
- "ai代理"
- "ai 代理"
- "This is AI代理 system"
- "About AI 代理 research"
Won't match:
- "AI代理人" (AI agent person)
- "智能代理" (Intelligent agent)
- "代理AI" (Agent AI)
Microsoft Spreadsheet approach[edit]
Using FIND, MID & CONCATENATE functions.
FIND function is case-sensitive!
| A | B | |
| 1 | 文章 | 笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。 原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html |
| 2 | 關鍵字 | 吸星大法 |
| 3 | 搜尋結果摘要 | =IF(ISERROR(FIND(B2, B1)), "", CONCATENATE(MID(B1, IF(FIND(B2, B1)-10 >= 1, FIND(B2, B1)-10, 1), 10), MID(B1, FIND(B2, B1), 10+LEN(B2)))) |
PHP approach[edit]
PHP solution: php - highlight multiple keywords in search - Stack Overflow Unverified
Ranking factors[edit]
Possibile factors
- Google PageRank - Wikipedia
- MySQL POSITION() function - w3resource / MySQL LENGTH() function - w3resource where the keywords located.
Related articles[edit]
to explore strange new worlds / related articles:
- MySQL :: MySQL 5.1 Reference Manual :: 12.5.2 Regular Expressions
- regex - MySQL REGEXP word boundaries and double quotes - Stack Overflow keyin column REGEXP "[[:<:]]word[[:>:]]"
- Google 進階搜尋, Punctuation, symbols & operators in search - Search Help
- Yahoo Advanced Web Search
- Advanced search options of bing
- 在 Excel 或 Google 試算表中,布林搜尋多個關鍵字
- MySQL Fulltext Search 使用方式 | Tsung's Blog 只支援英文
other search cases: if the column ... (inspired by OutWit)
- contains ____
- does not contain ____
- begins with ____
- does not begins with ____
- ends with ____
- does not ends with ____
- equals to ____
- does not equal ____
References[edit]
- ↑ Google's * Wildcard Operator - Google Guide
- ↑ Using MySQL LIKE Operator to Select Data Based on Pattern Matching
- ↑ mysql - How to remove all non-alpha numeric characters from a string? - Stack Overflow
- ↑ regex - MySQL REGEXP word boundaries [[:<:]] [[:>:]] and double quotes - Stack Overflow https://stackoverflow.com/questions/18901704/mysql-regexp-word-boundaries-and-double-quotes