MySQL full text search equivalents to Google search: Difference between revisions

From LemonWiki共筆
Jump to navigation Jump to search
(→‎References or related articles: update RSS feed URL)
 
(18 intermediate revisions by the same user not shown)
Line 38: Line 38:
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).
</pre>
</pre>
教學文章:[https://errerrors.blogspot.com/2021/01/how-to-find-abbreviations-from-article-written-in-english-and-chinese-in-mysql.html 解決簡短英文單字的 MySQL 查詢:搜尋 app 而不是 apple]


== Ignore special characters ==
== Ignore special characters ==
Line 45: Line 47:
* Approach: (1) remove the html tag (2) remove the return symbol ([https://en.wikipedia.org/wiki/Carriage_return Carriage return]).
* Approach: (1) remove the html tag (2) remove the return symbol ([https://en.wikipedia.org/wiki/Carriage_return Carriage return]).


Ignore space, [https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms Halfwidth and fullwidth symbol] (半形字元和全形字元)
Ignore white spaces, [https://en.wikipedia.org/wiki/Halfwidth_and_fullwidth_forms Halfwidth and fullwidth symbol] (半形字元和全形字元)
* Examples:  
* Examples:  
** Searched the keywords e.g. {{kbd | key = <nowiki>"嗎有"</nowiki>}} on Google and found the search result contains {{kbd | key = <nowiki>嗎? 有</nowiki>}} & {{kbd | key = <nowiki>嗎- 有</nowiki>}}.
** Searched the keywords e.g. {{kbd | key = <nowiki>"嗎有"</nowiki>}} on Google and found the search result contains {{kbd | key = <nowiki>嗎? 有</nowiki>}} & {{kbd | key = <nowiki>嗎- 有</nowiki>}}.
Line 56: Line 58:


=== MySQL approach ===
=== MySQL approach ===
Input search keywords, and returned the matched paragraph. Using MySQL [http://www.w3resource.com/mysql/string-functions/mysql-substring-function.php SUBSTRING() function], [http://www.w3resource.com/mysql/string-functions/mysql-position-function.php POSITION() function] & [http://www.w3resource.com/mysql/string-functions/mysql-char_length-function.php CHAR_LENGTH() function].
==== SQL syntax ====
Input search keywords, and returned the the first occurrence of matched paragraph. Using MySQL [http://www.w3resource.com/mysql/string-functions/mysql-substring-function.php SUBSTRING() function], [http://www.w3resource.com/mysql/string-functions/mysql-position-function.php POSITION() function] & [http://www.w3resource.com/mysql/string-functions/mysql-char_length-function.php CHAR_LENGTH() function].


<pre>
<pre>
Line 79: Line 82:
       )
       )
   ELSE ''
   ELSE ''
END AS "scrapbook"
END AS `scrapbook`


-- Returned result of scrapbook column: Show 10 characters before or after the search keywords.
-- Returned result of scrapbook column: Show 10 characters before or after the search keywords.
Line 85: Line 88:
</pre>
</pre>


Run on [http://sqlfiddle.com/#!9/096df3/5/0 sqlfiddle]
==== Instruction of SQL syntax ====
(1) [https://www.w3resource.com/mysql/string-functions/mysql-position-function.php MySQL POSITION() function - w3resource] "MySQL POSITION() returns the position of a substring within a string."
<pre>
SET @term := "吸星大法";
SET @message := "笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。
原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html";
SELECT POSITION(@term IN @message)
-- > returns 46
</pre>
(2) Avoid the the start position is 0 or negative. Minimum start position of each paragraph is 1.
<pre>
SELECT IF(
            POSITION(@term IN @message) > 0 &&
            POSITION(@term IN @message) -10 < 0
            , 1
            , POSITION(@term IN @message) -10)
-- > returns 36 = 46 - 10
</pre>
(3) Show 10 characters before or after the search keywords. [https://www.w3resource.com/mysql/string-functions/mysql-substring-function.php MySQL SUBSTRING() function - w3resource]"returns a specified number of characters from a particular position of a given string."
<pre>
SELECT
@message
, CASE
  WHEN POSITION(@term IN @message) > 0 THEN SUBSTRING(@message
        , IF(
            POSITION(@term IN @message) > 0 &&
            POSITION(@term IN @message) -10 < 0
            , 1
            , POSITION(@term IN @message) -10)
        , CHAR_LENGTH(@term) + 20
      )
  ELSE ''
END AS `scrapbook`;
-- > returns 行比武時,以此功對付吸星大法,使其全身凍僵、天池
</pre>


<pre>
<pre>
Line 104: Line 152:
       )
       )
   ELSE ''
   ELSE ''
END AS "scrapbook"
END AS `scrapbook`


-- Returned result of scrapbook column: Show 10 characters before or after the search keywords.
-- Returned result of scrapbook column: Show 10 characters before or after the search keywords.
Line 136: Line 184:
   </tr>
   </tr>
</table>
</table>
1. English Keyword Version - "AI agent/agents"
Create a Google Sheets formula that suggests a title by extracting text leading up to the "AI agent" mention. {{exclaim}} case-insensitive!:
<pre>
=IF(
  REGEXMATCH(A2, "(?i)\bAI\s*agents?\b"),
  REGEXEXTRACT(
    A2,
    ".{0,10}(?i)\bAI\s*agents?\b.{0,10}"
  )&" ...",
  ""
)
</pre>
Here's a breakdown of the Google Sheets formula that extracts excerpts containing "AI agent" or "AI agents":
The formula has two main parts:
# REGEXMATCH to check if the phrase exists
# REGEXEXTRACT to get the surrounding context if found
Pattern explanation:
* `(?i)` makes the match case-insensitive
* `\b` ensures word boundaries
* `\s*` allows any number of spaces
* `s?` makes the 's' optional (matches both singular and plural)
The formula will:
* Search for "AI agent" or "AI agents" in cell A2
* If found, extract up to 10 characters before and after the match
* Add "..." to indicate truncation
* Return empty string if no match
Will match:
* "AI agent"
* "AI agents"
* "ai Agent"
* "Ai AGENTS"
* "The AI agent is"
* "Multiple AI agents are"
Won't match:
* "AIagent"
* "AImagent"
* "AI agentify"
2. Chinese Keyword Version - "AI代理" or "AI 代理"
Create a Google Sheets formula that suggests a title by extracting text containing "AI代理". {{exclaim}} case-insensitive!:
<pre>
=IF(
  REGEXMATCH(A2, "(?i)\bAI\s*代理"),
  REGEXEXTRACT(
    A2,
    ".{0,10}(?i)\bAI\s*代理.{0,10}"
  )&" ...",
  ""
)
</pre>
Here's a breakdown of the Google Sheets formula that extracts excerpts containing "AI代理":
The formula has two main parts:
# REGEXMATCH to check if the phrase exists
# REGEXEXTRACT to get the surrounding context if found
Pattern explanation:
* `(?i)` makes the match case-insensitive (affects the "AI" part)
* `\b` ensures word boundary before "AI"
* `\s*` allows any number of spaces between "AI" and "代理"
The formula will:
* Search for "AI代理" or "AI 代理" in cell A2
* If found, extract up to 10 characters before and after the match
* Add "..." to indicate truncation
* Return empty string if no match
Will match:
* "AI代理"
* "AI 代理"
* "ai代理"
* "ai 代理"
* "This is AI代理 system"
* "About AI 代理 research"
Won't match:
* "AI代理人" (AI agent person)
* "智能代理" (Intelligent agent)
* "代理AI" (Agent AI)


=== Microsoft  Spreadsheet approach ===
=== Microsoft  Spreadsheet approach ===
Line 163: Line 299:
   </tr>
   </tr>
</table>
</table>
[https://docs.google.com/spreadsheets/d/1ij-50vYqRXJwM71OEWXZHJzrkYfpZYCK-0MFJ3jvY1E/edit?usp=sharing Try it online]


=== PHP approach ===
=== PHP approach ===
Line 172: Line 310:
* [http://www.w3resource.com/mysql/string-functions/mysql-position-function.php MySQL POSITION() function - w3resource] / [http://www.w3resource.com/mysql/string-functions/mysql-length-function.php MySQL LENGTH() function - w3resource] where the keywords located.
* [http://www.w3resource.com/mysql/string-functions/mysql-position-function.php MySQL POSITION() function - w3resource] / [http://www.w3resource.com/mysql/string-functions/mysql-length-function.php MySQL LENGTH() function - w3resource] where the keywords located.


== References or related articles ==
== Related articles ==
to explore strange new worlds / related articles:
to explore strange new worlds / related articles:
* [http://dev.mysql.com/doc/refman/5.1/en/regexp.html MySQL :: MySQL 5.1 Reference Manual :: 12.5.2 Regular Expressions]
* [http://dev.mysql.com/doc/refman/5.1/en/regexp.html MySQL :: MySQL 5.1 Reference Manual :: 12.5.2 Regular Expressions]
Line 179: Line 317:
* [https://search.yahoo.com/search/options?fr=fp-top&p= Yahoo Advanced Web Search]
* [https://search.yahoo.com/search/options?fr=fp-top&p= Yahoo Advanced Web Search]
* [http://onlinehelp.microsoft.com/en-us/bing/ff808438.aspx Advanced search options] of bing
* [http://onlinehelp.microsoft.com/en-us/bing/ff808438.aspx Advanced search options] of bing
* [http://errerrors.blogspot.tw/2016/10/excel.html 在 Excel 或 Google 試算表中,布林搜尋多個關鍵字]
* [http://errerrors.blogspot.com/2016/10/excel.html 在 Excel 或 Google 試算表中,布林搜尋多個關鍵字]
* [https://blog.longwin.com.tw/2012/07/mysql-fulltext-search-howto-2012/ MySQL Fulltext Search 使用方式 | Tsung's Blog] 只支援英文


other search cases: if the column ... (inspired by [http://www.outwit.com/ OutWit])
other search cases: if the column ... (inspired by [http://www.outwit.com/ OutWit])
Line 191: Line 330:
* does not equal  ____
* does not equal  ____


references
== References ==
<references/>
<references/>
 
 
== Related news ==
 
{{News feed | title = MySQL OR nosql related news | feed = <rss>https://news.google.com/news/rss/search/section/q/MySQL%20OR%20nosql/MySQL%20OR%20nosql?hl=zh-TW&gl=TW&ned=tw</rss>
}}




Line 209: Line 342:
[[Category:Search]]
[[Category:Search]]
[[Category:Data Science]]
[[Category:Data Science]]
[[Category: Revised with LLMs]]

Latest revision as of 14:28, 20 December 2024

AND[edit]

Google search: keyword1 keyword2 same as keyword1 AND keyword2 or keyword1 +keyword2. Icon_exclaim.gif (1) The following is exact words search. (2) Replace column_name with your column name

  • Google: 易筋經 AND 吸星大法
  • MySQL: column_name REGEXP '易筋經' AND column_name REGEXP '吸星大法'
  • MySQL: column_name LIKE '%易筋經%' AND column_name LIKE '%吸星大法%' (online demo[1])
  • MySQL: IF(LOCATE('易筋經', column_name) > 0) AND IF(LOCATE('吸星大法, column_name) > 0)
  • MySQL: column_name LIKE '%易筋經%吸星大法%' AND column_name LIKE '%吸星大法%易筋經%' Icon_exclaim.gif Trivial for multiple keywords

OR[edit]

Google search: keyword1 OR keyword2

  • Google: 易筋經 OR 吸星大法
  • MySQL: column_name REGEXP '易筋經' OR column_name REGEXP '吸星大法'
  • MySQL: IF(LOCATE('易筋經', column_name) > 0) OR IF(LOCATE('吸星大法, column_name) > 0)
  • MySQL: column_name LIKE '%易筋經%' OR column_name LIKE '%吸星大法%' (online demo[2])

NOT[edit]

Google search: keyword1 NOT keyword2 same as keyword1 -keyword2

  • Google: 易筋經 NOT 吸星大法
  • MySQL: column_name REGEXP '易筋經' AND column_name NOT REGEXP '吸星大法' (online demo)
  • MySQL: IF(LOCATE('易筋經', column_name) > 0) AND IF(LOCATE('吸星大法, column_name) = 0)
  • MySQL: column_name LIKE '%易筋經%' AND column_name NOT LIKE '%吸星大法%'

* wildcard operator[edit]

Google * wildcard operator. "Use *, an asterisk character, known as a wildcard, to match one or more words in a phrase" [1] (online demo)

  • Google: 狐狸*叫
  • MySQL: column_name LIKE '狐狸%叫'[2]

English issue[edit]

When the keyword is short and written in English e.g. AI, the query result using column_name LIKE '%AI%' may NOT what you want e.g. Tainan, main, hair and so on.

  • (1) Remove all non-alpha-numeric-characters[3] (2) REGEXP word boundaries[4] e.g. (REPLACE(CONVERT(column_name USING ascii), '?', ' ') REGEXP '([[:<:]])AI([[:>:]])')


Cited from MySQL :: MySQL 5.7 Reference Manual :: 12.5.2 Regular Expressions

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

教學文章:解決簡短英文單字的 MySQL 查詢:搜尋 app 而不是 apple

Ignore special characters[edit]

Ignore return symbol and span tag

  • Example:
    • Searched the keywords e.g. "意法" site:ptt.cc on Google and found the search result contains 意 & 法 located in the nearest but different rows. 意 is at the end of the n-th row. 法 is at the beginning of n+1-th row [3].
  • Approach: (1) remove the html tag (2) remove the return symbol (Carriage return).

Ignore white spaces, Halfwidth and fullwidth symbol (半形字元和全形字元)

  • Examples:
    • Searched the keywords e.g. "嗎有" on Google and found the search result contains 嗎? 有 & 嗎- 有.
    • Searched the keywords e.g. "人物誌Persona" on Google and found the search result contains 人物誌(Persona), 人物誌(Persona) & 「人物誌」(persona).
  • Approach: (1) remove the space symbol (2) remove the Halfwidth and fullwidth symbol.
  • References: PHP remove symbols from string - Stack Overflow

Highlight search query keywords on resulting pages[edit]

Returned result: Show 10 characters before or after the search keywords. (cf: Total 130 ~ 240 characters on Google resulting pages.)

MySQL approach[edit]

SQL syntax[edit]

Input search keywords, and returned the the first occurrence of matched paragraph. Using MySQL SUBSTRING() function, POSITION() function & CHAR_LENGTH() function.


SET @term := "吸星大法";
SET @message := "笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。

原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html";


SELECT 
@message

, CASE
  WHEN POSITION(@term IN @message) > 0 THEN SUBSTRING(@message
        , IF(
            POSITION(@term IN @message) > 0 &&
            POSITION(@term IN @message) -10 < 0
            , 1
            , POSITION(@term IN @message) -10)
        , CHAR_LENGTH(@term) + 20
      )
  ELSE ''
END AS `scrapbook`

-- Returned result of scrapbook column: Show 10 characters before or after the search keywords.
-- 行比武時,以此功對付吸星大法,使其全身凍僵、天池

Run on sqlfiddle

Instruction of SQL syntax[edit]

(1) MySQL POSITION() function - w3resource "MySQL POSITION() returns the position of a substring within a string."

SET @term := "吸星大法";
SET @message := "笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。

原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html";
SELECT POSITION(@term IN @message)

-- > returns 46

(2) Avoid the the start position is 0 or negative. Minimum start position of each paragraph is 1.

SELECT IF(
            POSITION(@term IN @message) > 0 &&
            POSITION(@term IN @message) -10 < 0
            , 1
            , POSITION(@term IN @message) -10)

-- > returns 36 = 46 - 10

(3) Show 10 characters before or after the search keywords. MySQL SUBSTRING() function - w3resource"returns a specified number of characters from a particular position of a given string."


SELECT 
@message

, CASE
  WHEN POSITION(@term IN @message) > 0 THEN SUBSTRING(@message
        , IF(
            POSITION(@term IN @message) > 0 &&
            POSITION(@term IN @message) -10 < 0
            , 1
            , POSITION(@term IN @message) -10)
        , CHAR_LENGTH(@term) + 20
      )
  ELSE ''
END AS `scrapbook`;

-- > returns 行比武時,以此功對付吸星大法,使其全身凍僵、天池
SET @term := "吸星大法";
SET @message := "原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html";


SELECT 
@message

, CASE
  WHEN POSITION(@term IN @message) > 0 THEN SUBSTRING(@message
        , IF(
            POSITION(@term IN @message) > 0 &&
            POSITION(@term IN @message) -10 < 0
            , 1
            , POSITION(@term IN @message) -10)
        , CHAR_LENGTH(@term) + 20
      )
  ELSE ''
END AS `scrapbook`

-- Returned result of scrapbook column: Show 10 characters before or after the search keywords.
-- [EMPTY]

Google sheet approach[edit]

Using REGEXEXTRACT function Icon_exclaim.gif case-sensitive!:

A B
1 文章 笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。 原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html
2 關鍵字 吸星大法
3 搜尋結果摘要 =IF(ISERROR(REGEXEXTRACT(LOWER(B1), "(.{10}"&B2&".{10})")), "", REGEXEXTRACT(LOWER(B1), "(.{10}"&B2&".{10})"))


1. English Keyword Version - "AI agent/agents" Create a Google Sheets formula that suggests a title by extracting text leading up to the "AI agent" mention. Icon_exclaim.gif case-insensitive!:

=IF(
  REGEXMATCH(A2, "(?i)\bAI\s*agents?\b"),
  REGEXEXTRACT(
    A2,
    ".{0,10}(?i)\bAI\s*agents?\b.{0,10}"
  )&" ...",
  ""
)

Here's a breakdown of the Google Sheets formula that extracts excerpts containing "AI agent" or "AI agents":

The formula has two main parts:

  1. REGEXMATCH to check if the phrase exists
  2. REGEXEXTRACT to get the surrounding context if found

Pattern explanation:

  • `(?i)` makes the match case-insensitive
  • `\b` ensures word boundaries
  • `\s*` allows any number of spaces
  • `s?` makes the 's' optional (matches both singular and plural)

The formula will:

  • Search for "AI agent" or "AI agents" in cell A2
  • If found, extract up to 10 characters before and after the match
  • Add "..." to indicate truncation
  • Return empty string if no match

Will match:

  • "AI agent"
  • "AI agents"
  • "ai Agent"
  • "Ai AGENTS"
  • "The AI agent is"
  • "Multiple AI agents are"

Won't match:

  • "AIagent"
  • "AImagent"
  • "AI agentify"

2. Chinese Keyword Version - "AI代理" or "AI 代理" Create a Google Sheets formula that suggests a title by extracting text containing "AI代理". Icon_exclaim.gif case-insensitive!:

=IF(
  REGEXMATCH(A2, "(?i)\bAI\s*代理"),
  REGEXEXTRACT(
    A2,
    ".{0,10}(?i)\bAI\s*代理.{0,10}"
  )&" ...",
  ""
)

Here's a breakdown of the Google Sheets formula that extracts excerpts containing "AI代理":

The formula has two main parts:

  1. REGEXMATCH to check if the phrase exists
  2. REGEXEXTRACT to get the surrounding context if found

Pattern explanation:

  • `(?i)` makes the match case-insensitive (affects the "AI" part)
  • `\b` ensures word boundary before "AI"
  • `\s*` allows any number of spaces between "AI" and "代理"

The formula will:

  • Search for "AI代理" or "AI 代理" in cell A2
  • If found, extract up to 10 characters before and after the match
  • Add "..." to indicate truncation
  • Return empty string if no match

Will match:

  • "AI代理"
  • "AI 代理"
  • "ai代理"
  • "ai 代理"
  • "This is AI代理 system"
  • "About AI 代理 research"

Won't match:

  • "AI代理人" (AI agent person)
  • "智能代理" (Intelligent agent)
  • "代理AI" (Agent AI)

Microsoft Spreadsheet approach[edit]

Using FIND, MID & CONCATENATE functions. Icon_exclaim.gif FIND function is case-sensitive!

A B
1 文章 笑傲江湖中嵩山派掌門左冷禪所創掌法,可發出至陰至寒的真氣。左冷禪與任我行比武時,以此功對付吸星大法,使其全身凍僵、天池穴被封;與岳不群比劍奪帥時,左又使出寒冰神掌,與紫霞神功旗鼓相當、不分勝敗。 原文網址:https://kknews.cc/zh-tw/culture/xzaxbq.html
2 關鍵字 吸星大法
3 搜尋結果摘要 =IF(ISERROR(FIND(B2, B1)), "", CONCATENATE(MID(B1, IF(FIND(B2, B1)-10 >= 1, FIND(B2, B1)-10, 1), 10), MID(B1, FIND(B2, B1), 10+LEN(B2))))

Try it online

PHP approach[edit]

PHP solution: php - highlight multiple keywords in search - Stack Overflow Unverified

Ranking factors[edit]

Possibile factors

Related articles[edit]

to explore strange new worlds / related articles:

other search cases: if the column ... (inspired by OutWit)

  • contains ____
  • does not contain ____
  • begins with ____
  • does not begins with ____
  • ends with ____
  • does not ends with ____
  • equals to ____
  • does not equal ____

References[edit]