boost搜索引擎

  • 1. 项目背景
    • 1.1 搜索引擎基本原理
    • 1.2 Boost库
    • 1.3 项目的目标
  • 2. Boost搜索引擎宏观流程
  • 3. 技术栈与环境
    • 3.1 技术栈
    • 3.2 环境
  • 4. 认识什么是索引
    • 4.1 正排索引
    • 4.2 倒排索引
    • 4.3 我们如何分词?
    • 4.4 模拟查找过程
  • 5. 数据处理
    • 5.1 下载boost库到本地
    • 5.2 认识标签
    • 5.3 清除标签的整体框架
    • 5.4 EnumFile函数的实现
    • 5.5 ParseHtml函数的实现
      • 5.5.1 实现读取文件内容的ReadFile函数
      • 5.5.2 实现提取titile的函数ParseTitle
      • 5.5.3 实现提取content的函数ParseContent
      • 5.5.4 实现提取url函数ParseUrl
    • 5.6 SaveHtml函数的实现
  • 6. 建立索引
    • 6.1 jieba的安装与使用
    • 6.2 索引框架
    • 6.3 BuildIndex函数的实现
      • 6.3.1 建立正排索引函数BuildForwardIndex
      • 6.3.2 建立倒排索引函数BuildInveredIndex
    • 6.4 GetForwardIndex函数
    • 6.5 GetInvertedList函数
    • 6.6 将index设置成单例
  • 7. 搜索引擎模块
    • 7.1 InitSearcher函数
    • 7.2 Search函数
    • 7.3 jsoncpp安装与使用
    • 7.4 搜索功能的测试
    • 7.5 获取内容摘要
  • 8. 搜索服务端
    • 8.1 升级gcc版本
    • 8.2 引入cpp-httplib库
    • 8.3 测试cpp-httplib
    • 8.4 设置根目录
    • 8.5 编写搜索服务端
  • 9. 前端代码
    • 9.1 网页结构
    • 9.2 网页样式
    • 9.3 前后端交互
  • 10. 项目补充
    • 10.1 取重完善
    • 10.2 添加日志
  • 11. 项目拓展
    • 11.1 摘要完善
    • 11.2 后台部署服务
    • 11.3 其他拓展

1. 项目背景

什么是搜索引擎呢?其实我们平常使用的百度就是搜索引擎,我们把自己想要搜索的内容输入进去,百度就会给我们返回相关的内容,百度一般给我们返回哪些内容呢?如下我们先来看一下。

1.1 搜索引擎基本原理

简单的了解一下搜索引擎的基本原理:

  • 我们给服务器发起请求,例如搜索关键字”boost,服务器拿到请求之后,此时检索自己的资源,然后把结果构成响应发送给我们。

1.2 Boost库

boost库是一个经过千锤百炼、可移植、提供源代码的 C++ 库,作为标准库的后备。他的供能很强大,但是有一个小小的缺陷,它不支持搜索,例如我们想要搜索一个函数,cplusplus库是支持的。


boost库是不支持,但是不知道后面会不会支持。

1.3 项目的目标

下面我们就要说一下我们的项目的目标是什么了,很简单,我们给boost添加一个搜索的功能,这里要说一下,我们服务器上面说了,我们需要搜索资源,可以通过两个方式:

  • 搜索其他的网页资源:这里需要使用爬虫,有一定的技术要求。
  • 把boost下载下来,我们在本地搜索资源。

这里我们使用第二个方式,下载一下boost库。

2. Boost搜索引擎宏观流程

(1)整理数据

  • 把boost库下载下来,此时我们想要把所有的后缀是html的文件进行处理,也就是整理数据(去标签)。如下先看一个简单的html文件,我们需要把其中的title(标题)、content(内容)、url(链接)内容进行保存。
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> Chapter 30. Boost.Process "stylesheet" href="../../doc/src/boostbook.css" type="text/css"> <meta name="generator" content="DocBook XSL Stylesheets V1.79.1"> "home" href="index.html" title="The Boost C++ Libraries BoostBook Documentation Subset"> "up" href="libraries.html" title="Part I. The Boost C++ Libraries (BoostBook Subset)"> "prev" href="poly_collection/acknowledgments.html" title="Acknowledgments"> "next" href="boost_process/concepts.html" title="Concepts">比特就业课 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> <table cellpadding="2" width="100%"> <td valign="top">

3. 技术栈与环境

3.1 技术栈

后端:C/C++、C++11、STL、boost标准库、Jsoncpp、cppjieba、cpp-httplib。
前端:html5、css、js、jQuery、Ajax。

3.2 环境

  • Centos7远端服务器、vim、gcc(g++)、Makefile、Vscode。

4. 认识什么是索引

下面我们就要了解什么是索引?这里很简单,我们给文档上编号,我们可以根据编号找到唯一确定的文,这就是索引的基本的原理,不过这里的索引分为正排索引和倒排索引。

  • 正排索引:根据编号找到文件,这里的结果是唯一的。
  • 倒排索引:根据关键字,找到文件id,结果不是唯一的。

可以通过一个例子来理解,这里有两个文件:

4.1 正排索引

我们对每一个文件进行编:

文档ID文档名称文档内容
1文档A我的手机牌子是华为的
2文档B我的手机牌子是小米的

这里的正派索引很简单,我们根据文档编号,直接就可以找到文档的内容。

4.2 倒排索引

我们把每一个文档都进行分词,拿出来不重复的词,对于每一个不重复的词,下面都挂着我们的文档的编号。

关键字(具有唯一性)文档ID
我的1,2
手机1,2
牌子1,2
华为1
小米1

倒排索引就是根据关键字拿到我们的文档I。

4.3 我们如何分词?

上面我们说了把文档进行分词,那么我们为何要分词?分词是为了提高查找的效率,那么我们该如何分词呢?我们可以自己手动词,但是已经有大佬给我们编写好了一个库,我们直接使用就可以了,但是如果我们手动分?这里该如何分,很简单.

  • 我的手机牌子是华为的:我的/手机/牌子/是/华为/的。
  • 我的手机牌子是小米的:我的/手机/牌子/是/小米/的。

注意:上面的分词我们随意分的,不一定就是这样的,不过这里我们要谈一下我们一个提高效率的方法。我们发现一个文旦里面的了" 、“从” 、 “吗” , “the” 、 “a” 有的时候意义不是很大,那么我们在分词的时候就可以直接忽略,可以提高我们的效率,像这一种词,我们称为停止词。

4.4 模拟查找过程

下面我们模拟一下查找的流程的:

用户输入:我的 -> 倒排索引中查找 -> 提取出文档ID(1,2) -> 根据正排索引 -> 找到文档的内容 ->title+conent(desc)+url 文档结果进行摘要->构建响应结果。

5. 数据处理

5.1 下载boost库到本地

(1)我们先下载一下boost库,直接使用最新版本的,我这里是1.84.0.我们下载到桌面,然后在centos7下使用指令rz传入远端服务器当中,然后解压一下就可以了。

[xiaomaker@VM-28-13-centos data]$ rz -E[xiaomaker@VM-28-13-centos data]$ lltotal 141756-rw-r--r-- 1 xiaomaker xiaomaker 145151722 Feb 13 17:54 boost_1_84_0.tar.gz[xiaomaker@VM-28-13-centos data]$ tar xzf boost_1_84_0.tar.gz [xiaomaker@VM-28-13-centos data]$ lltotal 141760drwxr-xr-x 8 xiaomaker xiaomaker4096 Dec7 05:37 boost_1_84_0-rw-r--r-- 1 xiaomaker xiaomaker 145151722 Feb 13 17:54 boost_1_84_0.tar.gz[xiaomaker@VM-28-13-centos data]$ 

(2)查看一下boost库的内容:


这里面就是我们boost库的全部内容,为了我们的项目简单一些,这里我们使用boost里面的doc里面的html目录下的的html文件。如果我们想要搭建所有的html文件,后面可以慢慢的完善。

boost_1_84_0/doc/html
[xiaomaker@VM-28-13-centos doc]$ cd html[xiaomaker@VM-28-13-centos html]$ lltotal 3080-rw-r--r--1 xiaomaker xiaomaker 3476 Dec7 05:22 about.htmldrwxr-xr-x2 xiaomaker xiaomaker 4096 Dec7 05:23 accumulators-rw-r--r--1 xiaomaker xiaomaker 5858 Dec7 05:23 accumulators.htmldrwxr-xr-x2 xiaomaker xiaomaker 4096 Dec7 05:23 align-rw-r--r--1 xiaomaker xiaomaker 4440 Dec7 05:23 align.htmldrwxr-xr-x2 xiaomaker xiaomaker 4096 Dec7 05:23 any-rw-r--r--1 xiaomaker xiaomaker 9102 Dec7 05:23 any.htmldrwxr-xr-x3 xiaomaker xiaomaker 4096 Dec7 05:23 array-rw-r--r--1 xiaomaker xiaomaker 8377 Dec7 05:23 array.html-rw-r--r--1 xiaomaker xiaomaker36597 Dec7 05:27 array_types.html-rw-r--r--1 xiaomaker xiaomaker 288197 Dec7 05:26 asio_HTML.manifest-rw-r--r--1 xiaomaker xiaomaker 6685 Dec7 05:32 Assignable.html-rw-r--r--1 xiaomaker xiaomaker700 Dec7 05:02 atomic.html-rw-r--r--1 xiaomaker xiaomaker20627 Dec7 05:27 auxiliary.htmldrwxr-xr-x2 xiaomaker xiaomaker 4096 Dec7 05:02 bbv2-rw-r--r--1 xiaomaker xiaomaker640 Dec7 05:02 bbv2.html-rw-r--r--1 xiaomaker xiaomaker 9635 Dec7 05:32 BidirectionalIterator.htmldrwxr-xr-x 39 xiaomaker xiaomaker 4096 Dec7 05:32 boost...

(3)下面我们要做的就是就是把boost_1_84_0/doc/html里面的所有内容保存到data/input文件中。

[xiaomaker@VM-28-13-centos boost_searcher]$ mkdir data/input -p[xiaomaker@VM-28-13-centos boost_searcher]$ cp -rf ../../data/boost_1_84_0/doc/html/* data/input/

这样就成功将boost_1_84_0/doc/html的内容拷贝到了data/input当中。

接下来就可以去标签了,创建一个.cpp文件编写parser。

[xiaomaker@VM-28-13-centos boost_searcher]$ touch parser.cpp

5.2 认识标签

(1)在谈去标签之前,我们需要先认识一下标签,我们随便打开的一个html文件。

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">Chapter 45. Boost.YAP"stylesheet" href="../../doc/src/boostbook.css" type="text/css"><meta name="generator" content="DocBook XSL Stylesheets V1.79.1">"home" href="index.html" title="The Boost C++ Libraries BoostBook Documentation Subset">"up" href="libraries.html" title="Part I. The Boost C++ Libraries (BoostBook Subset)">"prev" href="xpressive/appendices.html" title="Appendices">"next" href="boost_yap/manual.html" title="Manual"><meta name="viewport" content="width=device-width, initial-scale=1"><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><table cellpadding="2" width="100%"><td valign="top">

(3)这里我们可以对每一个源html都创建一个文件,但是这样文件就非常多了,不如我们把所有的文档去标签之后结果放在一个文件中,文件与文件之间使用’\3’隔开。就像下面的格式:

XXXXXXXXXXXXXXXXX\3YYYYYYYYYYYYYYYYYYYYY\3ZZZZZZZZZZZZZZZZZZZZZZZZZ\3

解释一下为什么使用’\3’为分隔符:因为在ASCII表中,控制字符是不可显示字符,即无法打印。在我们获取的文档内容(即data/input中的html网页文件)中,里面基本上都是可打印字符,基本上不会有不可显示的控制字符。如此以来也就不会污染我们的文档内容啦。

不过我们不使用上面的格式,这里我们想办法把一个文档的’\n’全部去掉。然后我们使用这样的格式:

类似:title\3content\3url \n title\3content\3url \n title\3content\3url \n ...方便我们getline(ifsream, line),直接获取文档的全部内容:title\3content\3url

(4)我们创建一个文件来保存我们去标签之后的内容。

[xiaomaker@VM-28-13-centos data]$ cd raw_html/[xiaomaker@VM-28-13-centos raw_html]$ touch raw.txt[xiaomaker@VM-28-13-centos raw_html]$ lltotal 0-rw-rw-r--. 1 xiaomaker xiaomaker 0 Feb 14 20:17 raw.txt

5.3 清除标签的整体框架

(1)parser.cpp的简单框架:

#include #include #include #include const std::string src_path = "data/input"; //是一个目录,里面放的是所有的html网页const std::string output = "data/raw_html/raw.txt";//将上面的网页去标签后存放在该目录下typedef struct DocInfo{std::string title;//文档的标题std::string content;//文档内容std::string url;//读文档在官网中的url}DocInfo_t;//函数声明bool EnumFile(const std::string& src_path, std::vector<std::string>& file_list);bool ParseHtml(const std::vector<std::string>& file_list, std::vector<DocInfo_t>& results);bool SaveHtml(const std::vector<DocInfo_t>& results, const std::string& output);int main(){std::vector<std::string> file_list;//第一步:递归式的把每个html文件名带路径保存到file_list当中,方便后期进行一个一个文件进行读取if(!EnumFile(src_path, file_list)){std::cerr << "enum file name error" << std::endl;return 1;}//第二步:按照file_list读取每个文件的内容,并进行解析std::vector<DocInfo_t> results;if(!ParseHtml(file_list, results)){std::cerr << "parse html error" << std::endl;return 2;}//第三步:把解析完毕的文件写入到output,按照\3作为每个文档的分隔符if(!SaveHtml(results, output)){std::cerr << "save html error" << std::endl;return 3;}return 0;}

基本思路:

  1. 拿到我们所有的源html文件名,然后把这些文件名保存在一个数组中。
  2. 依次遍历数组,把文件进行去标签,然后把去掉的内容整理成一个DocInfo_t结构体,里面保存title、content、url的结果并存放到一个数组中。
  3. 遍历结构体数组,然后把内容写入到我们的目的文件中,按照一定的格式。

(2)实现如上三个函数我们可以安装boost库,使用boost当中的接口实现:

[xiaomaker@VM-28-13-centos boost_searcher]$ sudo yum install -y boost-devel[sudo] password for xiaomaker: 

简单认识一下boost库,如下是使用手册:

5.4 EnumFile函数的实现

EnumFil函数的功能是把我们给定src_path目录下的所有后缀是html的文件名字给保存下了,存在在一个file_list数组当中。

bool EnumFile(const std::string& src_path, std::vector<std::string>& file_list);

(1)具体的实现:

bool EnumFile(const std::string& src_path, std::vector<std::string>& file_list)//第一步{namespace fs = boost::filesystem;fs::path root_path(src_path);if(!fs::exists(root_path))//判断路径是否存在{ std::cerr << src_path << "not exists" << std::endl;return false;}//定义一个空的迭代器,用来进行判断递归结束fs::recursive_directory_iterator end;for(fs::recursive_directory_iterator iter(root_path); iter != end; ++iter){if(!fs::is_regular_file(*iter))//判断文件是否是普通文件.html文件都是普通文件{continue;}if(iter->path().extension() != ".html") //判断是否是以.html结尾{continue;}//std::cout << "debug: " <path().string() << std::endl;//当前路径是合法的,以.html结束的普通文件file_list.push_back(iter->path().string());//将所有带路径的html保存到file_list。方便后续进行文本分析}return true;}

(2)代码解析:

(3)运行一下看看结果是否正确:

[xiaomaker@VM-28-13-centos boost_searcher]$ ./parser debug: data/input/about.htmldebug: data/input/accumulators/user_s_guide.htmldebug: data/input/accumulators/acknowledgements.htmldebug: data/input/accumulators/reference.htmldebug: data/input/accumulators.html...

如上就说明成功了。

5.5 ParseHtml函数的实现

我们要开始解析我们的每一个html目录。

bool ParseHtml(const std::vector<std::string>& file_list, std::vector<DocInfo_t>& results);

(1)如下是ParseHtml函数的框架:

bool ParseHtml(const std::vector<std::string>& file_list, std::vector<DocInfo_t>& results)//第二步{for(auto& file : file_list){//1. 读取文件. Read();std::string result;if(!ns_util::FileUtil::ReadFile(file, result)){continue;}DocInfo_t doc;//2. 解析指定的文件, 提取titleif(!ParseTitle(result, &doc.title)){continue;}//3. 解析指定的文件, 提取contentif(!ParseContent(result, &doc.content)){continue;}//4. 解析指定的文件, 构建urlif(!ParseUrl(file, &doc.url)){continue;}//完成了解析任务,当前文档的相关结果都保存在了doc当中results.push_back(std::move(doc));//效率可能会很低,加上move就是移动构造,减少拷贝//ShowDoc(doc);//break;}return true;}

(2)ParseHtml函数的大致流程:

  • 对于每一个文件,我们把它读取到一个字符串中。
  • 根据字符串拿到title。
  • 根据字符串拿到content。
  • 根据字符串拿到url。

下面我们分别实现这些函数的功能。

5.5.1 实现读取文件内容的ReadFile函数

(1)对于该函数,我们可以把它放在一个工具集(util.hpp)当中,因为后面可能其它函数会使用到。

#pragma once#include #include #include #include namespace ns_util{class FileUtil{public:static bool ReadFile(const std::string& file_path, std::string& out){std::ifstream in(file_path, std::ios::in);if(!in.is_open()){LOG(FATAL, "open file" + file_path + "error");//std::cerr << "open file" << file_path << "error" << std::endl;return false;}std::string line;while(std::getline(in, line))//如何判断一个文件读取结束呢??getline的返回值是&,但是whlie(bool),那是因为重载了强转{out += line;}in.close();return true;}};}

(2)代码解析:

5.5.2 实现提取titile的函数ParseTitle

(1)看看html文件的内容,title是在一个标签里面。

(2)根据字符串来进行提取title。

bool ParseTitle(const std::string& file, std::string* title)//解析指定的文件, 提取title{size_t begin = file.find(""</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span><span class="token punctuation">(</span>begin <span class="token operator">==</span> std<span class="token double-colon punctuation">::</span>string<span class="token double-colon punctuation">::</span>npos<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"获取<title>字符串失败"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span><span class="token punctuation">}</span>size_t end <span class="token operator">=</span> file<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token string">"");if(end == std::string::npos){LOG(FATAL, "获取字符串失败");return false;}begin += std::string(""</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span><span class="token punctuation">(</span>begin <span class="token operator">></span> end<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"获取的下标不正确"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token operator">*</span>title <span class="token operator">=</span> file<span class="token punctuation">.</span><span class="token function">substr</span><span class="token punctuation">(</span>begin<span class="token punctuation">,</span> end <span class="token operator">-</span> begin<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(3)代码解析:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/63445fd6db8448f3ad3e3e858bba1946.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/63445fd6db8448f3ad3e3e858bba1946.png" /></p><h4>5.5.3 实现提取content的函数ParseContent</h4><p>我们获取content,不是把所有的内容都拿出来,而是要去标签,这里需要借助一个状态。</p><p>(1)我们知道标签是这样的表示。那么我们这里使用一个状态。我们默认第一个字符是<。</p><pre><code class="prism language-cpp"><span class="token keyword">static</span> <span class="token keyword">bool</span> <span class="token function">ParseContent</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> file<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">*</span> content<span class="token punctuation">)</span><span class="token comment">//解析指定的文件, 提取content</span><span class="token punctuation">{</span><span class="token comment">//去标签</span><span class="token keyword">enum</span> <span class="token class-name">status</span><span class="token punctuation">{</span>LABLE<span class="token punctuation">,</span>CONTENT<span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token keyword">enum</span> <span class="token class-name">status</span> s <span class="token operator">=</span> LABLE<span class="token punctuation">;</span><span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">char</span> c <span class="token operator">:</span> file<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">switch</span> <span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">case</span> LABLE<span class="token operator">:</span><span class="token keyword">if</span><span class="token punctuation">(</span>c <span class="token operator">==</span> <span class="token char">'>'</span><span class="token punctuation">)</span><span class="token punctuation">{</span>s <span class="token operator">=</span> CONTENT<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">break</span><span class="token punctuation">;</span><span class="token keyword">case</span> CONTENT<span class="token operator">:</span><span class="token keyword">if</span><span class="token punctuation">(</span>c <span class="token operator">==</span> <span class="token char">'<'</span><span class="token punctuation">)</span><span class="token punctuation">{</span>s <span class="token operator">=</span> LABLE<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">else</span><span class="token punctuation">{</span><span class="token comment">//我们不需要原文件当中的\n,因为我们需要用\n作为html解析之后的文本分隔符</span><span class="token keyword">if</span><span class="token punctuation">(</span>c <span class="token operator">==</span> <span class="token char">'\n'</span><span class="token punctuation">)</span><span class="token punctuation">{</span>c <span class="token operator">=</span> <span class="token char">' '</span><span class="token punctuation">;</span><span class="token punctuation">}</span>content<span class="token operator">-></span><span class="token function">push_back</span><span class="token punctuation">(</span>c<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">break</span><span class="token punctuation">;</span><span class="token keyword">default</span><span class="token operator">:</span><span class="token keyword">break</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(2)代码解析:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/6876e1bdcc6d46ae9161bc736f883dda.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/6876e1bdcc6d46ae9161bc736f883dda.png" /></p><h4>5.5.4 实现提取url函数ParseUrl</h4><p><font color="red">(1)boost库的官方文档,和我们下载下来的文档,是有路径的对应关系的。如下:</font></p><p><strong>官网url</strong>:https://www.boost.org/doc/libs/1_84_0/doc/html/accumulators.html<br /> <strong>我们下载下来的url</strong>:boost_1_84_0/doc/html/accumulators.html</p><p><strong>我们拷贝到我们项目中的样例</strong>:data/input/accumulators.html<br /> 我们把下载下来的boost库当中的 doc/html/* 拷贝到了data/input/</p><p>url_head = “https://www.boost.org/doc/libs/1_84_0/doc/html”;这是固定的<br /> url_tail = data/input /accumulators.html 转换成 url_tail = /accumulators.html</p><p><font color="red">url = url_head + url_tail就相当于形成了一个官网链接。</font></p><p><strong>(2)具体实现:</strong></p><pre><code class="prism language-cpp"><span class="token keyword">static</span> <span class="token keyword">bool</span> <span class="token function">ParseUrl</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> file_path<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">*</span> url<span class="token punctuation">)</span><span class="token comment">//解析指定的文件, 构建url</span><span class="token punctuation">{</span>std<span class="token double-colon punctuation">::</span>string url_head <span class="token operator">=</span> <span class="token string">"https://www.boost.org/doc/libs/1_84_0/doc/html"</span><span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>string url_tail <span class="token operator">=</span> file_path<span class="token punctuation">.</span><span class="token function">substr</span><span class="token punctuation">(</span>src_path<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token operator">*</span>url <span class="token operator">=</span> url_head <span class="token operator">+</span> url_tail<span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(3)代码解析:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/8dc67050ef04417aaced7a17744362e8.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/8dc67050ef04417aaced7a17744362e8.png" /></p><p><strong>(4)使用如下函数检测是否构建成功:</strong></p><pre><code class="prism language-cpp"><span class="token keyword">void</span> <span class="token function">ShowDoc</span><span class="token punctuation">(</span>DocInfo_t<span class="token operator">&</span> doc<span class="token punctuation">)</span><span class="token punctuation">{</span>std<span class="token double-colon punctuation">::</span>cout <span class="token operator"><<</span> <span class="token string">"title: "</span> <span class="token operator"><<</span> doc<span class="token punctuation">.</span>title <span class="token operator"><<</span> std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>cout <span class="token operator"><<</span> <span class="token string">"content: "</span> <span class="token operator"><<</span> doc<span class="token punctuation">.</span>content <span class="token operator"><<</span> std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>cout <span class="token operator"><<</span> <span class="token string">"url: "</span> <span class="token operator"><<</span> doc<span class="token punctuation">.</span>url <span class="token operator"><<</span> std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(5)测试结果:</strong></p><pre><code class="prism language-powershell">title: Struct template result&lt<span class="token punctuation">;</span>This<span class="token punctuation">(</span>InputIterator<span class="token punctuation">,</span> InputIterator<span class="token punctuation">)</span>&gt<span class="token punctuation">;</span>content: Struct template result&lt<span class="token punctuation">;</span>This<span class="token punctuation">(</span>InputIterator<span class="token punctuation">,</span> InputIterator<span class="token punctuation">)</span>&gt<span class="token punctuation">;</span>HomeLibrariesPeopleFAQMoreStruct template result&lt<span class="token punctuation">;</span>This<span class="token punctuation">(</span>InputIterator<span class="token punctuation">,</span> InputIterator<span class="token punctuation">)</span>&gt<span class="token punctuation">;</span>boost::proto::functional::distance::result&lt<span class="token punctuation">;</span>This<span class="token punctuation">(</span>InputIterator<span class="token punctuation">,</span> InputIterator<span class="token punctuation">)</span>&gt<span class="token punctuation">;</span>Synopsis/<span class="token operator">/</span> In header: &lt<span class="token punctuation">;</span>boost/proto/functional/std/iterator<span class="token punctuation">.</span>hpp&gt<span class="token punctuation">;</span>template&lt<span class="token punctuation">;</span>typename This<span class="token punctuation">,</span> typename InputIterator&gt<span class="token punctuation">;</span> struct result&lt<span class="token punctuation">;</span>This<span class="token punctuation">(</span>InputIterator<span class="token punctuation">,</span> InputIterator<span class="token punctuation">)</span>&gt<span class="token punctuation">;</span> <span class="token punctuation">{</span><span class="token operator">/</span><span class="token operator">/</span> typestypedef typename std::iterator_traits&lt<span class="token punctuation">;</span>typename boost::remove_const&lt<span class="token punctuation">;</span>typename boost::remove_reference&lt<span class="token punctuation">;</span>InputIterator&gt<span class="token punctuation">;</span>::<span class="token function">type</span>&gt<span class="token punctuation">;</span>::<span class="token function">type</span>&gt<span class="token punctuation">;</span>::difference_type <span class="token function">type</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">;</span>Copyright © 2008 Eric NieblerDistributed under the Boost Software License<span class="token punctuation">,</span> Version 1<span class="token punctuation">.</span>0<span class="token punctuation">.</span> <span class="token punctuation">(</span>See accompanyingfile LICENSE_1_0<span class="token punctuation">.</span>txt or <span class="token function">copy</span> at http:<span class="token operator">/</span><span class="token operator">/</span>www<span class="token punctuation">.</span>boost<span class="token punctuation">.</span>org/LICENSE_1_0<span class="token punctuation">.</span>txt<span class="token punctuation">)</span>url: https:<span class="token operator">/</span><span class="token operator">/</span>www<span class="token punctuation">.</span>boost<span class="token punctuation">.</span>org/doc/libs/1_84_0/doc/html/boost/proto/functional/distance/resu_1_3_32_5_26_2_1_1_2_4<span class="token punctuation">.</span>html</code></pre><p><strong>(6)我们可以将拿到的这个url去官网上看看是不是正确的:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/e490e83839054748af535a08ef10057f.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/e490e83839054748af535a08ef10057f.png" /></p><h3>5.6 SaveHtml函数的实现</h3><pre><code class="prism language-cpp"><span class="token keyword">bool</span> <span class="token function">SaveHtml</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>DocInfo_t<span class="token operator">></span><span class="token operator">&</span> results<span class="token punctuation">,</span> <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> output<span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre><p><strong>(1)我们已经得到每一个文件的结构体了,现在开始保存文件到要求的文件当中:</strong></p><pre><code class="prism language-cpp"><span class="token keyword">bool</span> <span class="token function">SaveHtml</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>DocInfo_t<span class="token operator">></span><span class="token operator">&</span> results<span class="token punctuation">,</span> <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> output<span class="token punctuation">)</span><span class="token comment">//第三步</span><span class="token punctuation">{</span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">SEP</span> <span class="token char">'\3'</span></span><span class="token comment">//按照二进制写入</span>std<span class="token double-colon punctuation">::</span>ofstream <span class="token function">out</span><span class="token punctuation">(</span>output<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>out <span class="token operator">|</span> std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>binary<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span><span class="token punctuation">(</span><span class="token operator">!</span>out<span class="token punctuation">.</span><span class="token function">is_open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"open: "</span> <span class="token operator">+</span> output <span class="token operator">+</span> <span class="token string">"failed"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cerr << "open: " << output << "failed" << std::endl;</span><span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//接下来进行文件内容写入</span><span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> item <span class="token operator">:</span> results<span class="token punctuation">)</span><span class="token punctuation">{</span>std<span class="token double-colon punctuation">::</span>string out_string<span class="token punctuation">;</span>out_string <span class="token operator">=</span> item<span class="token punctuation">.</span>title<span class="token punctuation">;</span>out_string <span class="token operator">+=</span> SEP<span class="token punctuation">;</span>out_string <span class="token operator">+=</span> item<span class="token punctuation">.</span>content<span class="token punctuation">;</span>out_string <span class="token operator">+=</span> SEP<span class="token punctuation">;</span>out_string <span class="token operator">+=</span> item<span class="token punctuation">.</span>url<span class="token punctuation">;</span>out_string <span class="token operator">+=</span> <span class="token char">'\n'</span><span class="token punctuation">;</span>out<span class="token punctuation">.</span><span class="token function">write</span><span class="token punctuation">(</span>out_string<span class="token punctuation">.</span><span class="token function">c_str</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> out_string<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span>out<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(2)代码解析:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/147ba767fe8f496f9a562ea9faa6d0b0.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/147ba767fe8f496f9a562ea9faa6d0b0.png" /></p><p><strong>(3)验证是否保存成功:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/69755ca81b5b418d9461a89fe0df9e4d.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/69755ca81b5b418d9461a89fe0df9e4d.png" /></p><p><strong>(4)验证是否保存完全:</strong></p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ <span class="token function">ls</span> <span class="token punctuation">.</span><span class="token operator">/</span><span class="token keyword">data</span><span class="token operator">/</span>input/ <span class="token operator">-</span>Rl <span class="token punctuation">|</span> grep <span class="token operator">-</span>E <span class="token string">"*.html"</span> <span class="token punctuation">|</span> wc <span class="token operator">-</span>l8586<span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ <span class="token function">cat</span> <span class="token punctuation">.</span><span class="token operator">/</span><span class="token keyword">data</span><span class="token operator">/</span>raw_html/raw<span class="token punctuation">.</span>txt <span class="token punctuation">|</span> wc <span class="token operator">-</span>l8586<span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$</code></pre><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/bc6f1f4acc7843218d763c0c711a6533.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/bc6f1f4acc7843218d763c0c711a6533.png" /></p><h2>6. 建立索引</h2><p>下面我们就要建立索引了,建立索引实际上就是构建存储+搜索的数据结构,来加快我们对于关键字->文档ID->文档内容的搜索过程。根据上面了解,我们建立正派索引和倒排索引。<br /> <font color="red">在建立索引之前我们需要安装jieba这个分词工具来帮助我们分词。</font></p><h3>6.1 jieba的安装与使用</h3><p>(1)对于分词,我们可以直接使用cppjieba分词工具即可。我们执行下面的命令将github上面的jieba库下载到本地。</p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos jieba]</span>$ git clone git clone https:<span class="token operator">/</span><span class="token operator">/</span>gitcode<span class="token punctuation">.</span>net/qq_55172408/cppjieba<span class="token punctuation">.</span>git</code></pre><p><strong>如下是cppjieba的具体内容:</strong></p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos jieba]</span>$ tree cppjieba/cppjieba/├── appveyor<span class="token punctuation">.</span>yml├── ChangeLog<span class="token punctuation">.</span>md├── CMakeLists<span class="token punctuation">.</span>txt├── deps│ ├── CMakeLists<span class="token punctuation">.</span>txt│ ├── gtest│ │ ├── CMakeLists<span class="token punctuation">.</span>txt│ │ ├── include│ │ │ └── gtest│ │ │ ├── gtest-death-test<span class="token punctuation">.</span>h│ │ │ ├── gtest<span class="token punctuation">.</span>h│ │ │ ├── gtest-message<span class="token punctuation">.</span>h│ │ │ ├── gtest-<span class="token keyword">param</span><span class="token operator">-</span>test<span class="token punctuation">.</span>h│ │ │ ├── gtest-<span class="token keyword">param</span><span class="token operator">-</span>test<span class="token punctuation">.</span>h<span class="token punctuation">.</span>pump│ │ │ ├── gtest_pred_impl<span class="token punctuation">.</span>h│ │ │ ├── gtest-printers<span class="token punctuation">.</span>h│ │ │ ├── gtest_prod<span class="token punctuation">.</span>h│ │ │ ├── gtest-spi<span class="token punctuation">.</span>h│ │ │ ├── gtest-<span class="token function">test-part</span><span class="token punctuation">.</span>h│ │ │ ├── gtest-typed-test<span class="token punctuation">.</span>h│ │ │ └── internal│ │ │ ├── gtest-death-<span class="token function">test-internal</span><span class="token punctuation">.</span>h│ │ │ ├── gtest-filepath<span class="token punctuation">.</span>h│ │ │ ├── gtest-internal<span class="token punctuation">.</span>h│ │ │ ├── gtest-linked_ptr<span class="token punctuation">.</span>h│ │ │ ├── gtest-<span class="token keyword">param</span><span class="token operator">-</span>util-generated<span class="token punctuation">.</span>h│ │ │ ├── gtest-<span class="token keyword">param</span><span class="token operator">-</span>util-generated<span class="token punctuation">.</span>h<span class="token punctuation">.</span>pump│ │ │ ├── gtest-<span class="token keyword">param</span><span class="token operator">-</span>util<span class="token punctuation">.</span>h│ │ │ ├── gtest-port<span class="token punctuation">.</span>h│ │ │ ├── gtest-string<span class="token punctuation">.</span>h│ │ │ ├── gtest-tuple<span class="token punctuation">.</span>h│ │ │ ├── gtest-tuple<span class="token punctuation">.</span>h<span class="token punctuation">.</span>pump│ │ │ ├── gtest-<span class="token function">type</span><span class="token operator">-</span>util<span class="token punctuation">.</span>h│ │ │ └── gtest-<span class="token function">type</span><span class="token operator">-</span>util<span class="token punctuation">.</span>h<span class="token punctuation">.</span>pump│ │ └── src│ │ ├── gtest-all<span class="token punctuation">.</span>cc│ │ ├── gtest<span class="token punctuation">.</span>cc│ │ ├── gtest-death-test<span class="token punctuation">.</span>cc│ │ ├── gtest-filepath<span class="token punctuation">.</span>cc│ │ ├── gtest-internal-inl<span class="token punctuation">.</span>h│ │ ├── gtest_main<span class="token punctuation">.</span>cc│ │ ├── gtest-port<span class="token punctuation">.</span>cc│ │ ├── gtest-printers<span class="token punctuation">.</span>cc│ │ ├── gtest-<span class="token function">test-part</span><span class="token punctuation">.</span>cc│ │ └── gtest-typed-test<span class="token punctuation">.</span>cc│ └── limonp│ ├── ArgvContext<span class="token punctuation">.</span>hpp│ ├── BlockingQueue<span class="token punctuation">.</span>hpp│ ├── BoundedBlockingQueue<span class="token punctuation">.</span>hpp│ ├── BoundedQueue<span class="token punctuation">.</span>hpp│ ├── Closure<span class="token punctuation">.</span>hpp│ ├── Colors<span class="token punctuation">.</span>hpp│ ├── Condition<span class="token punctuation">.</span>hpp│ ├── Config<span class="token punctuation">.</span>hpp│ ├── FileLock<span class="token punctuation">.</span>hpp│ ├── ForcePublic<span class="token punctuation">.</span>hpp│ ├── LocalVector<span class="token punctuation">.</span>hpp│ ├── Logging<span class="token punctuation">.</span>hpp│ ├── Md5<span class="token punctuation">.</span>hpp│ ├── MutexLock<span class="token punctuation">.</span>hpp│ ├── NonCopyable<span class="token punctuation">.</span>hpp│ ├── StdExtension<span class="token punctuation">.</span>hpp│ ├── StringUtil<span class="token punctuation">.</span>hpp│ ├── Thread<span class="token punctuation">.</span>hpp│ └── ThreadPool<span class="token punctuation">.</span>hpp├── dict│ ├── hmm_model<span class="token punctuation">.</span>utf8│ ├── idf<span class="token punctuation">.</span>utf8│ ├── jieba<span class="token punctuation">.</span>dict<span class="token punctuation">.</span>utf8│ ├── pos_dict│ │ ├── char_state_tab<span class="token punctuation">.</span>utf8│ │ ├── prob_emit<span class="token punctuation">.</span>utf8│ │ ├── prob_start<span class="token punctuation">.</span>utf8│ │ └── prob_trans<span class="token punctuation">.</span>utf8│ ├── README<span class="token punctuation">.</span>md│ ├── stop_words<span class="token punctuation">.</span>utf8│ └── user<span class="token punctuation">.</span>dict<span class="token punctuation">.</span>utf8├── include│ └── cppjieba│ ├── DictTrie<span class="token punctuation">.</span>hpp│ ├── FullSegment<span class="token punctuation">.</span>hpp│ ├── HMMModel<span class="token punctuation">.</span>hpp│ ├── HMMSegment<span class="token punctuation">.</span>hpp│ ├── Jieba<span class="token punctuation">.</span>hpp│ ├── KeywordExtractor<span class="token punctuation">.</span>hpp│ ├── limonp│ │ ├── ArgvContext<span class="token punctuation">.</span>hpp│ │ ├── BlockingQueue<span class="token punctuation">.</span>hpp│ │ ├── BoundedBlockingQueue<span class="token punctuation">.</span>hpp│ │ ├── BoundedQueue<span class="token punctuation">.</span>hpp│ │ ├── Closure<span class="token punctuation">.</span>hpp│ │ ├── Colors<span class="token punctuation">.</span>hpp│ │ ├── Condition<span class="token punctuation">.</span>hpp│ │ ├── Config<span class="token punctuation">.</span>hpp│ │ ├── FileLock<span class="token punctuation">.</span>hpp│ │ ├── ForcePublic<span class="token punctuation">.</span>hpp│ │ ├── LocalVector<span class="token punctuation">.</span>hpp│ │ ├── Logging<span class="token punctuation">.</span>hpp│ │ ├── Md5<span class="token punctuation">.</span>hpp│ │ ├── MutexLock<span class="token punctuation">.</span>hpp│ │ ├── NonCopyable<span class="token punctuation">.</span>hpp│ │ ├── StdExtension<span class="token punctuation">.</span>hpp│ │ ├── StringUtil<span class="token punctuation">.</span>hpp│ │ ├── Thread<span class="token punctuation">.</span>hpp│ │ └── ThreadPool<span class="token punctuation">.</span>hpp│ ├── MixSegment<span class="token punctuation">.</span>hpp│ ├── MPSegment<span class="token punctuation">.</span>hpp│ ├── PosTagger<span class="token punctuation">.</span>hpp│ ├── PreFilter<span class="token punctuation">.</span>hpp│ ├── QuerySegment<span class="token punctuation">.</span>hpp│ ├── SegmentBase<span class="token punctuation">.</span>hpp│ ├── SegmentTagged<span class="token punctuation">.</span>hpp│ ├── TextRankExtractor<span class="token punctuation">.</span>hpp│ ├── Trie<span class="token punctuation">.</span>hpp│ └── Unicode<span class="token punctuation">.</span>hpp├── README_EN<span class="token punctuation">.</span>md├── README<span class="token punctuation">.</span>md└── test├── CMakeLists<span class="token punctuation">.</span>txt├── demo<span class="token punctuation">.</span><span class="token function">cpp</span>├── load_test<span class="token punctuation">.</span><span class="token function">cpp</span>├── testdata│ ├── curl<span class="token punctuation">.</span>res│ ├── extra_dict│ │ └── jieba<span class="token punctuation">.</span>dict<span class="token punctuation">.</span>small<span class="token punctuation">.</span>utf8│ ├── gbk_dict│ │ ├── hmm_model<span class="token punctuation">.</span>gbk│ │ └── jieba<span class="token punctuation">.</span>dict<span class="token punctuation">.</span>gbk│ ├── jieba<span class="token punctuation">.</span>dict<span class="token punctuation">.</span>0<span class="token punctuation">.</span>1<span class="token punctuation">.</span>utf8│ ├── jieba<span class="token punctuation">.</span>dict<span class="token punctuation">.</span>0<span class="token punctuation">.</span>utf8│ ├── jieba<span class="token punctuation">.</span>dict<span class="token punctuation">.</span>1<span class="token punctuation">.</span>utf8│ ├── jieba<span class="token punctuation">.</span>dict<span class="token punctuation">.</span>2<span class="token punctuation">.</span>utf8│ ├── load_test<span class="token punctuation">.</span>urls│ ├── review<span class="token punctuation">.</span>100│ ├── review<span class="token punctuation">.</span>100<span class="token punctuation">.</span>res│ ├── server<span class="token punctuation">.</span>conf│ ├── testlines<span class="token punctuation">.</span>gbk│ ├── testlines<span class="token punctuation">.</span>utf8│ ├── userdict<span class="token punctuation">.</span>2<span class="token punctuation">.</span>utf8│ ├── userdict<span class="token punctuation">.</span>english│ ├── userdict<span class="token punctuation">.</span>utf8│ └── weicheng<span class="token punctuation">.</span>utf8└── unittest├── CMakeLists<span class="token punctuation">.</span>txt├── gtest_main<span class="token punctuation">.</span><span class="token function">cpp</span>├── jieba_test<span class="token punctuation">.</span><span class="token function">cpp</span>├── keyword_extractor_test<span class="token punctuation">.</span><span class="token function">cpp</span>├── pos_tagger_test<span class="token punctuation">.</span><span class="token function">cpp</span>├── pre_filter_test<span class="token punctuation">.</span><span class="token function">cpp</span>├── segments_test<span class="token punctuation">.</span><span class="token function">cpp</span>├── textrank_test<span class="token punctuation">.</span><span class="token function">cpp</span>├── trie_test<span class="token punctuation">.</span><span class="token function">cpp</span>└── unicode_test<span class="token punctuation">.</span><span class="token function">cpp</span>17 directories<span class="token punctuation">,</span> 136 files<span class="token namespace">[xiaomaker@VM-28-13-centos jieba]</span>$</code></pre><p><strong>这里我们只需要关注的是两个文件:</strong></p><ul><li><strong>cppjieba/include</strong>:头文件。</li><li><strong>cppjiba/dict</strong>:字典。</li></ul><p>(2)下面我们了解jieba分词的使用,里面存在一个demo.cpp文件供我们测试。</p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$ <span class="token function">pwd</span><span class="token operator">/</span>home/xiaomaker/code_cpp/jieba/cppjieba/test<span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$ lltotal 20<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker148 Feb 14 14:02 CMakeLists<span class="token punctuation">.</span>txt<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker 2797 Feb 14 14:02 demo<span class="token punctuation">.</span><span class="token function">cpp</span><span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker 1532 Feb 14 14:02 load_test<span class="token punctuation">.</span><span class="token function">cpp</span>drwxrwxr-x 4 xiaomaker xiaomaker 4096 Feb 14 14:02 testdatadrwxrwxr-x 2 xiaomaker xiaomaker 4096 Feb 14 14:02 unittest<span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$</code></pre><p><strong>①我们不能直接编译,它会报错。</strong></p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$ g+<span class="token operator">+</span> demo<span class="token punctuation">.</span><span class="token function">cpp</span> demo<span class="token punctuation">.</span><span class="token function">cpp</span>:1:30: fatal error: cppjieba/Jieba<span class="token punctuation">.</span>hpp: No such file or directory <span class="token comment">#include "cppjieba/Jieba.hpp"</span>^~~~~~~~~~~~~~~~~~~~ ^compilation terminated<span class="token punctuation">.</span><span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$</code></pre><p>这是因为我们这里的库和头文件的路径是不对的,这里添加软链接即可。链接的路径是自己下载jieba的路径。</p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$ ln <span class="token operator">-</span>s ~<span class="token operator">/</span>code_cpp/jieba/cppjieba/include/ include<span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$ ln <span class="token operator">-</span>s ~<span class="token operator">/</span>code_cpp/jieba/cppjieba/dict/ dict<span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$ lltotal 20<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker148 Feb 14 14:02 CMakeLists<span class="token punctuation">.</span>txt<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker 2853 Feb 24 12:16 demo<span class="token punctuation">.</span><span class="token function">cpp</span>lrwxrwxrwx 1 xiaomaker xiaomaker 45 Feb 24 12:18 dict <span class="token operator">-</span>> <span class="token operator">/</span>home/xiaomaker/code_cpp/jieba/cppjieba/dict/lrwxrwxrwx 1 xiaomaker xiaomaker 48 Feb 24 12:17 include <span class="token operator">-</span>> <span class="token operator">/</span>home/xiaomaker/code_cpp/jieba/cppjieba/include/<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker 1532 Feb 14 14:02 load_test<span class="token punctuation">.</span><span class="token function">cpp</span>drwxrwxr-x 4 xiaomaker xiaomaker 4096 Feb 14 14:02 testdatadrwxrwxr-x 2 xiaomaker xiaomaker 4096 Feb 14 14:02 unittest<span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$</code></pre><p><strong>②接下来就需要修改demo.cpp的头文件。</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/744cd569fcb749a8a1e4c164a691f10e.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/744cd569fcb749a8a1e4c164a691f10e.png" /></p><p><strong>③我们继续编译,我们发现还是出现错误。</strong></p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$ g+<span class="token operator">+</span> demo<span class="token punctuation">.</span><span class="token function">cpp</span> In file included <span class="token keyword">from</span> include/cppjieba/Jieba<span class="token punctuation">.</span>hpp:4<span class="token punctuation">,</span> <span class="token keyword">from</span> demo<span class="token punctuation">.</span><span class="token function">cpp</span>:1:include/cppjieba/QuerySegment<span class="token punctuation">.</span>hpp:7:10: fatal error: limonp/Logging<span class="token punctuation">.</span>hpp: No such file or directory <span class="token comment">#include "limonp/Logging.hpp"</span>^~~~~~~~~~~~~~~~~~~~compilation terminated<span class="token punctuation">.</span></code></pre><p>这是因为找不到limonp/Logging.hpp文件。这时候我们只需要将deps/limonp目录拷贝到include/cppjieba当中即可。</p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos cppjieba]</span>$ <span class="token function">cp</span> deps/limonp/ include/cppjieba/ <span class="token operator">-</span>rf</code></pre><p><strong>④这样我们就可以编译通过了:</strong></p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$ g+<span class="token operator">+</span> demo<span class="token punctuation">.</span><span class="token function">cpp</span><span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$ lltotal 460<span class="token operator">-</span>rwxrwxr-x 1 xiaomaker xiaomaker 447008 Feb 24 12:46 a<span class="token punctuation">.</span>out<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker148 Feb 14 14:02 CMakeLists<span class="token punctuation">.</span>txt<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker 2861 Feb 24 12:20 demo<span class="token punctuation">.</span><span class="token function">cpp</span>lrwxrwxrwx 1 xiaomaker xiaomaker 45 Feb 24 12:18 dict <span class="token operator">-</span>> <span class="token operator">/</span>home/xiaomaker/code_cpp/jieba/cppjieba/dict/lrwxrwxrwx 1 xiaomaker xiaomaker 48 Feb 24 12:17 include <span class="token operator">-</span>> <span class="token operator">/</span>home/xiaomaker/code_cpp/jieba/cppjieba/include/<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker 1532 Feb 14 14:02 load_test<span class="token punctuation">.</span><span class="token function">cpp</span>drwxrwxr-x 4 xiaomaker xiaomaker 4096 Feb 14 14:02 testdatadrwxrwxr-x 2 xiaomaker xiaomaker 4096 Feb 14 14:02 unittest<span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$ <span class="token punctuation">.</span><span class="token operator">/</span>a<span class="token punctuation">.</span>out 他来到了网易杭研大厦<span class="token namespace">[demo]</span> Cut With HMM他<span class="token operator">/</span>来到<span class="token operator">/</span>了<span class="token operator">/</span>网易<span class="token operator">/</span>杭研<span class="token operator">/</span>大厦<span class="token namespace">[demo]</span> Cut Without HMM 他<span class="token operator">/</span>来到<span class="token operator">/</span>了<span class="token operator">/</span>网易<span class="token operator">/</span>杭<span class="token operator">/</span>研<span class="token operator">/</span>大厦我来到北京清华大学<span class="token namespace">[demo]</span> CutAll我<span class="token operator">/</span>来到<span class="token operator">/</span>北京<span class="token operator">/</span>清华<span class="token operator">/</span>清华大学<span class="token operator">/</span>华大<span class="token operator">/</span>大学小明硕士毕业于中国科学院计算所,后在日本京都大学深造<span class="token namespace">[demo]</span> CutForSearch小明<span class="token operator">/</span>硕士<span class="token operator">/</span>毕业<span class="token operator">/</span>于<span class="token operator">/</span>中国<span class="token operator">/</span>科学<span class="token operator">/</span>学院<span class="token operator">/</span>科学院<span class="token operator">/</span>中国科学院<span class="token operator">/</span>计算<span class="token operator">/</span>计算所<span class="token operator">/</span>,<span class="token operator">/</span>后<span class="token operator">/</span>在<span class="token operator">/</span>日本<span class="token operator">/</span>京都<span class="token operator">/</span>大学<span class="token operator">/</span>日本京都大学<span class="token operator">/</span>深造<span class="token namespace">[xiaomaker@VM-28-13-centos test]</span>$</code></pre><h3>6.2 索引框架</h3><p>(1)创建index.hpp文件后,我们要建立正排和倒排索引,并且我们还要提供查找的接口。index.hpp的整体框架如下:</p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">pragma</span> <span class="token expression">once</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"util.hpp"</span></span><span class="token keyword">namespace</span> ns_index<span class="token punctuation">{</span><span class="token keyword">struct</span> <span class="token class-name">DocInfo</span><span class="token punctuation">{</span>std<span class="token double-colon punctuation">::</span>string title<span class="token punctuation">;</span> <span class="token comment">// 文档的标题</span>std<span class="token double-colon punctuation">::</span>string content<span class="token punctuation">;</span> <span class="token comment">// 文档内容</span>std<span class="token double-colon punctuation">::</span>string url<span class="token punctuation">;</span> <span class="token comment">// 读文档在官网中的url</span><span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span><span class="token comment">//文档id</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token keyword">struct</span> <span class="token class-name">InvertedElem</span><span class="token punctuation">{</span><span class="token keyword">int</span> doc_id<span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>string word<span class="token punctuation">;</span><span class="token keyword">int</span> weight<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token keyword">typedef</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>InvertedElem<span class="token operator">></span> InvertedList_t<span class="token punctuation">;</span> <span class="token comment">//倒排拉链</span><span class="token keyword">class</span> <span class="token class-name">Index</span><span class="token punctuation">{</span><span class="token keyword">public</span><span class="token operator">:</span><span class="token function">Index</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token punctuation">}</span>DocInfo<span class="token operator">*</span> <span class="token function">GetForwardIndex</span><span class="token punctuation">(</span><span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">)</span><span class="token comment">//正排索引:根据doc_id找到文档内容</span><span class="token punctuation">{</span><span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span><span class="token punctuation">}</span>InvertedList_t<span class="token operator">*</span> <span class="token function">GetInveredList</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> word<span class="token punctuation">)</span><span class="token comment">//倒排索引:根据关键字获取倒排拉链</span><span class="token punctuation">{</span><span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//根据去标签,格式化之后的文档,构建正排索引和倒排索引 </span><span class="token keyword">bool</span> <span class="token function">BuildIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> input<span class="token punctuation">)</span><span class="token comment">//parse处理完毕的数据交给我</span><span class="token punctuation">{</span><span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token operator">~</span><span class="token function">Index</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token keyword">private</span><span class="token operator">:</span>DocInfo<span class="token operator">*</span> <span class="token function">BuildForwardIndex</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> line<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">bool</span> <span class="token function">BuildInveredIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> DocInfo<span class="token operator">&</span> doc<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//正排索引使用数组,数组的下标就是文档id</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>DocInfo<span class="token operator">></span> forward_index<span class="token punctuation">;</span><span class="token comment">//正排索引</span><span class="token comment">//倒排索引一定是一个关键字和一组InvertedElem对应</span>std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span> InvertedList_t<span class="token operator">></span> inverted_index<span class="token punctuation">;</span><span class="token comment">//倒排索引</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p>下面我们依次实现index.hpp里面的函数。</p><h3>6.3 BuildIndex函数的实现</h3><pre><code class="prism language-cpp"><span class="token keyword">bool</span> <span class="token function">BuildIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> input<span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre><p>(1)根据我们已经处理好的数据,通过它来构建索引。</p><pre><code class="prism language-cpp"><span class="token keyword">bool</span> <span class="token function">BuildIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> input<span class="token punctuation">)</span><span class="token comment">//parse处理完毕的数据交给我</span><span class="token punctuation">{</span>std<span class="token double-colon punctuation">::</span>fstream <span class="token function">in</span><span class="token punctuation">(</span>input<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>in <span class="token operator">|</span> std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>binary<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token operator">!</span>in<span class="token punctuation">.</span><span class="token function">is_open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"sorry"</span> <span class="token operator">+</span> input <span class="token operator">+</span> <span class="token string">"error"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cerr << "sorry" << input << "error" << std::endl;</span><span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span><span class="token punctuation">}</span>std<span class="token double-colon punctuation">::</span>string line<span class="token punctuation">;</span><span class="token keyword">int</span> count <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token keyword">while</span> <span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">getline</span><span class="token punctuation">(</span>in<span class="token punctuation">,</span> line<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span>DocInfo<span class="token operator">*</span> doc <span class="token operator">=</span> <span class="token function">BuildForwardIndex</span><span class="token punctuation">(</span>line<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span>doc <span class="token operator">==</span> <span class="token keyword">nullptr</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>WARNING<span class="token punctuation">,</span> <span class="token string">"build"</span> <span class="token operator">+</span> line <span class="token operator">+</span> <span class="token string">"error"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cerr << "build" << line << "error" << std::endl;</span><span class="token keyword">continue</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token function">BuildInveredIndex</span><span class="token punctuation">(</span><span class="token operator">*</span>doc<span class="token punctuation">)</span><span class="token punctuation">;</span>count<span class="token operator">++</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span>count <span class="token operator">%</span> <span class="token number">50</span> <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>NORMAL<span class="token punctuation">,</span> <span class="token string">"当前已经建立索引文档: "</span> <span class="token operator">+</span> std<span class="token double-colon punctuation">::</span><span class="token function">to_string</span><span class="token punctuation">(</span>count<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cout << "当前已经建立索引文档:" << count << std::endl;</span><span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(2)代码解析:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/fccc5bd9f0f84846911d471ce7b3e6b5.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/fccc5bd9f0f84846911d471ce7b3e6b5.png" /></p><h4>6.3.1 建立正排索引函数BuildForwardIndex</h4><p>(1)这个非常容易实现,因为我们数组下标天然是我们的文档ID,只需要把处理后每一个文档的内容处理成结构体,然后添加到数组中就可以了。</p><pre><code class="prism language-cpp">DocInfo<span class="token operator">*</span> <span class="token function">BuildForwardIndex</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> line<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token comment">//1. 解析line, 字符串切分</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> results<span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>string sep <span class="token operator">=</span> <span class="token string">"\3"</span><span class="token punctuation">;</span>ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">StringUtil</span><span class="token double-colon punctuation">::</span><span class="token function">Cutstring</span><span class="token punctuation">(</span>line<span class="token punctuation">,</span> results<span class="token punctuation">,</span> sep<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span>results<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token comment">//当切分的字符串不是3说明切割出现错误</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"build error"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//2. 字符串进行填充到DocInfo</span>DocInfo doc<span class="token punctuation">;</span>doc<span class="token punctuation">.</span>title <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token comment">//title</span>doc<span class="token punctuation">.</span>content <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token comment">//content</span>doc<span class="token punctuation">.</span>url <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token comment">//url</span>doc<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> forward_index<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">//文档id</span><span class="token comment">//3. 插入到vector当中</span>forward_index<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">move</span><span class="token punctuation">(</span>doc<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token operator">&</span>forward_index<span class="token punctuation">.</span><span class="token function">back</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p>(2)将Cutstring函数写到工具集(util.hpp)当中。split是boost库当中的接口。</p><pre><code class="prism language-cpp"><span class="token keyword">namespace</span> ns_util<span class="token punctuation">{</span><span class="token keyword">class</span> <span class="token class-name">StringUtil</span><span class="token punctuation">{</span><span class="token keyword">public</span><span class="token operator">:</span><span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">Cutstring</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> target<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span><span class="token operator">&</span> out<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string sep<span class="token punctuation">)</span><span class="token punctuation">{</span>boost<span class="token double-colon punctuation">::</span><span class="token function">split</span><span class="token punctuation">(</span>out<span class="token punctuation">,</span> target<span class="token punctuation">,</span> boost<span class="token double-colon punctuation">::</span><span class="token function">is_any_of</span><span class="token punctuation">(</span>sep<span class="token punctuation">)</span><span class="token punctuation">,</span> boost<span class="token double-colon punctuation">::</span>token_compress_on<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><h4>6.3.2 建立倒排索引函数BuildInveredIndex</h4><p>(1)我们开始根据最新的结构体建立倒排索引,这里就需要我进行分词了。<font color="red">也就需要引入jieba库帮助我们分词。</font>使用软连接将下载好的jieba路径链接到项目当中:</p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ ln <span class="token operator">-</span>s <span class="token operator">/</span>home/xiaomaker/code_cpp/jieba/cppjieba/include/cppjieb cppjieba<span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ ln <span class="token operator">-</span>s <span class="token operator">/</span>home/xiaomaker/code_cpp/jieba/cppjieba/dict/ cppjieba<span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ lltotal 24lrwxrwxrwx 1 xiaomaker xiaomaker 57 Feb 14 14:04 cppjieba <span class="token operator">-</span>> <span class="token operator">/</span>home/xiaomaker/code_cpp/jieba/cppjieba/include/cppjieba/drwxrwxr-x 4 xiaomaker xiaomaker 4096 Feb 13 18:10 <span class="token keyword">data</span>lrwxrwxrwx 1 xiaomaker xiaomaker 45 Feb 14 14:08 dict <span class="token operator">-</span>> <span class="token operator">/</span>home/xiaomaker/code_cpp/jieba/cppjieba/dict/<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker 6101 Feb 15 18:24 index<span class="token punctuation">.</span>hpp<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker379 Feb 15 14:21 Makefile<span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker 6963 Feb 15 18:32 parser<span class="token punctuation">.</span><span class="token function">cpp</span><span class="token operator">-</span>rw-rw-r-<span class="token operator">-</span> 1 xiaomaker xiaomaker 1842 Feb 15 18:38 util<span class="token punctuation">.</span>hpp<span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$</code></pre><p>(2)这样我们就可以编写我们的切词工具了。</p><pre><code class="prism language-cpp"><span class="token keyword">namespace</span> ns_util<span class="token punctuation">{</span><span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> DICT_PATH <span class="token operator">=</span> <span class="token string">"./dict/jieba.dict.utf8"</span><span class="token punctuation">;</span><span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> HMM_PATH <span class="token operator">=</span> <span class="token string">"./dict/hmm_model.utf8"</span><span class="token punctuation">;</span><span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> USER_DICT_PATH <span class="token operator">=</span> <span class="token string">"./dict/user.dict.utf8"</span><span class="token punctuation">;</span><span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> IDF_PATH <span class="token operator">=</span> <span class="token string">"./dict/idf.utf8"</span><span class="token punctuation">;</span><span class="token keyword">const</span> <span class="token keyword">char</span> <span class="token operator">*</span><span class="token keyword">const</span> STOP_WORD_PATH <span class="token operator">=</span> <span class="token string">"./dict/stop_words.utf8"</span><span class="token punctuation">;</span><span class="token keyword">class</span> <span class="token class-name">JiebaUtil</span><span class="token punctuation">{</span><span class="token keyword">private</span><span class="token operator">:</span><span class="token keyword">static</span> cppjieba<span class="token double-colon punctuation">::</span>Jieba jieba<span class="token punctuation">;</span><span class="token keyword">public</span><span class="token operator">:</span><span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">Split</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> src<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span><span class="token operator">&</span> out<span class="token punctuation">)</span><span class="token punctuation">{</span>jieba<span class="token punctuation">.</span><span class="token function">CutForSearch</span><span class="token punctuation">(</span>src<span class="token punctuation">,</span> out<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">//搜索分词,还有其它种类的分词,这里只要搜索分词</span><span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token punctuation">;</span>cppjieba<span class="token double-colon punctuation">::</span>Jieba <span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">jieba</span><span class="token punctuation">(</span>DICT_PATH<span class="token punctuation">,</span> HMM_PATH<span class="token punctuation">,</span> USER_DICT_PATH<span class="token punctuation">,</span> IDF_PATH<span class="token punctuation">,</span> STOP_WORD_PATH<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(3)具体实现:</strong></p><pre><code class="prism language-cpp"><span class="token keyword">bool</span> <span class="token function">BuildInveredIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> DocInfo<span class="token operator">&</span> doc<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">struct</span> <span class="token class-name">word_cnt</span><span class="token comment">//统计词汇出现的次数</span><span class="token punctuation">{</span><span class="token keyword">int</span> title_cnt <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token comment">//标题出现的次数</span><span class="token keyword">int</span> content_cnt <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token comment">//内容出现的次数</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token comment">//分词---标题</span>std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span> word_cnt<span class="token operator">></span> word_map<span class="token punctuation">;</span><span class="token comment">//统计暂存词频率的映射表</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> title_word<span class="token punctuation">;</span>ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">Split</span><span class="token punctuation">(</span>doc<span class="token punctuation">.</span>title<span class="token punctuation">,</span> title_word<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//对标题词频进行统计</span><span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> s <span class="token operator">:</span> title_word<span class="token punctuation">)</span><span class="token punctuation">{</span>boost<span class="token double-colon punctuation">::</span><span class="token function">to_lower</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//统一转换成小写</span>word_map<span class="token punctuation">[</span>s<span class="token punctuation">]</span><span class="token punctuation">.</span>title_cnt<span class="token operator">++</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//分词---内容</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> content_word<span class="token punctuation">;</span>ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">Split</span><span class="token punctuation">(</span>doc<span class="token punctuation">.</span>content<span class="token punctuation">,</span> content_word<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//对内容词频进行统计</span><span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> s <span class="token operator">:</span> content_word<span class="token punctuation">)</span><span class="token punctuation">{</span>boost<span class="token double-colon punctuation">::</span><span class="token function">to_lower</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//统一转换成小写</span>word_map<span class="token punctuation">[</span>s<span class="token punctuation">]</span><span class="token punctuation">.</span>content_cnt<span class="token operator">++</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> word_pair <span class="token operator">:</span> word_map<span class="token punctuation">)</span><span class="token punctuation">{</span>InvertedElem item<span class="token punctuation">;</span>item<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> doc<span class="token punctuation">.</span>doc_id<span class="token punctuation">;</span>item<span class="token punctuation">.</span>word <span class="token operator">=</span> word_pair<span class="token punctuation">.</span>first<span class="token punctuation">;</span>item<span class="token punctuation">.</span>weight <span class="token operator">=</span> word_pair<span class="token punctuation">.</span>second<span class="token punctuation">.</span>title_cnt <span class="token operator">*</span> X <span class="token operator">+</span> word_pair<span class="token punctuation">.</span>second<span class="token punctuation">.</span>content_cnt <span class="token operator">*</span> Y<span class="token punctuation">;</span>InvertedList_t<span class="token operator">&</span> inverted_list <span class="token operator">=</span> inverted_index<span class="token punctuation">[</span>word_pair<span class="token punctuation">.</span>first<span class="token punctuation">]</span><span class="token punctuation">;</span>inverted_list<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">move</span><span class="token punctuation">(</span>item<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(4)代码解析:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/330bb25dab6c446a8266e0ae0b17ff30.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/330bb25dab6c446a8266e0ae0b17ff30.png" /></p><p><strong>(5)权重计算:</strong><br /> <strong>什么是权重</strong>:对于搜索频率高的单词,我们认为它的权重高,同时对一个文档,如果关键字出现的次数越多,权重越大,这里我么权重结算简单些。</p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">X</span> <span class="token expression"><span class="token number">10</span> </span><span class="token comment">//标题出现的次数乘10</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">Y</span> <span class="token expression"><span class="token number">1</span></span><span class="token comment">//内容出现的次数乘1</span></span></code></pre><p>那么权重有什么作用呢?当我们搜索的时,一个关键字可以对应多个文档,那么此时我们可以把权重高的放在前面。<br /> <strong>现在我们的结构是这样的:</strong><br /> <noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/435e0b24f1e84ee6bbff8e5cdf6fc90c.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/435e0b24f1e84ee6bbff8e5cdf6fc90c.png" /></p><h3>6.4 GetForwardIndex函数</h3><p><strong>(1)根据文档的id找到文档的内:</strong></p><pre><code class="prism language-cpp">DocInfo<span class="token operator">*</span> <span class="token function">GetForwardIndex</span><span class="token punctuation">(</span><span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">)</span><span class="token comment">//正排索引:根据doc_id找到文档内容</span><span class="token punctuation">{</span><span class="token keyword">if</span> <span class="token punctuation">(</span>doc_id <span class="token operator">>=</span> forward_index<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"doc_id out range. error"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cerr << "doc_id out range. error" << std::endl;</span><span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">return</span> <span class="token operator">&</span>forward_index<span class="token punctuation">[</span>doc_id<span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><h3>6.5 GetInvertedList函数</h3><p><strong>(1)根据关键字拿到倒排拉链:</strong></p><pre><code class="prism language-cpp">InvertedList_t<span class="token operator">*</span> <span class="token function">GetInveredList</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> word<span class="token punctuation">)</span><span class="token comment">//倒排索引:根据关键字获取倒排拉链</span><span class="token punctuation">{</span><span class="token keyword">auto</span> iter <span class="token operator">=</span> inverted_index<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span>word<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span>iter <span class="token operator">==</span> inverted_index<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"no have InvertedList_t"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cerr << word << "no have InvertedList_t" << std::endl;</span><span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">return</span> <span class="token operator">&</span><span class="token punctuation">(</span>iter<span class="token operator">-></span>second<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p>完上述所有工作后可以<font color="red">将index设置为单例模式。</font></p><h3>6.6 将index设置成单例</h3><p>下面我们把index设置成单例模式,这是因为在boost搜索引擎项目当中,事实上不需要建立多个Index索引对象,只需要建立一个索引对象就可以完成查找工作了。而且我们建立一个索引对象的成本事实上是极高的,因为我们需要将所有的网页信息分词、统计、填充、插入、效率上会受极大损失。</p><p><font color="red">index整体代码(单例模式):</font></p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">pragma</span> <span class="token expression">once</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"util.hpp"</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"log.hpp"</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">X</span> <span class="token expression"><span class="token number">10</span></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">Y</span> <span class="token expression"><span class="token number">1</span></span></span><span class="token keyword">namespace</span> ns_index<span class="token punctuation">{</span><span class="token keyword">struct</span> <span class="token class-name">DocInfo</span><span class="token punctuation">{</span>std<span class="token double-colon punctuation">::</span>string title<span class="token punctuation">;</span> <span class="token comment">// 文档的标题</span>std<span class="token double-colon punctuation">::</span>string content<span class="token punctuation">;</span> <span class="token comment">// 文档内容</span>std<span class="token double-colon punctuation">::</span>string url<span class="token punctuation">;</span> <span class="token comment">// 读文档在官网中的url</span><span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span><span class="token comment">//文档id</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token keyword">struct</span> <span class="token class-name">InvertedElem</span><span class="token punctuation">{</span><span class="token keyword">int</span> doc_id<span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>string word<span class="token punctuation">;</span><span class="token keyword">int</span> weight<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token keyword">typedef</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>InvertedElem<span class="token operator">></span> InvertedList_t<span class="token punctuation">;</span> <span class="token comment">//倒排拉链</span><span class="token keyword">class</span> <span class="token class-name">Index</span><span class="token punctuation">{</span><span class="token keyword">private</span><span class="token operator">:</span><span class="token function">Index</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token function">Index</span><span class="token punctuation">(</span><span class="token keyword">const</span> Index<span class="token operator">&</span><span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token keyword">delete</span><span class="token punctuation">;</span>Index<span class="token operator">&</span> <span class="token keyword">operator</span><span class="token operator">=</span><span class="token punctuation">(</span><span class="token keyword">const</span> Index<span class="token operator">&</span><span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token keyword">delete</span><span class="token punctuation">;</span><span class="token operator">~</span><span class="token function">Index</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token keyword">static</span> Index<span class="token operator">*</span> instance<span class="token punctuation">;</span><span class="token keyword">static</span> std<span class="token double-colon punctuation">::</span>mutex mtx<span class="token punctuation">;</span><span class="token keyword">public</span><span class="token operator">:</span><span class="token keyword">static</span> Index<span class="token operator">*</span> <span class="token function">GetInstance</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">if</span><span class="token punctuation">(</span>instance <span class="token operator">==</span> <span class="token keyword">nullptr</span><span class="token punctuation">)</span><span class="token punctuation">{</span>mtx<span class="token punctuation">.</span><span class="token function">lock</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//多线程安全</span><span class="token keyword">if</span><span class="token punctuation">(</span>instance <span class="token operator">==</span> <span class="token keyword">nullptr</span><span class="token punctuation">)</span><span class="token punctuation">{</span>instance <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token function">Index</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span>mtx<span class="token punctuation">.</span><span class="token function">unlock</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">return</span> instance<span class="token punctuation">;</span><span class="token punctuation">}</span>DocInfo<span class="token operator">*</span> <span class="token function">GetForwardIndex</span><span class="token punctuation">(</span><span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">)</span><span class="token comment">//正排索引:根据doc_id找到文档内容</span><span class="token punctuation">{</span><span class="token keyword">if</span><span class="token punctuation">(</span>doc_id <span class="token operator">>=</span> forward_index<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"doc_id out range. error"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cerr << "doc_id out range. error" << std::endl;</span><span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">return</span> <span class="token operator">&</span>forward_index<span class="token punctuation">[</span>doc_id<span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token punctuation">}</span>InvertedList_t<span class="token operator">*</span> <span class="token function">GetInveredList</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> word<span class="token punctuation">)</span><span class="token comment">//倒排索引:根据关键字获取倒排拉链</span><span class="token punctuation">{</span><span class="token keyword">auto</span> iter <span class="token operator">=</span> inverted_index<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span>word<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span><span class="token punctuation">(</span>iter <span class="token operator">==</span> inverted_index<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"no have InvertedList_t"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cerr << word << "no have InvertedList_t" << std::endl;</span><span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">return</span> <span class="token operator">&</span><span class="token punctuation">(</span>iter<span class="token operator">-></span>second<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//根据去标签,格式化之后的文档,构建正排索引和倒排索引 </span><span class="token keyword">bool</span> <span class="token function">BuildIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> input<span class="token punctuation">)</span><span class="token comment">//parse处理完毕的数据交给我</span><span class="token punctuation">{</span>std<span class="token double-colon punctuation">::</span>fstream <span class="token function">in</span><span class="token punctuation">(</span>input<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>in <span class="token operator">|</span> std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>binary<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span><span class="token punctuation">(</span><span class="token operator">!</span>in<span class="token punctuation">.</span><span class="token function">is_open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"sorry"</span> <span class="token operator">+</span> input <span class="token operator">+</span> <span class="token string">"error"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cerr << "sorry" << input << "error" << std::endl;</span><span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span><span class="token punctuation">}</span>std<span class="token double-colon punctuation">::</span>string line<span class="token punctuation">;</span><span class="token keyword">int</span> count <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token keyword">while</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">getline</span><span class="token punctuation">(</span>in<span class="token punctuation">,</span> line<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span>DocInfo<span class="token operator">*</span> doc <span class="token operator">=</span> <span class="token function">BuildForwardIndex</span><span class="token punctuation">(</span>line<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span><span class="token punctuation">(</span>doc <span class="token operator">==</span> <span class="token keyword">nullptr</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>WARNING<span class="token punctuation">,</span> <span class="token string">"build"</span> <span class="token operator">+</span> line <span class="token operator">+</span> <span class="token string">"error"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cerr << "build" << line << "error" << std::endl;</span><span class="token keyword">continue</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token function">BuildInveredIndex</span><span class="token punctuation">(</span><span class="token operator">*</span>doc<span class="token punctuation">)</span><span class="token punctuation">;</span>count<span class="token operator">++</span><span class="token punctuation">;</span><span class="token keyword">if</span><span class="token punctuation">(</span>count <span class="token operator">%</span> <span class="token number">50</span> <span class="token operator">==</span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>NORMAL<span class="token punctuation">,</span> <span class="token string">"当前已经建立索引文档: "</span> <span class="token operator">+</span> std<span class="token double-colon punctuation">::</span><span class="token function">to_string</span><span class="token punctuation">(</span>count<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cout << "当前已经建立索引文档:" << count << std::endl;</span><span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">private</span><span class="token operator">:</span>DocInfo<span class="token operator">*</span> <span class="token function">BuildForwardIndex</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> line<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token comment">//1. 解析line, 字符串切分</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> results<span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>string sep <span class="token operator">=</span> <span class="token string">"\3"</span><span class="token punctuation">;</span>ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">StringUtil</span><span class="token double-colon punctuation">::</span><span class="token function">Cutstring</span><span class="token punctuation">(</span>line<span class="token punctuation">,</span> results<span class="token punctuation">,</span> sep<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span><span class="token punctuation">(</span>results<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"build error"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//2. 字符串进行填充到DocInfo</span>DocInfo doc<span class="token punctuation">;</span>doc<span class="token punctuation">.</span>title <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token comment">//title</span>doc<span class="token punctuation">.</span>content <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token comment">//content</span>doc<span class="token punctuation">.</span>url <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token comment">//url</span>doc<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> forward_index<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">//文档id</span><span class="token comment">//3. 插入到vector当中</span>forward_index<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">move</span><span class="token punctuation">(</span>doc<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token operator">&</span>forward_index<span class="token punctuation">.</span><span class="token function">back</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">bool</span> <span class="token function">BuildInveredIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> DocInfo<span class="token operator">&</span> doc<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">struct</span> <span class="token class-name">word_cnt</span><span class="token comment">//统计词汇出现的次数</span><span class="token punctuation">{</span><span class="token keyword">int</span> title_cnt <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token comment">//标题出现的次数</span><span class="token keyword">int</span> content_cnt <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token comment">//内容出现的次数</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token comment">//分词---标题</span>std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span> word_cnt<span class="token operator">></span> word_map<span class="token punctuation">;</span><span class="token comment">//统计暂存词频率的映射表</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> title_word<span class="token punctuation">;</span>ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">Split</span><span class="token punctuation">(</span>doc<span class="token punctuation">.</span>title<span class="token punctuation">,</span> title_word<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//对标题词频进行统计</span><span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> s <span class="token operator">:</span> title_word<span class="token punctuation">)</span><span class="token punctuation">{</span>boost<span class="token double-colon punctuation">::</span><span class="token function">to_lower</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//统一转换成小写</span>word_map<span class="token punctuation">[</span>s<span class="token punctuation">]</span><span class="token punctuation">.</span>title_cnt<span class="token operator">++</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//分词---内容</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> content_word<span class="token punctuation">;</span>ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">Split</span><span class="token punctuation">(</span>doc<span class="token punctuation">.</span>content<span class="token punctuation">,</span> content_word<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//对内容词频进行统计</span><span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> s <span class="token operator">:</span> content_word<span class="token punctuation">)</span><span class="token punctuation">{</span>boost<span class="token double-colon punctuation">::</span><span class="token function">to_lower</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//统一转换成小写</span>word_map<span class="token punctuation">[</span>s<span class="token punctuation">]</span><span class="token punctuation">.</span>content_cnt<span class="token operator">++</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> word_pair <span class="token operator">:</span> word_map<span class="token punctuation">)</span><span class="token punctuation">{</span>InvertedElem item<span class="token punctuation">;</span>item<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> doc<span class="token punctuation">.</span>doc_id<span class="token punctuation">;</span>item<span class="token punctuation">.</span>word <span class="token operator">=</span> word_pair<span class="token punctuation">.</span>first<span class="token punctuation">;</span>item<span class="token punctuation">.</span>weight <span class="token operator">=</span> word_pair<span class="token punctuation">.</span>second<span class="token punctuation">.</span>title_cnt <span class="token operator">*</span> X <span class="token operator">+</span> word_pair<span class="token punctuation">.</span>second<span class="token punctuation">.</span>content_cnt <span class="token operator">*</span> Y<span class="token punctuation">;</span>InvertedList_t<span class="token operator">&</span> inverted_list <span class="token operator">=</span> inverted_index<span class="token punctuation">[</span>word_pair<span class="token punctuation">.</span>first<span class="token punctuation">]</span><span class="token punctuation">;</span>inverted_list<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">move</span><span class="token punctuation">(</span>item<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//正排索引使用数组,数组的下标就是文档id</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>DocInfo<span class="token operator">></span> forward_index<span class="token punctuation">;</span><span class="token comment">//正排索引</span><span class="token comment">//倒排索引一定是一个关键字和一组InvertedElem对应</span>std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span> InvertedList_t<span class="token operator">></span> inverted_index<span class="token punctuation">;</span><span class="token comment">//倒排索引</span><span class="token punctuation">}</span><span class="token punctuation">;</span>Index<span class="token operator">*</span> Index<span class="token double-colon punctuation">::</span>instance <span class="token operator">=</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span> <span class="token comment">//单例模式的指针初始化</span>std<span class="token double-colon punctuation">::</span>mutex Index<span class="token double-colon punctuation">::</span>mtx<span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><h2>7. 搜索引擎模块</h2><p>下面我们开始编写搜索模块,创建一个searcher.hpp 文件。</p><p><strong>(1)搜索引擎代码框架:</strong></p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">pragma</span> <span class="token expression">once</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"index.hpp"</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"util.hpp"</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token keyword">namespace</span> ns_searcher<span class="token punctuation">{</span><span class="token keyword">struct</span> <span class="token class-name">InvertedElemPrint</span><span class="token punctuation">{</span><span class="token keyword">uint64_t</span> doc_id <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> word<span class="token punctuation">;</span><span class="token keyword">int</span> weight <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token keyword">class</span> <span class="token class-name">Searcher</span><span class="token punctuation">{</span><span class="token keyword">public</span><span class="token operator">:</span><span class="token function">Searcher</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token keyword">void</span> <span class="token function">InitSearcher</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> input<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token comment">//1. 获取或者创建index对象</span><span class="token comment">//2. 根据index对象建立索引</span><span class="token punctuation">}</span><span class="token comment">//query:搜索的关键字</span><span class="token comment">//json_string:返回个用户浏览器的数据搜索结果</span><span class="token keyword">void</span> <span class="token function">Search</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> query<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> json_string<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token comment">//1. 对query即关键字进行分词</span><span class="token comment">//2. 触发:就是根据分词的各个词,进行index查找</span><span class="token comment">//3. 合并排序,汇总结果,按照相关性降序排序</span><span class="token comment">//4. 构建:根据查找的结果,构建json串---jsoncpp</span><span class="token punctuation">}</span><span class="token operator">~</span><span class="token function">Searcher</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token keyword">private</span><span class="token operator">:</span>std<span class="token double-colon punctuation">::</span>string <span class="token function">GetDesc</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> html_content<span class="token punctuation">,</span> <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> word<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token comment">//找到word在html_content中首次出现,往前找50字节(如果没有从开始找),往后100字节</span><span class="token comment">//1. 找到首次出现</span><span class="token comment">//2. 获取start, end</span><span class="token comment">//3. 截取子串</span><span class="token punctuation">}</span>ns_index<span class="token double-colon punctuation">::</span>Index<span class="token operator">*</span> index<span class="token punctuation">;</span><span class="token comment">//供系统进行查找的索引</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><h3>7.1 InitSearcher函数</h3><p>(1)这个是我们初始化的工作,一共两个内容:</p><ul><li>拿到index对象</li><li>根据index建立索引</li></ul><pre><code class="prism language-cpp"><span class="token keyword">void</span> <span class="token function">InitSearcher</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> input<span class="token punctuation">)</span><span class="token comment">//input是数据源的地址</span><span class="token punctuation">{</span><span class="token comment">//1. 获取或者创建index对象</span>index <span class="token operator">=</span> ns_index<span class="token double-colon punctuation">::</span><span class="token class-name">Index</span><span class="token double-colon punctuation">::</span><span class="token function">GetInstance</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token function">LOG</span><span class="token punctuation">(</span>NORMAL<span class="token punctuation">,</span> <span class="token string">"获取单例index成功"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cout << "获取单例index成功" << std::endl;</span><span class="token comment">//2. 根据index对象建立索引</span>index<span class="token operator">-></span><span class="token function">BuildIndex</span><span class="token punctuation">(</span>input<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token function">LOG</span><span class="token punctuation">(</span>NORMAL<span class="token punctuation">,</span> <span class="token string">"建立正排和倒排成功"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cout << "建立正排和倒排成功" << std::endl;</span><span class="token punctuation">}</span></code></pre><h3>7.2 Search函数</h3><p>(1)这个是我们查找实现的具体流程,我们输入我们想要查找的内容,下面是我们<strong>函数的流程:</strong></p><ul><li>切分输入的内容,统一小写的保存在数组中。</li><li>根据数组的每一个元素,拿到倒排拉链,然后把所有的倒排拉量的内容保存在一个拉链中。</li><li>我们以降序的方式排序整个拉链。</li><li>根据拉链的id找到文档内容,构建json串。</li></ul><pre><code class="prism language-cpp"><span class="token comment">//query:搜索的关键字</span><span class="token comment">//json_string:返回个用户浏览器的数据搜索结果</span><span class="token keyword">void</span> <span class="token function">Search</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> query<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> json_string<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token comment">//1. 对query即关键字进行分词</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> words<span class="token punctuation">;</span>ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">Split</span><span class="token punctuation">(</span>query<span class="token punctuation">,</span> words<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//2. 触发:就是根据分词的各个词,进行index查找</span><span class="token comment">//ns_index::InvertedList_t inverted_list_all;</span>std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>InvertedElemPrint<span class="token operator">></span> inverted_list_all<span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span><span class="token keyword">uint64_t</span><span class="token punctuation">,</span> InvertedElemPrint<span class="token operator">></span> token_map<span class="token punctuation">;</span><span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> s <span class="token operator">:</span> words<span class="token punctuation">)</span><span class="token punctuation">{</span>boost<span class="token double-colon punctuation">::</span><span class="token function">to_lower</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span>ns_index<span class="token double-colon punctuation">::</span>InvertedList_t<span class="token operator">*</span> inverted_list <span class="token operator">=</span> index<span class="token operator">-></span><span class="token function">GetInveredList</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span>inverted_list <span class="token operator">==</span> <span class="token keyword">nullptr</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>WARNING<span class="token punctuation">,</span> <span class="token string">"通过关键字寻找文章失败"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">continue</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//不完美</span><span class="token comment">//inverted_list_all.insert(inverted_list_all.end(), inverted_list->begin(), inverted_list->end());</span><span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> elem <span class="token operator">:</span> <span class="token operator">*</span>inverted_list<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">auto</span><span class="token operator">&</span> item <span class="token operator">=</span> token_map<span class="token punctuation">[</span>elem<span class="token punctuation">.</span>doc_id<span class="token punctuation">]</span><span class="token punctuation">;</span>item<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> elem<span class="token punctuation">.</span>doc_id<span class="token punctuation">;</span>item<span class="token punctuation">.</span>weight <span class="token operator">+=</span> elem<span class="token punctuation">.</span>weight<span class="token punctuation">;</span>item<span class="token punctuation">.</span>word<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>elem<span class="token punctuation">.</span>word<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">const</span> <span class="token keyword">auto</span><span class="token operator">&</span> item <span class="token operator">:</span> token_map<span class="token punctuation">)</span><span class="token punctuation">{</span>inverted_list_all<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">move</span><span class="token punctuation">(</span>item<span class="token punctuation">.</span>second<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//3. 合并排序,汇总结果,按照相关性降序排序</span><span class="token comment">/*std::sort(inverted_list_all.begin(), inverted_list_all.end(), \[](const ns_index::InvertedElem& e1, const ns_index::InvertedElem& e2){return e1.weight > e2.weight;});*/</span>std<span class="token double-colon punctuation">::</span><span class="token function">sort</span><span class="token punctuation">(</span>inverted_list_all<span class="token punctuation">.</span><span class="token function">begin</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> inverted_list_all<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> \<span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">(</span><span class="token keyword">const</span> InvertedElemPrint<span class="token operator">&</span> e1<span class="token punctuation">,</span> <span class="token keyword">const</span> InvertedElemPrint<span class="token operator">&</span> e2<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">return</span> e1<span class="token punctuation">.</span>weight <span class="token operator">></span> e2<span class="token punctuation">.</span>weight<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//4. 构建:根据查找的结果,构建json串---jsoncpp</span>Json<span class="token double-colon punctuation">::</span>Value root<span class="token punctuation">;</span><span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> item <span class="token operator">:</span> inverted_list_all<span class="token punctuation">)</span><span class="token punctuation">{</span>ns_index<span class="token double-colon punctuation">::</span>DocInfo<span class="token operator">*</span> doc <span class="token operator">=</span> index<span class="token operator">-></span><span class="token function">GetForwardIndex</span><span class="token punctuation">(</span>item<span class="token punctuation">.</span>doc_id<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span>doc <span class="token operator">==</span> <span class="token keyword">nullptr</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">continue</span><span class="token punctuation">;</span><span class="token punctuation">}</span>Json<span class="token double-colon punctuation">::</span>Value elem<span class="token punctuation">;</span>elem<span class="token punctuation">[</span><span class="token string">"title"</span><span class="token punctuation">]</span> <span class="token operator">=</span> doc<span class="token operator">-></span>title<span class="token punctuation">;</span>elem<span class="token punctuation">[</span><span class="token string">"desc"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token function">GetDesc</span><span class="token punctuation">(</span>doc<span class="token operator">-></span>content<span class="token punctuation">,</span> item<span class="token punctuation">.</span>word<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//只需要展示一部分内容就可以</span>elem<span class="token punctuation">[</span><span class="token string">"url"</span><span class="token punctuation">]</span> <span class="token operator">=</span> doc<span class="token operator">-></span>url<span class="token punctuation">;</span>root<span class="token punctuation">.</span><span class="token function">append</span><span class="token punctuation">(</span>elem<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//Json::StyledWriter w;</span>Json<span class="token double-colon punctuation">::</span>FastWriter w<span class="token punctuation">;</span>json_string <span class="token operator">=</span> w<span class="token punctuation">.</span><span class="token function">write</span><span class="token punctuation">(</span>root<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p>(2)上面代码的实现有一个完美的地方,我们知道一个词可以映射到多个文档的id,那么多个关键字映射的文档id,就有可能进行冲突,例如下面的例子:</p><table><thead><tr><th>关键字(具有唯一性)</th><th>文档ID</th></tr></thead><tbody><tr><td>我的</td><td>1,2</td></tr><tr><td>手机</td><td>1,2</td></tr><tr><td>牌子</td><td>1,2</td></tr><tr><td>华为</td><td>1</td></tr><tr><td>小米</td><td>1</td></tr></tbody></table><p>我们把"我的、手机、牌子"进行分词,然后得到拉链,放在总拉链里面,这就是[文档1, 文档2,文档1, 文档2]这样就重复了,上面代码解决了该问题(解决问题的解析见本章第十小节)。</p><p><strong>(3)代码解析:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/27b0d94234fd4de08e9877236e2963a1.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/27b0d94234fd4de08e9877236e2963a1.png" /></p><h3>7.3 jsoncpp安装与使用</h3><p>下面我们需要说一下jsoncpp的安装与使用,前面使用了jsoncpp构建了json串,json是序列化和反序列化的。</p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ sudo yum install <span class="token operator">-</span>y jsoncpp-devel</code></pre><p><strong>(1)下面演示一下json的使用:</strong></p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token keyword">int</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span>Json<span class="token double-colon punctuation">::</span>Value root<span class="token punctuation">;</span>Json<span class="token double-colon punctuation">::</span>Value item1<span class="token punctuation">;</span>item1<span class="token punctuation">[</span><span class="token string">"key1"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token string">"value11"</span><span class="token punctuation">;</span>item1<span class="token punctuation">[</span><span class="token string">"key2"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token string">"value22"</span><span class="token punctuation">;</span>Json<span class="token double-colon punctuation">::</span>Value item2<span class="token punctuation">;</span>item2<span class="token punctuation">[</span><span class="token string">"key1"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token string">"value1"</span><span class="token punctuation">;</span>item2<span class="token punctuation">[</span><span class="token string">"key2"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token string">"value2"</span><span class="token punctuation">;</span>root<span class="token punctuation">.</span><span class="token function">append</span><span class="token punctuation">(</span>item1<span class="token punctuation">)</span><span class="token punctuation">;</span>root<span class="token punctuation">.</span><span class="token function">append</span><span class="token punctuation">(</span>item2<span class="token punctuation">)</span><span class="token punctuation">;</span>Json<span class="token double-colon punctuation">::</span>StyledWriter writer<span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>string s <span class="token operator">=</span> writer<span class="token punctuation">.</span><span class="token function">write</span><span class="token punctuation">(</span>root<span class="token punctuation">)</span><span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>cout <span class="token operator"><<</span> s <span class="token operator"><<</span> std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(2)运行结果:</strong></p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ g+<span class="token operator">+</span> test<span class="token punctuation">.</span><span class="token function">cpp</span><span class="token operator">-</span>ljsoncpp<span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ <span class="token punctuation">.</span><span class="token operator">/</span>a<span class="token punctuation">.</span>out <span class="token punctuation">[</span> <span class="token punctuation">{</span><span class="token string">"key1"</span> : <span class="token string">"value11"</span><span class="token punctuation">,</span><span class="token string">"key2"</span> : <span class="token string">"value22"</span> <span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token punctuation">{</span><span class="token string">"key1"</span> : <span class="token string">"value1"</span><span class="token punctuation">,</span><span class="token string">"key2"</span> : <span class="token string">"value2"</span> <span class="token punctuation">}</span><span class="token punctuation">]</span><span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ </code></pre><h3>7.4 搜索功能的测试</h3><p>(1)下面是搜索测试代码:</p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"searcher.hpp"</span></span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string input <span class="token operator">=</span> <span class="token string">"data/raw_html/raw.txt"</span><span class="token punctuation">;</span><span class="token keyword">int</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span>ns_searcher<span class="token double-colon punctuation">::</span>Searcher<span class="token operator">*</span> search <span class="token operator">=</span> <span class="token keyword">new</span> ns_searcher<span class="token double-colon punctuation">::</span><span class="token function">Searcher</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>search<span class="token operator">-></span><span class="token function">InitSearcher</span><span class="token punctuation">(</span>input<span class="token punctuation">)</span><span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>string query<span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>string json_string<span class="token punctuation">;</span><span class="token keyword">while</span> <span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">{</span>std<span class="token double-colon punctuation">::</span>cout <span class="token operator"><<</span> <span class="token string">"请输入关键字# "</span><span class="token punctuation">;</span><span class="token comment">//std::cin >> query;</span>std<span class="token double-colon punctuation">::</span><span class="token function">getline</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>cin<span class="token punctuation">,</span> query<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cout << query;</span>search<span class="token operator">-></span><span class="token function">Search</span><span class="token punctuation">(</span>query<span class="token punctuation">,</span> <span class="token operator">&</span>json_string<span class="token punctuation">)</span><span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>cout <span class="token operator"><<</span> json_string <span class="token operator"><<</span> std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p>下面我们测试一下,这是一个html文档的内容,由于内容实在是太多了,我们应该把内容给裁出来一部分,这样比较好。</p><pre><code class="prism language-powershell"><span class="token punctuation">{</span><span class="token string">"desc"</span> : <span class="token string">"Struct template bound_launcherHomeLibrariesPeopleFAQMoreStruct template bound_launcherboost::process::v2::bound_launcher — Utility class to bind initializers to a launcher. Synopsis// In header: <boost/process/v2/bind_launcher.hpp>template<typename Launcher, typename ... Init> struct bound_launcher {// construct/copy/destructtemplate<typename Launcher_, typename ... Init_> bound_launcher(Launcher_ &&, Init_ &&...);// public member functionstemplate<typename ExecutionContext, typename Args, typename ... Inits> auto operator()(ExecutionContext &, const typename std::enable_if< std::is_convertible< ExecutionContext &, boost::asio::execution_context & >::value, filesystem::path >::type &, Args &&, Inits &&...);template<typename ExecutionContext, typename Args, typename ... Inits> auto operator()(ExecutionContext &, error_code &, const typename std::enable_if< std::is_convertible< ExecutionContext &, boost::asio::execution_context & >::value, filesystem::path >::type &, Args &&, Inits &&...);template<typename Executor, typename Args, typename ... Inits> auto operator()(Executor, const typename std::enable_if< boost::asio::execution::is_executor< Executor >::value||boost::asio::is_executor< Executor >::value, filesystem::path >::type &, Args &&, Inits &&...);template<typename Executor, typename Args, typename ... Inits> auto operator()(Executor, error_code &, const typename std::enable_if< boost::asio::execution::is_executor< Executor >::value||boost::asio::is_executor< Executor >::value, filesystem::path >::type &, Args &&, Inits &&...);// private member functionstemplate<std::size_t ... Idx, typename ExecutionContext, typename Args,typename ... Inits> auto invoke(unspecified, ExecutionContext &, const typename std::enable_if< std::is_convertible< ExecutionContext &, boost::asio::execution_context & >::value, filesystem::path >::type &, Args &&, Inits &&...);template<std::size_t ... Idx, typename ExecutionContext, typename Args,typename ... Inits> auto invoke(unspecified, ExecutionContext &, error_code &, const typename std::enable_if< std::is_convertible< ExecutionContext &, boost::asio::execution_context & >::value, filesystem::path >::type &, Args &&, Inits &&...);template<std::size_t ... Idx, typename Executor, typename Args,typename ... Inits> auto invoke(unspecified, Executor, const typename std::enable_if< boost::asio::execution::is_executor< Executor >::value||boost::asio::is_executor< Executor >::value, filesystem::path >::type &, Args &&, Inits &&...);template<std::size_t ... Idx, typename Executor, typename Args,typename ... Inits> auto invoke(unspecified, Executor, error_code &, const typename std::enable_if< boost::asio::execution::is_executor< Executor >::value||boost::asio::is_executor< Executor >::value, filesystem::path >::type &, Args &&, Inits &&...);};DescriptionThis can be used when multiple processes shared some settings, e.g. Template Parameterstypename LauncherThe inner launcher to be used typename ... Initbound_launcher public construct/copy/destructtemplate<typename Launcher_, typename ... Init_> bound_launcher(Launcher_ && l, Init_ &&... init);bound_launcher public member functionstemplate<typename ExecutionContext, typename Args, typename ... Inits> auto operator()(ExecutionContext & context, const typename std::enable_if< std::is_convertible< ExecutionContext &, boost::asio::execution_context & >::value, filesystem::path >::type & executable, Args && args, Inits &&... inits);template<typename ExecutionContext, typename Args, typename ... Inits> auto operator()(ExecutionContext & context, error_code & ec, const typename std::enable_if< std::is_convertible< ExecutionContext &, boost::asio::execution_context & >::value, filesystem::path >::type & executable, Args && args, Inits &&... inits);template<typename Executor, typename Args, typename ... Inits> auto operator()(Executor exec, const typename std::enable_if< boost::asio::execution::is_executor< Executor >::value||boost::asio::is_executor< Executor >::value, filesystem::path >::type & executable, Args && args, Inits &&... inits);template<typename Executor, typename Args, typename ... Inits> auto operator()(Executor exec, error_code & ec, const typename std::enable_if< boost::asio::execution::is_executor< Executor >::value||boost::asio::is_executor< Executor >::value, filesystem::path >::type & executable, Args && args, Inits &&... inits);bound_launcher private member functionstemplate<std::size_t ... Idx, typename ExecutionContext, typename Args,typename ... Inits> auto invoke(unspecified, ExecutionContext & context, const typename std::enable_if< std::is_convertible< ExecutionContext &, boost::asio::execution_context & >::value, filesystem::path >::type & executable, Args && args, Inits &&... inits);template<std::size_t ... Idx, typename ExecutionContext, typename Args,typename ... Inits> auto invoke(unspecified, ExecutionContext & context, error_code & ec, const typename std::enable_if< std::is_convertible< ExecutionContext &, boost::asio::execution_context & >::value, filesystem::path >::type & executable, Args && args, Inits &&... inits);template<std::size_t ... Idx, typename Executor, typename Args,typename ... Inits> auto invoke(unspecified, Executor exec, const typename std::enable_if< boost::asio::execution::is_executor< Executor >::value||boost::asio::is_executor< Executor >::value, filesystem::path >::type & executable, Args && args, Inits &&... inits);template<std::size_t ... Idx, typename Executor, typename Args,typename ... Inits> auto invoke(unspecified, Executor exec, error_code & ec, const typename std::enable_if< boost::asio::execution::is_executor< Executor >::value||boost::asio::is_executor< Executor >::value, filesystem::path >::type & executable, Args && args, Inits &&... inits);Copyright © 2006-2012 Julio M. Merino Vidal, Ilya Sokolov,Felipe Tanus, Jeff Flinn, Boris SchaelingCopyright © 2016 Klemens D. MorgensternDistributed under the Boost Software License, Version 1.0. (See accompanyingfile LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)"</span><span class="token punctuation">,</span><span class="token string">"title"</span> : <span class="token string">"Struct template bound_launcher"</span><span class="token punctuation">,</span><span class="token string">"url"</span> : <span class="token string">"https://www.boost.org/doc/libs/1_84_0/doc/html/boost/process/v2/bound_launcher.html"</span><span class="token punctuation">}</span>,</code></pre><h3>7.5 获取内容摘要</h3><p>当我们搜索出结果的时候并不需要全部展示内容,所以需要获取内容的摘要。<br /> <strong>(1)具体实现:</strong></p><pre><code class="prism language-cpp">std<span class="token double-colon punctuation">::</span>string <span class="token function">GetDesc</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> html_content<span class="token punctuation">,</span> <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> word<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token comment">//找到word在html_content中首次出现,往前找50字节(如果没有从开始找),往后100字节</span><span class="token keyword">const</span> <span class="token keyword">int</span> prev_step <span class="token operator">=</span> <span class="token number">50</span><span class="token punctuation">;</span><span class="token keyword">const</span> <span class="token keyword">int</span> next_step <span class="token operator">=</span> <span class="token number">100</span><span class="token punctuation">;</span><span class="token comment">//1. 找到首次出现</span><span class="token keyword">auto</span> iter <span class="token operator">=</span> std<span class="token double-colon punctuation">::</span><span class="token function">search</span><span class="token punctuation">(</span>html_content<span class="token punctuation">.</span><span class="token function">begin</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> html_content<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> word<span class="token punctuation">.</span><span class="token function">begin</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> word<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> \<span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">(</span><span class="token keyword">int</span> x<span class="token punctuation">,</span> <span class="token keyword">int</span> y<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">return</span> <span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">tolower</span><span class="token punctuation">(</span>x<span class="token punctuation">)</span> <span class="token operator">==</span> std<span class="token double-colon punctuation">::</span><span class="token function">tolower</span><span class="token punctuation">(</span>y<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span>iter <span class="token operator">==</span> html_content<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">return</span> <span class="token string">"None1"</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">int</span> pos <span class="token operator">=</span> std<span class="token double-colon punctuation">::</span><span class="token function">distance</span><span class="token punctuation">(</span>html_content<span class="token punctuation">.</span><span class="token function">begin</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span> iter<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//2. 获取start, end</span><span class="token keyword">int</span> start <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token keyword">int</span> end <span class="token operator">=</span> html_content<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">-</span> <span class="token number">1</span><span class="token punctuation">;</span><span class="token comment">//如果之前有50个就更新</span><span class="token keyword">if</span> <span class="token punctuation">(</span>pos <span class="token operator">></span> start <span class="token operator">+</span> prev_step<span class="token punctuation">)</span><span class="token punctuation">{</span>start <span class="token operator">=</span> pos <span class="token operator">-</span> prev_step<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//如果后面有100个就更新</span><span class="token keyword">if</span> <span class="token punctuation">(</span>pos <span class="token operator"><</span> end <span class="token operator">-</span> next_step<span class="token punctuation">)</span><span class="token punctuation">{</span>end <span class="token operator">=</span> pos <span class="token operator">+</span> next_step<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">//3. 截取子串</span><span class="token keyword">if</span> <span class="token punctuation">(</span>start <span class="token operator">>=</span> end<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">return</span> <span class="token string">"None2"</span><span class="token punctuation">;</span><span class="token punctuation">}</span>std<span class="token double-colon punctuation">::</span>string desc <span class="token operator">=</span> html_content<span class="token punctuation">.</span><span class="token function">substr</span><span class="token punctuation">(</span>start<span class="token punctuation">,</span> end <span class="token operator">-</span> start<span class="token punctuation">)</span><span class="token punctuation">;</span>desc <span class="token operator">+=</span> <span class="token string">"..."</span><span class="token punctuation">;</span><span class="token keyword">return</span> desc<span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/cbcae01cf89b467bbb427df2177f0993.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/cbcae01cf89b467bbb427df2177f0993.png" /><br /> <strong>(2)测试结果:</strong></p><pre><code class="prism language-powershell">请输入关键字<span class="token comment"># filesystem</span><span class="token punctuation">[</span> <span class="token punctuation">{</span><span class="token string">"desc"</span> : <span class="token string">"boost::asio::execution_context & >::value, filesystem::path >::type &, Args &&, Inits &&...);templ...."</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">,</span><span class="token string">"title"</span> : <span class="token string">"Struct template bound_launcher"</span><span class="token punctuation">,</span><span class="token string">"url"</span> : <span class="token string">"https://www.boost.org/doc/libs/1_84_0/doc/html/boost/process/v2/bound_launcher.html"</span> <span class="token punctuation">}</span><span class="token punctuation">,</span> <span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">.</span><span class="token punctuation">]</span></code></pre><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/9078b4cda38848e6bf66aed62b344f1b.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/9078b4cda38848e6bf66aed62b344f1b.png" /></p><h2>8. 搜索服务端</h2><p>接下来开始编写网络版本的服务端,创建http_server.cpp文件。</p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"searcher.hpp"</span></span><span class="token keyword">int</span> <span class="token function">mian</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p>这里通信我们可以自己写,后面我们会升级,不过这里我们使用cpp-httplib库,这个库很简单,但是cpp-httplib有点问题,我们<font color="red">需要较新版本的编译器,否则就是编译不通过,或者是运行出现错误。</font></p><h3>8.1 升级gcc版本</h3><p>下面是更新新的gcc版本:</p><pre><code class="prism language-powershell"><span class="token operator">/</span><span class="token operator">/</span>安装scl<span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ sudo yum install centos-release-scl scl-utils-build<span class="token operator">/</span><span class="token operator">/</span>安装新版本gcc<span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ sudo yum install <span class="token operator">-</span>y devtoolset-7-gcc devtoolset-7-gcc-c+<span class="token operator">+</span><span class="token operator">/</span><span class="token operator">/</span>启动: 细节,命令行启动只能在本会话有效<span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ scl enable devtoolset-7 bash</code></pre><h3>8.2 引入cpp-httplib库</h3><ol><li><p>这里我们选择下载0.7.15版本,这是因为较新版本的可能运行时会报错。</p></li><li><p>这里我们选择下载到桌面,然后拖拽到轻量级服务器上,这些方法都试一遍.</p></li></ol><p>(1)cpp-httplib库链接:https://gitee.com/linzhipong/cpp-httplib/tree/v0.7.15</p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/4db129f2b1b84dbbb92c9e2ba4c4d845.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/4db129f2b1b84dbbb92c9e2ba4c4d845.png" /></p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos http]</span>$ rz <span class="token operator">-</span>E <span class="token namespace">[xiaomaker@VM-28-13-centos http]</span>$ lltotal 4drwxr-xr-x 6 root root 4096 Nov 192020 <span class="token function">cpp</span><span class="token operator">-</span>httplib-v0<span class="token punctuation">.</span>7<span class="token punctuation">.</span>15<span class="token namespace">[xiaomaker@VM-28-13-centos http]</span>$</code></pre><p><strong>(2)这时候可以创建软链接到我们的项目中:</strong></p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ ln <span class="token operator">-</span>s <span class="token operator">/</span>home/xiaomaker/code_cpp/http/<span class="token function">cpp</span><span class="token operator">-</span>httplib-v0<span class="token punctuation">.</span>7<span class="token punctuation">.</span>15/ <span class="token function">cpp</span><span class="token operator">-</span>httplib</code></pre><h3>8.3 测试cpp-httplib</h3><p>在测试一下httplib库之前,我们不仅需要链接cpp-httplib库,还需要链接pthread库,因为该库用到了多线程,所以需要链接多线程库。<br /> <strong>(1)测试代码:</strong></p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"cpp-httplib/httplib.h"</span></span><span class="token keyword">int</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span>httplib<span class="token double-colon punctuation">::</span>Server svr<span class="token punctuation">;</span>svr<span class="token punctuation">.</span><span class="token function">Get</span><span class="token punctuation">(</span><span class="token string">"hi"</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">(</span><span class="token keyword">const</span> httplib<span class="token double-colon punctuation">::</span>Request<span class="token operator">&</span> req<span class="token punctuation">,</span> httplib<span class="token double-colon punctuation">::</span>Response<span class="token operator">&</span> rsp<span class="token punctuation">)</span> <span class="token punctuation">{</span>rsp<span class="token punctuation">.</span><span class="token function">set_content</span><span class="token punctuation">(</span><span class="token string">"hello word!"</span><span class="token punctuation">,</span> <span class="token string">"text/plain; charset=utf-8"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>svr<span class="token punctuation">.</span><span class="token function">listen</span><span class="token punctuation">(</span><span class="token string">"0.0.0.0"</span><span class="token punctuation">,</span> <span class="token number">8081</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(2)运行之后查看服务器:</strong></p><pre><code class="prism language-powershell"><span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$ netstat <span class="token operator">-</span>ntlp<span class="token punctuation">(</span>Not all processes could be identified<span class="token punctuation">,</span> non-owned <span class="token keyword">process</span> info will not be shown<span class="token punctuation">,</span> you would have to be root to see it all<span class="token punctuation">.</span><span class="token punctuation">)</span>Active Internet connections <span class="token punctuation">(</span>only servers<span class="token punctuation">)</span>Proto Recv-Q <span class="token function">Send-Q</span> Local Address Foreign Address State PID/Program nametcp00 127<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>1:44227 0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0:<span class="token operator">*</span> LISTEN1903/node tcp00 0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0:111 0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0:<span class="token operator">*</span> LISTEN<span class="token operator">-</span> tcp00 0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0:80810<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0:<span class="token operator">*</span> LISTEN4191/<span class="token punctuation">.</span><span class="token operator">/</span>http_servertcp00 192<span class="token punctuation">.</span>168<span class="token punctuation">.</span>122<span class="token punctuation">.</span>1:530<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0:<span class="token operator">*</span> LISTEN<span class="token operator">-</span> tcp00 0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0:220<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0:<span class="token operator">*</span> LISTEN<span class="token operator">-</span> tcp00 127<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>1:631 0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0:<span class="token operator">*</span> LISTEN<span class="token operator">-</span> tcp00 127<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>1:250<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0<span class="token punctuation">.</span>0:<span class="token operator">*</span> LISTEN<span class="token operator">-</span> tcp6 00 :::111:::<span class="token operator">*</span>LISTEN<span class="token operator">-</span> tcp6 00 :::22 :::<span class="token operator">*</span>LISTEN<span class="token operator">-</span> tcp6 00 ::1:631 :::<span class="token operator">*</span>LISTEN<span class="token operator">-</span> tcp6 00 ::1:25:::<span class="token operator">*</span>LISTEN<span class="token operator">-</span> <span class="token namespace">[xiaomaker@VM-28-13-centos boost_searcher]</span>$</code></pre><p>运行之前自己的服务器需要开放端口号才可以。</p><p><strong>(3)运行结果:</strong><br /> <noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/282870faee5e4c838486271763914908.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/282870faee5e4c838486271763914908.png" /></p><h3>8.4 设置根目录</h3><p>一般而言我们需要有一个根目录,这样就可以实现前端,创建wwwroot目录。<br /> <strong>(1)在服务器上面设置跟目录:</strong></p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"cpp-httplib/httplib.h"</span></span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string root_path <span class="token operator">=</span> <span class="token string">"./wwwroot"</span><span class="token punctuation">;</span><span class="token keyword">int</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span>httplib<span class="token double-colon punctuation">::</span>Server svr<span class="token punctuation">;</span><span class="token comment">// 设置跟目录</span>svr<span class="token punctuation">.</span><span class="token function">set_base_dir</span><span class="token punctuation">(</span>root_path<span class="token punctuation">.</span><span class="token function">c_str</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>svr<span class="token punctuation">.</span><span class="token function">Get</span><span class="token punctuation">(</span><span class="token string">"hi"</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token punctuation">]</span><span class="token punctuation">(</span><span class="token keyword">const</span> httplib<span class="token double-colon punctuation">::</span>Request<span class="token operator">&</span> req<span class="token punctuation">,</span> httplib<span class="token double-colon punctuation">::</span>Response<span class="token operator">&</span> rsp<span class="token punctuation">)</span> <span class="token punctuation">{</span>rsp<span class="token punctuation">.</span><span class="token function">set_content</span><span class="token punctuation">(</span><span class="token string">"hello word!"</span><span class="token punctuation">,</span> <span class="token string">"text/plain; charset=utf-8"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>svr<span class="token punctuation">.</span><span class="token function">listen</span><span class="token punctuation">(</span><span class="token string">"0.0.0.0"</span><span class="token punctuation">,</span> <span class="token number">8080</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(2)测试结果:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/5eb3fdea80e34921a34b28dc2a2111cb.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/5eb3fdea80e34921a34b28dc2a2111cb.png" /><br /> (3)这是因为我们的根目录下面什么都没有,我们在wwwroot目录下创建index.html文件,在这里设置一下:</p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/7df48a6961034bec999e34328d3871c6.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/7df48a6961034bec999e34328d3871c6.png" /></p><p><strong>(4)再次测试:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/88c703d5d74e4f65838c81cb38e2dce6.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/88c703d5d74e4f65838c81cb38e2dce6.png" /></p><h3>8.5 编写搜索服务端</h3><p><strong>(1)下面编写我们的服务端了:</strong></p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"searcher.hpp"</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">"cpp-httplib/httplib.h"</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string input <span class="token operator">=</span> <span class="token string">"data/raw_html/raw.txt"</span><span class="token punctuation">;</span> <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string root_path <span class="token operator">=</span> <span class="token string">"./wwwroot"</span><span class="token punctuation">;</span><span class="token keyword">int</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span>ns_searcher<span class="token double-colon punctuation">::</span>Searcher search<span class="token punctuation">;</span>search<span class="token punctuation">.</span><span class="token function">InitSearcher</span><span class="token punctuation">(</span>input<span class="token punctuation">)</span><span class="token punctuation">;</span>httplib<span class="token double-colon punctuation">::</span>Server svr<span class="token punctuation">;</span>svr<span class="token punctuation">.</span><span class="token function">set_base_dir</span><span class="token punctuation">(</span>root_path<span class="token punctuation">.</span><span class="token function">c_str</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>svr<span class="token punctuation">.</span><span class="token function">Get</span><span class="token punctuation">(</span><span class="token string">"/s"</span><span class="token punctuation">,</span> <span class="token punctuation">[</span><span class="token operator">&</span>search<span class="token punctuation">]</span><span class="token punctuation">(</span><span class="token keyword">const</span> httplib<span class="token double-colon punctuation">::</span>Request<span class="token operator">&</span> req<span class="token punctuation">,</span> httplib<span class="token double-colon punctuation">::</span>Response<span class="token operator">&</span> rsp<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">if</span><span class="token punctuation">(</span><span class="token operator">!</span>req<span class="token punctuation">.</span><span class="token function">has_param</span><span class="token punctuation">(</span><span class="token string">"word"</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span>rsp<span class="token punctuation">.</span><span class="token function">set_content</span><span class="token punctuation">(</span><span class="token string">"必须要有搜索关键字"</span><span class="token punctuation">,</span> <span class="token string">"text/plain: charset=utf-8"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span><span class="token punctuation">;</span><span class="token punctuation">}</span>std<span class="token double-colon punctuation">::</span>string word <span class="token operator">=</span> req<span class="token punctuation">.</span><span class="token function">get_param_value</span><span class="token punctuation">(</span><span class="token string">"word"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token function">LOG</span><span class="token punctuation">(</span>NORMAL<span class="token punctuation">,</span> <span class="token string">"用户在搜索"</span> <span class="token operator">+</span> word<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//std::cout << "用户在搜索" << word << std::endl;</span>std<span class="token double-colon punctuation">::</span>string json_string<span class="token punctuation">;</span>search<span class="token punctuation">.</span><span class="token function">Search</span><span class="token punctuation">(</span>word<span class="token punctuation">,</span> json_string<span class="token punctuation">)</span><span class="token punctuation">;</span>rsp<span class="token punctuation">.</span><span class="token function">set_content</span><span class="token punctuation">(</span>json_string<span class="token punctuation">,</span> <span class="token string">"application/json"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//rsp.set_content("hello world", "text/plain: charset=utf-8");</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token function">LOG</span><span class="token punctuation">(</span>NORMAL<span class="token punctuation">,</span> <span class="token string">"服务器启动成功"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>svr<span class="token punctuation">.</span><span class="token function">listen</span><span class="token punctuation">(</span><span class="token string">"0.0.0.0"</span><span class="token punctuation">,</span> <span class="token number">8080</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p><strong>(2)代码解析:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/72dafac951674df19b1978491a0b08bd.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/72dafac951674df19b1978491a0b08bd.png" /></p><h2>9. 前端代码</h2><p>前端部分我们可以选学,这里我们也不谈,如果想学,可以去下面的网站:</p><ul><li><strong>HTML</strong>:编写网页结构,网页的骨骼。</li><li><strong>CSS</strong>:网页样式,网页的皮肉。</li><li><strong>Js</strong>:前后端交互,网页的灵魂。</li></ul><p>前端学习网站推荐:http://www.w3school.com.cn</p><h3>9.1 网页结构</h3><p><strong>(1)设置的网页结构是这样的:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/e4fdd4b91e224500801d6e36d95dad2f.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/e4fdd4b91e224500801d6e36d95dad2f.png" /><br /> <strong>(2)按照上面的内容,我们的html代码:</strong></p><pre><code class="prism language-html"><span class="token doctype"><span class="token punctuation"><!</span><span class="token doctype-tag">DOCTYPE</span> <span class="token name">html</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>html</span> <span class="token attr-name">lang</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>en<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>head</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">charset</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>UTF-8<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">http-equiv</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>X-UA-Compatible<span class="token punctuation">"</span></span> <span class="token attr-name">content</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>IE=edge<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>viewport<span class="token punctuation">"</span></span> <span class="token attr-name">content</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>width=device-width, initial-scale=1.0<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>boost 搜索引擎<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>head</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>body</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>container<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>search<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>input</span> <span class="token attr-name">type</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>text<span class="token punctuation">"</span></span> <span class="token attr-name">value</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>输入搜索关键字...<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>button</span><span class="token punctuation">></span></span>搜索一下<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>button</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>result<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>item<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>#<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>这是标题<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>i</span><span class="token punctuation">></span></span>https://search.gitee.com/" /><span class="token tag"><span class="token punctuation"></</span>i</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>item<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>#<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>这是标题<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>i</span><span class="token punctuation">></span></span>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>i</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>item<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>#<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>这是标题<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>i</span><span class="token punctuation">></span></span>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>i</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>item<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>#<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>这是标题<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>i</span><span class="token punctuation">></span></span>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>i</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>item<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>#<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>这是标题<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>i</span><span class="token punctuation">></span></span>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>i</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>body</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>html</span><span class="token punctuation">></span></span></code></pre><p><strong>(3)运行结果:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/7f880cf74b4e42dba57b02fd73342e3a.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/7f880cf74b4e42dba57b02fd73342e3a.png" /></p><h3>9.2 网页样式</h3><p><strong>(1)上面的网有点丑,所以这里我们要给他美颜一下:</strong></p><pre><code class="prism language-html"><span class="token doctype"><span class="token punctuation"><!</span><span class="token doctype-tag">DOCTYPE</span> <span class="token name">html</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>html</span> <span class="token attr-name">lang</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>en<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>head</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">charset</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>UTF-8<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">http-equiv</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>X-UA-Compatible<span class="token punctuation">"</span></span> <span class="token attr-name">content</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>IE=edge<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>viewport<span class="token punctuation">"</span></span> <span class="token attr-name">content</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>width=device-width, initial-scale=1.0<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>boost 搜索引擎<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>style</span><span class="token punctuation">></span></span><span class="token style"><span class="token language-css"><span class="token comment">/* 去掉网页中的所有的默认内外边距,html的盒子模型 */</span><span class="token selector">*</span> <span class="token punctuation">{</span><span class="token comment">/* 设置外边距 */</span><span class="token property">margin</span><span class="token punctuation">:</span> 0<span class="token punctuation">;</span><span class="token comment">/* 设置内边距 */</span><span class="token property">padding</span><span class="token punctuation">:</span> 0<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">/* 将我们的body内的内容100%和html的呈现吻合 */</span><span class="token selector">html,body</span> <span class="token punctuation">{</span><span class="token property">height</span><span class="token punctuation">:</span> 100%<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">/* 类选择器.container */</span><span class="token selector">.container</span> <span class="token punctuation">{</span><span class="token comment">/* 设置div的宽度 */</span><span class="token property">width</span><span class="token punctuation">:</span> 800px<span class="token punctuation">;</span><span class="token comment">/* 通过设置外边距达到居中对齐的目的 */</span><span class="token property">margin</span><span class="token punctuation">:</span> 0px auto<span class="token punctuation">;</span><span class="token comment">/* 设置外边距的上边距,保持元素和网页的上部距离 */</span><span class="token property">margin-top</span><span class="token punctuation">:</span> 15px<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">/* 复合选择器,选中container 下的 search */</span><span class="token selector">.container .search</span> <span class="token punctuation">{</span><span class="token comment">/* 宽度与父标签保持一致 */</span><span class="token property">width</span><span class="token punctuation">:</span> 100%<span class="token punctuation">;</span><span class="token comment">/* 高度设置为52px */</span><span class="token property">height</span><span class="token punctuation">:</span> 52px<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">/* 先选中input标签, 直接设置标签的属性,先要选中, input:标签选择器*/</span><span class="token comment">/* input在进行高度设置的时候,没有考虑边框的问题 */</span><span class="token selector">.container .search input</span> <span class="token punctuation">{</span><span class="token comment">/* 设置left浮动 */</span><span class="token property">float</span><span class="token punctuation">:</span> left<span class="token punctuation">;</span><span class="token property">width</span><span class="token punctuation">:</span> 600px<span class="token punctuation">;</span><span class="token property">height</span><span class="token punctuation">:</span> 50px<span class="token punctuation">;</span><span class="token comment">/* 设置边框属性:边框的宽度,样式,颜色 */</span><span class="token property">border</span><span class="token punctuation">:</span> 1px solid black<span class="token punctuation">;</span><span class="token comment">/* 去掉input输入框的有边框 */</span><span class="token property">border-right</span><span class="token punctuation">:</span> none<span class="token punctuation">;</span><span class="token comment">/* 设置内边距,默认文字不要和左侧边框紧挨着 */</span><span class="token property">padding-left</span><span class="token punctuation">:</span> 10px<span class="token punctuation">;</span><span class="token comment">/* 设置input内部的字体的颜色和样式 */</span><span class="token property">color</span><span class="token punctuation">:</span> #CCC<span class="token punctuation">;</span><span class="token property">font-size</span><span class="token punctuation">:</span> 15px<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">/* 先选中button标签, 直接设置标签的属性,先要选中, button:标签选择器*/</span><span class="token selector">.container .search button</span> <span class="token punctuation">{</span><span class="token comment">/* 设置left浮动 */</span><span class="token property">float</span><span class="token punctuation">:</span> left<span class="token punctuation">;</span><span class="token property">width</span><span class="token punctuation">:</span> 150px<span class="token punctuation">;</span><span class="token property">height</span><span class="token punctuation">:</span> 52px<span class="token punctuation">;</span><span class="token comment">/* 设置button的背景颜色,#4e6ef2 */</span><span class="token property">background-color</span><span class="token punctuation">:</span> #4e6ef2<span class="token punctuation">;</span><span class="token comment">/* 设置button中的字体颜色 */</span><span class="token property">color</span><span class="token punctuation">:</span> #FFF<span class="token punctuation">;</span><span class="token comment">/* 设置字体的大小 */</span><span class="token property">font-size</span><span class="token punctuation">:</span> 19px<span class="token punctuation">;</span><span class="token property">font-family</span><span class="token punctuation">:</span> Georgia<span class="token punctuation">,</span> <span class="token string">'Times New Roman'</span><span class="token punctuation">,</span> Times<span class="token punctuation">,</span> serif<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result</span> <span class="token punctuation">{</span><span class="token property">width</span><span class="token punctuation">:</span> 100%<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result .item</span> <span class="token punctuation">{</span><span class="token property">margin-top</span><span class="token punctuation">:</span> 15px<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result .item a</span> <span class="token punctuation">{</span><span class="token comment">/* 设置为块级元素,单独站一行 */</span><span class="token property">display</span><span class="token punctuation">:</span> block<span class="token punctuation">;</span><span class="token comment">/* a标签的下划线去掉 */</span><span class="token property">text-decoration</span><span class="token punctuation">:</span> none<span class="token punctuation">;</span><span class="token comment">/* 设置a标签中的文字的字体大小 */</span><span class="token property">font-size</span><span class="token punctuation">:</span> 20px<span class="token punctuation">;</span><span class="token comment">/* 设置字体的颜色 */</span><span class="token property">color</span><span class="token punctuation">:</span> #4e6ef2<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result .item a:hover</span> <span class="token punctuation">{</span><span class="token comment">/*设置鼠标放在a之上的动态效果*/</span><span class="token property">text-decoration</span><span class="token punctuation">:</span> underline<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result .item p</span> <span class="token punctuation">{</span><span class="token property">margin-top</span><span class="token punctuation">:</span> 5px<span class="token punctuation">;</span><span class="token property">font-size</span><span class="token punctuation">:</span> 16px<span class="token punctuation">;</span><span class="token property">font-family</span><span class="token punctuation">:</span> <span class="token string">'Lucida Sans'</span><span class="token punctuation">,</span> <span class="token string">'Lucida Sans Regular'</span><span class="token punctuation">,</span> <span class="token string">'Lucida Grande'</span><span class="token punctuation">,</span> <span class="token string">'Lucida SansUnicode'</span><span class="token punctuation">,</span> Geneva<span class="token punctuation">,</span> Verdana<span class="token punctuation">,</span> sans-serif<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result .item i</span> <span class="token punctuation">{</span><span class="token comment">/* 设置为块级元素,单独站一行 */</span><span class="token property">display</span><span class="token punctuation">:</span> block<span class="token punctuation">;</span><span class="token comment">/* 取消斜体风格 */</span><span class="token property">font-style</span><span class="token punctuation">:</span> normal<span class="token punctuation">;</span><span class="token property">color</span><span class="token punctuation">:</span> green<span class="token punctuation">;</span><span class="token punctuation">}</span></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>style</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>head</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>body</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>container<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>search<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>input</span> <span class="token attr-name">type</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>text<span class="token punctuation">"</span></span> <span class="token attr-name">value</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>输入搜索关键字...<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>button</span><span class="token punctuation">></span></span>搜索一下<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>button</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>result<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>item<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>#<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>这是标题<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>i</span><span class="token punctuation">></span></span>https://search.gitee.com/" /><span class="token tag"><span class="token punctuation"></</span>i</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>item<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>#<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>这是标题<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>i</span><span class="token punctuation">></span></span>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>i</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>item<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>#<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>这是标题<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>i</span><span class="token punctuation">></span></span>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>i</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>item<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>#<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>这是标题<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>i</span><span class="token punctuation">></span></span>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>i</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>item<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>a</span> <span class="token attr-name">href</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>#<span class="token punctuation">"</span></span><span class="token punctuation">></span></span>这是标题<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>a</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>p</span><span class="token punctuation">></span></span>这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要这是摘要<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>p</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>i</span><span class="token punctuation">></span></span>https://search.gitee.com/?skin=rec&type=repository&q=cpp-httplib<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>i</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>body</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>html</span><span class="token punctuation">></span></span></code></pre><p><strong>(2)运行结果:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/fddb4570eec04ecbb0809dbf82d93ace.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/fddb4570eec04ecbb0809dbf82d93ace.png" /></p><h3>9.3 前后端交互</h3><p><strong>(1)下面我们继续使用前后端交互,也是直接贴代码:</strong></p><pre><code class="prism language-html"><span class="token comment"></span><span class="token doctype"><span class="token punctuation"><!</span><span class="token doctype-tag">DOCTYPE</span> <span class="token name">html</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>html</span> <span class="token attr-name">lang</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>en<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>head</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">charset</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>UTF-8<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">http-equiv</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>X-UA-Compatible<span class="token punctuation">"</span></span> <span class="token attr-name">content</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>IE=edge<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>meta</span> <span class="token attr-name">name</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>viewport<span class="token punctuation">"</span></span> <span class="token attr-name">content</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>width=device-width, initial-scale=1.0<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>script</span> <span class="token attr-name">src</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>http://code.jquery.com/jquery-2.1.1.min.js<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token script"></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>script</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>title</span><span class="token punctuation">></span></span>boost 搜索引擎<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>title</span><span class="token punctuation">></span></span><span class="token comment"></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>style</span><span class="token punctuation">></span></span><span class="token style"><span class="token language-css"><span class="token selector">*</span> <span class="token punctuation">{</span><span class="token comment">/* 设置外边距 */</span><span class="token property">margin</span><span class="token punctuation">:</span> 0<span class="token punctuation">;</span><span class="token comment">/* 设置内边距 */</span><span class="token property">padding</span><span class="token punctuation">:</span> 0<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">html,body</span> <span class="token punctuation">{</span><span class="token property">height</span><span class="token punctuation">:</span> 100%<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">/* 居中显式以点开头的我们称之类选择器 */</span><span class="token selector">.container</span> <span class="token punctuation">{</span><span class="token comment">/* 这是最大框架 */</span><span class="token property">width</span><span class="token punctuation">:</span> 800px<span class="token punctuation">;</span><span class="token property">margin</span><span class="token punctuation">:</span> 0px auto<span class="token punctuation">;</span><span class="token property">margin-top</span><span class="token punctuation">:</span> 15px<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">/* 复合选择器 */</span><span class="token selector">.container .search</span> <span class="token punctuation">{</span><span class="token property">width</span><span class="token punctuation">:</span> 100%<span class="token punctuation">;</span><span class="token comment">/* 为何是52我们后面解释 */</span><span class="token property">height</span><span class="token punctuation">:</span> 52px<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .search input</span> <span class="token punctuation">{</span><span class="token comment">/* 加上浮动 */</span><span class="token property">float</span><span class="token punctuation">:</span> left<span class="token punctuation">;</span><span class="token property">width</span><span class="token punctuation">:</span> 600px<span class="token punctuation">;</span><span class="token property">height</span><span class="token punctuation">:</span> 50px<span class="token punctuation">;</span><span class="token comment">/* 设置边框 */</span><span class="token property">border</span><span class="token punctuation">:</span> 1px solid black<span class="token punctuation">;</span><span class="token comment">/* 去掉右边距 */</span><span class="token property">border-right</span><span class="token punctuation">:</span> none<span class="token punctuation">;</span><span class="token property">padding-left</span><span class="token punctuation">:</span> 10px<span class="token punctuation">;</span><span class="token property">color</span><span class="token punctuation">:</span> #ccc<span class="token punctuation">;</span><span class="token property">font-size</span><span class="token punctuation">:</span> 15px<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .search button</span> <span class="token punctuation">{</span><span class="token comment">/* 加上浮动 */</span><span class="token property">float</span><span class="token punctuation">:</span> left<span class="token punctuation">;</span><span class="token property">width</span><span class="token punctuation">:</span> 120px<span class="token punctuation">;</span><span class="token property">height</span><span class="token punctuation">:</span> 52px<span class="token punctuation">;</span><span class="token comment">/* 设置背景颜色 */</span><span class="token property">background-color</span><span class="token punctuation">:</span> #4e6ef2<span class="token punctuation">;</span><span class="token comment">/* 设置字体颜色 */</span><span class="token property">color</span><span class="token punctuation">:</span> #fff<span class="token punctuation">;</span><span class="token comment">/* 设置字体大小 */</span><span class="token property">font-size</span><span class="token punctuation">:</span> 19px<span class="token punctuation">;</span><span class="token comment">/* 设置字体样式 */</span><span class="token property">font-family</span><span class="token punctuation">:</span> <span class="token string">'Times New Roman'</span><span class="token punctuation">,</span> Times<span class="token punctuation">,</span> serif<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result</span> <span class="token punctuation">{</span><span class="token property">width</span><span class="token punctuation">:</span> 100%<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result .item</span> <span class="token punctuation">{</span><span class="token property">margin-top</span><span class="token punctuation">:</span> 15px<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result .item a</span> <span class="token punctuation">{</span><span class="token property">display</span><span class="token punctuation">:</span> block<span class="token punctuation">;</span><span class="token comment">/* 去掉下划线 */</span><span class="token property">text-decoration</span><span class="token punctuation">:</span> none<span class="token punctuation">;</span><span class="token property">font-size</span><span class="token punctuation">:</span> 20px<span class="token punctuation">;</span><span class="token property">color</span><span class="token punctuation">:</span> #4e6ef2<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result .item a:hover</span> <span class="token punctuation">{</span><span class="token property">text-decoration</span><span class="token punctuation">:</span> underline<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result .item p</span> <span class="token punctuation">{</span><span class="token property">margin</span><span class="token punctuation">:</span> 5px<span class="token punctuation">;</span><span class="token property">font-size</span><span class="token punctuation">:</span> 16px<span class="token punctuation">;</span><span class="token property">font-family</span><span class="token punctuation">:</span> <span class="token string">'Times New Roman'</span><span class="token punctuation">,</span> Times<span class="token punctuation">,</span> serif<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token selector">.container .result .item i</span> <span class="token punctuation">{</span><span class="token property">display</span><span class="token punctuation">:</span> block<span class="token punctuation">;</span><span class="token comment">/* 取消斜体 */</span><span class="token property">font-style</span><span class="token punctuation">:</span> normal<span class="token punctuation">;</span><span class="token property">color</span><span class="token punctuation">:</span> green<span class="token punctuation">;</span><span class="token punctuation">}</span></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>style</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>head</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>body</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>container<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>search<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>input</span> <span class="token attr-name">type</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>text<span class="token punctuation">"</span></span> <span class="token attr-name">value</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>输入搜索关键字...<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>button</span> <span class="token special-attr"><span class="token attr-name">onclick</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span><span class="token value javascript language-javascript"><span class="token function">Search</span><span class="token punctuation">(</span><span class="token punctuation">)</span></span><span class="token punctuation">"</span></span></span><span class="token punctuation">></span></span>搜索一下<span class="token tag"><span class="token tag"><span class="token punctuation"></</span>button</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>div</span> <span class="token attr-name">class</span><span class="token attr-value"><span class="token punctuation attr-equals">=</span><span class="token punctuation">"</span>result<span class="token punctuation">"</span></span><span class="token punctuation">></span></span><span class="token comment"></span><span class="token comment"><!-- 这是标题<p>这是摘要这是摘要,这是摘要这是摘要,这是摘要这是摘要,这是摘要这是摘要</p><i>https://www.bilibili.com/</i>这是标题<p>这是摘要这是摘要</p><i>https://www.bilibili.com/</i>这是标题<p>这是摘要这是摘要</p><i>https://www.bilibili.com/</i>这是标题<p>这是摘要这是摘要</p><i>https://www.bilibili.com/</i>这是标题<p>这是摘要这是摘要</p><i>https://www.bilibili.com/</i>这是标题<p>这是摘要这是摘要</p><i>https://www.bilibili.com/</i>这是标题<p>这是摘要这是摘要</p><i>https://www.bilibili.com/</i> --></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>div</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"><</span>script</span><span class="token punctuation">></span></span><span class="token script"><span class="token language-javascript"><span class="token keyword">function</span> <span class="token function">Search</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><span class="token comment">// alert("hello js");</span><span class="token comment">// 1. 提取数据 jquery</span><span class="token keyword">let</span> query <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">".container .search input"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">val</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span><span class="token punctuation">(</span>query <span class="token operator">==</span> <span class="token string">''</span> <span class="token operator">||</span> query <span class="token operator">==</span> <span class="token keyword">null</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">return</span><span class="token punctuation">;</span><span class="token punctuation">}</span>console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span><span class="token string">"query = "</span> <span class="token operator">+</span> query<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">// 2. 发起http 请求</span>$<span class="token punctuation">.</span><span class="token function">ajax</span><span class="token punctuation">(</span><span class="token punctuation">{</span><span class="token literal-property property">type</span><span class="token operator">:</span> <span class="token string">"GET"</span><span class="token punctuation">,</span><span class="token literal-property property">url</span><span class="token operator">:</span> <span class="token string">"/s" /> <span class="token operator">+</span> query<span class="token punctuation">,</span><span class="token function-variable function">success</span><span class="token operator">:</span> <span class="token keyword">function</span> <span class="token punctuation">(</span><span class="token parameter">data</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>console<span class="token punctuation">.</span><span class="token function">log</span><span class="token punctuation">(</span>data<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">// 构建新网页-- 动态的</span><span class="token function">BuildHtml</span><span class="token punctuation">(</span>data<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">function</span> <span class="token function">BuildHtml</span><span class="token punctuation">(</span><span class="token parameter">data</span><span class="token punctuation">)</span> <span class="token punctuation">{</span><span class="token keyword">if</span><span class="token punctuation">(</span>date <span class="token operator">==</span> <span class="token string">''</span> <span class="token operator">||</span> data <span class="token operator">==</span> <span class="token keyword">null</span><span class="token punctuation">)</span><span class="token punctuation">{</span>document<span class="token punctuation">.</span><span class="token function">write</span><span class="token punctuation">(</span><span class="token string">"搜索的内容没有"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">let</span> result_lable <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">".container .result"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>result_lable<span class="token punctuation">.</span><span class="token function">empty</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">let</span> elem <span class="token keyword">of</span> data<span class="token punctuation">)</span> <span class="token punctuation">{</span><span class="token comment">// console.log(elem.title);</span><span class="token comment">// console.log(elem.url);</span><span class="token keyword">let</span> a_lable <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">""</span><span class="token punctuation">,</span> <span class="token punctuation">{</span><span class="token literal-property property">text</span><span class="token operator">:</span> elem<span class="token punctuation">.</span>title<span class="token punctuation">,</span><span class="token literal-property property">href</span><span class="token operator">:</span> elem<span class="token punctuation">.</span>url<span class="token punctuation">,</span><span class="token literal-property property">target</span><span class="token operator">:</span> <span class="token string">"_blank"</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">let</span> p_lable <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">"<p>"</span><span class="token punctuation">,</span> <span class="token punctuation">{</span><span class="token literal-property property">text</span><span class="token operator">:</span> elem<span class="token punctuation">.</span>desc<span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">let</span> i_lable <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">"<i>"</span><span class="token punctuation">,</span> <span class="token punctuation">{</span><span class="token literal-property property">text</span><span class="token operator">:</span> elem<span class="token punctuation">.</span>url<span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">let</span> div_lable <span class="token operator">=</span> <span class="token function">$</span><span class="token punctuation">(</span><span class="token string">""</span><span class="token punctuation">,</span> <span class="token punctuation">{</span><span class="token keyword">class</span><span class="token operator">:</span> <span class="token string">"item"</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>a_lable<span class="token punctuation">.</span><span class="token function">appendTo</span><span class="token punctuation">(</span>div_lable<span class="token punctuation">)</span><span class="token punctuation">;</span>p_lable<span class="token punctuation">.</span><span class="token function">appendTo</span><span class="token punctuation">(</span>div_lable<span class="token punctuation">)</span><span class="token punctuation">;</span>i_lable<span class="token punctuation">.</span><span class="token function">appendTo</span><span class="token punctuation">(</span>div_lable<span class="token punctuation">)</span><span class="token punctuation">;</span>div_lable<span class="token punctuation">.</span><span class="token function">appendTo</span><span class="token punctuation">(</span>result_lable<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">}</span></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>script</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>body</span><span class="token punctuation">></span></span><span class="token tag"><span class="token tag"><span class="token punctuation"></</span>html</span><span class="token punctuation">></span></span></code></pre><p><strong>(2)最后整体运行结果:</strong></p><p><noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/a34253f8a4904887950b17c15551a1a8.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/a34253f8a4904887950b17c15551a1a8.png" /></p><h2>10. 项目补充</h2><p>现在完善一下系统的一些小细节。</p><h3>10.1 取重完善</h3><p>我们在搜索服务那里说过,对于我们关键词的搜索结果,在多个关键字之间,我们的文档id可能会重复,这个时候我们需要进行去重分为两步:</p><ul><li>找到在重复的id</li><li>把id里面的权重尽心相加</li><li>重新构造,让后进行查找构建json串</li></ul><p><strong>下面是我们的遇到的情况:</strong><br /> <noscript><img decoding="async" class="aligncenter" src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/06295ef0b3b644749084a2e85e183338.png" /></noscript><img decoding="async" class="lazyload aligncenter" src='data:image/svg+xml,%3Csvg%20xmlns=%22http://www.w3.org/2000/svg%22%20viewBox=%220%200%20210%20140%22%3E%3C/svg%3E' data-src="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/06295ef0b3b644749084a2e85e183338.png" /></p><h3>10.2 添加日志</h3><p>我们可以添加日志,创建一个Log.hpp文件。<br /> <strong>(1)日志实现:</strong></p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">pragma</span> <span class="token expression">once</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">NORMAL</span> <span class="token expression"><span class="token number">1</span></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">WARNING</span> <span class="token expression"><span class="token number">2</span></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">DEBUG</span> <span class="token expression"><span class="token number">3</span></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">FATAL</span> <span class="token expression"><span class="token number">4</span></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name function">LOG</span><span class="token expression"><span class="token punctuation">(</span>LEVEL<span class="token punctuation">,</span> MESSAGE<span class="token punctuation">)</span> <span class="token function">log</span><span class="token punctuation">(</span>#LEVEL<span class="token punctuation">,</span> MESSAGE<span class="token punctuation">,</span> <span class="token constant">__FILE__</span><span class="token punctuation">,</span> <span class="token constant">__LINE__</span><span class="token punctuation">)</span></span></span><span class="token keyword">void</span> <span class="token function">log</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span>string level<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string message<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string file<span class="token punctuation">,</span> <span class="token keyword">int</span> line<span class="token punctuation">)</span><span class="token punctuation">{</span>std<span class="token double-colon punctuation">::</span>cout <span class="token operator"><<</span> <span class="token string">"["</span> <span class="token operator"><<</span> level <span class="token operator"><<</span> <span class="token string">"]"</span><span class="token operator"><<</span> <span class="token string">"["</span> <span class="token operator"><<</span> <span class="token function">time</span><span class="token punctuation">(</span><span class="token keyword">nullptr</span><span class="token punctuation">)</span> <span class="token operator"><<</span> <span class="token string">"]"</span><span class="token operator"><<</span> <span class="token string">"["</span> <span class="token operator"><<</span> message <span class="token operator"><<</span> <span class="token string">"]"</span><span class="token operator"><<</span> <span class="token string">"["</span> <span class="token operator"><<</span> file <span class="token operator"><<</span> <span class="token string">" : "</span> <span class="token operator"><<</span> line <span class="token operator"><<</span> <span class="token string">"]"</span> <span class="token operator"><<</span> std<span class="token double-colon punctuation">::</span>endl<span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre><p>可以在各个需要打印的地方加入日志,区分日志的等级即可(本篇文章附带的代码基本上都已经加入了日志)。</p><h2>11. 项目拓展</h2><p>这里我们可以扩展一下项目内容。</p><h3>11.1 摘要完善</h3><p>我们知道分词的时候是可以去掉暂停词的,上面的我们都没有这么做,这是因为我们的如果加上去掉暂停词,此时对资源的要求非常大,那么这里可以作为一个扩展.jieba里面也有暂停词的集合,我们可以使用一下:</p><pre><code class="prism language-cpp"><span class="token keyword">class</span> <span class="token class-name">JiebaUtil</span><span class="token punctuation">{</span><span class="token keyword">public</span><span class="token operator">:</span><span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">CutString</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> src<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span><span class="token operator">*</span> out<span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">assert</span><span class="token punctuation">(</span>out<span class="token punctuation">)</span><span class="token punctuation">;</span>ns_util<span class="token double-colon punctuation">::</span><span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">get_instance</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token operator">-></span><span class="token function">CutStringHelper</span><span class="token punctuation">(</span>src<span class="token punctuation">,</span> out<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">private</span><span class="token operator">:</span><span class="token comment">/// @brief 这里是分词</span><span class="token comment">/// @param src</span><span class="token comment">/// @param out</span><span class="token keyword">void</span> <span class="token function">CutStringHelper</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> src<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span><span class="token operator">*</span> out<span class="token punctuation">)</span><span class="token punctuation">{</span>jieba<span class="token punctuation">.</span><span class="token function">CutForSearch</span><span class="token punctuation">(</span>src<span class="token punctuation">,</span> <span class="token operator">*</span>out<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">auto</span> iter <span class="token operator">=</span> out<span class="token operator">-></span><span class="token function">begin</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> iter <span class="token operator">!=</span> out<span class="token operator">-></span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">auto</span> it <span class="token operator">=</span> stop_words<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token operator">*</span>iter<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span>it <span class="token operator">!=</span> stop_words<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token comment">// 此时是暂停词 删除</span><span class="token comment">//避免迭代器失效</span><span class="token comment">// std::cout << *iter << std::endl;</span>iter <span class="token operator">=</span> out<span class="token operator">-></span><span class="token function">erase</span><span class="token punctuation">(</span>iter<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">else</span><span class="token punctuation">{</span>iter<span class="token operator">++</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token keyword">static</span> JiebaUtil<span class="token operator">*</span> <span class="token function">get_instance</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">static</span> std<span class="token double-colon punctuation">::</span>mutex mtx<span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">nullptr</span> <span class="token operator">==</span> instance<span class="token punctuation">)</span><span class="token punctuation">{</span>mtx<span class="token punctuation">.</span><span class="token function">lock</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token keyword">nullptr</span> <span class="token operator">==</span> instance<span class="token punctuation">)</span><span class="token punctuation">{</span>instance <span class="token operator">=</span> <span class="token keyword">new</span> JiebaUtil<span class="token punctuation">;</span>instance<span class="token operator">-></span><span class="token function">InitJiebaUtil</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span>mtx<span class="token punctuation">.</span><span class="token function">unlock</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">return</span> instance<span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">// 这是我们的切分词</span><span class="token keyword">void</span> <span class="token function">InitJiebaUtil</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span>std<span class="token double-colon punctuation">::</span>ifstream <span class="token function">in</span><span class="token punctuation">(</span>STOP_WORD_PATH<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">if</span> <span class="token punctuation">(</span>in<span class="token punctuation">.</span><span class="token function">is_open</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">==</span> <span class="token boolean">false</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token function">LOG</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span> <span class="token string">"加载暂停词错误"</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token keyword">return</span><span class="token punctuation">;</span><span class="token punctuation">}</span>std<span class="token double-colon punctuation">::</span>string line<span class="token punctuation">;</span><span class="token keyword">while</span> <span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">getline</span><span class="token punctuation">(</span>in<span class="token punctuation">,</span> line<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">{</span>stop_words<span class="token punctuation">.</span><span class="token function">insert</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">make_pair</span><span class="token punctuation">(</span>line<span class="token punctuation">,</span> <span class="token boolean">true</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span>in<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token keyword">private</span><span class="token operator">:</span><span class="token keyword">static</span> JiebaUtil<span class="token operator">*</span> instance<span class="token punctuation">;</span>cppjieba<span class="token double-colon punctuation">::</span>Jieba jieba<span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span> <span class="token keyword">bool</span><span class="token operator">></span> stop_words<span class="token punctuation">;</span><span class="token function">JiebaUtil</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">:</span> <span class="token function">jieba</span><span class="token punctuation">(</span>DICT_PATH<span class="token punctuation">,</span> HMM_PATH<span class="token punctuation">,</span> USER_DICT_PATH<span class="token punctuation">,</span> IDF_PATH<span class="token punctuation">,</span> STOP_WORD_PATH<span class="token punctuation">)</span> <span class="token punctuation">{</span><span class="token punctuation">}</span><span class="token comment">// 拷贝构造等 delte</span><span class="token punctuation">}</span><span class="token punctuation">;</span>JiebaUtil<span class="token operator">*</span> JiebaUtil<span class="token double-colon punctuation">::</span>instance <span class="token operator">=</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span></code></pre><h3>11.2 后台部署服务</h3><p>我们可以把它设置为精灵进程.</p><p><strong>(1)nohup指令</strong><br /> nohup指令:将服务进程以守护进程的方式执行,使关闭XShell之后仍可以访问该服务<br /> 例如:nohup ./http_server</p><p>如果让程序在后台执行,可以在末尾加上 &,程序就会隐身,不会显示在终端<br /> 例如:nohup ./http_server &</p><p><strong>(2)nohup形成的文件:</strong><br /> 执行完上述的nohup指令之后,将会形成一个nohup.out存储日志信息文件,可以cat查看该文件。</p><p><strong>(3)setsid</strong><br /> 我们也是可以使用下面的方式进行守护进程化:</p><pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">pragma</span> <span class="token expression">once</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token keyword">void</span> <span class="token function">daemonize</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">{</span><span class="token keyword">int</span> fd <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span><span class="token comment">// 1. 忽略SIGPIPE</span><span class="token function">signal</span><span class="token punctuation">(</span>SIGPIPE<span class="token punctuation">,</span> SIG_IGN<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">// 2. 更改进程的工作目录</span><span class="token comment">// chdir();</span><span class="token comment">// 3. 让自己不要成为进程组组长</span><span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token function">fork</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">></span> <span class="token number">0</span><span class="token punctuation">)</span><span class="token function">exit</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">// 4. 设置自己是一个独立的会话</span><span class="token function">setsid</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">// 5. 重定向0,1,2</span><span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token punctuation">(</span>fd <span class="token operator">=</span> <span class="token function">open</span><span class="token punctuation">(</span><span class="token string">"/dev/null"</span><span class="token punctuation">,</span> O_RDWR<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token operator">!=</span> <span class="token operator">-</span><span class="token number">1</span><span class="token punctuation">)</span> <span class="token comment">// fd == 3</span><span class="token punctuation">{</span><span class="token function">dup2</span><span class="token punctuation">(</span>fd<span class="token punctuation">,</span> STDIN_FILENO<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token function">dup2</span><span class="token punctuation">(</span>fd<span class="token punctuation">,</span> STDOUT_FILENO<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token function">dup2</span><span class="token punctuation">(</span>fd<span class="token punctuation">,</span> STDERR_FILENO<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">// 6. 关闭掉不需要的fd</span><span class="token keyword">if</span> <span class="token punctuation">(</span>fd <span class="token operator">></span> STDERR_FILENO<span class="token punctuation">)</span> <span class="token function">close</span><span class="token punctuation">(</span>fd<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">// 6. close(0,1,2)// 严重不推荐</span><span class="token punctuation">}</span><span class="token punctuation">}</span></code></pre><h3>11.3 其他拓展</h3><ul><li>我们在搜索引擎中,对于权重的设置先后显示顺序,我们其实可以叠加一些算法,比如可以设置竞价排,热点统计,额外增加某些文档的权重。</li><li>我们可以利用数据库,设置用户登录注册,引入对MySQL的使用。</li></ul></article></div><div class="related-posts"><h2 class="related-posts-title"><i class="fab fa-hive me-1"></i>相关文章</h2><div class="row g-2 g-md-3 row-cols-2 row-cols-md-3 row-cols-lg-4"><div class="col"><article class="post-item item-grid"><div class="tips-badge position-absolute top-0 start-0 z-1 m-2"></div><div class="entry-media ratio ratio-3x2"> <a target="" class="media-img lazy bg-cover bg-center" href="https://www.maxssl.com/article/27323/" title="Linux下Makefile的安装以及使用" data-bg="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/48bd35a46e294d8c91cba3d60d3fa18c.png"> </a></div><div class="entry-wrapper"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/27323/" title="Linux下Makefile的安装以及使用">Linux下Makefile的安装以及使用</a></h2></div></article></div><div class="col"><article class="post-item item-grid"><div class="tips-badge position-absolute top-0 start-0 z-1 m-2"></div><div class="entry-media ratio ratio-3x2"> <a target="" class="media-img lazy bg-cover bg-center" href="https://www.maxssl.com/article/23429/" title="从 iOS App 启动速度看如何为基础性能保驾护航" data-bg="https://img.maxssl.com/uploads/?url=https://mp.toutiao.com/mp/agw/article_material/open_image/get?code=NjljYjNmY2M2OTliMDQ0NjQ1YTNmNzUzMGM4ZmUzMTIsMTY5MDE2NTE2NTM5Mg=="> </a></div><div class="entry-wrapper"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/23429/" title="从 iOS App 启动速度看如何为基础性能保驾护航">从 iOS App 启动速度看如何为基础性能保驾护航</a></h2></div></article></div><div class="col"><article class="post-item item-grid"><div class="tips-badge position-absolute top-0 start-0 z-1 m-2"></div><div class="entry-media ratio ratio-3x2"> <a target="" class="media-img lazy bg-cover bg-center" href="https://www.maxssl.com/article/10960/" title="Amazon SageMaker测评分享,效果超出预期" data-bg="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/img_convert/04b66eceede84b4a2359f4e32ce0e708.png"> </a></div><div class="entry-wrapper"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/10960/" title="Amazon SageMaker测评分享,效果超出预期">Amazon SageMaker测评分享,效果超出预期</a></h2></div></article></div><div class="col"><article class="post-item item-grid"><div class="tips-badge position-absolute top-0 start-0 z-1 m-2"></div><div class="entry-media ratio ratio-3x2"> <a target="" class="media-img lazy bg-cover bg-center" href="https://www.maxssl.com/article/51799/" title="什么是tomcat?tomcat是干什么用的?" data-bg="/wp-content/themes/ripro-v5/assets/img/thumb.jpg"> </a></div><div class="entry-wrapper"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/51799/" title="什么是tomcat?tomcat是干什么用的?">什么是tomcat?tomcat是干什么用的?</a></h2></div></article></div><div class="col"><article class="post-item item-grid"><div class="tips-badge position-absolute top-0 start-0 z-1 m-2"></div><div class="entry-media ratio ratio-3x2"> <a target="" class="media-img lazy bg-cover bg-center" href="https://www.maxssl.com/article/53955/" title="【手写数据库toadb】表relation访问实现概述,分层设计再实践,表访问层与表操作层简化代码复杂度" data-bg="/wp-content/themes/ripro-v5/assets/img/thumb.jpg"> </a></div><div class="entry-wrapper"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/53955/" title="【手写数据库toadb】表relation访问实现概述,分层设计再实践,表访问层与表操作层简化代码复杂度">【手写数据库toadb】表relation访问实现概述,分层设计再实践,表访问层与表操作层简化代码复杂度</a></h2></div></article></div><div class="col"><article class="post-item item-grid"><div class="tips-badge position-absolute top-0 start-0 z-1 m-2"></div><div class="entry-media ratio ratio-3x2"> <a target="" class="media-img lazy bg-cover bg-center" href="https://www.maxssl.com/article/21796/" title="SM2加密算法" data-bg="/wp-content/themes/ripro-v5/assets/img/thumb.jpg"> </a></div><div class="entry-wrapper"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/21796/" title="SM2加密算法">SM2加密算法</a></h2></div></article></div><div class="col"><article class="post-item item-grid"><div class="tips-badge position-absolute top-0 start-0 z-1 m-2"></div><div class="entry-media ratio ratio-3x2"> <a target="" class="media-img lazy bg-cover bg-center" href="https://www.maxssl.com/article/48599/" title="初识HarmonyOS" data-bg="/wp-content/themes/ripro-v5/assets/img/thumb.jpg"> </a></div><div class="entry-wrapper"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/48599/" title="初识HarmonyOS">初识HarmonyOS</a></h2></div></article></div><div class="col"><article class="post-item item-grid"><div class="tips-badge position-absolute top-0 start-0 z-1 m-2"></div><div class="entry-media ratio ratio-3x2"> <a target="" class="media-img lazy bg-cover bg-center" href="https://www.maxssl.com/article/5693/" title="Python 图像边缘检测 | 利用 opencv 和 skimage 的 Canny 算法" data-bg="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/979d77664f21410ea9d23f0a170ea0d7.png"> </a></div><div class="entry-wrapper"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/5693/" title="Python 图像边缘检测 | 利用 opencv 和 skimage 的 Canny 算法">Python 图像边缘检测 | 利用 opencv 和 skimage 的 Canny 算法</a></h2></div></article></div></div></div></div><div class="sidebar-wrapper col-md-12 col-lg-3 h-100" data-sticky><div class="sidebar"><div id="recent-posts-4" class="widget widget_recent_entries"><h5 class="widget-title">最新关注</h5><ul><li> <a href="https://www.maxssl.com/article/57859/">【MySQL】InnoDB存储引擎</a></li><li> <a href="https://www.maxssl.com/article/57858/">DB-GPT:强强联合Langchain-Vicuna的应用实战开源项目,彻底改变与数据库的交互方式</a></li><li> <a href="https://www.maxssl.com/article/57857/">TigerBeetle:世界上最快的会计数据库</a></li><li> <a href="https://www.maxssl.com/article/57856/">【SQL server】玩转SQL server数据库:第三章 关系数据库标准语言SQL(二)数据查询</a></li><li> <a href="https://www.maxssl.com/article/57855/">马斯克400条聊天记录被法院公开,原来推特收购是在短信上谈崩的</a></li><li> <a href="https://www.maxssl.com/article/57854/">戏精摩根大通:从唱空比特币到牵手贝莱德</a></li></ul></div><div id="ri_sidebar_posts_widget-2" class="widget sidebar-posts-list"><h5 class="widget-title">热文推荐</h5><div class="row g-3 row-cols-1"><div class="col"><article class="post-item item-list"><div class="entry-media ratio ratio-3x2 col-auto"> <a target="" class="media-img lazy" href="https://www.maxssl.com/article/36928/" title="机器学习与人工智能:一场革命性的变革" data-bg="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/direct/bdadfa2ce80e4ced8446d73ab4cf4c7c.png"></a></div><div class="entry-wrapper"><div class="entry-body"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/36928/" title="机器学习与人工智能:一场革命性的变革">机器学习与人工智能:一场革命性的变革</a></h2></div></div></article></div><div class="col"><article class="post-item item-list"><div class="entry-media ratio ratio-3x2 col-auto"> <a target="" class="media-img lazy" href="https://www.maxssl.com/article/424/" title="什么是DAPP?" data-bg="/wp-content/themes/ripro-v5/assets/img/thumb.jpg"></a></div><div class="entry-wrapper"><div class="entry-body"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/424/" title="什么是DAPP?">什么是DAPP?</a></h2></div></div></article></div><div class="col"><article class="post-item item-list"><div class="entry-media ratio ratio-3x2 col-auto"> <a target="" class="media-img lazy" href="https://www.maxssl.com/article/34667/" title="nodemon学习(一)简介、安装、配置、使用" data-bg="https://img.maxssl.com/uploads/?url=https://common.cnblogs.com/images/copycode.gif"></a></div><div class="entry-wrapper"><div class="entry-body"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/34667/" title="nodemon学习(一)简介、安装、配置、使用">nodemon学习(一)简介、安装、配置、使用</a></h2></div></div></article></div><div class="col"><article class="post-item item-list"><div class="entry-media ratio ratio-3x2 col-auto"> <a target="" class="media-img lazy" href="https://www.maxssl.com/article/15321/" title="【JVM】Java 虚拟机原理和架构、JVM指令集" data-bg="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/1c94dcbf657b491db228d3dc05caba01.png"></a></div><div class="entry-wrapper"><div class="entry-body"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/15321/" title="【JVM】Java 虚拟机原理和架构、JVM指令集">【JVM】Java 虚拟机原理和架构、JVM指令集</a></h2></div></div></article></div><div class="col"><article class="post-item item-list"><div class="entry-media ratio ratio-3x2 col-auto"> <a target="" class="media-img lazy" href="https://www.maxssl.com/article/34045/" title="【C语言】鸡兔同笼" data-bg="https://img.maxssl.com/uploads/?url=https://img-blog.csdnimg.cn/81b7aae175584753ac5afe71cbab076d.png"></a></div><div class="entry-wrapper"><div class="entry-body"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/34045/" title="【C语言】鸡兔同笼">【C语言】鸡兔同笼</a></h2></div></div></article></div><div class="col"><article class="post-item item-list"><div class="entry-media ratio ratio-3x2 col-auto"> <a target="" class="media-img lazy" href="https://www.maxssl.com/article/5688/" title="有备无患!DBS高性价比方案助力富途证券备份上云" data-bg="https://img.maxssl.com/uploads/?url=https://img2023.cnblogs.com/other/1805314/202212/1805314-20221227163057193-1109545359.png"></a></div><div class="entry-wrapper"><div class="entry-body"><h2 class="entry-title"> <a target="" href="https://www.maxssl.com/article/5688/" title="有备无患!DBS高性价比方案助力富途证券备份上云">有备无患!DBS高性价比方案助力富途证券备份上云</a></h2></div></div></article></div></div></div></div></div></div></div></main><footer class="site-footer py-md-4 py-2 mt-2 mt-md-4"><div class="container"><div class="text-center small w-100"><div>Copyright © <script>today=new Date();document.write(today.getFullYear());</script> maxssl.com 版权所有 <a href="https://beian.miit.gov.cn/" target="_blank" rel="nofollow noopener">浙ICP备2022011180号</a></div><div class=""><script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-7656930379472324"
     crossorigin="anonymous"></script></div></div></div></footer><div class="rollbar"><ul class="actions"><li><a target="" href="https://www.maxssl.com/" rel="nofollow noopener noreferrer"><i class="fas fa-home"></i><span></span></a></li><li><a target="" href="http://wpa.qq.com/msgrd?v=3&uin=6666666&site=qq&menu=yes" rel="nofollow noopener noreferrer"><i class="fab fa-qq"></i><span></span></a></li></ul></div><div class="back-top"><i class="fas fa-caret-up"></i></div><div class="dimmer"></div><div class="off-canvas"><div class="canvas-close"><i class="fas fa-times"></i></div><div class="logo-wrapper"> <a class="logo text" href="https://www.maxssl.com/">MaxSSL</a></div><div class="mobile-menu d-block d-lg-none"></div></div> <script></script><noscript><style>.lazyload{display:none}</style></noscript><script data-noptimize="1">window.lazySizesConfig=window.lazySizesConfig||{};window.lazySizesConfig.loadMode=1;</script><script async data-noptimize="1" src='https://www.maxssl.com/wp-content/plugins/autoptimize/classes/external/js/lazysizes.min.js'></script><script src='//cdn.bootcdn.net/ajax/libs/jquery/3.6.0/jquery.min.js' id='jquery-js'></script> <script src='//cdn.bootcdn.net/ajax/libs/highlight.js/11.7.0/highlight.min.js' id='highlight-js'></script> <script src='https://www.maxssl.com/wp-content/themes/ripro-v5/assets/js/vendor.min.js' id='vendor-js'></script> <script id='main-js-extra'>var zb={"home_url":"https:\/\/www.maxssl.com","ajax_url":"https:\/\/www.maxssl.com\/wp-admin\/admin-ajax.php","theme_url":"https:\/\/www.maxssl.com\/wp-content\/themes\/ripro-v5","singular_id":"53376","post_content_nav":"0","site_notify_auto":"0","current_user_id":"0","ajax_nonce":"75ee256294","gettext":{"__copypwd":"\u5bc6\u7801\u5df2\u590d\u5236\u526a\u8d34\u677f","__copybtn":"\u590d\u5236","__copy_succes":"\u590d\u5236\u6210\u529f","__comment_be":"\u63d0\u4ea4\u4e2d...","__comment_succes":"\u8bc4\u8bba\u6210\u529f","__comment_succes_n":"\u8bc4\u8bba\u6210\u529f\uff0c\u5373\u5c06\u5237\u65b0\u9875\u9762","__buy_be_n":"\u8bf7\u6c42\u652f\u4ed8\u4e2d\u00b7\u00b7\u00b7","__buy_no_n":"\u652f\u4ed8\u5df2\u53d6\u6d88","__is_delete_n":"\u786e\u5b9a\u5220\u9664\u6b64\u8bb0\u5f55\uff1f"}};</script> <script src='https://www.maxssl.com/wp-content/themes/ripro-v5/assets/js/main.min.js' id='main-js'></script> </body></html>