基于xunsearch实现全文检索功能-码农梦weber

基于xunsearch实现全文检索功能

发布时间：2020/11/13 作者：天马行空阅读（1916）

Xunsearch 是免费开源的专业全文检索解决方案，旨在帮助一般开发者针对既有的海量数据，快速而方便地建立自己的全文搜索引擎。全文检索可以帮助您降低服务器搜索负荷、极大程度的提高搜索速度和用户体验。
高性能：后端是采用 C/C++ 开发多线程服务端，索引设计基于 Xapian 和 scws 中文分词。单库最多支持 40 亿条数据，在 5 亿网页大约 1.5TB 的数据中检索时间不超过 1 秒(非缓存)。
简单易用：前端是使用脚本语言编写的开发工具 (SDK)，目前仅支持 PHP 语言。API 简单清晰，开发难度极低，提供全中文的示例代码、文档、辅助脚本工具等。
全功能：除支持基础的自定义分词、字段检索、布尔搜索外，还直接支持用户急需的相关搜索、拼音搜索、搜索建议等专业功能。

一、安装Xunsearch服务端

[root@iZ2vcdckpocdmc9n3vlbywZ ~]# yum install -y bzip2
[root@iZ2vcdckpocdmc9n3vlbywZ ~]# wget http://www.xunsearch.com/download/xunsearch-full-latest.tar.bz2
[root@iZ2vcdckpocdmc9n3vlbywZ ~]# tar -xjf xunsearch-full-latest.tar.bz2
[root@iZ2vcdckpocdmc9n3vlbywZ ~]# cd xunsearch-full-1.4.14/
[root@iZ2vcdckpocdmc9n3vlbywZ ~]# sh setup.sh
[root@iZ2vcdckpocdmc9n3vlbywZ ~]# cd /home/wwwroot/xunsearch/
[root@iZ2vcdckpocdmc9n3vlbywZ xunsearch]# bin/xs-ctl.sh -b inet start

二、通过composer安装php搜索包
composer require hightman/xunsearch

三、编写项目配置文件xunseach.ini
具体规则请参考：http://www.xunsearch.com/doc/php/guide/ini.guide

; 项目名称
project.name = article
; 默认字符集
project.default_charset = utf-8
; 索引服务端配置，默认值为 8383
server.index = 127.0.0.1:8383
; 搜索服务端配置，默认值为 8384
server.search = 127.0.0.1::8384
[id]
; 字段类型
type = id
[title]
type = title
[description]
type = string
index = mixed
[content]
type = body
[sort]
type = numeric
[publish_time]
type = numeric

四、生成索引
生成索引是将你数据库中的数据按照一定规则查询出来，然后添加到xunsearch的索引中去，这需要增加一个定时的计划任务，将新增的数据不断的添加进去，才能通过api搜索出来结果。以下的基于thinkphp写的添加索引的一段代码，理解思路即可。

/**
 * 生成索引
 * @return mixed|string
 */
public function index()
{
    //每两分钟执行一次
    //*/2 * * * * /xxxxx/php /xxx/task/index.php article/index 2
    
    $minute = isset($GLOBALS["argv"][2]) ? (int)$GLOBALS["argv"][2] : 0;
    
    $where = [
            ['status','=',1],
    ];
    if ($minute) {
        $where[] = ['updated','>',time()-$minute*60];//多少分钟之前更新的
    }
    
    ArticleModel::where($where)->chunk(500,function($rows) {
        $datas = [];
        
        foreach ($rows as $row){
            $datas[] = [
                    'id' => $row->id, // 此字段为主键，是进行文档替换的唯一标识
                    'title' => $row->title,
                    'description' => $row->description,
                    'content' => strip_tags($row->content),
                    'sort' => $row->sort,
                    'publish_time' => $row->publish_time
            ];
            
        }
        $this->addIndex($datas);
        
    },'id','asc');
    
    die('success');
    
}
private function addIndex($datas){
    try
    {
        //实例化xunsearch对象
        $xs = new \XS('article');
        //使用索引缓冲区
        $xs->index->openBuffer();
        
        foreach ($datas as $data){
            // 创建文档对象
            $doc = new \XSDocument;
            $doc->setFields($data);
            // 更新到索引数据库中
            $xs->index->update($doc);
        }
        
        //告诉服务器重建索引完成
        $xs->index->closeBuffer();
    }
    catch (\XSException $e)
    {
        //echo $e;               // 直接输出异常描述
        if (defined('DEBUG'))  // 如果是 DEBUG 模式，则输出堆栈情况
            echo "\n" . $e->getTraceAsString() . "\n";
    }
    
}

五、搜索功能实现
搜索文档请参考：http://www.xunsearch.com/doc/php/guide/search.overview

public function search(){
    $datas = [];
    
    $keywords = $this->request->param('keywords');
    $page = $this->request->param('page',1);
    $limit = 10;//每页多少条
    $count = 0;//搜出来多少条
    $pageLink = '';//分页代码
    if ($keywords) {
        $xs = new \XS('article');
        $search = $xs->search; // 获取 搜索对象
        $tokenizer = new \XSTokenizerScws;   // 创建分词实例
        $words = $tokenizer->getTokens($keywords);//分词
        $search->setQuery(join(' OR ', $words))->setFuzzy(true);//设置查询条件
        
        //解决统计总条数不准确的问题：始终都用最后一页的总条数
        $search->search();
        $count1 = $search->getLastCount();
        $page1 = ceil($count1/$limit);
        $search->setLimit($limit,($page1-1)*$limit)->search(); 
        $count = $search->getLastCount();
        
        $docs = $search->setLimit($limit, $offset=(($page-1)*$limit))->setSort('publish_time')->search();
        foreach ($docs as &$doc)
        {
            $datas[] = [
                    'id'=>$doc->id,
                    'title'=>$doc->title,
                    'description'=>$doc->description, 
                    'title_h'=>$search->highlight($doc->title,false), // 高亮处理 title 字段
                    'description_h'=>$search->highlight($doc->description,false), // 高亮处理 description 字段
                    //'content'=>$doc->content,
                    'sort'=>$doc->sort,
                    'publish_time'=>$doc->publish_time,
            ];
        }
        
        if ($count>0){
            $pageLink = Bootstrap::make(null, $limit, $page, $count, false, ['path'=>'', 'query'=>['keywords'=>$keywords]])->render();
        }
    }
    
    $this->assign('keywords',$keywords);
    $this->assign('count',$count);
    $this->assign('pageLink',$pageLink);
    $this->assign('datas',$datas);
    
    return $this->fetch('Article/Article-search');
}

原文链接：https://www.netljc.com/article/detail-142

关键字： xunsearch

上一篇：10大经典排序算法之冒泡排序（PHP版）

下一篇：依赖注入（DI）与控制反转（IoC）

相关推荐

内容分享