全文搜索属于最常见的需求,开源的 Elasticsearch (以下简称 Elastic)是目前全文搜索引擎的首选。它可以快速地储存、搜索和分析海量数据。目前Stack Overflow、Github 都采用它。
作为一个搜索引擎,基本要求有两点:
1.短时间内实现对海量数据的搜索;
2.保持服务的高可用性;
对于第一点,Elastic基于Lucene,其基本作用就是就是快速存储,搜索,分析海量数据,所以这点不必担心,我们要关心的是保持服务的高可用性;
如果一台主机能够在任何时间都不宕机,我们就说他的可用性是100%,显然这是不可能的(断电,断网等因素),这时候我们就需要搭建集群来提高可用性,如果某台主机宕机,其他主机能够分担这台主机的工作,保证服务一直可用;
问题又来了,如果一台主机挂了,用户访问的数据恰好存储在挂掉的主机上,不就访问不到了么?或者说其他每台主机都有其他主机的备份?那这样就太浪费空间了,这个问题要由Elastic的存储方式说起,其他主机上确实有宕机主机的备份,不过不是全部的备份,而是分片备份;所谓的分片备份把存储某个数据分成多个分片(默认是5片,可以自己定义),每个分片都存储在集群中不同的主机中(节点),并且当你的集群节点数量扩大或者缩小,Elastic会自动在节点中迁移分片,确保数据分片是均匀的分布到集群中;并且每个分片都有备份(副本),存储到其他节点;这就意味着,我存储一份数据,这个数据可能被分成了5片,且每片都有副本,就是说存储在10个地方,存了2遍(片数据和片数据的副本);一般Elastic会把同一数据的片和副本存储到不同的节点,除非存储同一数据的片和副本的节点都宕机了,否则都是可以正常访问的;这就是集群的优势;
安装ElasticSearch
现在开始安装一台主机来熟悉下基本操作
首先需要我们搭建java环境,因为Elastic是基于Lucene,Lucene基于java,如果没安装请参考我的前一篇文章 点我 , java安装好后,正式开始安装Elastic
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.1.zip $ unzip elasticsearch-5.5.1.zip $ cd elasticsearch-5.5.1/
进入到解压的目录,启动Elastic
$ ./bin/elasticsearch
这里启动可能会报错,jvm默认内存是2G,如果你的内存不够的话就会报错(之前我的服务器内存是512m的妥妥的不够,机智的我用了公司部门服务器,配置就是高),建议内存的话还是1G以上
# vim config/jvm.options -Xms2g -Xmx2g
通过上述代码可以修改jvm的内存分配,可以把2g改成512m(最低,别问我怎么知道的)或者1g
如果报如果这时报 "max virtual memory areas vm.maxmapcount [65530] is too low",可以执行以下命令
$ sudo sysctl -w vm.max_map_count=262144
如果报 以下错误,请切换到root账户按步骤操作
[bohongtao@localhost bin]$ ./elasticsearch [2018-12-02T19:17:07,504][INFO ][o.e.n.Node ] [] initializing ... [2018-12-02T19:17:07,756][INFO ][o.e.e.NodeEnvironment ] [iLgIvRo] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [8.4gb], net total_space [13.1gb], spins? [unknown], types [rootfs] [2018-12-02T19:17:07,757][INFO ][o.e.e.NodeEnvironment ] [iLgIvRo] heap size [1.9gb], compressed ordinary object pointers [true] [2018-12-02T19:17:07,759][INFO ][o.e.n.Node ] node name [iLgIvRo] derived from node ID [iLgIvRoURsKpQrVTa9LKzw]; set [node.name] to override [2018-12-02T19:17:07,760][INFO ][o.e.n.Node ] version[5.5.1], pid[4421], build[19c13d0/2017-07-18T20:44:24.823Z], OS[Linux/3.10.0-327.el7.x86_64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_65/25.65-b01] [2018-12-02T19:17:07,760][INFO ][o.e.n.Node ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.path.home=/home/bohongtao/Downloads/elasticsearch-5.5.1] [2018-12-02T19:17:10,667][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [aggs-matrix-stats] [2018-12-02T19:17:10,667][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [ingest-common] [2018-12-02T19:17:10,667][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [lang-expression] [2018-12-02T19:17:10,667][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [lang-groovy] [2018-12-02T19:17:10,667][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [lang-mustache] [2018-12-02T19:17:10,667][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [lang-painless] [2018-12-02T19:17:10,668][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [parent-join] [2018-12-02T19:17:10,668][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [percolator] [2018-12-02T19:17:10,668][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [reindex] [2018-12-02T19:17:10,668][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [transport-netty3] [2018-12-02T19:17:10,668][INFO ][o.e.p.PluginsService ] [iLgIvRo] loaded module [transport-netty4] [2018-12-02T19:17:10,669][INFO ][o.e.p.PluginsService ] [iLgIvRo] no plugins loaded [2018-12-02T19:17:17,025][INFO ][o.e.d.DiscoveryModule ] [iLgIvRo] using discovery type [zen] [2018-12-02T19:17:22,398][INFO ][o.e.n.Node ] initialized [2018-12-02T19:17:22,399][INFO ][o.e.n.Node ] [iLgIvRo] starting ... [2018-12-02T19:17:23,207][INFO ][o.e.t.TransportService ] [iLgIvRo] publish_address {192.168.213.130:9300}, bound_addresses {[::]:9300} [2018-12-02T19:17:23,341][INFO ][o.e.b.BootstrapChecks ] [iLgIvRo] bound or publishing to a non-loopback or non-link-local address, enforcing bootstrap checks ERROR: [1] bootstrap checks failed [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536] [2018-12-02T19:17:23,441][INFO ][o.e.n.Node ] [iLgIvRo] stopping ... [2018-12-02T19:17:23,608][INFO ][o.e.n.Node ] [iLgIvRo] stopped [2018-12-02T19:17:23,608][INFO ][o.e.n.Node ] [iLgIvRo] closing ... [2018-12-02T19:17:23,651][INFO ][o.e.n.Node ] [iLgIvRo] closed
[1]: max file descriptors [65535] for elasticsearch process is too low, increase to at least [65536]
编辑 /etc/security/limits.conf,追加以下内容;
* soft nofile 65536 * hard nofile 65536
此文件修改后需要重新登录用户,才会生效
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
编辑 /etc/sysctl.conf,追加以下内容:
vm.max_map_count=655360
保存后,执行:
sysctl -p
然后切换到非root账户,启动
如果没错误的话,会在9200端口(默认)运行,这时候只需要打开另外一个命令行窗口,请求这个端口,就会得到相关的信息;
[bohongtao@wtstrain ~]$ curl localhost:9200 { "name" : "JOWF1ou", "cluster_name" : "elasticsearch", "cluster_uuid" : "CRA3fLfLRHGLOwBnp_TWZw", "version" : { "number" : "5.5.1", "build_hash" : "19c13d0", "build_date" : "2017-07-18T20:44:24.823Z", "build_snapshot" : false, "lucene_version" : "6.6.0" }, "tagline" : "You Know, for Search" }
默认情况下,Elastic 只允许本机访问,如果需要远程访问,可以修改 Elastic 安装目录的config/elasticsearch.yml
文件,去掉network.host
的注释,将它的值改成0.0.0.0
,然后重新启动 Elastic。不过线上的话建议改成固定IP
配置ElasticSearch
安装完之后,Elasticsearch 的配置文件是 /etc/elasticsearch/elasticsearch.yml,接下来让我们编辑一下配置文件
集群的名称:同一个集群所有节点名称要一致
cluster.name: coder-es-clusters
节点的名称:同一个集群内节点名字不能重复
node.name: es-node-1
path.data && path.log 请指定路径,分别是数据存储路径和日志路径,请确保读写权限
path.data: /home/bohongtao/ElasticDate
设置访问IP,0.0.0.0表示所有人都可以访问
network.host: 0.0.0.0
设置端口
http.port: 9200
节点地址集合(这里请填写所有节点的IP)
discovery.zen.ping.unicast.hosts:["",""]
设置正常工作阈值(以我设置的为例:当有12台以下的主机正常时集群不能正常工作)
gateway.recover_after_nodes: 12
设置主节点数目(建议数量:总节点数/2+1)
discovery.zen.minimum_master_nodes: 20
是否可升为主节点
node.master: true
是否是数据节点
node.data: true
继续启动,然后请求地址localhost:9200,就可以输出每个节点的信息