大数据架构-使用HBase和Solr配置存储与索引-hbase元数据存储在哪里

HBase和Solr可以通过协处理器Coprocessor的方式向Solr发出请求，Solr对于接收到的数据可以做相关的同步：增、删、改索引的操作。将存储与索引放在不同的机器上，这是大数据架构的必须品，但目前还有很多不懂得此道的同学，他们对于这种思想感到很新奇，不过，这绝对是好的方向，所以不懂得抓紧学习吧。

有个朋友给我的那篇博客留言，说CDH也可以做这样的事情，我还没有试过，他还问我要与此相关的代码，于是我就稍微整理了一下，作为本篇文章的主要内容。关于CDH的事，我会尽快尝试，有知道的同学可以给我留言。

下面我主要讲述一下，我测试对HBase和Solr的性能时，使用HBase协处理器向HBase添加数据所编写的相关代码，及解释说明。

一、编写HBase协处理器Coprocessor

一旦有数据postPut，就立即对Solr里相应的Core更新。这里使用了ConcurrentUpdateSolrServer，它是Solr速率性能的保证，使用它不要忘记在Solr里面配置autoCommit哟。

/* 
 
 *版权：王安琪 
 
 *描述：监视HBase，一有数据postPut就向Solr发送，本类要作为触发器添加到HBase 
 
 *修改时间：2014-05-27 
 
 *修改内容：新增 
 
 */ 
 
package solrHbase.test; 
 
  
 
import java.io.UnsupportedEncodingException; 
 
  
 
import ***; 
 
  
 
public class SorlIndexCoprocessorObserver extends BaseRegionObserver { 
 
  
 
    private static final Logger LOG = LoggerFactory 
 
            .getLogger(SorlIndexCoprocessorObserver.class); 
 
    private static final String solrUrl = "http://192.1.11.108:80/solr/core1"; 
 
    private static final SolrServer solrServer = new ConcurrentUpdateSolrServer( 
 
            solrUrl, 10000, 20); 
 
  
 
    /** 
 
     * 建立solr索引 
 
     *  
 
     * @throws UnsupportedEncodingException 
 
     */ 
 
    @Override 
 
    public void postPut(final ObserverContext<RegionCoprocessorEnvironment> e, 
 
            final Put put, final WALEdit edit, final boolean writeToWAL) 
 
            throws UnsupportedEncodingException { 
 
        inputSolr(put); 
 
    } 
 
  
 
    public void inputSolr(Put put) { 
 
        try { 
 
            solrServer.add(TestSolrMain.getInputDoc(put)); 
 
        } catch (Exception ex) { 
 
            LOG.error(ex.getMessage()); 
 
        } 
 
    } 
 
} 
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.

注意：getInputDoc是这个HBase协处理器Coprocessor的精髓所在，它可以把HBase内的Put里的内容转化成Solr需要的值。其中String fieldName = key.substring(key.indexOf(columnFamily) + 3, key.indexOf("我在这")).trim();这里有一个乱码字符，在这里看不到，请大家注意一下。

public static SolrInputDocument getInputDoc(Put put) { 
 
        SolrInputDocument doc = new SolrInputDocument(); 
 
        doc.addField("test_ID", Bytes.toString(put.getRow())); 
 
        for (KeyValue c : put.getFamilyMap().get(Bytes.toBytes(columnFamily))) { 
 
            String key = Bytes.toString(c.getKey()); 
 
            String value = Bytes.toString(c.getValue()); 
 
            if (value.isEmpty()) { 
 
                continue; 
 
            } 
 
            String fieldName = key.substring(key.indexOf(columnFamily) + 3, 
 
                    key.indexOf("")).trim(); 
 
            doc.addField(fieldName, value); 
 
        } 
 
        return doc; 
 
    } 
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.

二、编写测试程序入口代码main

这段代码向HBase请求建了一张表，并将模拟的数据，向HBase连续地提交数据内容，在HBase中不断地插入数据，同时记录时间，测试插入性能。

/* 
 
 *版权：王安琪 
 
 *描述：测试HBaseInsert，HBase插入性能 
 
 *修改时间：2014-05-27 
 
 *修改内容：新增 
 
 */ 
 
package solrHbase.test; 
 
  
 
import hbaseInput.HbaseInsert; 
 
  
 
import ***; 
 
  
 
public class TestHBaseMain { 
 
  
 
    private static Configuration config; 
 
    private static String tableName = "angelHbase"; 
 
    private static HTable table = null; 
 
    private static final String columnFamily = "wanganqi"; 
 
  
 
    /** 
 
     * @param args 
 
     */ 
 
    public static void main(String[] args) { 
 
        config = HBaseConfiguration.create(); 
 
        config.set("hbase.zookeeper.quorum", "192.103.101.104"); 
 
        HbaseInsert.createTable(config, tableName, columnFamily); 
 
        try { 
 
            table = new HTable(config, Bytes.toBytes(tableName)); 
 
            for (int k = 0; k < 1; k++) { 
 
                Thread t = new Thread() { 
 
                    public void run() { 
 
                        for (int i = 0; i < 100000; i++) { 
 
                            HbaseInsert.inputData(table, 
 
                                    PutCreater.createPuts(1000, columnFamily)); 
 
                            Calendar c = Calendar.getInstance(); 
 
                            String dateTime = c.get(Calendar.YEAR) + "-" 
 
                                    + c.get(Calendar.MONTH) + "-" 
 
                                    + c.get(Calendar.DATE) + "T" 
 
                                    + c.get(Calendar.HOUR) + ":" 
 
                                    + c.get(Calendar.MINUTE) + ":" 
 
                                    + c.get(Calendar.SECOND) + ":" 
 
                                    + c.get(Calendar.MILLISECOND) + "Z 写入: " 
 
                                    + i * 1000; 
 
                            System.out.println(dateTime); 
 
                        } 
 
                    } 
 
                }; 
 
                t.start(); 
 
            } 
 
        } catch (IOException e1) { 
 
            e1.printStackTrace(); 
 
        } 
 
    } 
 
  
 
} 
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.
86.
87.
88.
89.
90.
91.
92.
93.
94.
95.
96.
97.
98.
99.
100.
101.
102.
103.
104.
105.
106.
107.
108.
109.

下面的是与HBase相关的操作，把它封装到一个类中，这里就只有建表与插入数据的相关代码。

/* 
 
 *版权：王安琪 
 
 *描述：与HBase相关操作，建表与插入数据 
 
 *修改时间：2014-05-27 
 
 *修改内容：新增 
 
 */ 
 
package hbaseInput; 
 
import ***; 
 
import org.apache.hadoop.hbase.client.Put; 
 
  
 
public class HbaseInsert { 
 
  
 
    public static void createTable(Configuration config, String tableName, 
 
            String columnFamily) { 
 
        HBaseAdmin hBaseAdmin; 
 
        try { 
 
            hBaseAdmin = new HBaseAdmin(config); 
 
            if (hBaseAdmin.tableExists(tableName)) { 
 
                return; 
 
            } 
 
            HTableDescriptor tableDescriptor = new HTableDescriptor(tableName); 
 
            tableDescriptor.addFamily(new HColumnDescriptor(columnFamily)); 
 
            hBaseAdmin.createTable(tableDescriptor); 
 
            hBaseAdmin.close(); 
 
        } catch (MasterNotRunningException e) { 
 
            e.printStackTrace(); 
 
        } catch (ZooKeeperConnectionException e) { 
 
            e.printStackTrace(); 
 
        } catch (IOException e) { 
 
            e.printStackTrace(); 
 
        } 
 
    } 
 
  
 
    public static void inputData(HTable table, ArrayList<Put> puts) { 
 
        try { 
 
            table.put(puts); 
 
            table.flushCommits(); 
 
            puts.clear(); 
 
        } catch (IOException e) { 
 
            e.printStackTrace(); 
 
        } 
 
    } 
 
} 
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
81.
82.
83.
84.
85.

三、编写模拟数据Put

向HBase中写入数据需要构造Put，下面是我构造模拟数据Put的方式，有字符串的生成，我是由mmseg提供的词典words.dic中随机读取一些词语连接起来，生成一句字符串的，下面的代码没有体现，不过很easy，你自己造你自己想要的数据就OK了。

public static Put createPut(String columnFamily) { 
 
        String ss = getSentence(); 
 
        byte[] family = Bytes.toBytes(columnFamily); 
 
        byte[] rowKey = Bytes.toBytes("" + Math.abs(r.nextLong())); 
 
        Put put = new Put(rowKey); 
 
        put.add(family, Bytes.toBytes("DeviceID"), 
 
                Bytes.toBytes("" + Math.abs(r.nextInt()))); 
 
        ****** 
 
        put.add(family, Bytes.toBytes("Company_mmsegsm"), Bytes.toBytes("ss")); 
 
  
 
        return put; 
 
    } 
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.

当然在运行上面这个程序之前，需要先在Solr里面配置好你需要的列信息，HBase、Solr安装与配置，它们的基础使用方法将会在之后的文章中介绍。在这里，Solr的列配置就跟你使用createPut生成的Put搞成一样的列名就行了，当然也可以使用动态列的形式。

四、直接对Solr性能测试

如果你不想对HBase与Solr的相结合进行测试，只想单独对Solr的性能进行测试，这就更简单了，完全可以利用上面的代码段来测试，稍微组装一下就可以了。

private static void sendConcurrentUpdateSolrServer(final String url, 
 
            final int count) throws SolrServerException, IOException { 
 
        SolrServer solrServer = new ConcurrentUpdateSolrServer(url, 10000, 20); 
        for (int i = 0; i < count; i++) {      solrServer.add(getInputDoc(PutCreater.createPut(columnFamily))); 
        } 
    } 
1.
2.
3.
4.
5.
6.
7.
8.

希望可以帮助到你规格严格-功夫到家。这次的文章代码又偏多了点，但代码是解释思想的***的语言，我的提倡就是尽可能的减少代码的注释，尽力简化你的代码，使你的代码足够的清晰易懂，甚至于相似于伪代码了，这也是《重构》这本书里所提倡的。

原文链接：http://www.cnblogs.com/wgp13x/p/3927979.html