对于微博这种粉丝关注关系而言,需要使用有向图(Directed Graph)表示。因为关注是单向关联,A 关注了 B,但是 B 不一定关注 A。这种有向图的示意图如下:
如果 A 关注了 B,图中就会存在一条从 A 到 B 的带箭头的边。上图中,“刘一”关注了“周八”,“刘一”和“李四”相互关注。对于有向图而言,度又分为入度(In-degree)和出度(Out-degree)。入度表示有多少条边指向该顶点,出度表示有多少条边是以该顶点为起点。“刘一”的入度为 4(微博中表示粉丝数),出度为 2(微博中表示关注的人数)。
create table t_user(user_id int primary key, user_name varchar(50) not null);
insert into t_user values(1,'刘一');
insert into t_user values(2,'陈二');
insert into t_user values(3,'张三');
insert into t_user values(4,'李四');
insert into t_user values(5,'王五');
insert into t_user values(6,'赵六');
insert into t_user values(7,'孙七');
insert into t_user values(8,'周八');
insert into t_user values(9,'吴九');
create table t_friend(
user_id int not null,
friend_id int not null,
created_time timestamp not null,
primary key(user_id, friend_id));
insert into t_friend values(1,2, current_timestamp);
insert into t_friend values(2,1, current_timestamp);
insert into t_friend values(1,3, current_timestamp);
insert into t_friend values(3,1, current_timestamp);
insert into t_friend values(1,4, current_timestamp);
insert into t_friend values(4,1, current_timestamp);
insert into t_friend values(1,7, current_timestamp);
insert into t_friend values(7,1, current_timestamp);
insert into t_friend values(1,8, current_timestamp);
insert into t_friend values(8,1, current_timestamp);
insert into t_friend values(2,3, current_timestamp);
insert into t_friend values(3,2, current_timestamp);
insert into t_friend values(2,5, current_timestamp);
insert into t_friend values(5,2, current_timestamp);
insert into t_friend values(3,4, current_timestamp);
insert into t_friend values(4,3, current_timestamp);
insert into t_friend values(4,6, current_timestamp);
insert into t_friend values(6,4, current_timestamp);
insert into t_friend values(5,8, current_timestamp);
insert into t_friend values(8,5, current_timestamp);
insert into t_friend values(7,8, current_timestamp);
insert into t_friend values(8,7, current_timestamp);
如果是单向图结构(组织结构树),可以使用一个表进行存储。通常是为 id 增加一个父级节点 parent_id。
查看好友列表
微信中的通讯录,显示的就是我们的好友。同样,我们可以查看“王五”(user_id = 5)的好友:
select u.user_id as friend_id,u.user_name as friend_name
from t_user u
join t_friend f on(u.user_id = f.friend_id and f.user_id =5);
friend_id|friend_name|---------|-----------|2|陈二 |8|周八 |
1.
2.
3.
4.
5.
6.
7.
“王五”有两个好友,分别是“陈二”和“周八”。
查看共同好友
利用好友关系表,我们还可以获取更多关联信息。例如,以下语句可以查看“张三”和“李四”的共同好友:
withf1(friend_id)as(
select f.friend_id
from t_user u
join t_friend f on(u.user_id = f.friend_id and f.user_id =3)),f2(friend_id)as(
select f.friend_id
from t_user u
join t_friend f on(u.user_id = f.friend_id and f.user_id =4))
select u.user_id as friend_id,u.user_name as friend_name
from t_user u
join f1 on(u.user_id = f1.friend_id)
join f2 on(u.user_id = f2.friend_id);
friend_id|friend_name|---------|-----------|1|刘一 |
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
上面的语句中我们使用了通用表表达式(Common Table Expression)定义了两个临时查询结果集 f1 和 f2,分别表示“张三”的好友和“李四”的好友;然后通过连接查询返回他们的共同好友。关于通用表表达式以及各种数据库中的语法可以参考这篇文章。
withfriend(id)as(
select f.friend_id
from t_user u
join t_friend f on(u.user_id = f.friend_id and f.user_id =2)),fof(id)as(
select f.friend_id
from t_user u
join t_friend f on(u.user_id = f.friend_id)
join friend on(f.user_id = friend.id and f.friend_id !=2))
select u.user_id, u.user_name,count(*)
from t_user u
join fof on(u.user_id = fof.id)
where fof.id not in(select id from friend)
group by u.user_id, u.user_name;
user_id|user_name|count(*)|-------|---------|--------|4|李四 |2|7|孙七 |1|8|周八 |2|
在社会学中存在一个六度关系理论(Six Degrees of Separation),指地球上所有的人都可以通过六层以内的关系链和任何其他人联系起来。在社交网络中,也有一些相关的实验。例如 2011年,Facebook 以一个月内访问 的 7.21 亿活跃用户为研究对象,计算出其中任何两个独立的用户之间平均所间隔的人数为4.74。
我们以“赵六”和“孙七“为例,查找任意两个人之间的关系链:
with recursive t(id, fid, hops, path)as(
select user_id, friend_id,0,CAST(CONCAT(user_id ,',', friend_id)ASCHAR(1000))
from t_friend
where user_id =6
union all
select t.id, f.friend_id, hops+1,CONCAT(t.path,',', f.friend_id)
from t join t_friend f
on(t.fid = f.user_id)and(FIND_IN_SET(f.friend_id, t.path)=0) and hops <5)
select *
from t where t.fid =7
order by hops;
id|fid|hops|path |--|---|----|-------------|6|7|2|6,4,1,7|6|7|3|6,4,3,1,7|6|7|3|6,4,1,8,7|6|7|4|6,4,3,1,8,7|6|7|4|6,4,3,2,1,7|6|7|5|6,4,1,2,5,8,7|6|7|5|6,4,3,2,1,8,7|6|7|5|6,4,3,2,5,8,7|
with recursive t(id, fid, hops, path)as(
select user_id, friend_id,0,CAST(CONCAT(user_id ,',', friend_id)ASCHAR(1000))
from t_friend
where user_id =6
union all
select t.id, f.friend_id, hops+1,CONCAT(t.path,',', f.friend_id)
from t join t_friend f
on(t.fid = f.user_id)and(FIND_IN_SET(f.friend_id, t.path)=0) and hops <5)
select avg(hops)
from t
order by hops;avg(hops)|---------|3.5116|
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
对于 QQ 这种加权图,可以在 t_friend 表中增加一个权重字段,从而分析好友的亲密度。
粉丝关系分析
对于微博这种有向图,对应的表结构可以设计如下:
-- 粉丝
create table t_follower(
user_id int not null,
follower_id int not null,
created_time timestamp not null,
primary key(user_id, follower_id));
insert into t_follower values(1,2, current_timestamp);
insert into t_follower values(1,3, current_timestamp);
insert into t_follower values(1,4, current_timestamp);
insert into t_follower values(1,7, current_timestamp);
insert into t_follower values(2,3, current_timestamp);
insert into t_follower values(3,4, current_timestamp);
insert into t_follower values(4,1, current_timestamp);
insert into t_follower values(5,2, current_timestamp);
insert into t_follower values(5,8, current_timestamp);
insert into t_follower values(6,4, current_timestamp);
insert into t_follower values(7,8, current_timestamp);
insert into t_follower values(8,1, current_timestamp);
insert into t_follower values(8,7, current_timestamp);-- 关注
create table t_followed(
user_id int not null,
followed_id int not null,
created_time timestamp not null,
primary key(user_id, followed_id));
insert into t_followed values(1,4, current_timestamp);
insert into t_followed values(1,8, current_timestamp);
insert into t_followed values(2,1, current_timestamp);
insert into t_followed values(2,5, current_timestamp);
insert into t_followed values(3,1, current_timestamp);
insert into t_followed values(3,2, current_timestamp);
insert into t_followed values(4,1, current_timestamp);
insert into t_followed values(4,3, current_timestamp);
insert into t_followed values(4,6, current_timestamp);
insert into t_followed values(7,1, current_timestamp);
insert into t_followed values(7,8, current_timestamp);
insert into t_followed values(8,5, current_timestamp);
insert into t_followed values(8,7, current_timestamp);
select f.followed_id, u.user_name
from t_followed f
join t_user u on(u.user_id = f.followed_id)
where f.user_id =1;
followed_id|user_name|-----------|---------|4|李四 |8|周八 |
1.
2.
3.
4.
5.
6.
7.
8.
“刘一”关注了“李四”和“周八”。
共同关注
我们还可以进一步获取和“刘一”具有相同关注的人的用户:
select r.follower_id, r.user_id
from t_followed d
join t_follower r on(r.user_id = d.followed_id and r.follower_id != d.user_id)
where d.user_id =1;
follower_id|user_id|-----------|-------|7|8|
1.
2.
3.
4.
5.
6.
7.
结果显示,“刘一”和“孙七”共同关注了“周八”。
我的粉丝
我们再来看看哪些用户是“刘一”的粉丝:
select f.follower_id, u.user_name
from t_follower f
join t_user u on(u.user_id = f.follower_id)
where f.user_id =1;
follower_id|user_name|-----------|---------|2|陈二 |3|张三 |4|李四 |7|孙七 |
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
“刘一”有 4 个粉丝。
相互关注
最后,我们看看哪些用户之间互为粉丝,或者互相关注:
select r.user_id, r.follower_id
from t_follower r
join t_followed d on (r.user_id = d.user_id and r.follower_id = d.followed_id and r.user_id < r.follower_id);
user_id|follower_id|
-------|-----------|
1| 4|
7| 8|