Research Article
Open Access
Exploiting Multi-Sources Information for
Location Recommendation
Henghong Yang^{1}, Wenguo Wei^{1*}, Huimin Zhao^{1} and Guiyuan Xie^{1}
^{1}School of Electronic and Information, Guangdong Polytechnic Normal University, Guangzhou, China
This work was supported in part by the Science and Technology Project of Guangzhou City (201802020019, 01806040010)
This work was supported in part by the Science and Technology Project of Guangzhou City (201802020019, 01806040010)
*Corresponding author:Wenguo Wei, School of Electronic and Information, Guangdong Polytechnic Normal University, Guangzhou, China; E-mail:
@
Received: July 22, 2019; Accepted: August 19, 2019; Published: August 21, 2019
Citation: Wenguo W, Yang H, Zhao H, Xie G (2019) Exploiting Multi-Sources Information for Location Recommendation. J Comp Sci Appl Inform Technol. 4(1): 1-8. DOI: 10.15226/2474-9257/4/1/00143
AbstractTop
In location-based social networks (LBSNs), user preference,
social influence and geographical influence are three major factors
that affect users’ check-in behaviors. However, current studies tend
to ignore the influence of the features from point-of-interest (POIs)
to similar user groups. In this paper, we proposed a new approach
named Base Item Attribute - Weighted Cosine Similarity (BIA-WCOS)
to model social relationship of users, which consider the influence of
location’s popularity and check-in frequency on user similarity. The
proposed Geographical - Base Location Attribute Social Relationship
(G-BLAS) framework is to exploit personalized social and geographical
influences on location recommendation. We conduct a comprehensive
performance evaluation of our approach using two real datasets
collected from Foursquare and Gowalla. Experimental results show
the effectiveness and advantages of our proposed approach.
Keywords: Matrix Factorization; Geographical Influence; Item Attribute; User Similarity; Location Recommendation;
Keywords: Matrix Factorization; Geographical Influence; Item Attribute; User Similarity; Location Recommendation;
IntroductionTop
With the advancement of mobile devices and the development
of the GPS technique, we have witnessed the increasing popularity
of location-based social networks (LBSNs) in recent years. In an
LBSN, users can establish social links with their friends and share
their experiences of visiting some locations they feel interesting,
also known as POIs via “check-in”, which can reflect their
preferences. Further, users can share their feeling via commenting
on the locations they visit. For example, Foursquare, one of the
most popular LBSNs that have over 50 million users, 105 million
places and 12 billion check-ins. Facing the huge amount of data,
the personalized location recommendation systems can help
users to explore new locations that they are interested in.
The most widely used approach to model user preference for location recommendation is collaborative filtering (CF) technique, where user’s check-in data is modeled as user-location matrix with each entry representing the frequency of a user visiting a location. The personalized location recommendation system aims at predicting a user’s preference on unvisited locations according to user check-in data and other contextual information, such as geographical influences [1-6], social relationship [7-10] and reviews [11, 12]. The features of POIs also have a certain influence on the social relationship. For example, compared to a popular location, users’ visiting the unpopular locations often reflect the similarities between them. How to model these features more effectively becomes a research focus in our paper.
In this paper, a novel approach is proposed for determining weighted user-similarity, in which we explore the influence of POI’s popularity and visit-frequency on social relationship according to user’s historical check-in data, and then we propose a location recommendation framework by fusing multiple contextual information such as user preference, social influence, the geographical influence to social relationship and the personalized geographical influence of locations. The results show that the approach we proposed has significantly improved the recommendation accuracy.
The remaining section of this paper is structured as follows. Section 2 highlights related work and background introduction. Section 3 presents in detail our proposed approach. In Section 4, we reported and analyzed the experimental results. Finally, some concluding remarks are drawn in Section 5.
The most widely used approach to model user preference for location recommendation is collaborative filtering (CF) technique, where user’s check-in data is modeled as user-location matrix with each entry representing the frequency of a user visiting a location. The personalized location recommendation system aims at predicting a user’s preference on unvisited locations according to user check-in data and other contextual information, such as geographical influences [1-6], social relationship [7-10] and reviews [11, 12]. The features of POIs also have a certain influence on the social relationship. For example, compared to a popular location, users’ visiting the unpopular locations often reflect the similarities between them. How to model these features more effectively becomes a research focus in our paper.
In this paper, a novel approach is proposed for determining weighted user-similarity, in which we explore the influence of POI’s popularity and visit-frequency on social relationship according to user’s historical check-in data, and then we propose a location recommendation framework by fusing multiple contextual information such as user preference, social influence, the geographical influence to social relationship and the personalized geographical influence of locations. The results show that the approach we proposed has significantly improved the recommendation accuracy.
The remaining section of this paper is structured as follows. Section 2 highlights related work and background introduction. Section 3 presents in detail our proposed approach. In Section 4, we reported and analyzed the experimental results. Finally, some concluding remarks are drawn in Section 5.
Related WorkTop
In this section, we reviewed the related work and background
knowledge of our study. These are organized in three parts, i.e.
geographical recommendation, social recommendation and other
contextual recommendation, as detailed below.
Geographical recommendation
Geographical information is the most important factor to POIs.
Characterizing user’s preference based on geographical features
of locations is a widely used method in the POI recommendation
system. [1] Considered that it is important to take implicit
feedback characteristics of user mobility data into account as
well as the location’s spatial information, thus that a scalable
and flexible framework GeoMF++ is proposed to recommend
locations. [2, 3] found that the geographical influence of visiting
the POI should be personalized for users, thus it is unreasonable to
use power-law distribution to model user’s check-in behavior. In
order to prevent the fitting error caused by specific distributions,
Zhang et al. proposed to use kernel density estimation method to
estimate the distance distribution between any two POIs and to
measure the geographical performance of the user check-in data.
In addition to the above-mentioned approach, there are also other methods proposed, such as the GeoMF model proposed in [13] to fuse geographical location information into weighted matrix factorization, and a rank-geoFM model proposed in [14] to sort the POIs based on pair wise sorting. These models can effectively model the characteristics of geographical information and can improve the accuracy of location recommendation.
In addition to the above-mentioned approach, there are also other methods proposed, such as the GeoMF model proposed in [13] to fuse geographical location information into weighted matrix factorization, and a rank-geoFM model proposed in [14] to sort the POIs based on pair wise sorting. These models can effectively model the characteristics of geographical information and can improve the accuracy of location recommendation.
Social recommendation
In addition to geographical influence, social network
information also plays an important role in the location-based
recommendation system. In real life, the user’s visit to POI is
largely influenced by the friends around him. For example,
friends may go shopping together, thus we can assume that
there are similar preferences among friends, and incorporating
social network information into recommendation system has a
corresponding gain effect.
Modeled and analyzed [7] the trajectory of user check-in location based on the HGSM (Hierarchical-graph-based Similarity Measurement) hierarchical similarity algorithm, which measured the similarity of user’s behavior and ranked the highest ranked users as close neighbors recommended for the target user. [8] Used the two-hop random walk algorithm to exploit explicit social relationships and implicit social relationships between users based on the traditional matrix factorization model.
Modeled and analyzed [7] the trajectory of user check-in location based on the HGSM (Hierarchical-graph-based Similarity Measurement) hierarchical similarity algorithm, which measured the similarity of user’s behavior and ranked the highest ranked users as close neighbors recommended for the target user. [8] Used the two-hop random walk algorithm to exploit explicit social relationships and implicit social relationships between users based on the traditional matrix factorization model.
Other contextual information
Other context information of the user’s visit to POI may also
has a corresponding influence on the recommendation effect, such
as the time of user’s visit and the feature of picture published by
the previous visitors. [15] Used the topic model (LDA) to exploit
the topic attributes of POI based on its tag and to determine the
user’s preference according to these attributes. [16] Expanded the
state-of-the art Rank-GeoFM POI recommender algorithm[[14]
to include some features of weather-related. [17] Proposed a
location-based recommendation algorithm that fuses temporal
information. The algorithm models the user’s check-in behavior
as a fourth-order tensor containing time periods, and combines
the geographical influence to recommend locations. [18] Believed
that it is beneficial to analyze user preference by the pictures that
they shared. Therefore, they proposed CNN based technique to
extract feature vectors of pictures based on matrix factorization
to improve the recommending accuracy.
In summary, the existing approaches have achieved certain results on predicting a user’s preference to a location, but there are still many problems since the characteristics of locations cannot be fully utilized. In this paper, we explored the influence of the location’s characteristics to social relationship. Further, we integrate the user preference, geographical influence and the social friendship of users into one framework for unified location recommendation.
In summary, the existing approaches have achieved certain results on predicting a user’s preference to a location, but there are still many problems since the characteristics of locations cannot be fully utilized. In this paper, we explored the influence of the location’s characteristics to social relationship. Further, we integrate the user preference, geographical influence and the social friendship of users into one framework for unified location recommendation.
The proposed approach
In this section, we first introduce the Matrix Factorization
technique with different feedback data, and then introduce the
model of geographical influence and social influence that we used
in this paper. Finally, we present the unified framework with all
these approaches integrated together.
The model of Matrix Factorization (MF)
There are currently two types of user-history-behavior
data for recommendation systems, i.e. explicit feedback data
and implicit feedback data. The explicit feedback can directly
represent the user’s preferences (such as the rating scores), while
implicit feedback means that the feedback information does not
reflect the user’s preference directly (such as click and browsing).
Here, we first introduce the matrix factorization that is suitable
for explicit feedback data [19], and then introduce the matrix
factorization that is suitable for implicit feedback data [20, 21] as
used in this paper.
Collaborative filtering (CF) is one of the most widely used approaches in location recommendation, which describes user preferences on locations [22, 23]. Given m users $(\text{u}\in {\text{u}}_{1},{\text{u}}_{2},\mathrm{.....}{\text{u}}_{\text{m}})$ and n locations ($(\text{l}\in {\text{l}}_{1},{\text{l}}_{2},\mathrm{.....}{\text{l}}_{\text{n}})$ the user’s check-in data are modeled as a user-location matrix $\text{R}\in \text{R}mxn$ by CF, where each entry of R represents the frequency of a user visiting a location. CF aims to map the users and locations into a space with dimension k< < min (m, n), and estimate users’ preferences on locations by the dot product of them, which is shown below:
$$\hat{\text{R}}={U}_{i}{L}_{j}^{T}\text{(1)}$$
where $\hat{\text{R}}$ denotes users’ preferences on locations, U_{i} and L_{j} denote the i^{th} row in U and the j^{th} row in L respectively, $\text{U}\in {\text{R}}^{\text{mxm}}$ denotes the user matrix, L∈Rnxn denotes the location matrix. In order to reduce the generalization error of the objective function, U and L can be used as the regularization terms. Thus, the function of minimize weighted square error loss is:
$$\text{P}=\underset{}{\mathrm{min}}\underset{U,L}{}\frac{1}{2}\left|\right|W.(R-\hat{\text{R}})|{|}_{F}^{2}+\frac{{\mu}_{1}}{2}|\left|U\right|{|}_{F}^{2}+\frac{{\mu}_{2}}{2}\left|\right|L|{|}_{F}^{2}\text{(2)}$$
Where $\text{W}\in {\text{R}}^{\text{mxn}}$ is a weighting matrix that represents a confidence of R_{ij} . If the value of R_{ij} is larger than 0, we set W_{ij} to 1, otherwise, we set W_{ij} to $0.\left|\right|.|{|}_{\text{F}}^{2}$ F Denotes the square of a matrix’s Frobenius norm. μ_{1} and μ_{2} are regularization parameters.
On the other hand, when we try to estimate user preference according to the implicit feedback data, the user-location matrix will be modeled as a binary matrix $\text{C}\in {\text{R}}^{\text{mxn}}$ . If ui has checked-in location l_{j} at least once, the C_{ij} is set to 1; otherwise it is set to 0. Thus the function of the minimized weighted square error loss is given as follows:
$$\text{P}=\underset{}{\mathrm{min}}\underset{U,L}{}\frac{1}{2}\left|\right|W.(C-\hat{\text{R}})|{|}_{F}^{2}+\frac{{\mu}_{1}}{2}|\left|U\right|{|}_{F}^{2}+\frac{{\mu}_{2}}{2}\left|\right|L|{|}_{F}^{2}\text{(3)}$$
Where the W_{ij} is set as:
$${W}_{ij}=\{\begin{array}{c}\eta {R}_{ij}+1,\text{}{R}_{ij}0\\ \text{}1,otherwise\end{array}\text{(4)}$$
The constant η denotes the rate of increase, in this paper we set η at 20.
Collaborative filtering (CF) is one of the most widely used approaches in location recommendation, which describes user preferences on locations [22, 23]. Given m users $(\text{u}\in {\text{u}}_{1},{\text{u}}_{2},\mathrm{.....}{\text{u}}_{\text{m}})$ and n locations ($(\text{l}\in {\text{l}}_{1},{\text{l}}_{2},\mathrm{.....}{\text{l}}_{\text{n}})$ the user’s check-in data are modeled as a user-location matrix $\text{R}\in \text{R}mxn$ by CF, where each entry of R represents the frequency of a user visiting a location. CF aims to map the users and locations into a space with dimension k< < min (m, n), and estimate users’ preferences on locations by the dot product of them, which is shown below:
$$\hat{\text{R}}={U}_{i}{L}_{j}^{T}\text{(1)}$$
where $\hat{\text{R}}$ denotes users’ preferences on locations, U_{i} and L_{j} denote the i^{th} row in U and the j^{th} row in L respectively, $\text{U}\in {\text{R}}^{\text{mxm}}$ denotes the user matrix, L∈Rnxn denotes the location matrix. In order to reduce the generalization error of the objective function, U and L can be used as the regularization terms. Thus, the function of minimize weighted square error loss is:
$$\text{P}=\underset{}{\mathrm{min}}\underset{U,L}{}\frac{1}{2}\left|\right|W.(R-\hat{\text{R}})|{|}_{F}^{2}+\frac{{\mu}_{1}}{2}|\left|U\right|{|}_{F}^{2}+\frac{{\mu}_{2}}{2}\left|\right|L|{|}_{F}^{2}\text{(2)}$$
Where $\text{W}\in {\text{R}}^{\text{mxn}}$ is a weighting matrix that represents a confidence of R_{ij} . If the value of R_{ij} is larger than 0, we set W_{ij} to 1, otherwise, we set W_{ij} to $0.\left|\right|.|{|}_{\text{F}}^{2}$ F Denotes the square of a matrix’s Frobenius norm. μ_{1} and μ_{2} are regularization parameters.
On the other hand, when we try to estimate user preference according to the implicit feedback data, the user-location matrix will be modeled as a binary matrix $\text{C}\in {\text{R}}^{\text{mxn}}$ . If ui has checked-in location l_{j} at least once, the C_{ij} is set to 1; otherwise it is set to 0. Thus the function of the minimized weighted square error loss is given as follows:
$$\text{P}=\underset{}{\mathrm{min}}\underset{U,L}{}\frac{1}{2}\left|\right|W.(C-\hat{\text{R}})|{|}_{F}^{2}+\frac{{\mu}_{1}}{2}|\left|U\right|{|}_{F}^{2}+\frac{{\mu}_{2}}{2}\left|\right|L|{|}_{F}^{2}\text{(3)}$$
Where the W_{ij} is set as:
$${W}_{ij}=\{\begin{array}{c}\eta {R}_{ij}+1,\text{}{R}_{ij}0\\ \text{}1,otherwise\end{array}\text{(4)}$$
The constant η denotes the rate of increase, in this paper we set η at 20.
Geographical influence
As aforementioned, MF can effectively estimate user
preference and the relations associated with almost locations
by mapping the check-in data into user-location rating matrix.
However, the geographical influence plays an important role in
location recommendation.
In our lives, it can be discovered that users would like to visit the locations that are close to each other in geography. For example, people usually visit nearby locations such as restaurants or shopping malls after watching the movies. Therefore, we consider using geographical neighborhood characteristic on locations to improve the recommendation accuracy. In this paper we define the user’s preferences by fusing the geographical neighborhood characteristic [5] by:
$$\underset{\text{}U,L}{\mathrm{min}}\frac{1}{2}\left|\right|W.(C-U{L}^{T}G)|{|}_{F}^{2}+\frac{{\mu}_{1}}{2}|\left|U\right|{|}_{F}^{2}+\frac{{\mu}_{2}}{2}\left|\right|L|{|}_{F}^{2}\text{(5)}$$
Where $G=\alpha \text{H}+(1-\alpha ){\text{S}}^{\text{T}}\text{,H}\in {\text{R}}^{\text{n\xd7n}}$ is an identity matrix $\text{S}\in {\text{R}}^{\text{n\xd7n}}$ for which ${S}_{jk}=Sim({l}_{j},{l}_{k})/Z({l}_{j}),$ and $\alpha \in \left[0,1\right]$ is a weighting parameter used to control the influence of the neighborhood locations; D(l_{j}) is a set that represents the neighboring locations of ${l}_{j}.Sim({l}_{j},{l}_{k})$ Refers to the weight of geographical influence of the location ${\text{l}}_{\text{k}}\text{on}{l}_{j}.Z({l}_{j})$ is a normalizing factor, which is defined as $\text{Z}\left({l}_{j}\right)=\underset{}{{\displaystyle {\sum}_{{l}_{k}\in D({l}_{j})}}}Sim({l}_{j},{l}_{k})$ , where $Sim({l}_{j},{l}_{k})$ is a Gaussian function as follows:
$$\text{Sim}\left({l}_{j},{l}_{k}\right)={e}^{-\frac{\left|\right|{x}_{j}-{x}_{k}|{|}^{2}}{{\sigma}^{2}}\text{}\forall {l}_{k}\in D({l}_{j})}\text{(6)}$$
Where x_{j} and x_{k} represent the longitude and latitude of the location l_{j} and l_{k} respectively. For the maximum distance between the two locations, we set a threshold as 10000, and l_{k} will not be considered if the distance is exceeded that threshold.
In our lives, it can be discovered that users would like to visit the locations that are close to each other in geography. For example, people usually visit nearby locations such as restaurants or shopping malls after watching the movies. Therefore, we consider using geographical neighborhood characteristic on locations to improve the recommendation accuracy. In this paper we define the user’s preferences by fusing the geographical neighborhood characteristic [5] by:
$$\underset{\text{}U,L}{\mathrm{min}}\frac{1}{2}\left|\right|W.(C-U{L}^{T}G)|{|}_{F}^{2}+\frac{{\mu}_{1}}{2}|\left|U\right|{|}_{F}^{2}+\frac{{\mu}_{2}}{2}\left|\right|L|{|}_{F}^{2}\text{(5)}$$
Where $G=\alpha \text{H}+(1-\alpha ){\text{S}}^{\text{T}}\text{,H}\in {\text{R}}^{\text{n\xd7n}}$ is an identity matrix $\text{S}\in {\text{R}}^{\text{n\xd7n}}$ for which ${S}_{jk}=Sim({l}_{j},{l}_{k})/Z({l}_{j}),$ and $\alpha \in \left[0,1\right]$ is a weighting parameter used to control the influence of the neighborhood locations; D(l_{j}) is a set that represents the neighboring locations of ${l}_{j}.Sim({l}_{j},{l}_{k})$ Refers to the weight of geographical influence of the location ${\text{l}}_{\text{k}}\text{on}{l}_{j}.Z({l}_{j})$ is a normalizing factor, which is defined as $\text{Z}\left({l}_{j}\right)=\underset{}{{\displaystyle {\sum}_{{l}_{k}\in D({l}_{j})}}}Sim({l}_{j},{l}_{k})$ , where $Sim({l}_{j},{l}_{k})$ is a Gaussian function as follows:
$$\text{Sim}\left({l}_{j},{l}_{k}\right)={e}^{-\frac{\left|\right|{x}_{j}-{x}_{k}|{|}^{2}}{{\sigma}^{2}}\text{}\forall {l}_{k}\in D({l}_{j})}\text{(6)}$$
Where x_{j} and x_{k} represent the longitude and latitude of the location l_{j} and l_{k} respectively. For the maximum distance between the two locations, we set a threshold as 10000, and l_{k} will not be considered if the distance is exceeded that threshold.
Social relationship
In our reality life, it can be found that people with similar
interests are more likely to form relationships, such as friendships
and emotional relationship. People often go to restaurants or other
places recommended by friends, which reflects that the users’
check-in behaviors are greatly affected by social relationship.
Based on these observations, the social relationship has been
modeled to improve the accuracy of location recommendation.
In this paper we believe that the social relationships between users are mutual, as shown in Figure 1. The cosine similarity [24, 25] is used here to measure the similarity between users. Given the individual user u_{i} and u_{v}, let L be a set of locations, and l_{k} denotes a location belonging to the L. The definition of the user similarity is defined by:
In this paper we believe that the social relationships between users are mutual, as shown in Figure 1. The cosine similarity [24, 25] is used here to measure the similarity between users. Given the individual user u_{i} and u_{v}, let L be a set of locations, and l_{k} denotes a location belonging to the L. The definition of the user similarity is defined by:
Figure 1:Social network
$$\text{sim}\left(\text{i},\text{v}\right)=\frac{{{\displaystyle \sum}}_{{l}_{k}\in L}{S}_{ik}{S}_{vk}}{\sqrt{{{\displaystyle \sum}}_{{l}_{k}\in L}{S}_{ik}^{2}}\sqrt{{{\displaystyle \sum}}_{{l}_{k}\in L}{S}_{vk}^{2}}}\text{(7)}$$
Where sim(i,v) denotes the similarity between ui and u_{v}. S_{ik}
and S_{vk} indicate whether users u_{i} and uv are checked in at location
l_{k} or not. We set S_{ik} to one if the user u_{i} has checked-in at location
l_{k} at least once, otherwise, we set Sik to zero. The same processes
are used for S_{vk}.
Generally, traditional similarity calculations suggest that each item has the same weights to influence each other, while in reality we discover that the higher frequency that users visit the same location, the greater impact on user similarity between them. For example, for location lk, let users A, B and C visit it for 5, 2 and 6 times, respectively. Based on the theory, the higher frequency of visits represents the greater the user preference. Compared to user B, the user similarity between user A and user C should be larger because they have a higher degree of common visits [26]. On the other hand, we think that the factors that users visit the location can be divided into subjective factors and objective factors. The subjective factor is the user’s own preference for the type of location, and the objective factor is the popularity of the location. For the location assuming that it has a high popularity, the number of users who visit the location will continue to increase, so the user’s access to the location is not highly relevant to the type of itself. Then when the location lk has been visited by many people, it is difficult to find a project similar to it. Therefore, the higher the popularity of the location that users access together, the weaker the influence on the similarity between users should be, and the corresponding weight should also be smaller. According to these observations, we proposed a novel approach to estimate the user similarity base on the cosine similarity as follow:
$${\lambda}_{k}=\left(\frac{{r}_{ik}+{r}_{vk}}{2}\right)\mathrm{ln}\left|\frac{m}{{I}_{k}}\right|\text{(8)}$$
where r_{ik} and r_{vk} indicate the frequency of user u_{i} and u_{v} visiting the location l_{k} respectively. In addition, l_{k} denotes the number of people who checked in at the location l_{k}, m denotes the total number of users. In this paper we use the ratio to denote the influence of the location popularity. We combined the two location characteristics as weighting parameter λ_{k} and fused it into the user similarity, which is modeled as follows:
$$si{m}^{new}\left(\text{i},\text{v}\right)=\frac{{{\displaystyle \sum}}_{{l}_{k}\in L}{\lambda}_{k}{S}_{ik}{S}_{vk}}{\sqrt{{{\displaystyle \sum}}_{{l}_{k}\in L}{\lambda}_{k}{S}_{ik}^{2}}\sqrt{{{\displaystyle \sum}}_{{l}_{k}\in L}{\lambda}_{k}{S}_{vk}^{2}}}\text{(9)}$$
Generally, traditional similarity calculations suggest that each item has the same weights to influence each other, while in reality we discover that the higher frequency that users visit the same location, the greater impact on user similarity between them. For example, for location lk, let users A, B and C visit it for 5, 2 and 6 times, respectively. Based on the theory, the higher frequency of visits represents the greater the user preference. Compared to user B, the user similarity between user A and user C should be larger because they have a higher degree of common visits [26]. On the other hand, we think that the factors that users visit the location can be divided into subjective factors and objective factors. The subjective factor is the user’s own preference for the type of location, and the objective factor is the popularity of the location. For the location assuming that it has a high popularity, the number of users who visit the location will continue to increase, so the user’s access to the location is not highly relevant to the type of itself. Then when the location lk has been visited by many people, it is difficult to find a project similar to it. Therefore, the higher the popularity of the location that users access together, the weaker the influence on the similarity between users should be, and the corresponding weight should also be smaller. According to these observations, we proposed a novel approach to estimate the user similarity base on the cosine similarity as follow:
$${\lambda}_{k}=\left(\frac{{r}_{ik}+{r}_{vk}}{2}\right)\mathrm{ln}\left|\frac{m}{{I}_{k}}\right|\text{(8)}$$
where r_{ik} and r_{vk} indicate the frequency of user u_{i} and u_{v} visiting the location l_{k} respectively. In addition, l_{k} denotes the number of people who checked in at the location l_{k}, m denotes the total number of users. In this paper we use the ratio to denote the influence of the location popularity. We combined the two location characteristics as weighting parameter λ_{k} and fused it into the user similarity, which is modeled as follows:
$$si{m}^{new}\left(\text{i},\text{v}\right)=\frac{{{\displaystyle \sum}}_{{l}_{k}\in L}{\lambda}_{k}{S}_{ik}{S}_{vk}}{\sqrt{{{\displaystyle \sum}}_{{l}_{k}\in L}{\lambda}_{k}{S}_{ik}^{2}}\sqrt{{{\displaystyle \sum}}_{{l}_{k}\in L}{\lambda}_{k}{S}_{vk}^{2}}}\text{(9)}$$
Unified framework: G-BLAS
In this section, we introduce the integrated model of our
approach. According to the Eq.(5) and Eq.(9), the proposed
framework to combined three factors is shown as formula (10)
below:
$$\text{P}=\underset{U,L}{\mathrm{min}}\frac{1}{2}\left|\right|W.(C-U{L}^{T}G)|{|}_{F}^{2}+\frac{{\mu}_{1}}{2}|\left|U\right|{|}_{F}^{2}+\frac{{\mu}_{2}}{2}\left|\right|L|{|}_{F}^{2}+\frac{{\mu}_{3}}{2}{\sum}_{i=1}^{m}{\sum}_{v\in u}si{m}^{new}(i,v)||{u}_{i}-{u}_{v}|{|}_{F}^{2}\text{(10)}$$
Where μ_{1} and μ_{1} are the weighting parameters that control the U and L respectively, and μ_{3} denotes the weighting parameter that controls the influence of social relationship on recommendation. ${\text{||U||}}_{\text{F}}^{2}$ And ${\text{||L||}}_{\text{F}}^{2}$ are used as regularization terms to prevent over-fitting.
In this paper we used the gradient descent algorithm [15] to obtain the optimal solution of Eq. (10), the partial derivative of U and L is given by:
$$\frac{\partial P}{\partial L}=W.\left(U{L}^{T}G-C\right)GU+{\text{\mu}}_{2}{l}_{j}\text{(11)}$$
$$\frac{\partial P}{\partial U}=W.\left(U{L}^{T}G-C\right){G}^{T}L+{\text{\mu}}_{1}{u}_{i}+{\text{\mu}}_{3}{\displaystyle \sum}_{v\in U}si{m}^{new}(i,v)({u}_{i}-{u}_{v})\text{(12)}$$
$$\text{P}=\underset{U,L}{\mathrm{min}}\frac{1}{2}\left|\right|W.(C-U{L}^{T}G)|{|}_{F}^{2}+\frac{{\mu}_{1}}{2}|\left|U\right|{|}_{F}^{2}+\frac{{\mu}_{2}}{2}\left|\right|L|{|}_{F}^{2}+\frac{{\mu}_{3}}{2}{\sum}_{i=1}^{m}{\sum}_{v\in u}si{m}^{new}(i,v)||{u}_{i}-{u}_{v}|{|}_{F}^{2}\text{(10)}$$
Where μ_{1} and μ_{1} are the weighting parameters that control the U and L respectively, and μ_{3} denotes the weighting parameter that controls the influence of social relationship on recommendation. ${\text{||U||}}_{\text{F}}^{2}$ And ${\text{||L||}}_{\text{F}}^{2}$ are used as regularization terms to prevent over-fitting.
In this paper we used the gradient descent algorithm [15] to obtain the optimal solution of Eq. (10), the partial derivative of U and L is given by:
$$\frac{\partial P}{\partial L}=W.\left(U{L}^{T}G-C\right)GU+{\text{\mu}}_{2}{l}_{j}\text{(11)}$$
$$\frac{\partial P}{\partial U}=W.\left(U{L}^{T}G-C\right){G}^{T}L+{\text{\mu}}_{1}{u}_{i}+{\text{\mu}}_{3}{\displaystyle \sum}_{v\in U}si{m}^{new}(i,v)({u}_{i}-{u}_{v})\text{(12)}$$
Experimental Results and AnalysisTop
In this section we first introduce the dataset, performance
metrics and parameter settings that we used in the experiments,
followed by detailed comparison of the results and analysis of the
performance of each approach.
Dataset Description
The experimental data used in this study are collected from
Foursquare and Gowalla , which are the two most popular LBSNs.
Foursquare encourages users to share information such as their
current locations, which contains 1,196,248 check-ins for 24,941
users to 28,593 POIs. Gowalla is a second check-in website after
Foursquare. Users can share information about places, activities,
travel routes, etc. among friends on it. It contains 6, 941,890 checkin
data for 196,591 users to 950,327 POIs. Each of the check-in
data includes userID, locationID, and the coordinate of POI and
check-in time. We extracted some data for experimentation and
the detailed statistics of the check-in data in the datasets are
summarized in Table 1.
The Foursquare dataset for this experiment contains 496,488 check-in data for 13,805 users to 19,587 POIs. The Gowalla dataset contains 161, 553 check-ins for 5, 433 users to 9, 687 POIs. In our experiment, the data needs to be pre-processed due to the sparseness, we filtered out the users whose check-in times are less than 10 and the locations are visited by less than 10 users. Finally, based on the five-fold cross-validation method, we randomly split each dataset into training set and the testing set and the average of the test results is taken as the experimental result.
The Foursquare dataset for this experiment contains 496,488 check-in data for 13,805 users to 19,587 POIs. The Gowalla dataset contains 161, 553 check-ins for 5, 433 users to 9, 687 POIs. In our experiment, the data needs to be pre-processed due to the sparseness, we filtered out the users whose check-in times are less than 10 and the locations are visited by less than 10 users. Finally, based on the five-fold cross-validation method, we randomly split each dataset into training set and the testing set and the average of the test results is taken as the experimental result.
Table 1: Statistics of the two datasets
Datasets |
No. of users |
No. of locations |
No. of check-ins |
user-location matrix density |
Foursquare |
13805 |
19587 |
496488 |
1.83×10-3 |
Gowalla |
5433 |
9687 |
161553 |
3.06×10-3 |
Performance Metrics
In this work, we use two widely used metrics (precision
and recall) to evaluate the performance of the approach that
we proposed. The precision and recall of the top-K location
recommendations to a target user are denoted by P@K and R@K
respectively. P@K defines the ratio of the discovered locations
to the K recommended locations, and R@K defines the ratio of
discovered locations to the set of locations that the target user
has visited in the testing data. Generally, the higher the precision
and recall values are, the better the performance is. P@K and
R@K are defined as follows:
$$\text{Precision@K}=\frac{1}{\left|T\right|}{{\displaystyle \sum}_{}^{}}_{i=1}^{r}\frac{\left|R\left({u}_{i}\right){{\displaystyle \cap}}^{\text{}}E\left({u}_{i}\right)\right|}{K}\text{(13)}$$
$$\text{Precision@K}=\frac{1}{\left|T\right|}{{\displaystyle \sum}_{}^{}}_{i=1}^{r}\frac{\left|R\left({u}_{i}\right){{\displaystyle \cap}}^{\text{}}E\left({u}_{i}\right)\right|}{|R({u}_{i})|}\text{(14)}$$
Where u_{i} denotes a user, R(u_{i}) denotes the set of locations that the user u_{i} have visited in the testing data, E(u_{i}) denotes the set of locations which is recommended to user u_{i}, T denotes the set of users in the testing data. In particular, users are more inclined to the results of high recommendation rankings; therefore we choose P@5, P@10, R@5, and R@10 as evaluation metrics in our experiments. For the regularization parameters μ_{1} and μ_{2} we set at 0.03. Furthermore, we set the instance weighting parameter α at 0.4. The weighting parameter that controls the social relationship μ_{3} is set at 0.01 by cross-validation.
$$\text{Precision@K}=\frac{1}{\left|T\right|}{{\displaystyle \sum}_{}^{}}_{i=1}^{r}\frac{\left|R\left({u}_{i}\right){{\displaystyle \cap}}^{\text{}}E\left({u}_{i}\right)\right|}{K}\text{(13)}$$
$$\text{Precision@K}=\frac{1}{\left|T\right|}{{\displaystyle \sum}_{}^{}}_{i=1}^{r}\frac{\left|R\left({u}_{i}\right){{\displaystyle \cap}}^{\text{}}E\left({u}_{i}\right)\right|}{|R({u}_{i})|}\text{(14)}$$
Where u_{i} denotes a user, R(u_{i}) denotes the set of locations that the user u_{i} have visited in the testing data, E(u_{i}) denotes the set of locations which is recommended to user u_{i}, T denotes the set of users in the testing data. In particular, users are more inclined to the results of high recommendation rankings; therefore we choose P@5, P@10, R@5, and R@10 as evaluation metrics in our experiments. For the regularization parameters μ_{1} and μ_{2} we set at 0.03. Furthermore, we set the instance weighting parameter α at 0.4. The weighting parameter that controls the social relationship μ_{3} is set at 0.01 by cross-validation.
Benchmarking Algorithms
1) Base Matrix Factorization [27] (BaseMF): This is the based matrix factorization approach designed for explicit feedback datasets, which is used to predict a user’s preference by considering the check-in data.
2) Weighting Matrix Factorization [22] (WMF): This is the weight matrix factorization approach designed for implicit feedback datasets. It predicts a user’s preference without considering the geographical Influence, social relationship of users and other context information.
3) The approach in [28] is defined as USG in this paper: This approach is based on the observation that a user visiting a location follows the power law distribution. A unified location recommendation framework is used to linearly combine the geographical influence and the user’s preference.
4) The approach in [29] is defined as NCPD in this paper: This approach classifies the POIs by geographical neighborhood characteristics and fuses the location and popularity of POIs to predict a user’s preference based on the Non-negative Matrix Factorization (NMF).
2) Weighting Matrix Factorization [22] (WMF): This is the weight matrix factorization approach designed for implicit feedback datasets. It predicts a user’s preference without considering the geographical Influence, social relationship of users and other context information.
3) The approach in [28] is defined as USG in this paper: This approach is based on the observation that a user visiting a location follows the power law distribution. A unified location recommendation framework is used to linearly combine the geographical influence and the user’s preference.
4) The approach in [29] is defined as NCPD in this paper: This approach classifies the POIs by geographical neighborhood characteristics and fuses the location and popularity of POIs to predict a user’s preference based on the Non-negative Matrix Factorization (NMF).
Experimental Results
Figure 2 and Figure 3 depict the performance of the
recommendation techniques based on the real dataset collected
from Foursquare and Gowalla, using precision and recall
measurements. Experimental results show that our framework
G-BLAS outperforms the baseline matrix factorization model
(i.e., BaseMF, WMF), which has not used any other contextual
information, especially when compared to the approaches that
utilize geographical characteristics or social relationship (i.e.,
USG, NCPD). The details are demonstrated as follows.
Figure 2:The performance of the recommendation techniques on Foursquare
Figure 3:The performance of the recommendation techniques on Gowalla
Figure 2(a) and 2(b) show the performance of a variety of
recommendation techniques on the foursquare dataset. Taking
P@5 as an example, it can be observed that the performance of
WMF model is 19.23% which is higher than that of BaseMF model.
This shows that the WMF model can more effectively model the
user check-in data by assigning appropriate weights, which
improves the recommendation performance. However, both WMF
and BaseMF only consider the users’ check-in data and the models
are relatively simple, so the precision and recall rates of them are
lower than other approaches. The USG fuses user preference,
geographical influence, and social relationship to predict the users’
preference on unvisited location, which outperforms BaseMF
and WMF by 53.85% and 29.03% respectively. Similarly, NCPD
also considers the user preference information and geographical
influence of POIs, but its performance is only higher than the
USG model by 15.14%. One possible reason is that USG considers
that the power-law distribution is satisfied between the distance
and the check-in probability, which is inconsistent with reality.
Not all data sets are applicable to a specific distribution, and it
also indicates that the influence of the user’s social relationship
is weaker than the geographical influence. In addition, from the
result we can find that our approach always achieves the best
result on the evaluation metrics, in terms of P@5, where the
average improvements of G-BLAS over BaseMF, WMF, USG and
NCPD are 95.15%, 61.52%, 27.51%, 10.87% respectively. It has
demonstrated that the proposed G-BLAS can effectively mitigate
the data sparsity and significantly improve the recommendation
accuracy by fusing user preference and incorporating multiple
context information.
Figure 3(a) and 3(b) show the performance of a variety of recommendation techniques on the Gowalla dataset. Because of the lower sparseness of the check-in data, the performance of each approach on the Gowalla dataset is better than that on the foursquare dataset. In general, compared with other approach that we have benchmarked with, the performances of our proposed approach again show significant improvements, which validate the effectiveness of our approach.
Figure 3(a) and 3(b) show the performance of a variety of recommendation techniques on the Gowalla dataset. Because of the lower sparseness of the check-in data, the performance of each approach on the Gowalla dataset is better than that on the foursquare dataset. In general, compared with other approach that we have benchmarked with, the performances of our proposed approach again show significant improvements, which validate the effectiveness of our approach.
Parameter Tuning
In our proposed approach, the parameter α denotes the weight
that the influence of geographical neighborhood characteristics to
user preference, and the impact of social relationship on location
recommendation is controlled by the parameter μ_{3}. In Figure 4
we tested the effect of different parameter α on recommended
performance (setting the number of recommended locations as
5). As seen from the two figures, we have derived interesting
findings: (1) our approach achieves the better results when the
parameter α is set at 0.4; (2) if we don’t consider the influence
of geographical neighborhood characteristic, in other words the
parameter α is set at 0, the recommendation accuracy will be
degraded. It validates the importance of geographical influence
in the process of location recommendation.
In Figure 5 we investigate the performance of our approach under different values of μ_{3}. A lager μ_{3} indicates that the social relationship of users is more closely linked to location recommendation. As seen, μ3 = 0.01 is the most suitable setting in the two datasets. Furthermore, when parameter μ_{3} is larger than 1, the fluctuation of performance becomes less obvious.
In Figure 5 we investigate the performance of our approach under different values of μ_{3}. A lager μ_{3} indicates that the social relationship of users is more closely linked to location recommendation. As seen, μ3 = 0.01 is the most suitable setting in the two datasets. Furthermore, when parameter μ_{3} is larger than 1, the fluctuation of performance becomes less obvious.
Figure 4:Effects of parameter α on recommendation accuracy (μ_{3}= 0)
Figure 5:Effects of parameter μ_{3} on recommendation accuracy(α= 1 )
Table 2: Performance comparison of recommended technologies on Foursquare
Other model performance improvements compared to BaseMF |
|||||||
Foursquare |
WMF |
USG |
NCPD |
G-BLAS |
|||
19.23% |
53.85% |
76.92% |
95.15% |
||||
Other model performance improvements compared to WMF |
|||||||
Foursquare |
USG |
NCPD |
G-BLAS |
||||
29.03% |
48.39% |
64.52% |
|||||
Other model performance improvements compared to USG |
|||||||
Foursquare |
NCPD |
G-BLAS |
|||||
15.14% |
27.51% |
||||||
Other model performance improvements compared to NCPD |
|||||||
Foursquare |
G-BLAS |
||||||
10.87% |
ConclusionsTop
In this paper, we have explored the influence of the location
characteristic on social relationship and then proposed a novel
approach to measure the similarity between users. Furthermore,
we proposed a framework to more accurately model user’s
preferences on location by fusing the geographical neighborhood
characteristics and the social relationship of users. From the
results, we can see that our approach achieves significantly
higher recommendation performance than the state-of-the-art
approaches, including Based, WMF, USG and NCPD.
There are two directions for future investigation:(1) how to fuse the influence of temporal information on user’s preference to further extend our framework, and (2) how to capture more relationship between locations to improve the recommendation accuracy.
There are two directions for future investigation:(1) how to fuse the influence of temporal information on user’s preference to further extend our framework, and (2) how to capture more relationship between locations to improve the recommendation accuracy.
Author ContributionsTop
Conceptualization and Methodology, HH.Y, WG.W; software,
HH.Y. and GY.X; writing–original draft preparation, HH.Y; writing–
review and editing, HM.Z and HH.Y.; supervision, WG.W.
Conflicts of InterestTop
The authors declare no conflict of interest
ReferencesTop
- Lian D, Zheng K, Ge Y, Cao L, Chen E, Xie X. GeoMF++: Scalable Location Recommendation via Joint Geographical Modeling and Matrix Factorization. ACM Transactions on Information Systems. 2017; 1(1):1-29. Doi: 10.1145/nnnnnnn.nnnnnnn
- Yao L, Sheng Q Z, Wang X, ZHANG W, QIN Y. Collaborative Location Recommendation by Integrating Multi-dimensional Contextual Information. ACM Transactions on Internet Technology. 2018;18(3):1-24.
- ZHANG J，CHOW C Y, LI Y. igeorec: A personalized and efficient geographical location recommendation framework. IEEE Transactions on Services Computing.2015;8(5):701-714.doi: 10.1109/TSC.2014.2328341
- Zhao G, Qian X, Chen K. Service Rating Prediction by Exploring Social Mobile Users’ Geographic Locations[J]. IEEE Transactions on Big Data.2017; 1-12.doi: 10.1109/TBDATA.2016.2552541
- Liu Y, Wei W, Sun AX, Miao C. Exploiting geographical neighborhood characteristics for location recommendation. Proc of the 23rd ACM Conf on Information and Knowledge Management.2014; 739-748. doi:10.1145/2661829.2662002
- Zhang DC, Li M, Wang CD. Point of interest recommendation with social and geographical influence. 2016 IEEE International Conference on Big Data (Big Data). IEEE. 2017; doi: 10.1109/BigData.2016.7840709
- Zheng Y, Zhang L, Ma Z, XIE X, Ma WY. Recommending friends and locations based on individual location history. ACM Transactions. 2011;5(1):99-111. doi:10.1145/1921591.1921596
- Lin C, Qixiang S, Yulin L, Xixi Q. Research on the recommendation of point-of-interest based on potential geography-social relationship perception.Journal of Suzhou University.2017;2017(09):101-107.Data Mining. New York: ACM,2013:221-229
- WANG S, WANG Y, TANG J, SHU K, RANGANATH S, LIU H. What your images reveal: Exploiting visual contents for point-of-interest Recommendation. Proceedings of the 26th international conference on World Wide Web. 2017; 391 – 400. Doi: 10.1145/3038912.3052638
- Yihao Z, Liang L I, Qinghua Z, et al. Personalized Recommendation Algorithm of Social Network Based on Global Similarity. Computer Engineering. 2018;
- Yunlei M, Chunhui Z, Dongjin Y. A rating prediction framework based on distributed representation of document and regression model. Computer Era. 2016;
- Hu GN, Dai XY, Qiu FY, Li T, Huang SJ, Chen JJ. Collaborative Filtering with Topic and Social Latent Factors Incorporating Implicit Feedback. ACM Transactions on Knowledge Discovery from Data. 2018; 12(2):1-30. Doi:10.1145/3127873
- Lian DF, Zhao C, Xie X, Sun G, Chen E, Rui Y. GeoMF: Joint geographical modeling and matrix factorization for point-of-interest recommendation. Proc of the 20th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining (KDD, 14). New York: ACM. 2014;831-840.
- LI X, CONG G, LI X-L, Nguyen Pham TA, and Krishnaswamy S. Rank-Geofm: A Ranking Based Geographical Factorization Method for Point of Interest Recommendation. Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 2015; 433 – 442. Doi:10.1145/2766462.2767722
- Yin H, Cui B, Sun Y, Hu Z, Chen L. LCARS: A Spatial Item Recommender System. ACM Transactions on Information Systems. 2014;32(3): doi:10.1145/2629461
- Trattner C, Oberegger A, Marinho L, Parra D. Investigating the Utility of the Weather Context for Point of Interest Recommendations. Information Technology & Tourism.2018;19(1/4):117-150
- Danxia L, Lerong M, Jing H. Successive point-of-interest recommendation with spatial-temporal influence in LBSN. Application Research of Computers.36(12):
- WANG S, WANG Y, TANG J, SHU K, RANGANATH S, LIU H. What your images reveal: Exploiting visual contents for point-of-interest Recommendation. Proceedings of the 26th international conference on World Wide Web. 2017; 391 – 400. Doi: 10.1145/3038912.3052638
- WU Qingchun and JIA Caiyan. A Matrix Decomposition Recommendation Method Fusing Social Relations. Computer Engineering. 2019;
- Yin J, Wang ZS, Li Q, Su WJ. Personalized recommendation based on large-scale implicit feedback. Journal of Software. 2014;25(9):1953-1966. doi: 10.13328/j.cnki.jos.004648
- XING Yuying, XIA Hongbin, WANG Han. Improved ALS online recommendation algorithm with missing data modeling. Computer Engineering. 2018;44(8): 212-2117, 223.
- M. Jamali and M. Ester. TrustWalker: A random walk model for combining trust-based and item-based recommendation. Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009; DOI: 10.1145/1557019.1557067
- SU Chang, WU Peng-fei, XIE Xian-zhong, et al. Point of Interest Recommendation Based on User’s Interest and Geographic Factors. COMPUTER SCIENCE. 2019(4):228-234.
- WANG Yun-chao, LIU Zhen. Collaborative Filtering Algorithm Based on User’s Preference for Items and Attributes. COMPUTER SCIENCE.2018; 45(11A).
- ZHANG Li. Research on Advanced User Similarity on User-Based Collaborative Filtering. Modern Computer.2019;34-38.
- Luo J, Zhu W, University C. User similarity function considering weight of items similarity. Computer Engineering & Applications.2015;
- Koren Y. Factorization meets the neighborhood: a multifaceted collaborative filtering model. Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008;426-434. Doi:10.1145/1401890.1401944
- Ye M, Yin P, Lee W C, Lee D K . Exploiting geographical influence for collaborative point-of-interest recommendation. Proc of the 34th ACM SIGIR International Conference on Research and Development in Information Retrieval.2011;325-334. Doi:10.1145/2009916.2009962
- Hu L, Sun A, Liu Y. Your neighbors affect your ratings: on geographical neighborhood influence to rating prediction. Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. ACM, 2014; 345-354. Doi:10.1145/2600428.2609593