This thing is live!
Try it here: http://fas.f4cio.com
Frequently Asked Questions technique (FAQ) is technique commonly used today to answer large number of user queries which are in natural language form and where many of them should result with same answer. In this work I utilized the fact that those queries can be grouped (clustered) by their answers in order to achieve better performance in terms of precision and recall when searching for answers during new query. While moving through user interface users actually give feedback to clustering algorithm making it semi-automatic. At the end I compared performance of system in different modes: read-only mode (classical searchable FAQ), non-clustering mode (answer is added to each question if similar is already answered, no grouping), clustering mode.
Results showed that system based on proposed clustering and implicit users evaluation, can, if answer already exist, answer to 84% of users questions by giving them 4 results and can, again, if answer already exist, answer to 30% of questions by giving them single correct answer.
Whole document can be downloaded here. There is only Serbian version at the moment.
Creating FAQ Based on Clustering by Users Implicit Feedback.pdf
Source code is also available but under license from below.
FAS Source Code
This software is zero-price, open source software that can be freely distributed but can be used and/or modified only for research purposes. Whole software or any of its parts can not be used in commercial purposes without approval from author (but author would like to see it how it proves itself in practice especially in large scale environment).
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.