7    VisuAlgo.net / /suffixtree Login 后缀树。
示例模式 ▿

>

>
go to beginning previous frame pause play next frame go to end

A Suffix Tree is a compressed tree containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix Tree provides a particularly fast implementation for many important string operations. This data structure is very related to Suffix Array data structure.


Remarks: By default, we show e-Lecture Mode for first time (or non logged-in) visitor.
Please login if you are a repeated visitor or register for an (optional) free account first.

Print this e-Lecture
X Esc
下一个 PgDn

The suffix i (or the i-th suffix) of a (usually long) text string T is a 'special case' of substring that goes from the i-th character of the string up to the last character of the string.


For example, if T = "STEVEN$", then suffix 0 of T is "STEVEN$" (0-based indexing), suffix 2 of T is "EVEN$", suffix 4 of T is "EN$", etc.


Pro-tip: Since you are not logged-in, you may be a first time visitor who are not aware of the following keyboard shortcuts to navigate this e-Lecture mode: [PageDown] to advance to the next slide, [PageUp] to go back to the previous slide, [Esc] to toggle between this e-Lecture mode and exploration mode.

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

The visualization of Suffix Tree of a string T is basically a rooted tree where path label (concatenation of edge label(s)) from root to each leaf describes a suffix of T. Each leaf vertex is a suffix and the integer value written inside the leaf vertex is the suffix number.


An internal vertex will branch to more than one child vertex, therefore there are more than one suffix from the root to the leaves via this internal vertex. The path label of an internal vertex is a common prefix among those suffix(es).


Another pro-tip: We designed this visualization and this e-Lecture mode to look good on 1366x768 resolution or larger (typical modern laptop resolution in 2017). We recommend using Google Chrome to access VisuAlgo. Go to full screen mode (F11) to enjoy this setup. However, you can use zoom-in (Ctrl +) or zoom-out (Ctrl -) to calibrate this.

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

The Suffix Tree above is built from string T = "GATAGACA$" that have these 9 suffixes:

iSuffix
0GATAGACA$
1ATAGACA$
2TAGACA$
3AGACA$
4GACA$
5ACA$
6CA$
7A$
8$

Now verify that the path labels of suffix 7/6/2 are "A$"/"CA$"/"TAGACA$", respectively (there are 6 other suffixes). The internal vertices with path label "A"/"GA" branch out to 4 suffixes {7, 5, 3, 1}/2 suffixes {4, 0}, respectively (we ignore the trivial internal vertex = root vertex that branches out to all 9 suffixes).

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

In order to ensure that every suffix of the input string T ends in a leaf vertex, we enforce that string T ends with a special terminating symbol '$' that is not used in the original string T and has ASCII value lower than the lowest allowable character in T (which is character 'A'). This way, edge label '$' always appear at the leftmost edge of an internal vertex of this Suffix Tree visualization.


For the Suffix Tree example above (for T = "GATAGACA$"), if we do not have terminating symbol '$', notice that suffix 7 "A" does NOT end in a leaf vertex and can complicate some operations later.

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

As we have ensured that all suffixes end at a leaf vertex, there are at most n leaves/suffixes in a Suffix Tree. All internal vertices (including the root vertex if it is an internal vertex) are always branching thus there can be at most n-1 such vertices, as shown with one of the extreme test case on the right.


The maximum number of vertices in a Suffix Tree is thus = n (leaves) + (n-1) internal vertices = 2n-1 = O(n) vertices. As Suffix Tree is a tree, the maximum number of edges in a Suffix Tree is also (2n-1)-1 = O(n) edges.

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

When all the characters in string T is all distinct (e.g. T = "ABCDE$"), we can have the following very short Suffix Tree with exactly n+1 vertices (+1 due to root vertex).

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

All available operations on the Suffix Tree in this visualization are listed below:

  1. Build Suffix Tree (instant) — instant-build the Suffix Tree from string T.
  2. Search — Find the vertex in Suffix Tree of a (usually longer) string T that has path label containing the (usually shorter) pattern/search string P.
  3. Longest Repeated Substring (LRS) — Find the deepest internal vertex (as that vertex shares common prefix between two (or more) suffixes of T).
  4. Longest Common Substring (LCS) — Find the deepest internal vertex that contains suffixes from two different original strings.
Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

In this visualization, we only show the fully constructed Suffix Tree without describing the details of the O(n) Suffix Tree construction algorithm — it is a bit too complicated.


We limit the input to only accept 12 UPPERCASE alphabet and the special terminating symbol '$' characters (ie.g [A-Z$]). If you do not write a terminating symbol '$' at the back of your input string, we will automatically do so. If you place a '$' in the middle of the input string, they will be ignored. And if you enter an empty input string, we will resort to the default "GATAGACA$".


For convenience, we provide a few classic test case input strings usually found in Suffix Tree/Array lectures.

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

Assuming that the Suffix Tree of a (usually longer) string T (of length n) has been built, we want to find all occurrences of pattern/search string P (of length m).


To do this, we search for the vertex x in the suffix Tree of T which has path label that represents P. Once we find this vertex x, all the leaves in the subtree rooted at x are the occurrences.


Time complexity: O(m+occ) where occ is the total number of occurrences.


For example, on the Suffix Tree of T = "GATAGACA$" above, let's try finding:

  1. Search("A"), occurrences = {7, 5, 3, 1}
  2. Search("GA"), occurrences = {4, 0}
  3. P = "T", should return occurrences = {2}, but there is a silly bug that we have not killed yet
  4. P = "Z", should return occurrences = {NIL}, but there is a silly bug that we have not killed yet
Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

Assuming that the Suffix Tree of a (usually longer) string T (of length n) has been built, we can find the Longest Repeated Substring (LRS) in T by simply finding the deepest internal vertex of the Suffix Tree of T.


This is because each internal vertex of the Suffix Tree of T branches out to at least two (or more) suffixes, i.e. the path label (common prefix of these suffixes) are repeated.


The internal vertex with the deepest/longest path label is the required answer, which can be found in O(n) with a simple tree traversal.


Without further ado, try LRS(T) on the Suffix Tree of string T = "GATAGACA$" above.

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

This time, we need two input strings T1 and T2 that terminate with symbol '$'/'#', respectively. We then create the generalized Suffix Tree of these two strings T1+T2. Then, we can find the Longest Common Substring (LCS) of those two strings T1 and T2 by simply finding the deepest and valid internal vertex of the generalized Suffix Tree of T1+T2.


This is because each internal vertex of the Suffix Tree of T branches out to at least two (or more) suffixes, i.e. the path label (common prefix of these suffixes) are repeated. Then, we add an additional constraint where an internal vertex is considered valid (to be considered as LCS candidate) only if it represents suffixes from both strings, i.e. not just repeated, but a common substring found in both T1 and T2.


The valid internal vertex with the deepest/longest path label is the required answer, which can be found in O(n) with a simple tree traversal.


Without further ado, try LCS(T1,T2) on the generalized Suffix Tree of string T1 = "GATAGACA$" and T2 = "CATABB#" (notice that the UI will change to generalized Suffix Tree version).

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

There are a few other things that we can do with Suffix Tree like "Finding Longest Repeated Substring without overlap", "Finding Longest Common Substring of ≥ 2 strings", etc, but we will keep that for later.


We will continue the discussion of this String-specific data structure with the more versatile to Suffix Array data structure.

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn
当操作进行时,状态面板将会有每个步骤的描述。
Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

e-Lecture: The content of this slide is hidden and only available for legitimate CS lecturer worldwide. Drop an email to visualgo.info at gmail dot com if you want to activate this CS lecturer-only feature and you are really a CS lecturer (show your University staff profile).

Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

Control the animation with the player controls! Keyboard shortcuts are:

Spacebar: play/pause/replay
Left/right arrows: step backward/step forward
-/+: decrease/increase speed
Print this e-Lecture
X Esc
上一个 PgUp
下一个 PgDn

Return to 'Exploration Mode' to start exploring!


Note that if you notice any bug in this visualization or if you want to request for a new visualization feature, do not hesitate to drop an email to the project leader: Dr Steven Halim via his email address: stevenhalim at gmail dot com.

Print this e-Lecture
X Esc
上一个 PgUp

建立后缀树。 (instant)

最长的重复子串。

========================

最长的公共子串。

>

GATAGACA$

BANANA$

MISSISSIPPI$

AAAAAAA$

执行

Build Generalized ST and Compute LCS

关于 团队 使用条款

关于

VisuAlgo在2011年由Steven Halim博士概念化,作为一个工具,帮助他的学生更好地理解数据结构和算法,让他们自己和自己的步伐学习基础。
VisuAlgo包含许多高级算法,这些算法在Steven Halim博士的书(“竞争规划”,与他的兄弟Felix Halim博士合作)和其他书中讨论。今天,一些高级算法的可视化/动画只能在VisuAlgo中找到。
虽然专门为新加坡国立大学(NUS)学生采取各种数据结构和算法类(例如CS1010,CS1020,CS2010,CS2020,CS3230和CS3230),作为在线学习的倡导者,我们希望世界各地的好奇心发现这些可视化也很有用。
VisuAlgo不是从一开始就设计为在小触摸屏(例如智能手机)上工作良好,因为需要满足许多复杂的算法可视化,需要大量的像素和点击并拖动手势进行交互。一个令人尊敬的用户体验的最低屏幕分辨率为1024x768,并且只有着陆页相对适合移动设备。
VisuAlgo是一个正在进行的项目,更复杂的可视化仍在开发中。
最令人兴奋的发展是自动问题生成器和验证器(在线测验系统),允许学生测试他们的基本数据结构和算法的知识。这些问题是通过一些规则随机生成的,学生的答案会在提交给我们的评分服务器后立即自动分级。这个在线测验系统,当它被更多的世界各地的CS教师采用,应该技术上消除许多大学的典型计算机科学考试手动基本数据结构和算法问题。通过在通过在线测验时设置小(但非零)的重量,CS教练可以(显着地)增加他/她的学生掌握这些基本问题,因为学生具有几乎无限数量的可以立即被验证的训练问题他们参加在线测验。培训模式目前包含12个可视化模块的问题。我们将很快添加剩余的8个可视化模块,以便VisuAlgo中的每个可视化模块都有在线测验组件。
另一个积极的发展分支是VisuAlgo的国际化子项目。我们要为VisuAlgo系统中出现的所有英语文本准备一个CS术语的数据库。这是一个很大的任务,需要众包。一旦系统准备就绪,我们将邀请VisuAlgo游客贡献,特别是如果你不是英语母语者。目前,我们还以各种语言写了有关VisuAlgo的公共注释:
zh, id, kr, vn, th.

团队

项目领导和顾问(2011年7月至今)
Dr Steven Halim, Senior Lecturer, School of Computing (SoC), National University of Singapore (NUS)
Dr Felix Halim, Software Engineer, Google (Mountain View)

本科生研究人员 1 (Jul 2011-Apr 2012)
Koh Zi Chun, Victor Loh Bo Huai

最后一年项目/ UROP学生 1 (Jul 2012-Dec 2013)
Phan Thi Quynh Trang, Peter Phandi, Albert Millardo Tjindradinata, Nguyen Hoang Duy

最后一年项目/ UROP学生 2 (Jun 2013-Apr 2014)
Rose Marie Tan Zhao Yun, Ivan Reinaldo

本科生研究人员 2 (May 2014-Jul 2014)
Jonathan Irvin Gunawan, Nathan Azaria, Ian Leow Tze Wei, Nguyen Viet Dung, Nguyen Khac Tung, Steven Kester Yuwono, Cao Shengze, Mohan Jishnu

最后一年项目/ UROP学生 3 (Jun 2014-Apr 2015)
Erin Teo Yi Ling, Wang Zi

最后一年项目/ UROP学生 4 (Jun 2016-Dec 2017)
Truong Ngoc Khanh, John Kevin Tjahjadi, Gabriella Michelle, Muhammad Rais Fathin Mudzakir

List of translators who have contributed ≥100 translations can be found at statistics page.

致谢
这个项目是由来自NUS教学与学习发展中心(CDTL)的慷慨的教学增进赠款提供的。

使用条款

VisuAlgo是地球上的计算机科学社区免费。如果你喜欢VisuAlgo,我们对你的唯一的要求就是通过你知道的方式,比如:Facebook、Twitter、课程网页、博客评论、电子邮件等告诉其他计算机方面的学生/教师:VisuAlgo网站的神奇存在
如果您是数据结构和算法学生/教师,您可以直接将此网站用于您的课程。如果你从这个网站拍摄截图(视频),你可以使用屏幕截图(视频)在其他地方,只要你引用本网站的网址(http://visualgo.net)或出现在下面的出版物列表中作为参考。但是,您不能下载VisuAlgo(客户端)文件并将其托管在您自己的网站上,因为它是剽窃。到目前为止,我们不允许其他人分叉这个项目并创建VisuAlgo的变体。使用(客户端)的VisuAlgo的离线副本作为您的个人使用是很允许的。
请注意,VisuAlgo的在线测验组件本质上具有沉重的服务器端组件,并且没有简单的方法来在本地保存服务器端脚本和数据库。目前,公众只能使用“培训模式”来访问这些在线测验系统。目前,“测试模式”是一个更受控制的环境,用于使用这些随机生成的问题和自动验证在NUS的实际检查。其他感兴趣的CS教练应该联系史蒂文如果你想尝试这样的“测试模式”。
出版物名单
这项工作在2012年ACM ICPC世界总决赛(波兰,华沙)和IOI 2012年IOI大会(意大利Sirmione-Montichiari)的CLI讲习班上进行了简要介绍。您可以点击此链接阅读我们2012年关于这个系统的文章(它在2012年还没有被称为VisuAlgo)。
这项工作主要由我过去的学生完成。最近的最后报告是:Erin,Wang Zi,Rose,Ivan。
错误申报或请求添加新功能
VisuAlgo不是一个完成的项目。 Steven Halim博士仍在积极改进VisuAlgo。如果您在使用VisuAlgo并在我们的可视化页面/在线测验工具中发现错误,或者如果您想要求添加新功能,请联系Dr Steven Halim博士。他的联系邮箱是他的名字加谷歌邮箱后缀:StevenHalim@gmail.com。