go to beginning previous frame pause play next frame go to end

Suffix Array is a sorted array of all suffixes of a given (usually long) text string T of length n characters (n can be in order of hundred thousands characters).

Suffix Array is a simple, yet powerful data structure which is used, among others, in full text indices, data compression algorithms, and within the field of bioinformatics.

This data structure is very related to the Suffix Tree data structure. Both data structures are usually studied together.

Remarks: By default, we show e-Lecture Mode for first time (or non logged-in) visitor.
If you are an NUS student and a repeat visitor, please login.


The visualization of Suffix Array is simply a table where each row represents a suffix and each column represents the attributes of the suffixes.

The four (basic) attributes of each row i are:

  1. index i, ranging from 0 to n-1,
  2. SA[i]: the i-th lexicographically smallest suffix of T is the SA[i]-th suffix,
  3. LCP[i]: the Longest Common Prefix between the i-th and the (i-1)-th lexicographically smallest suffixes of T is LCP[i] (we will see the application of this attribute later), and
  4. Suffix T[SA[i]:] - the i-th lexicographically smallest suffix of T is from index SA[i] to the end (index n-1).

Some operations may add more attributes to each row and are explained when that operations are discussed.

Pro-tip 1: Since you are not logged-in, you may be a first time visitor (or not an NUS student) who are not aware of the following keyboard shortcuts to navigate this e-Lecture mode: [PageDown]/[PageUp] to go to the next/previous slide, respectively, (and if the drop-down box is highlighted, you can also use [→ or ↓/← or ↑] to do the same),and [Esc] to toggle between this e-Lecture mode and exploration mode.



  1. 构建后缀数组 (SA) 是基于 Karp,Miller,和 Rosenberg (1972) 的想法的 O(n log n) 后缀数组构建算法,该算法按照增长长度 (1, 2, 4, 8, ...) 对后缀的前缀进行排序。
  2. 搜索 利用后缀数组中的后缀已排序的事实,并在 O(m log n) 中调用两个二进制搜索,以找到模式字符串 P 的长度 m 的第一次和最后一次出现。
  3. 最长公共前缀 (LCP) 可以在 O(n) 中使用排列 LCP (PLCP) 定理计算两个相邻后缀(不包括第一个后缀)之间的最长公共前缀。这个算法的名字叫做 Kasai's 算法。
  4. 最长重复子串 (LRS) 是一个简单的 O(n) 算法,它找到具有最高 LCP 值的后缀。
  5. 最长公共子串 (LCS) 是一个简单的 O(n) 算法,它找到来自两个不同字符串的具有最高 LCP 值的后缀。

Pro-tip 2: We designed this visualization and this e-Lecture mode to look good on 1366x768 resolution or larger (typical modern laptop resolution in 2021). We recommend using Google Chrome to access VisuAlgo. Go to full screen mode (F11) to enjoy this setup. However, you can use zoom-in (Ctrl +) or zoom-out (Ctrl -) to calibrate this.


在这个可视化中,我们展示了基于Karp,Miller和Rosenberg(1972)的想法,通过按递增长度(1,2,4,8,...)排序后缀的前缀,即所谓的前缀倍增算法,来正确地构建后缀数组的O(n log n)方法。

我们限制输入只接受12个(由于可用的绘图空间,不能太长 - 但在后缀树的实际应用中,n可以是十万到百万个字符的顺序)大写(我们会删除您的小写输入)字母和特殊终止符'$'字符(即,[A-Z$])。如果您没有在输入字符串的末尾写一个终止符'$',我们将自动这样做。如果您在输入字符串的中间放一个'$',它们将被忽略。如果您输入一个空的输入字符串,我们将默认为"GATAGACA$"。



Pro-tip 3: Other than using the typical media UI at the bottom of the page, you can also control the animation playback using keyboard shortcuts (in Exploration Mode): Spacebar to play/pause/replay the animation, / to step the animation backwards/forwards, respectively, and -/+ to decrease/increase the animation speed, respectively.


这个前缀倍增算法在 O(log n) 迭代中运行,其中每次迭代,它比较子字符串 T[SA[i]:SA[i+k]] 与 T[SA[i+k]:SA[i+2*k]],即,首先比较两对字符,然后比较前两个字符与下一个两个字符,然后比较前四个字符与下一个四个字符,依此类推。

通过可视化最好探索此算法,看 ConstructSA("GATAGACA$") 的动作。

时间复杂度:有 O(log n) 前缀倍增迭代,每次迭代我们调用 O(n) 基数排序,因此它在 O(n log n) 中运行 - 足以处理最多 n ≤ 200K 字符的典型编程竞赛问题中涉及的长字符串。


After we construct the Suffix Array of T in O(n log n), we can search for the occurrence of Pattern string T in O(m log n) by binary searching the sorted suffixes to find the lower bound (the first occurrence of P as a prefix of any suffix of T) and the upper bound positions (thelast occurrence of P as a prefix of any suffix of T).

Time complexity: O(m log n) and it will return an interval of size k where k is the total number of occurrences.

For example, on the Suffix Array of T = "GATAGACA$" above, try these scenarios:

  1. P returns a range of rows: Search("GA"), occurrences = {4, 0}
  2. P returns one row only: Search("CA"), occurrences = {2}
  3. P is not found in T: Search("WONKA"), occurrences = {NIL}


第一阶段:计算Phi[]的值,其中Phi[SA[i]] = SA[i-1]在O(n)中。这是为了帮助算法在$O(1)时间内知道哪个后缀在后缀数组顺序中位于后缀-SA[i]之后。


第三阶段:我们计算LCP[]的值,其中LCP[i] = PLCP[SA[i]]在O(n)中。这些LCP值是我们稍后用于其他后缀数组应用的值。

时间复杂性:Kasai的算法利用了PLCP定理,其中LCP值的增加(和减少)操作的总数最多是O(n)。因此,Kasai的算法总体上运行在O(n)中。因此,O(n log n)后缀数组构造(通过前缀倍增算法)和使用这个Kasai的算法计算LCP数组的O(n)计算对于处理涉及长字符串的典型编程比赛问题(最多n ≤ 200K字符)是足够的。


After we construct the Suffix Array of T in O(n log n) and compute its LCP Array in O(n), we can find the Longest Repeated Substring (LRS) in T by simply iterating through all LCP values and reporting the largest one.

This is because each value LCP[i] the LCP Array means the longest common prefix between two lexicographically adjacent suffixes: Suffix-i and Suffix-(i-1). This corresponds to an internal vertex of the equivalent Suffix Tree of T that branches out to at least two (or more) suffixes, thus this common prefix of these adjacent suffixes are repeated.

The longest common (repeated) prefix is the required answer, which can be found in O(n) by going through the LCP array once.

Without further ado, try LRS("GATAGACA$"). We have LRS = "GA".

It is possible that T contains more than one LRS, e.g., try LRS("BANANABAN$").
We have LRS = "ANA" (actually overlap) or "BAN" (without overlap).


After we construct the generalized Suffix Array of the concatenation of both strings T1$T2# of length n = n1+n2 in O(n log n) and compute its LCP Array in O(n), we can find the Longest Repeated Substring (LRS) in T by simply iterating through all LCP values and reporting the largest one that comes from two different strings.

Without further ado, try LCS("GATAGACA$", "CATA#") on the generalized Suffix Array of string T1 = "GATAGACA$" and T2 = "CATA#". We have LCS = "ATA".


You are allowed to use/modify our implementation code for fast Suffix Array+LCP: sa_lcp.cpp | py | java | ml to solve programming contest problems that need it.

You have reached the last slide. Return to 'Exploration Mode' to start exploring!

Note that if you notice any bug in this visualization or if you want to request for a new visualization feature, do not hesitate to drop an email to the project leader: Dr Steven Halim via his email address: stevenhalim at gmail dot com.















T =


T1 =
T2 =


关于 团队 使用条款


VisuAlgo最初由副教授Steven Halim于2011年构思,旨在通过提供自学、互动式学习平台,帮助学生更深入地理解数据结构和算法。

VisuAlgo涵盖了Steven Halim博士与Felix Halim博士、Suhendry Effendy博士合著的书《竞技编程》中讨论的许多高级算法。即使过去十年,VisuAlgo仍然是可视化和动画化这些复杂算法的独家平台。






id, kr, vn, th.


Associate Professor Steven Halim, School of Computing (SoC), National University of Singapore (NUS)
Dr Felix Halim, Senior Software Engineer, Google (Mountain View)

本科生研究人员 1
CDTL TEG 1: Jul 2011-Apr 2012: Koh Zi Chun, Victor Loh Bo Huai

最后一年项目/ UROP学生 1
Jul 2012-Dec 2013: Phan Thi Quynh Trang, Peter Phandi, Albert Millardo Tjindradinata, Nguyen Hoang Duy
Jun 2013-Apr 2014 Rose Marie Tan Zhao Yun, Ivan Reinaldo

本科生研究人员 2
CDTL TEG 2: May 2014-Jul 2014: Jonathan Irvin Gunawan, Nathan Azaria, Ian Leow Tze Wei, Nguyen Viet Dung, Nguyen Khac Tung, Steven Kester Yuwono, Cao Shengze, Mohan Jishnu

最后一年项目/ UROP学生 2
Jun 2014-Apr 2015: Erin Teo Yi Ling, Wang Zi
Jun 2016-Dec 2017: Truong Ngoc Khanh, John Kevin Tjahjadi, Gabriella Michelle, Muhammad Rais Fathin Mudzakir
Aug 2021-Apr 2023: Liu Guangyuan, Manas Vegi, Sha Long, Vuong Hoang Long, Ting Xiao, Lim Dewen Aloysius

本科生研究人员 3
Optiver: Aug 2023-Oct 2023: Bui Hong Duc, Oleh Naver, Tay Ngan Lin

最后一年项目/ UROP学生 3
Aug 2023-Apr 2024: Xiong Jingya, Radian Krisno, Ng Wee Han

List of translators who have contributed ≥ 100 translations can be found at statistics page.

NUS教学与学习发展中心(CDTL)授予拨款以启动这个项目。在2023/24学年,Optiver的慷慨捐赠将被用来进一步开发 VisuAlgo。








VisuAlgo并不是一个完成的项目。Steven Halim副教授仍在积极改进VisuAlgo。如果您在使用VisuAlgo时发现任何可视化页面/在线测验工具中的错误,或者您想要请求新功能,请联系Steven Halim副教授。他的联系方式是将他的名字连接起来,然后加上gmail dot com。


版本 1.2 (更新于2023年8月18日星期五)。

自2023年8月18日(星期五)起,我们不再使用 Google Analytics。因此,我们现在使用的所有 cookies 仅用于此网站的运营。即使是首次访问的用户,烦人的 cookie 同意弹窗现在也已关闭。

自2023年6月7日(星期五)起,由于 Optiver 的慷慨捐赠,全世界的任何人都可以自行创建一个 VisuAlgo 账户,以存储一些自定义设置(例如,布局模式,默认语言,播放速度等)。

此外,对于 NUS 学生,通过使用 VisuAlgo 账户(一个 NUS 官方电子邮件地址,课堂名册中的学生姓名,以及在服务器端加密的密码 - 不存储其他个人数据),您同意您的课程讲师跟踪您的电子讲义阅读和在线测验培训进度,这是顺利进行课程所必需的。您的 VisuAlgo 账户也将用于参加 NUS 官方的 VisuAlgo 在线测验,因此,将您的账户凭据传递给他人代您进行在线测验构成学术违规。课程结束后,您的用户账户将被清除,除非您选择保留您的账户(OPT-IN)。访问完整的 VisuAlgo 数据库(包含加密密码)的权限仅限于 Halim 教授本人。

对于全球其他已经给 Steven 写过信的 CS 讲师,需要一个 VisuAlgo 账户(您的(非 NUS)电子邮件地址,您可以使用任何显示名称,以及加密密码)来区分您的在线凭据与世界其他地方。您的账户将具有 CS 讲师特定的功能,即能够查看隐藏的幻灯片,这些幻灯片包含了在隐藏幻灯片之前的幻灯片中提出的问题的(有趣的)答案。您还可以访问 VisuAlgo 在线测验的 Hard 设置。您可以自由地使用这些材料来增强您的数据结构和算法课程。请注意,未来可能会有其他 CS 讲师特定的功能。

对于任何拥有 VisuAlgo 账户的人,如果您希望不再与 VisuAlgo 工具有关联,您可以自行删除您的账户。