>

>
1x
go to beginning previous frame pause play next frame go to end

Suffix Array is a sorted array of all suffixes of a given (usually long) text string T of length n characters (n can be in order of hundred thousands characters).


Suffix Array is a simple, yet powerful data structure which is used, among others, in full text indices, data compression algorithms, and within the field of bioinformatics.


This data structure is very related to the Suffix Tree data structure. Both data structures are usually studied together.


Remarks: By default, we show e-Lecture Mode for first time (or non logged-in) visitor.
If you are an NUS student and a repeat visitor, please login.

🕑

The visualization of Suffix Array is simply a table where each row represents a suffix and each column represents the attributes of the suffixes.


The four (basic) attributes of each row i are:

  1. index i, ranging from 0 to n-1,
  2. SA[i]: the i-th lexicographically smallest suffix of T is the SA[i]-th suffix,
  3. LCP[i]: the Longest Common Prefix between the i-th and the (i-1)-th lexicographically smallest suffixes of T is LCP[i] (we will see the application of this attribute later), and
  4. Suffix T[SA[i]:] - the i-th lexicographically smallest suffix of T is from index SA[i] to the end (index n-1).

Some operations may add more attributes to each row and are explained when that operations are discussed.


Pro-tip 1: Since you are not logged-in, you may be a first time visitor (or not an NUS student) who are not aware of the following keyboard shortcuts to navigate this e-Lecture mode: [PageDown]/[PageUp] to go to the next/previous slide, respectively, (and if the drop-down box is highlighted, you can also use [→ or ↓/← or ↑] to do the same),and [Esc] to toggle between this e-Lecture mode and exploration mode.

🕑

All available operations on the Suffix Array are listed below.

  1. Construct Suffix Array (SA) is the O(n log n) Suffix Array construction algorithm based on the idea by Karp, Miller, & Rosenberg (1972) that sort prefixes of the suffix in increasing length (1, 2, 4, 8, ...).
  2. Search utilizes the fact that the suffixes in Suffix Array are sorted and call two binary searches in O(m log n) to find the first and the last occurrence(s) of pattern string P of length m.
  3. Longest Common Prefix (LCP) between two adjacent suffixes (excluding the first suffix) can be computed in O(n) using the Permuted LCP (PLCP) theorem. The name of this algorithm is Kasai's algorithm.
  4. Longest Repeated Substring (LRS) is a simple O(n) algorithm that finds the suffix with the highest LCP value.
  5. Longest Common Substring (LCS) is a simple O(n) algorithm that finds the suffix with the highest LCP value that comes from two different strings.

Pro-tip 2: We designed this visualization and this e-Lecture mode to look good on 1366x768 resolution or larger (typical modern laptop resolution in 2021). We recommend using Google Chrome to access VisuAlgo. Go to full screen mode (F11) to enjoy this setup. However, you can use zoom-in (Ctrl +) or zoom-out (Ctrl -) to calibrate this.

🕑

In this visualization, we show the proper O(n log n) construction of Suffix Array based on the idea of Karp, Miller, & Rosenberg (1972) that sort prefixes of the suffix in increasing length (1, 2, 4, 8, ...), a.k.a. the prefix doubling algorithm.


We limit the input to only accept 12 (cannot be too long due to the available drawing space — but in the real application of Suffix Tree, n can be in order of hundred thousand to million characters) UPPERCASE (we delete your lowercase input) alphabet and the special terminating symbol '$' characters (i.e., [A-Z$]). If you do not write a terminating symbol '$' at the back of your input string, we will automatically do so. If you place a '$' in the middle of the input string, they will be ignored. And if you enter an empty input string, we will resort to the default "GATAGACA$".


For convenience, we provide a few classic test case input strings usually found in Suffix Tree/Array lectures, but to showcase the strength of this visualization tool, you are encouraged to enter any 12-characters string of your choice (ending with character '$').


Note that the LCP Array column remains empty in this operation. They are to be computed separately via the Longest Common Prefix operation.


Pro-tip 3: Other than using the typical media UI at the bottom of the page, you can also control the animation playback using keyboard shortcuts (in Exploration Mode): Spacebar to play/pause/replay the animation, / to step the animation backwards/forwards, respectively, and -/+ to decrease/increase the animation speed, respectively.

🕑

This Prefix Doubling Algorithm runs in O(log n) iterations, where for each iteration, it compares substring T[SA[i]:SA[i+k]] with T[SA[i+k]:SA[i+2*k]], i.e., first compare two pairs of characters, then compare first two characters with the next two, then compare the first four characters with the next four, and so on.


This algorithm is best explored via visualization, see ConstructSA("GATAGACA$") in action.


Time complexity: There are O(log n) prefix doubling iterations, and each iteration we call O(n) Radix Sort, thus it runs in O(n log n) — good enough to handle up to n ≤ 200K characters in typical programming competition problems involving long strings.

🕑

After we construct the Suffix Array of T in O(n log n), we can search for the occurrence of Pattern string T in O(m log n) by binary searching the sorted suffixes to find the lower bound (the first occurrence of P as a prefix of any suffix of T) and the upper bound positions (thelast occurrence of P as a prefix of any suffix of T).


Time complexity: O(m log n) and it will return an interval of size k where k is the total number of occurrences.


For example, on the Suffix Array of T = "GATAGACA$" above, try these scenarios:

  1. P returns a range of rows: Search("GA"), occurrences = {4, 0}
  2. P returns one row only: Search("CA"), occurrences = {2}
  3. P is not found in T: Search("WONKA"), occurrences = {NIL}
🕑

We can compute the Longest Common Prefix (LCP) of two adjacent suffixes (in Suffix Array order) in O(n) time using three phases of Kasai's algorithm. This algorithm takes advantage that if we have a long LCP between two adjacent suffixes (in Suffix Array order), that long LCP has lots of overlap with another suffix in positional order when its first character is removed.


The first phase: Compute the value of Phi[], where Phi[SA[i]] = SA[i-1] in O(n). This is to help the algorithm knows in $O(1) time of which Suffix is behind Suffix-SA[i] in Suffix Array order.


The second phase: Compute the PLCP[] values between a Suffix-i in positional order with Suffix-Phi[i] (the one behind Suffix-i in Suffix Array order). When we advance to the next index i+1 in positional order, we will remove the front most character of the suffix, but possibly retain lots of LCP value between Suffix-(i+1) and Suffix-Phi[(i+1)]. PLCP Theorem (not proven) shows that the LCP values can only be incremented up to n times, and thus can only be decremented at most n times too, making the overall complexity of the second phase to be also O(n).


The third phase: We compute the value of LCP[], where LCP[i] = PLCP[SA[i]] in O(n). This LCP values are the one that we use for other Suffix Array applications later.


Time complexity: Kasai's algorithm utilizes the PLCP theorem where the total number of increase (and decrease) operations of the value of the LCP is at most O(n). Thus Kasai's algorithm runs in O(n) overall. Thus, the combination of O(n log n) Suffix Array construction (via the Prefix Doubling algorithm) and the O(n) computation of LCP Array using this Kasai's algorithm is good enough to handle up to n ≤ 200K characters in typical programming competition problems involving long strings.

🕑

After we construct the Suffix Array of T in O(n log n) and compute its LCP Array in O(n), we can find the Longest Repeated Substring (LRS) in T by simply iterating through all LCP values and reporting the largest one.


This is because each value LCP[i] the LCP Array means the longest common prefix between two lexicographically adjacent suffixes: Suffix-i and Suffix-(i-1). This corresponds to an internal vertex of the equivalent Suffix Tree of T that branches out to at least two (or more) suffixes, thus this common prefix of these adjacent suffixes are repeated.


The longest common (repeated) prefix is the required answer, which can be found in O(n) by going through the LCP array once.


Without further ado, try LRS("GATAGACA$"). We have LRS = "GA".


It is possible that T contains more than one LRS, e.g., try LRS("BANANABAN$").
We have LRS = "ANA" (actually overlap) or "BAN" (without overlap).

🕑

After we construct the generalized Suffix Array of the concatenation of both strings T1$T2# of length n = n1+n2 in O(n log n) and compute its LCP Array in O(n), we can find the Longest Repeated Substring (LRS) in T by simply iterating through all LCP values and reporting the largest one that comes from two different strings.


Without further ado, try LCS("GATAGACA$", "CATA#") on the generalized Suffix Array of string T1 = "GATAGACA$" and T2 = "CATA#". We have LCS = "ATA".

🕑

You are allowed to use/modify our implementation code for fast Suffix Array+LCP: sa_lcp.cpp | py | java | ml to solve programming contest problems that need it.


You have reached the last slide. Return to 'Exploration Mode' to start exploring!

Note that if you notice any bug in this visualization or if you want to request for a new visualization feature, do not hesitate to drop an email to the project leader: Dr Steven Halim via his email address: stevenhalim at gmail dot com.

🕑

构造后缀数组

最长公共前缀

最长重复子串

最长公共子串

>

GATAGACA$

BANANABAN$

MISSISSIPPI$

ABRACADABRA$

RATATAT$

AAAAAAA$

ABCDE$

AABBCC$

T =

执行

T1 =
T2 =

执行

We use cookies to improve our website.
By clicking ACCEPT, you agree to our use of Google Analytics for analysing user behaviour and improving user experience as described in our Privacy Policy.
By clicking reject, only cookies necessary for site functions will be used.

ACCEPT REJECT
关于 团队 使用条款 隐私政策

关于

VisuAlgo于2011年由Steven Halim博士创建,是一个允许学生以自己的速度自学基础知识,从而更好地学习数据结构与算法的工具。
VisuAlgo包含许多高级算法,这些算法在Steven Halim博士的书(“Competitive Programming”,与他的兄弟Felix Halim博士合作)和其他书中有讨论。今天,一些高级算法的可视化/动画只能在VisuAlgo中找到。
虽然本网站是专门为新加坡国立大学(NUS)学生学习各种数据结构和算法类(例如CS1010,CS2040,CS3230,CS3233,CS4234)而设,但我们作为在线学习的倡导者,我们非常希望世界各地的好奇的头脑能发现这些非常有用的算法可视化。
VisuAlgo不是从一开始就设计为在小触摸屏(例如智能手机)上工作良好,因为为了满足许多复杂算法可视化,需要大量的像素和点击并拖动手势进行交互。为得到良好的用户体验,最低屏幕分辨率应为1024x768,并且本网站只有首页相对适合移动设备。但是,我们正在测试一个准备在2022年4月发布的移动版本。
VisuAlgo是一个正在进行的项目,更复杂的可视化仍在开发中。
最令人兴奋的发展是自动问题生成器和验证器(在线测验系统),允许学生测试他们的基本数据结构和算法的知识。这些问题是通过一些随机生成的规则,学生的答案会在提交给我们的评分服务器后立即自动分级。这个在线测验系统,当它被更多的世界各地的CS教师采用,应该能从技术上消除许多大学的典型计算机科学考试手动基本数据结构和算法问题。通过在通过在线测验时设置小(但非零)的重量,CS教练可以(显着地)增加他/她的学生掌握这些基本问题,因为学生具有几乎无限数量的可以立即被验证的训练问题他们参加在线测验。培训模式目前包含12个可视化模块的问题。我们将很快添加剩余的12个可视化模块,以便VisuAlgo中的每个可视化模块都有在线测验组件。
VisuAlgo支持三种语言:英语,中文,印尼语。目前,我们还以各种语言写了有关VisuAlgo的公共注释:
id, kr, vn, th.

团队

项目领导和顾问(2011年7月至今)
Dr Steven Halim, Senior Lecturer, School of Computing (SoC), National University of Singapore (NUS)
Dr Felix Halim, Senior Software Engineer, Google (Mountain View)

本科生研究人员 1 (Jul 2011-Apr 2012)
Koh Zi Chun, Victor Loh Bo Huai

最后一年项目/ UROP学生 1 (Jul 2012-Dec 2013)
Phan Thi Quynh Trang, Peter Phandi, Albert Millardo Tjindradinata, Nguyen Hoang Duy

最后一年项目/ UROP学生 2 (Jun 2013-Apr 2014)
Rose Marie Tan Zhao Yun, Ivan Reinaldo

本科生研究人员 2 (May 2014-Jul 2014)
Jonathan Irvin Gunawan, Nathan Azaria, Ian Leow Tze Wei, Nguyen Viet Dung, Nguyen Khac Tung, Steven Kester Yuwono, Cao Shengze, Mohan Jishnu

最后一年项目/ UROP学生 3 (Jun 2014-Apr 2015)
Erin Teo Yi Ling, Wang Zi

最后一年项目/ UROP学生 4 (Jun 2016-Dec 2017)
Truong Ngoc Khanh, John Kevin Tjahjadi, Gabriella Michelle, Muhammad Rais Fathin Mudzakir

最后一年项目/ UROP学生 5 (Aug 2021-Dec 2022)
Liu Guangyuan, Manas Vegi, Sha Long, Vuong Hoang Long

List of translators who have contributed ≥100 translations can be found at statistics page.

致谢
本项目运营资金是由NUS教学与学习发展中心(CDTL)的教学增进款慷慨提供的。

使用条款

VisuAlgo is free of charge for Computer Science community on earth. If you like VisuAlgo, the only "payment" that we ask of you is for you to tell the existence of VisuAlgo to other Computer Science students/instructors that you know =) via Facebook/Twitter/Instagram/TikTok posts, course webpages, blog reviews, emails, etc.

If you are a data structure and algorithm student/instructor, you are allowed to use this website directly for your classes. If you take screen shots (videos) from this website, you can use the screen shots (videos) elsewhere as long as you cite the URL of this website (https://visualgo.net) and/or list of publications below as reference. However, you are NOT allowed to download VisuAlgo (client-side) files and host it on your own website as it is plagiarism. As of now, we do NOT allow other people to fork this project and create variants of VisuAlgo. Using the offline copy of (client-side) VisuAlgo for your personal usage is fine.

Note that VisuAlgo's online quiz component is by nature has heavy server-side component and there is no easy way to save the server-side scripts and databases locally. Currently, the general public can only use the 'training mode' to access these online quiz system. Currently the 'test mode' is a more controlled environment for using these randomly generated questions and automatic verification for real examinations in NUS.

List of Publications

This work has been presented briefly at the CLI Workshop at the ICPC World Finals 2012 (Poland, Warsaw) and at the IOI Conference at IOI 2012 (Sirmione-Montichiari, Italy). You can click this link to read our 2012 paper about this system (it was not yet called VisuAlgo back in 2012) and this link for the short update in 2015 (to link VisuAlgo name with the previous project).

This work is done mostly by my past students. 

Bug Reports or Request for New Features

VisuAlgo is not a finished project. Dr Steven Halim is still actively improving VisuAlgo. If you are using VisuAlgo and spot a bug in any of our visualization page/online quiz tool or if you want to request for new features, please contact Dr Steven Halim. His contact is the concatenation of his name and add gmail dot com.

隐私政策

Version 1.1 (Updated Fri, 14 Jan 2022).

Disclosure to all visitors: We currently use Google Analytics to get an overview understanding of our site visitors. We now give option for user to Accept or Reject this tracker.

Since Wed, 22 Dec 2021, only National University of Singapore (NUS) staffs/students and approved CS lecturers outside of NUS who have written a request to Steven can login to VisuAlgo, anyone else in the world will have to use VisuAlgo as an anonymous user that is not really trackable other than what are tracked by Google Analytics.

For NUS students enrolled in modules that uses VisuAlgo: By using a VisuAlgo account (a tuple of NUS official email address, NUS official student name as in the class roster, and a password that is encrypted on the server side — no other personal data is stored), you are giving a consent for your module lecturer to keep track of your e-lecture slides reading and online quiz training progresses that is needed to run the module smoothly. Your VisuAlgo account will also be needed for taking NUS official VisuAlgo Online Quizzes and thus passing your account credentials to another person to do the Online Quiz on your behalf constitutes an academic offense. Your user account will be purged after the conclusion of the module unless you choose to keep your account (OPT-IN). Access to the full VisuAlgo database (with encrypted passwords) is limited to Steven himself.

For other NUS students, you can self-register a VisuAlgo account by yourself (OPT-IN).

For other CS lecturers worldwide who have written to Steven, a VisuAlgo account (your (non-NUS) email address, you can use any display name, and encrypted password) is needed to distinguish your online credential versus the rest of the world. Your account will be tracked similarly as a normal NUS student account above but it will have CS lecturer specific features, namely the ability to see the hidden slides that contain (interesting) answers to the questions presented in the preceding slides before the hidden slides. You can also access Hard setting of the VisuAlgo Online Quizzes. You can freely use the material to enhance your data structures and algorithm classes. Note that there can be other CS lecturer specific features in the future.

For anyone with VisuAlgo account, you can remove your own account by yourself should you wish to no longer be associated with VisuAlgo tool.