The data for our clustering analysis are the points in
Moreover, let D be given byc10=3,2,714,5c20=37,24,1c30=4,71c40=2,13,5that is, the sequences of careers for students 10, 20, 30 and 40. The new sequence c=〈 1 5 〉c=15 has support 0.50.5 in D since c GSK 650394 a subsequence for c10c10 and c40c40.
Table 7 shows the output obtained by applying SPAM to data in Table 6, with a support equal to 0.50.5. Each line of the output file is a frequent sequence and can be interpreted as follows. The last number is the frequency of the sequence; the number -1-1 between two numbers indicates a change of the temporal information and the symbol -- indicates the end of the sequence. For example, the first line of the output indicates that 4 students have taken exam 1; the eighth line indicates that 2 students have taken exams 2 and 7 in the same semester and then have taken exam 1 in a later semester. In luteinizing hormone (LH) example, only the exam with code 1 has been taken by all students. Besides, we can observe that the longest patterns are 3-sequences of length 2 and correspond to the eighth and ninth lines.