Talks: CSI5165 Combinatorial Algorithms, Projects Fall 2009 =========================================================== Friday December 11, room: BRS 314 (Brooks building) 1:05-1:20 David Gains, "The Cerny Conjecture" 1:25-1:40 Oksana Korol, "Identifying motifs of non-coding RNA" 1:45-2:00 Sharmeen Shahabuddin, "Finding Minimum Sudoku using Heuristic Search Strategies" 2:05-2:20 Jacob Chodoriwsky, "A Two-Stage Heuristic Algorithm for the Maximum Clique Problem" 2:25-2:40 Cate Huston, "Applied Clique Finding: Discovering More About Your Twitter Network" donuts will be provided bring your own coffee/drink ************************** ABSTRACTS *************************** speaker: David Gains title: The Cerny Conjecture abstract: A synchronising or reset word for a complete, deterministic finite state automaton takes each state of the automaton to a single, fixed state. Cerny gave a sequence C_n, n=1,2,... of automata such that the automaton C_n has n states and a shortest synchronising word of length (n-1)(n-1). His famous conjecture states that every synchronising automaton with n states has a synchronising word of length no greater than (n-1)(n-1) (this bound would be tight since the automaton C_n has no reset word with less than (n-1)(n-1) letters). The Cerny conjecture is over 40 years old, but it has only been verified for some special classes of automata. This project documents a number of combinatorial algorithms associated with the Cerny conjecture. It also describes attempts to apply some novel combinatorial algorithms to the solution of problems arising from the conjecture. speaker: Oksana Korol title: Identifying motifs of non-coding RNA abstract: Searching for a unique or repeated substring in a very long string is an important computational problem in bioinformatics. It is applied to find patterns in different biological sequences, like genomic or protein sequences. Solving this problem efficiently is essential, since the length of the sequences can be as big as billions of characters. In this presentation I will briefly discuss a problem of identifying non-coding RNA motifs in a genomic sequence and introduce a method of solving this problem based on a suffix tree data structure. I will compare annotated versus non-annotated tree traversal, as well as discuss an efficient boundary function that will minimize the backtracking tree while adding as few annotations to the suffix tree as possible. speaker: Sharmeen Shahabuddin title: Finding Minimum Sudoku using Heuristic Search Strategies abstract: Sudoku is a number-based logic puzzle, where the most common form is a 9x9 grid with 3x3 regions that are to be filled so that each column, each row, and each of the nine regions contains the digits from 1 to 9 exactly once. There are algorithms to generate sudoku puzzles with a unique solution. However, as the number of hints in the puzzle is reduced, it gets more difficult to ensure that there is exactly one solution. For this project, we use two heuristic search techniques, namely simulated annealing, and tabu search, in order to generate sudoku puzzles with as few number of hints as possible. During the talk, the sudoku puzzle problem, difficulties in finding puzzles with unique solutions, and the heuristic search algorithms to find smallest puzzles with unique solutions will be discussed, with some preliminary results. speaker: Jacob Chodoriwsky title: A Two-Stage Heuristic Algorithm for the Maximum Clique Problem We explore a two-stage heuristic algorithm for the maximum clique problem. The first stage is a greedy, vertex-deletion algorithm. The second stage is a "swap-and-grow" method where vertices in a current clique are swapped out for vertices which would keep the vertex set as a clique at all times. It is a recursive search method with parameters to prune based on maximum depth and/or maximum width. It will also limit cycling by maintaining a local tabu list of vertices in each subtree of the search tree. speaker: Cate Huston title: Applied Clique Finding: Discovering More About Your Twitter Network abstract: Follower / Following networks are essentially meaningless on Twitter due to the prevalence of spam. However by creating the graphs of conversation networks it is possible to create a better picture of more meaningful connections the other users that interact / are interacted with by a given user. For power users, however, these graphs can be extremely busy, making it difficult to pick out the most important conversations and connections. One potential way to summarize the most important connections in a network is to pull out cliques completely connected sub-graphs. These cliques may represent part of a users core network, or a suggestion of new users to interact with by generating those cliques that a user is connected to. For example, user A might be in a clique with users B, C, D and users B, C, D may be in a clique with a 5th user, E. This suggests that user A might well be interested to interact with user E, as well. This may also help us determine tie strength as well, as a clique is likely an indication of a stronger tie strength than just a singly connected node.