Talks: CSI5165 Combinatorial Algorithms, Projects Fall 2009
===========================================================
Friday December 11, room: BRS 314 (Brooks building)
1:05-1:20 David Gains, "The Cerny Conjecture"
1:25-1:40 Oksana Korol, "Identifying motifs of non-coding RNA"
1:45-2:00 Sharmeen Shahabuddin, "Finding Minimum Sudoku using Heuristic Search Strategies"
2:05-2:20 Jacob Chodoriwsky, "A Two-Stage Heuristic Algorithm for the Maximum Clique Problem"
2:25-2:40 Cate Huston, "Applied Clique Finding: Discovering More About Your Twitter Network"
donuts will be provided
bring your own coffee/drink
************************** ABSTRACTS ***************************
speaker: David Gains
title: The Cerny Conjecture
abstract:
A synchronising or reset word for a complete, deterministic finite state
automaton takes each state of the automaton to a single, fixed state. Cerny
gave a sequence C_n, n=1,2,... of automata such that the automaton C_n has
n states and a shortest synchronising word of length (n-1)(n-1). His famous
conjecture states that every synchronising automaton with n states has a
synchronising word of length no greater than (n-1)(n-1) (this bound would be
tight since the automaton C_n has no reset word with less than (n-1)(n-1)
letters). The Cerny conjecture is over 40 years old, but it has only been
verified for some special classes of automata.
This project documents a number of combinatorial algorithms associated with the
Cerny conjecture. It also describes attempts to apply some novel combinatorial
algorithms to the solution of problems arising from the conjecture.
speaker: Oksana Korol
title: Identifying motifs of non-coding RNA
abstract:
Searching for a unique or repeated substring in a very long string is an
important computational problem in bioinformatics. It is applied to find
patterns in different biological sequences, like genomic or protein
sequences. Solving this problem efficiently is essential, since the length
of the sequences can be as big as billions of characters. In this
presentation I will briefly discuss a problem of identifying non-coding RNA
motifs in a genomic sequence and introduce a method of solving this problem
based on a suffix tree data structure. I will compare annotated versus
non-annotated tree traversal, as well as discuss an efficient boundary
function that will minimize the backtracking tree while adding as few
annotations to the suffix tree as possible.
speaker: Sharmeen Shahabuddin
title: Finding Minimum Sudoku using Heuristic Search Strategies
abstract:
Sudoku is a number-based logic puzzle, where the most common form
is a 9x9 grid with 3x3 regions that are to be filled so that each column,
each row, and each of the nine regions contains the digits
from 1 to 9 exactly once. There are algorithms to generate sudoku puzzles
with a unique solution. However, as the number of hints in the puzzle
is reduced, it gets more difficult to ensure that there is exactly one solution.
For this project, we use two heuristic search techniques, namely
simulated annealing, and tabu search, in order to generate sudoku puzzles
with as few number of hints as possible. During the talk, the sudoku puzzle
problem, difficulties in finding puzzles with unique solutions, and the
heuristic search algorithms to find smallest puzzles with unique solutions
will be discussed, with some preliminary results.
speaker: Jacob Chodoriwsky
title: A Two-Stage Heuristic Algorithm for the Maximum Clique Problem
We explore a two-stage heuristic algorithm for the maximum clique
problem. The first stage is a greedy, vertex-deletion algorithm. The
second stage is a "swap-and-grow" method where vertices in a current
clique are swapped out for vertices which would keep the vertex set as a
clique at all times. It is a recursive search method with parameters to
prune based on maximum depth and/or maximum width. It will also limit
cycling by maintaining a local tabu list of vertices in each subtree of
the search tree.
speaker: Cate Huston
title: Applied Clique Finding: Discovering More About Your Twitter Network
abstract:
Follower / Following networks are essentially meaningless on Twitter due to
the prevalence of spam. However by creating the graphs of conversation
networks it is possible to create a better picture of more meaningful
connections the other users that interact / are interacted with by a given
user. For power users, however, these graphs can be extremely busy, making
it difficult to pick out the most important conversations and connections.
One potential way to summarize the most important connections in a network
is to pull out cliques completely connected sub-graphs. These cliques may
represent part of a users core network, or a suggestion of new users to
interact with by generating those cliques that a user is connected to. For
example, user A might be in a clique with users B, C, D and users B, C, D
may be in a clique with a 5th user, E. This suggests that user A might well
be interested to interact with user E, as well. This may also help us
determine tie strength as well, as a clique is likely an indication of a
stronger tie strength than just a singly connected node.