
|
Curriculum Vitae
Name: Yu Uny Cao
Address: P O Box 251543
Los Angeles, CA 90025
Email: yu@cs.ucla.edu
Web: www.yucao.com
Phone: 310.825.1563
Objective
R&D for new search engines and Information Retrieval Systems.
Education
04.02 Ph.D. in Computer Networks, Computer Science, UCLA
Dissertation Title
Doubly Ranked Information Retrieval
and Linked Page Information Retrieval
05.93 M.S. in Computer Science, Univ. of Louisville, Kentucky,
GPA 3.975/4.00
Thesis Title
Modeling and Analysis of
of Computer-Based Interpersonal Communications
05.91 B.Engr. in Electrical Engr., Zhejiang Univ., Hangzhou, China,
GPA 3.80/4.00
Awards
03.92 The AT&T Network Systems ISDN Award for 1992-1993,
AT&T Network Systems, Inc.
04.93 Master of Science Scholastic Achievement Award,
Computer Science, Univ. of Louisville.
87-91 Scholastic Achievement Awards, Zhejiang Univ.
Research Experience
10.95-06.99 Research Assistant, Computer Science, UCLA
The Transparent Virtual Mobile Information Systems
Project, funded by the Defense Advanced Research
Projects Agency ( DARPA );
Co-developed a proposal to the Digital Library II
project of the National Science Foundation in 1998.
10.94-09.95 Research Assistant, Computer Science, UCLA
The Commotion Lab for studies in Mobile Robotics
funded by the National Science Foundation;
Programmed the web interface to remote control
of mobile robots via the Internet, successfully
demo'ed the system at the Internet '95 Conference.
02.92-08.93 Research Assistant, Computer Science, U. of Louisville
and Bell South Telecommunications Research Center.
Programmed and wrote papers in simulation, AI.
Participated in the Transcontinental ISDN Project
(TRIP) 1992.
Current Research Project
My research has focused on new generation of Web search engines as
well as Information Retrieval systems in general.
I have come up with two extensions to the current generation of ranked
information retrieval systems ( of which Web search engines are typical ).
Idea one, current systems return ranked documents. However,
there is no reason why terms are not ranked. What we've done is
just such a system where both documents and terms are ranked,
thus `doubly ranked information retrieval'. It turns out the
ranking of the documents and the ranking of the terms are
interrelated. The matrix analytical result is simple and nice.
A prototype is implemented and a user study is performed.
Idea two, current systems return only single documents. However,
linked documents sometimes better serve users' information needs.
The ubiquitous use of hyperlinks is not taken full advantage by
current systems. We formulate the problem of Linked Page information
Retrieval as a recognition problem, and give a matching/ranking
scheme as well as a search algorithm. The algorithm has a rich
structure that warrants deep investigation. A prototype is implemented.
Planned Research Project
I believe that an Information Retrieval system shall have
the following key ingredients, each being `good', to be successful:
1) matching/ranking schemes; 2) scientific computational methods;
3) engineering.
I am most interested in new matching/ranking schemes, want to do
more in scientific computational methods, and hope to work with
other engineers on real systems.
First, there have emerged new IR systems, such as large on-line
product catalogues, a typical system of which contains information
of millions of products manufactured by scores of countries.
To make such systems useful for non-expert users, a wholesale
re-thinking is called for in both matching/ranking schemes as well
as in scientific computation methods.
Second, if I can find an engineering team to work with, I want to
implement the two matching/ranking themes I have originated in
my PhD dissertation.
A Note on My PhD Research Journey
I chose Computer Networks as major when I entered the PhD program,
and became student of Prof. Leonard Kleinrock who is often referred
to as `the Inventor of the Internet Technology', and `a Father of
the Internet' ( www.lk.cs.ucla.edu ). After a couple of years,
I decided that there was not too much left to do in moving bits and
bytes over the Internet.
I then worked out of passion on Mobile Robotics at the Commotion
Lab till '97, and got an exposure to Artificial Life, emerging
behaviors, and the Web. The research was carried out by a
passionate team of young professors, junior graduate students
and upper-division undergraduates who worked in a non-partitioned,
large room. It was the best. However, because of the funding
structure, no PhD work was possible out of lab.
I left the lab and took a look at Active Agents, but nothing came out
of it.
I then decided to work on what I first called `information
theoretic related work', because I always believed that we shall
be able to construct a `jar' where you put in information and
thoughts from time and time, and out it comes `knowledge pickles'.
I thought much more was to be done about moving information to
and from people's minds, and the work was closer to my heart.
I first worked on theory and algorithm for minimal, approximate
storage of a table of many rows. The graph theoretic result was
interesting but we decided that we didn't see a killer application.
In July '98, inspired by Latent Semantic Indexing and Jon Kleinberg's
work, I co-wrote with three professors a proposal to Digital
Library Initiative II of NFS, on developing theory, algorithms
and techniques for browsing large information space, where I
proposed the idea of looking for `intrinsically important'
documents and terms in a collection ( this eventually became
part of my PhD work ).
In late '98, I was bitten by the dotcom bug and worked on a CD-ROM
project with a friend. The project, small but successful, led to
the creation of a company, which we two run through end of '99.
After that I gave another couple of attempts, including co-drafting
a business plan on a B2B marketplace. In early 2000, I headed back
to school, restarted my research while worked as a TA.
The rest is documented by my PhD dissertation.
Publications
Y. U. Cao, L. Kleinrock,
"Doubly Ranked Information Retrieval",
Manuscript.
Current ranked Information Retrieval systems return a list of
ranked documents given a user query. Information needs can be
better served if terms can also be ranked. We design just
such a system where both documents and terms are ranked in the
returned results, thus "doubly ranked information retrieval".
It turns out that the ranking of the documents and the ranking
of the terms are interrelated in a simple and intuitive way
from a matrix analytical point of view, thus we argue that
both rankings might be "intrinsic". In the returned results,
each document is presented as a list of segments that are ranked
by terms contained within, thus helping users to better satisfy
information needs, form relevance feedback and refine query.
A prototype is implemented and user study is performed.
Y. U. Cao, L. Kleinrock,
"Linked Page Informaiton Retrieval",
Manuscript.
Current Information Retrieval systems does not address the case
where information needs can be more satisfied when a number
of linked documents are returned to the users. We formulate
the problem of "Linked Page Information Retrieval" as a
recognition problem, and give a matching/ranking scheme as
well as a search algorithm. Our system thus takes full
advantage of the ubiquitous use of hyperlinks. A prototype
is implemented.
Y. U. Cao, A. S. Fukunaga and A. B. Kahng,
"Cooperative Mobile Robotics: Antecedents and Directions",
Autonomous Robots, Jan.-Feb. 1997, vol.4, no.1, pp. 7-27.
There has been increased research interest in systems composed
of multiple autonomous mobile robots exhibiting cooperative
behavior. Groups of mobile robots are constructed, with an
aim to studying such issues as group architecture, resource
conflict, origin of cooperation, learning, and geometric
problems. As yet, few applications of cooperative robotics
have been reported, and supporting theory is still in its
formative stages. In this paper, we give a critical survey
of existing works and discuss open problems in this field,
emphasizing the various theoretical issues that arise in the
study of cooperative robotics. We describe the intellectual
heritages that have guided early research, as well as possible
additions to the set of existing motivation.
Y. U. Cao, T.-W. Chen, M. D. Harris, A. B. Kahng, M. A. Lewis
and A. D. Stechert,
"A Remote Robotics Laboratory on the Internet", INTERNET-95, 1995.
This paper describes ongoing work toward a remote robotics
research site which allows repeatable remote experimentation
on multiple mobile robots. Our system consists of ten small
mobile robots hosting on-board Unix workstations. The robots
provide facilities for sensing and moving obstacles, inter-
robot positioning and communications and user input. The Unix
workstations allow the user to control the robots using common
languages in a familiar environment, while also providing an
interface to mass-market peripherals ( secondary storage and
vision ), network access ( telnet, FTP, mail and HTTP ), and
robust multi-tasking. We believe that this work provides a
foundation for future efforts toward new paradigms for remote
research and user interaction with taskable hardware ( e.g.,
colonies of application specific robots). We envision
applications in such domains as agriculture, environmental
monitoring, and deep-space exploration.
Y. Cao, J. H. Graham and A. S. Elmaghraby,
"Communications Approaches for Simulation-AI Interactions",
Simulation Digest, Winter 1993, vol. 36, pp.3-16.
Although the paradigm of AI-simulation interaction is relatively
new, it is believed that this hybrid approach has a number of
advantages, in terms of making fuller use of the power and
features of both AI and simulation. This paper considers in
detail the various classes of interactions between AI and
simulation systems, and how they may be mutually beneficial.
It then considers various strategies for communication and
presents a UNIX-based implementation of interprocess
communication. Finally a case study is presented using MODSIM
as the language for a manufacturing system simulation, and CLIPS
as the language for implementing an intelligent agent which
interacts which the simulation.
A. S. Elmaghraby and Y. Cao,
"Modeling and Analysis of Computer-Based Interpersonal
Communications", 1993-1994 Annual Review of Communications,
National Engineering Consortium, vol. 47, pp.699-706.
The authors observe the emergence of a new field, computer-based
interpersonal communications. The basis of the field is provided
by the rapidly progressing computer and networking technologies
which are leading to a single network unifying telephony, broad-
casting, and data communications. The ultimate goal of computer-
based interpersonal communications is to emulate face-to-face
interpersonal communications. In order to achieve maximal effect
with limited transmission and switching capacity, modeling and
analysis of the information exchange processes is needed. A
layered model is proposed in this paper. Because of its self-
completeness, the model fits the nature of information exchange
process and lends itself to object-oriented modeling techniques.
More complex models, namely multimode and multiple layered models,
can be constructed using the basic layered model as building blocks.
Methodology in applying the layered model is discussed.
A. S. Elmaghraby and Y. Cao,
"Human Performance Modeling in Manned Weapon Systems", IEEE Intl.
Conf. on Systems, Man and Cybernetics, 1992.
The authors present a methodology developed for integrating human
performance in performance modeling of manned weapon systems.
Literature in this area is briefly reviewed to provide
necessary background knowledge. A discussion of alternative
methodologies for the performance modeling of such systems is
provided. The proposed methodology is composed of four parts:
(1) formally define the system; (2) formally define the
performance functions, or the effectiveness function;
(3) formally define the measurement; and (4) perform the
sensitivity analysis. The purpose of the methodology is to
integrate the human factor into the system from the very
beginning of the design of the system. This is achieved by
systematically considering all the interacting parts of the
system, omitting minor errors only after quantifying the errors.
A. A. Farag, Y. Cao, D. M. Rose and E. J. Delp,
"On Empirical Estimation of the Parameters of Edge Enhancement
Filters", IEEE Intl. Conf. on Systems, Man and Cybernetics, 1992.
The authors develop an empirical measure for the selection of
the Gaussian filter that is commonly used for edge enhancement.
The measure is based totally on the image at hand. Edge
enhancement by a Gaussian filter has two distinct advantages:
(1) the filter is fully described by a single parameter, the
standard deviation sigma ; (2) the two-dimensional filter is
separable and can be easily implemented. The filter's spatial
support is a function of sigma . This support is normally
in the range of +or-3.5 sigma. An empirical measure is
described for the selection of the Gaussian filter's spatial
support using the power spectrum density of the input image.
Classic Fourier analysis is used to obtain a measure for the
spatial support of the Gaussian filter given a particular
image. Experimental results suggest that this measure can
be used as an aid in deciding the Gaussian filter's spatial
support needed to enhance the edges.
A. A. Farag, Y. Cao and Y.-P. Yeap,
"Integrating a Priori Information in Edge Linking Algorithms",
SPIE Conf. on Automatic Object Recognition, 1992.
This research presents an approach to integrate a priori
information to the path metric of the LINK algorithm. The
zero-crossing contours of the Del /sup 2/G are taken as a
gross estimate of the boundaries in the image and used to
define the swath of important information, and to provide
a distance measure for edge localization. During the linking
process, a priori information plays important roles in
(1) reducing the search space because the actual paths lie
within +or-2 sigma /sub f/ from the prototype contours
( sigma /sub f/ is the standard deviation of the Gaussian
kernel used in the edge enhancement step); (2) breaking the
ties when the search metrics give uncertain information; and
(3) selecting the set of goal nodes for the search algorithm.
Related Experience: Work Experience
02.00- Consultant, ChinaWTO.com, Santa Monica, California
Co-developed business plan, sought investors.
Edited tech/product news.
08.98-11.99 Co-founder of M123, Inc., Pasadena, CA, a multimedia
and Internet entertainment contents producer
Company Overview: 10 people; CD-ROM, Web and
short-film productions; graphic design, industrial
concept design, toy design, 3D animation, video
shooting/editing, music composition and sound
engineering.
Co-developed the "Internet-Disc" business model,
assisted in developing the "Mutual Credits" model,
did Strategy, Technical direction, Project management,
day-to-day management and Sales.
10.95- Consultant, Foreign Investor Services, Newport Beach, CA.
Consulted on China-related businesses. Translated legal
and business documents between Chinese and English.
10.95-11.96 Key programmer, Asian American Network ( AAN.NET )
Developed E-greetings software for CardMaster.com
which is still in operation.
Related Experience: Teaching Experience
05.01-06.01 Guest Instructor, UCLA Extension, UCLA
Solaris Administration; Intro. to Unix
01.00-06.01 Teaching Associate, Computer Science, UCLA
Principles of Computer Systems. Computer Architecture.
Computer Networks. Algorithms.
01.92-08.93 Teaching Assistant, Computer Science, U. of Louisville
C Programming and UNIX. Pascal. MODSIM & SIMGRAPHICS.
Instructed independently.
Activities
Executive Vice President, Chinese Students and Scholars Association
(CSSA) at UCLA, 1995-97.
Founding President, The Overseas Alumni Association of The Mixed Class
Program of Zhejiang Univ., 1997-present.
Skills
Languages: Perl, C, C++, Java, ASP, VB, Web CGI programming, TCP/IP.
Operating Systems: UNIX, Windows.
References
Upon Request.