Model učenja robotskog zadatka zasnovan na interakciji s čovjekom

Vidaković, Josip

prikaz prve stranice dokumenta Model učenja robotskog zadatka zasnovan na interakciji s čovjekom

Preuzmi
PDF 8.48 MB

disertacija

Model učenja robotskog zadatka zasnovan na interakciji s čovjekom

2020. urn:nbn:hr:235:132746

Vidaković, Josip

Sveučilište u Zagrebu
Fakultet strojarstva i brodogradnje

Citirajte ovaj rad

APA 6th Edition

Vidaković, J. (2020). Model učenja robotskog zadatka zasnovan na interakciji s čovjekom (Disertacija). Zagreb: Sveučilište u Zagrebu, Fakultet strojarstva i brodogradnje. Preuzeto s https://urn.nsk.hr/urn:nbn:hr:235:132746

MLA 8th Edition

Vidaković, Josip. "Model učenja robotskog zadatka zasnovan na interakciji s čovjekom." Disertacija, Sveučilište u Zagrebu, Fakultet strojarstva i brodogradnje, 2020. https://urn.nsk.hr/urn:nbn:hr:235:132746

Chicago 17th Edition

Harvard

Vidaković, J. (2020). 'Model učenja robotskog zadatka zasnovan na interakciji s čovjekom', Disertacija, Sveučilište u Zagrebu, Fakultet strojarstva i brodogradnje, citirano: 11.04.2024., https://urn.nsk.hr/urn:nbn:hr:235:132746

Vancouver

Vidaković J. Model učenja robotskog zadatka zasnovan na interakciji s čovjekom [Disertacija]. Zagreb: Sveučilište u Zagrebu, Fakultet strojarstva i brodogradnje; 2020 [pristupljeno 11.04.2024.] Dostupno na: https://urn.nsk.hr/urn:nbn:hr:235:132746

IEEE

J. Vidaković, "Model učenja robotskog zadatka zasnovan na interakciji s čovjekom", Disertacija, Sveučilište u Zagrebu, Fakultet strojarstva i brodogradnje, Zagreb, 2020. Dostupno na: https://urn.nsk.hr/urn:nbn:hr:235:132746

Za citiranje koristite ovu mrežnu adresu: https://urn.nsk.hr/urn:nbn:hr:235:132746

Prijavite se u repozitorij kako biste mogli spremiti objekt u svoju listu.

Podaci o radu

Naslov	Model učenja robotskog zadatka zasnovan na interakciji s čovjekom
Naslov (engleski)	Model of robot task learning based on human-robot interaction
Autor	Josip Vidaković
Mentor	Bojan Jerbić (mentor)
Član povjerenstva	Mladen Crneković (predsjednik povjerenstva)
Član povjerenstva	Petar Ćurković (član povjerenstva)
Član povjerenstva	Zdenko Kovačić (član povjerenstva)
Ustanova koja je dodijelila akademski / stručni stupanj	Sveučilište u Zagrebu Fakultet strojarstva i brodogradnje Zagreb
Datum i država obrane	2020-10-06, Hrvatska
Znanstveno / umjetničko područje, polje i grana	TEHNIČKE ZNANOSTI Strojarstvo
Univerzalna decimalna klasifikacija (UDC)	004 - Računalna znanost i tehnologija. Računalstvo. Obrada podataka 007 - Teorija komunikacija. Kibernetika. Automatski sustavi
Sažetak	Robotsko učenje širok je pojam koji se može sagledati sa dva stajališta. Prvo je razina na kojoj se provodi učenje dok je drugo metodologija učenja. Razina učenja odnosi se na to usvaja li se vještina na razini pokreta (niža razina) ili na razini odabira pred-segmetiranih cjelina ponašanja na temelju stanja okoline (viša razina). Metodologija se u osnovi može podijeliti na samostalno učenje i učenje iz primjera (demonstracija). U radu je razvijena metodologija za robotsko učenje zadatka na razini kretanja, temeljeno na usvajanju znanja iz demonstracija potpomognuto kasnijim samostalnim učenjem. Prvo je razvijena metoda za analizu demonstracija dobivenih direktnim prostornim vođenjem robota u obavljanju zadatka. U sklopu ovoga predložena je nova metoda generiranja trajektorija u operacijskom sustavu robota sa mogućnošću aproksimacije prolaznih točaka i izbjegavanja stacionarnih prepreka. Eksperimentalnom validacijom potvrđena je valjanost razvijenog pristupa za generiranje trajektorija za nove slučajeve zadatka (konfiguracije). Nakon toga predložen je sustav za samostalno učenje temeljen na iterativnom optimizacijskom procesu pretraživanja parametarskog prostora trajektorije usmjeren prema ostvarivanju zadatka. Sustav je implementiran u simulacijskom okruženju te je validiran na dva različita zadatka. U zadnjem dijelu rada objedinjene su razvijene metodologije usvajanja znanja iz demonstracija i iterativnog učenja orijentiranog zadatku kako bi se predložio efikasan sustav učenja. Isti je validiran i provedena je usporedba u odnosu na slučaj gdje se koristi samo metoda iterativnog učenja.
Sažetak (engleski)	Chapter 1: Introduction. This section is concerned with giving the research context of the thesis. It gives the motivation behind developing learning capabilities for technical systems, especially robotic systems. The one emphasized are the need for higher flexibility and autonomy of such systems in order to remove the barrier of highly specialized knowledge needed during the application of robots in the industry or in the service robotics field. The problem of task-oriented behavior is viewed from the programming and learning perspectives. Three common approaches for accomplishing such high-level behavior have been addressed: learning from demonstration, reinforcement learning and task-oriented motion planning. Current accomplishments in all three fields are referenced and specific advantages and disadvantages of each approach are covered. Based on this, the motivation for the research direction taken in this thesis is pointed out. The research hypothesis and goals are defined. On the end of this section, the structure of the thesis is given. Chapter 2: Learning from demonstration. On the beginning of this section, a broader overview of learning from demonstration (LfD) specific problems is given. The correspondence problem is pointed out together with most frequent demonstration mechanisms. Methods for encoding demonstrated trajectories are covered in both the statistical modeling aspect and encoding based on dynamical systems. The DMP parametrization and its characteristics are covered in detail as it is used for trajectory encoding in other parts of the thesis. Two main approaches for encoding generalizability into learning from demonstration methods are covered – task parametrization and inverse reinforcement learning. A novel methodology for the analysis of demonstrations based on trajectories obtained by kinesthetic teaching is proposed and covered. The method uses a novel classification mechanism in order to determine attracting points, non-attracting points and obstacle points in the working environment of the robot. Experimental results of this methodology are presented and commented on the end of this section. Chapter 3: Task-oriented trajectory planning. Demonstration sampling and analysis by the methodology from the previous section is performed in Cartesian space. In this section, task-oriented reproduction of trajectories in the same domain is performed. Common trajectory representations used in robotics that can be used both for planning in configuration space and operational space and their parametrizations are covered. As the thesis focuses on the application of primitive motions in task-oriented programming, this section gives an overview of the application of primitive motions in task-oriented scenarios. A modified DMP representation is presented which is capable of explicitly using the information obtained by the demonstration analysis. It has the capability of encoding variational information in the low level DMP trajectory definition and achieves this by introducing a modified time function instead of the standard exponential decay function. The methodology is originally presented in the conference paper: Task Dependent Trajectory Learning from Multiple Demonstrations Using Movement Primitives. After this, a Cartesian optimization-based path planning model is proposed, based on the following paper: Learning from Demonstration Based on a Classification of Task Parameters and Trajectory Optimization. The model is capable of encoding the information from the demonstration analysis by approximating identified via-points and avoiding identified obstacles. The path planning model is transferred into a DMP trajectory using the special DMP state representation presented earlier. The trajectory planning approach is verified on a presented experimental setup. Chapter 4: Reinforcement learning in continuous environments. As models learned from demonstrations often fail to produce completely accurate task solution in the extrapolation phase, the idea of local trajectory improvement through self-exploration has been considered in this section. Reinforcement learning provides the general framework for achieving this. This section therefore covers the theoretical overview of RL, which provides the basis for explaining the methodology seen in the continuous space scenario. Policy search methods are identified as the most suitable when performing improvements on the trajectory level with continuous parametrization. Two main approaches for policy search are covered: critic-based approaches and “black-box” optimization (BBO). Both perform learning directly in the parameters space by observing and evaluating agent’s interactions with the environment. However, BBO approaches simplify the required evaluation mechanism while having comparable performance to critic-based approaches. Possible policy representations in the trajectory domain are covered in a special subsection. The application of a BBO algorithm together with a DMP policy parametrization is demonstrated at the end of this section. Chapter 5: Iterative learning for stochastic tasks. The BBO policy search methodology presented in the previous section implies the direct interaction of the agent (robot) with the environment. As searching in the parameter space of the trajectory policy in real environments is very dangerous and can lead to physical damages of both the robot and environment, a simulation setup is here introduced, suitable for robot learning. The setup is based on the ROS based physics simulator Gazebo. Based on this, a task-oriented iterative learning setup is proposed. At its core, the setup consists of black-box optimization which is given in the form of the evolutionary CMA-ES algorithm. The policy parametrization responsible for the execution of trajectories in the simulation environment is in the DMP form. The CMA-ES algorithm is responsible for updating the policy weight parameters with respect to a task-oriented cost-function. This closes the policy search loop which is performed in an iterative manner in order to achieve task learning convergence. The methodology was tested on two tasks: a peg-in-hole task and a sweeping task. Since the tasks showed high stochasticity with respect to the goal-oriented cost functions, two criteria to evaluate such learning processes where proposed. A best-current solution metric and a current average metric. The first one keeps track of the best solution achieved in the policy search process, while the later gives information about the overall quality of the learning process. Chapter 6: LfD as a basis for iterative learning. The iterative learning algorithm presented in the previous section was initialized by an empirical strategy which used a linear trajectory. Previous research suggested that the search in big parameter spaces is very dependent on the initial conditions and exploration is mostly only locally oriented. In this section, results of the iterative learning algorithm are given, when initialized from demonstrations. The demonstration methodology followed the one presented in section two, which involved kinesthetic guidance for demonstration collection and the coordinate frame classification methodology for extracting useful via-points. The initial cartesian DMP trajectories where constructed using the optimization-based methodology from section three. The obtained results showed that the LfD initialization strategy lead to significantly better results in terms of the quality of the searched solutions as well as faster convergence to applicable solutions. Findings presented in this section are based on the following paper: Accelerating Robot Trajectory Learning for Stochastic Tasks. Chapter 7: Conclusion This section discusses the summary and the main achievements of the doctoral thesis. The main contributions can be viewed as: I) a novel learning from demonstration method for the analysis of trajectory level demonstrations, based on the classification of coordinate frames, II) an optimization-based cartesian trajectory planning algorithm with coordinate frame approximation and obstacle avoiding capabilities, III) a simulation based, iterative learning framework for task-oriented trajectory learning compatible with the LfD methodology. Future research will be focused on finding more efficient algorithms for policy search with sparse evaluation and testing the applicability of different policy representation. The possibilities for automatic estimation of exploration rates will be explored, as well as the automatic extraction of end-result-oriented cost/reward functions in order to remove the need for hand crafted functions.
Ključne riječi
Ključne riječi (engleski)
Jezik	hrvatski
URN:NBN	urn:nbn:hr:235:132746
Studijski program	Naziv: Strojarstvo i brodogradnja Vrsta studija: sveučilišni Stupanj studija: poslijediplomski doktorski Akademski / stručni naziv: doktor/doktorica znanosti, područje tehničkih znanosti (dr.sc.)
Vrsta resursa	Tekst
Način izrade datoteke	Izvorno digitalna
Prava pristupa	Otvoreni pristup
Uvjeti korištenja
Datum i vrijeme pohrane	2020-12-14 10:37:21