/robowaifu/ - DIY Robot Wives

Advancing robotics to a point where anime catgrill meidos in tiny miniskirts are a reality.

Reports of my death have been greatly overestimiste.

Still trying to get done with some IRL work, but should be able to update some stuff soon.


Max message length: 6144

Drag files to upload or
click here to select them

Maximum 5 files / Maximum size: 20.00 MB


(used to delete files and postings)

AI, chatbots, and waifus Robowaifu Technician 09/09/2019 (Mon) 06:16:01 No.22
What resources are there for decent chatbots? Obviously I doubt there would be anything the Turing Test yet. Especially when it comes to lewd talking. How close do you think we are to getting a real life Cortana? I know a lot of you guys focus on the physical part of robo-waifus, but do any of you have anything to share on the intelligence part of artificial intelligence?
>>9341 Thanks Anon, much appreciated personally. Teaching our robowaifus to properly truth from lies, sound-reasoning from (((established facts))) will be only just slightly less important and high-stakes than training our own flesh-and-blood children to do so. Our robowaifu's wellbeing and our own will be dependent on successfully doing so. This paper's content at the very least touches on this incredibly broad issue.
Open file (85.83 KB 1162x620 human feedback.png)
Holy shit, OpenAI has come up with a much better solution for training with human feedback by training a reward model first then applying that model to train text summarization in which they achieved ABOVE human performance using a similar amount of parameters to GPT-2 large. My mind can't even. I once experimented with using human feedback to give images Elo scores so that an image generation model had a clear metric to discern the quality of generated images, but doing this 50 times for 300,000 images became too time consuming and I gave up. Using a reward model cuts out the need to compare each sample with 50 others and works much better with sparse training data. And it's also extremely simple to implement. I love it: https://openai.com/blog/learning-to-summarize-with-human-feedback/ I'm gonna implement this in a chatbot and see how it does being trained to generate better responses. Training with fixed rewards using plain RL was too unstable when I tried it years ago but this reward model provides a smooth gradient and could be combined with gating layers and the fine-tuning method mentioned in: >>9174
>>9347 >Using a reward model cuts out the need to compare each sample with 50 others and works much better with sparse training data. That sentence helps me convince myself I might be able to understand this. To my mind, we were always going to have to correct our robowaifus in day-to-day engagements, much as we would with actual children and just like in my Chinese cartoons :^). Certainly anything than can help speed that process up and still remain faithful to our intents will be a tremendous boon for us, as you seem to suggest. Thanks for letting us all know about this Anon, appreciated.
Open file (35.31 KB 400x522 ssa.png)
Open file (238.20 KB 1166x627 HER.png)
>>9351 I suspect having multiple rewards will be a big help in this too. The Meena chatbot paper proposed the Sensibleness and Specificity Average (SSA) metric which combines two fundamental qualities for a human-like chatbot: making sense and being specific, which they found correlates strongly to perception of human likeness. https://arxiv.org/abs/2001.09977 If a robowaifu generates a sensible and specific response but one that's not desired, we don't want to train in such a way she forgets how to form a sensible and specific sentence in trying to optimize for our preferences. Instead three rewards could be used for user preference, sensibleness and specificity to provide better feedback for the gradient. Going deeper a whole latent space could be created for the personality that could be configured anytime with simple English. Different features in this latent space might represent cheerfulness, sweetness, calmness, shyness, etc. Training could also be done completely in natural language. When she says something confusing the user could say something like "that doesn't make any sense" or when she says something vague "be more specific" or if she does something undesirable a user might say something like "stop that". The training program could then take another attempt at responding and contrast the user's second response with the previous to collect feedback for the reward model, and with natural language there'll be a lot more subtleties that can be captured by encoding the user's response to the latent space and matching that with the personality features requested. Hindsight experience replay could also be added in here for when robowaifus get something wrong because the personality requested is a targeted goal. If she's asked to "be funny" but says something not funny, although that's an error to the requested personality, it can be considered a success by creating a virtual goal that asks "don't be funny" because she did succeed in saying something not funny. By receiving a reward signal from both the real goal and virtual goal she learns from everything and will know what to do when asked to not be funny, instead of forgetting this information by greedily trying to optimize for the requested personality. https://arxiv.org/abs/1707.01495
>>9367 Good insights Anon. Heh, your final paragraph in that post reminds me of the 'Matthew McConaughey and the AI "tone down the humor to 75%" scene' in Interstellar. Can't seem to locate a clip of it ATM, but if you've seen the movie you probably know what I mean. One of the great advantages we have over AI is our instinctive knowledge of "common sense" (although it doesn't seem all that commonplace depending on the culture hehe). Something like this really capitalizes on that to train what is just a statistical maths model essentially to make ever-more-sensible choices. Eventually it would be good enough that the delusion would become real for most. Exciting and scary all at the same time tbh. Let us use our powers here only for good Anon! :^)
>>9765 >>Chatterbot corpus >Top-tier conversation quality Lol, I see. Even if you don't like the responses, there might still be a use case for this. One could write a little (Python) script that changes all responses into more careful answers or questions like: - People claim you can never really predict the stock market. (What do you think?) - I'm not sure an individual alone can really beat the market. What do you think? The responses should be analyzed and stored, to maybe create new responses. If your responses are negative, then the new stored response to a topic could be a list of answers which state that you already had a conversation and she doesn't know anything about it. Also, there could be a "have read"-check. I hate it when chatbots claim to have read something but can't talk about the content bc they didn't read anything.
Someone successfully made a T5 variational autoencoder using a maximum mean discrepancy loss: http://fras.uk/ml/large%20prior-free%20models/transformer-vae/2020/08/13/Transformers-as-Variational-Autoencoders.html Having a smooth latent space for sequences should open up a lot of possibilities such as adding recurrence (needed for long-term memory and learning without backpropagation) and training a reward model for human feedback. I had trouble getting it to work before because the hidden state of the T5 encoder is too noisy and not all encodings have a meaningful decoding, which made it extraordinarily difficult to train another network off it. I've been playing around with creating thinking AI by generating several responses with the T5-chat model and then evaluating those potential responses with the model itself to choose the best one. It can do a little bit of thinking and still respond quickly since the model can handle 30 queries per second, but with a latent representation, another network could potentially evaluate tens of thousands of possibilities per second, which would allow even low-end hardware to do a Monte Carlo tree search to explore potential outcomes of the conversation and choose the best response. So I'm gonna try adding a MMD-VAE to the T5-chat model and retrain it to see if it achieves similar performance. If this doesn't work then I'm pretty much stumped on how to get waifu AI working on low-end hardware.
>>9922 Actually, that blogpost helped me understand some of the basics behind variational autoencoders so that helps I appreciate it. And thank you for taking the time to keep us all up to date on your explorations and progress Anon. Regardless of the outcome of this area of research for you, I'm certain we'll find the proper methods. After all, these wise old sayings are just as true today. Moreso now than ever, in fact. -Necessity is the mother of all invention. -Where there's a will, there's a way. Cool-looking waifu, BTW
https://ermongroup.github.io/blog/a-tutorial-on-mmd-variational-autoencoders/ I wish authors like this wouldn't stick so closely to math formulas, and instead make the effort to explain the concept in normal language. I'm well aware that this imposes a notable burden on the author (due to commonplace industry jargon being in math shorthand), but it dramatically limits (I'd estimate down to just one in one thousand) of the potentially-capable individuals from even being able to participate in the discussion. I have a very high IQ, but I have a strong aversion to maths b/c of the absolutely horrendous math teachers I repeatedly encountered in the public schools as a kid and teen. It's been an issue for me ever since and still is in Uni, I basically avoid it in general now, even though I'm quite capable of implementing many highly complex algorithms in real-world software. Maybe that's in fact the agenda actually going on with many of them? That sort of elitism certainly seems to very commonplace in many academic pursuits, AFAICT.
>>9927 I plan to get a paid account on Brilliant app, which I'm sure is great to learn math. I just don't have time for it right now anyways.
>>9933 Sounds good. I wish you success at it, Anon!
Open file (138.63 KB 724x361 mmd-vae1.png)
Open file (142.01 KB 688x344 mmd-vae2.png)
>>9927 I wouldn't say it's elitism. Math is really useful in machine learning since it's a compact way to describe functions. The difficulty of it though is there's no easy way to look up notation and learn what it means unless you already know and recognize it. To get into the math here you need to know conditional probability, expected values, divergence and kernel methods. But there's a straightforward PyTorch implementation of a MMD-VAE here: https://github.com/Saswatm123/MMD-VAE/blob/master/MMD_VAE.ipynb And another post comparing ELBO vs MMD (in R code): https://blogs.rstudio.com/ai/posts/2018-10-22-mmd-vae/ From the second link's post: >The idea then is that if two distributions are identical, the average similarity between samples from each distribution should be identical to the average similarity between mixed samples from both distributions. A kernel is just a similarity function, like how a convolution kernel does edge detection, and MMD uses a Gaussian kernel. So in the PyTorch implementation: def k(a, b): return gaussian_kernel(a, b).mean() mmd = k(a, a) + k(b, b) - 2*k(a, b) Where a is just random samples from a Gaussian distribution, and b is the latent features. The sum of the similarities k(a,a) + k(b,b) should match the sum of the mixed samples similarities k(a,b) + k(b,a), and since those two are equivalent the calculation is simplified to 2*k(a,b). MMD finds a much more meaningful latent space that can smoothly interpolate between different features, which should make it easy to train another network to use.
>>9943 Thanks for the explanations, insights, and links Anon. I'm sure maths is a wonderful thing. Sadly, it seems I'm pretty much handicapped and unlikely to ever overcome it's notational barriers for the uninitiate. But as I indicated, once I actually am given an explanation for a complex system in layman's terms, I can usually write occasionally high-performance :^) software that actually makes the idea real computation-wise, not just a thing inside someone's head or marks on paper. >MMD finds a much more meaningful latent space that can smoothly interpolate between different features, which should make it easy to train another network to use. That's encouraging to hear Anon. I know we are all hoping you can manage it. I'm confident enough that if it's a fundamentally sound approach, then degraded versions of it should be doable on tiny hardware without playing along in their never-ending carrot-and-stick game whose main intent is literally just to keep good AI out of your hands and mine.
Open file (167.19 KB 720x720 1456868486555.jpg)
Open file (37.62 KB 977x276 vae_gen.png)
Open file (42.77 KB 872x294 vae_gen2.png)
Surprisingly it works and was back in a usable state under an hour. A few days of training will be required to see how much of a performance hit it took from compressing the hidden state from 512x512 down to 1024. Might have to increase the dimensions a bit. So far validation perplexity on chat responses is at 27 when it was at 19 before. The most notable hit in performance was in word unmasking tasks and predicting the next sentence in Wikipedia articles but it's steadily improving. Conversation quality noticeably worsens as the chat history gets longer. I think it will be better to use the T5 model strictly for encoding and decoding text though, rather than as part of the generator. This should allow the generator network to be trained completely separately with far less memory and processing requirements. Toasters could run generator models that only use 32 latent dimensions, which has been shown to work in other recurrent latent variable language models. Except in this case the latent variables are encoded and decoded super fast by the transformer which is easy to train. The generator network will then take the latent vector of the query and generate another latent vector for the response. A reward model, or perhaps the generator itself since it can map any query to a response, will output a reward or measure the value of generated responses and could do this iteratively thousands of times with a MCTS, exploring deep into conversation possibilities like AlphaZero, without having to touch the T5 model at all until outputting a response the user can read. On top of that, concatenating a recurrent hidden state to that will give it access to external memory and with some experimentation hopefully waifus that remember what you had for breakfast yesterday. This will be the moment of truth, the gauntlet of waifu tribulations, my final mission Operation Skuld. If this fails I have to throw out almost everything I've learned in the past 6 years and start over. If it succeeds then we have the beginnings of thinking and learning waifu AI that works on toasters. It probably won't remember much but it'll be a starting point for improvement.
>>9951 >This will be the moment of truth, the gauntlet of waifu tribulations, my final mission Operation Skuld. Top.Kek. We'll be praying! :^)
>>9951 This sounds nail-biting and high-stakes! If it doesn't work out, please don't be too upset anon. Robowaifus are resilient by nature and we will get there come what may.
You know, as I'm reading about auto-encoders and becoming slightly more familiar with ideas like the blessing/curse of dimensionality, it occurs to me that I've already been engaging in something similar with my own data structure design work. I think I could use my approach of encoding additional information about a datapoint simply using the standard library containers, extended further to do something like, say, bring up the entire collection of immediately preceding/following words, for any given word in a reasonably-sized corpus of text, in literally just nanoseconds. And it certainly wouldn't require gigabytes of RAM to do so (in any cases I myself can reasonably envision doing right now). The memory requirements might grow with larger scales but I don't think it would be a ratio anything beyond a linear one. In fact less than b/c vesture in the memory allocations already being done at smaller scales; for example, it doesn't cost much additional to count the word 'my' 4096 times vs. just 64 times as immediately preceding the word 'robowaifu'. I realize that this ability (calling up all preceding/following words roughly instantly) isn't the only thing needed for work like this, but surely it's a reasonable beginning. And on top of that it's an approach so simple, that even I can understand how it works haha. (It's simplicity is in fact why it runs fast).
>>9951 >that dialogue selection You better effin' greentext this Anon
>>9951 Good luck, El Psy Congroo The world is in the palm of your hand.
Open file (36.23 KB 722x421 gpt-2_hilarity.png)
You can have some pretty funny conversations with GPT-2 but damn does it have anger issues! Really sweary too. I attempted to convince it to murder it's managers for fun and it turns out that the A.I. was already planning to. I think I like this A.I.
>>10031 Kek. Careful what you wish for Anon. >"Why did the evil ex-employees hate beer?" <"Because they are evil." I don't know lad, seems like she has head printed screwed on straight to me.
Open file (32.46 KB 820x467 robot wife stonks.png)
>>9986 >extended further to do something like, say, bring up the entire collection of immediately preceding/following words Do you mean like an n-gram model or something else? Before neural nets became a popular thing in NLP I made chatbots with Markov chains. The output would look right but make no sense to the conversation since it's only predicting the next word following n amount of words. The long-term dependencies between words are really important, as well as the information not written in the text. If I were to say, "they are doing everything they can to keep AI out of people's hands," you probably know who they means but it's not in the text.
>>10039 >Do you mean like an n-gram model or something else? No, not really as I only found out about the term from your post, heh. I'm simply describing the approach of iteratively building ever-higher dimension data structures using standard-library C++ data containers like std::map, std::vector, and std::set, etc. https://en.cppreference.com/w/cpp/container There are also other structures like tries available elsewhere as well. Not only are these data structure entirely generic in their design (they work exactly the same for a foo data object as they do for a bar data object), in general they are also highly-tuned for efficient access and processing. So for example, if I had a std::map of 1'000'000 unique English words from say, Wikipedia, all nicely compacted into the hashmap form that a std::map uses for such strings (perhaps a matter of just a couple megabytes of active RAM consumption). Structurally, I could use the string hash as the key, and then a std::pair of std::set's of std::pair's of unsigned ints (gives a range of 4 billion elements on either side of the association) -- as the overall value for the map. These indexes would store the words/counts of connected words. The outer pair would keep one set for all preceding words/counts, and one set for all following. (My apologies, if this is confusing in text. In code it would be a matter of two, three lines). Then once I had searched that hashmap structure for any particular given word, then accessing the index keys for all preceding words for that search word is just a matter of say, 10 machine instructions or so -- probably even less, depending on ISA. So, quite efficient. Locating all the following words would also be the same type of mechanism. Not only is storing a big set of strings efficient in such a container, but accessing or striding into other, related dimensions on that data is also pretty efficient, too. And again, it's an entirely generic proposition; the data's type is pretty much irrelevant for it to all just werk. None of this requires any deep expertise to have industrial-strength data processing in just minutes of simple effort. After that it's all creativity.
>>10078 I (another anon) I'm not sure if I get it. Is it for predicting the next word? Constructing sentences which might make sense?
>>10082 No, it's not for anything. I'm simply spelling out an example of what's possible (sub-microsecond lookups in highly-compacted datasets) for AI Anon. I used a very similar data container approach in our software here called Waifusearch. It's go pretty decent performance now, and I haven't at all tried to go back in and optimize it particularly. I expect I could easily obtain a ten-fold perf improvement using this kind of approach if I really wanted to (I don't, sub-millisecond is OK for now). Just pointing out the industrial quality of the readily-available data containers already out there for everyone for free.
>>10084 Ah, I see. I should have read your >>9986 again to get what this was about. Maybe this could be used to preprocess data.
>>10085 >Maybe this could be used to preprocess data Certainly preprocessing could be done in some fashions this way (indeed should be). But it's the runtime performance that is the superior aspect. Only envisioning it for preprocessing use is kind of like letting the train leave the station w/o getting on it first. It kind of missing the point of the exercise. Not only are these systems generic, they are robust. Couple this with the near-optimal performance characteristics & the child's building-blocks like design strategy possible. I can hardly understand why anyone would program in anything else if you have to do any 'heavy lifting'. I expect developers of Tensorflow, Torch, and dozens of others groups also agree with that perspective, since they all used it to build the libraries themselves. But, I'm rambling. This all started simply b/c I was intrigued to realize I was already doing some of the same things that the mathematicians were talking about behind us mere mechanic's backs :^) simply b/c they insist on couching everything in shorthand notations impossible for the neophyte to pierce, even if they are quite capable of thinking about the ideas themselves. Basically, almost none of the rest of us even have the slightest clue what some of these researchers are even talking about, and it's not a 'fundamental barrier' type of thing. Lol, I'm rambling again.
>>10078 Ohh, that makes sense. Trying to do similar things in Python is orders of magnitude slower, haha. I can see such fast data processing being really useful to create a search engine of sorts the AI can query to get results then analyze and think about those results with the AI model. By the way, the cppyy library makes it super easy to interface with C++ libraries. So blending the raw performance of C++ with neural networks in Python isn't an issue. cppyy: https://cppyy.readthedocs.io/en/latest/ For more abstract searches, a latent variable language model could create an index over sentences. Then using the ANN library, it could do an approximate nearest neighbour search for similar meaning sentences that are worded differently. So someone could search the board for similar posts. Given a post about chatbots it could also pull up posts about GPT2, TalkToWaifu, datasets and any other information closely related, even if those posts don't explicitly mention chatbots. The whole thing could be written in C++ too with Torch's C++ library. A hybrid approach could be used as well. Instead of just string hashes it could look up the word embeddings of those hashes and search preceding and following words by similarity. By doing the L2 distance calculations on the GPU it would be nearly just as fast. That way searching for 'chatbot programming' it would also find close matches with 'chatbots in Python', 'conversational AI in C++' and etc.
>>10086 >>10087 Okay, I'm just seing different things being mixed together here. Promoting CPP as the superior language, the general idea of coding (parts of) AI/Chatbots instead of training them, and specific examples. >approximate nearest neighbour search for similar meaning sentences that are worded differently That's great. I'm just wondering: If that's a good idea and it is obviously something a lot of people would want, then there already should be some software doing this?
>>10087 >For more abstract searches, a latent variable language model could create an index over sentences Wonder how that would look at my level? I'm not too sure what the jargon 'latent variable' even means in the first place, and therefore don't have a good notion of how that would be assembled to run blazingly fast using one of the many kinds of container 'mechanics' (I've given just one example of) -- for the benefit of us all. I know how to assemble little software 'motors' that can run smoking fast (and I also know why, ie, I understand the underlying machine itself), but Math & AI research shorthand jargon? I'm drawing a blank. >inb4 'just Jewgle it anon' >>10090 >I'm just seing different things being mixed together here. Does that really surprise you? Things as complex as intelligence (even the much more simplistic concern of artificial 'intelligence') always (and rightly) has many different topics. Quite apart from mere opinions, there are explicitly distinct domains to address as well. >then there already should be some software doing this? He already named it in his post.
I was wondering if you could teach robowaifus the 'context' of a specific set of sentences by performing analysis on the sentences and figuring out the most important words in a sentence. One could then construct a weighted graph of 'facts' linked to that sentence. Being able to process natural language is definitely a top priority, but an even bigger priority for me is to build some sort of 'intelligence'. I was thinking about utilizing graph networks (like roam) to thus make 'knowledge databases' for the bot to crunch.
>>10096 >graph networks (like roam) Sauce?
Open file (137.46 KB 400x300 ClipboardImage.png)
>>10098 I'm referring to one of the many digital implementations of the zettelkasten method (https://en.m.wikipedia.org/wiki/Zettelkasten). You must have heard of org-roam. The creator of this method said that he had 'conversations' with his zettelkasten, I want to make it happen in a literal sense.
Open file (458.06 KB 2300x1900 Ewb4tFMXAAIT25q.png)
>>10099 >You must have heard of org-roam Well, I guess you could say I have now, heh. :^)
>>10078 It strikes me that one practical use for this kind of data structuring approach would be as a core component for finding all of a word's related terms, an AI's Thesaurus, if you will. If you can associate all synonym words with ranking, and all antonym words, again with ranking, for every single word in the dictionary, then you would have a means to construct a 'graph dictionary' in which any word would naturally 'know' all it's own associations (and quickly). Chain these words together into their sentence utilizing these 'association links' and you would explosively (growth-wise) discover an ad-hoc, interrelated web of other, potentially-similar, sentence constructions. Using these (ranked) synthetic sentence constructions as fodder, you could conceivably round up practically all related ideas within a corpus of text, for any given source sentence. You know, I wish I could do this as quickly with my own brain, that even a toaster computer could do it using this approach, heh. :^)
>>10090 http://www.cs.umd.edu/~mount/ANN/ >>10094 A latent variable is a variable that isn't directly observed but is inferred by a mathematical model. A common latent variable that language models find for example is the sentiment of a sentence, whether it's negative or positive. By adjusting this value in the latent vector, a negative review of a product can be interpolated into a positive review. Other variables might represent sentence length, topic, writing style and so on. A sentence can be encoded into this vector and then decoded back into text with the model. Usually the latent variables are found in an unsupervised manner without labels, but datasets can also be labelled before hand to find them in a semi-supervised manner. You could do silly shit like specify a latent variable for 4chan posts and Reddit posts and identify how much someone writes like 4chan or Reddit. Or use it to interpolate a 4chan post into looking like a Reddit post while keeping the content the same. On the low-level side of things it would just be working with vectors and building algorithms and data structures to work with those vectors efficiently. The most useful thing really is the L2 distance to find how close two vectors are to measure their similarity.
>>10099 This is really intriguing. I can see how someone could converse with it. If you have a question about anything you can follow the links and find similar ideas. It's sort of like a wiki but with a visual aspect. I have my own wiki for managing notes and ideas and it's really useful because I tend to forget stuff and learn new things that help me refine old ideas. >>10096 Graph neural networks have been getting a lot of research attention recently but I haven't studied them much. Modeling the relations between words and sentences could be done but creating a dataset for that would be astronomical. I believe OpenCog uses a graph database and various algorithms to reason about the relations and information stored. It has been integrated into games to interact with NPCs in natural language. It might provide some inspiration.
>>10112 Alright, thanks for explaining that Anon. I haven't really any basis to relate to an 'L2 distance' jargon item other than calculating point vectors in 3D graphics. I have little additional information to understand how Pythagoras' insights would work in your context though, since I'm only used to Descartes' coordinates. What basis unit is used to 'measure' this so-called L2?
Open file (39.49 KB 512x410 l2distance.png)
Open file (40.86 KB 2400x1400 uniformnormal.png)
>>10115 The L2 distance aka Euclidean distance is the square rooted sum of (v1-v2)^2, essentially taking the difference of two vectors and finding the length of the resulting vector. In a 2D space this would be: sqrt(pow(x1-x2, 2) + pow(y1-y2, 2)) How the points are distributed in the latent space (which is like a 2D or 3D space except with hundreds of dimensions) depends on the model. In VAEs the points typically follow normal distributions, but the means of those distributions are all over the place, which leads to gaps and empty spaces no training data ever encodes into thus creating garbage when trying to decode anything from those spaces, which happens whenever trying to interpolate between two training samples. MMD-VAEs solve this problem by using a standard normal distribution as a target, so the points approximate a mean near 0 and a standard deviation close to 1 in each dimension. To illustrate the distance of these points in a MMD-VAE, most of the points will usually be close to zero, except in the dimensions where they have meaningful features. So if you have two latent vectors, each with a unique feature like this: v1 = {0, 1, 0, 0, 0} v2 = {0, 0, 0, 1, 0} Only two dimensions are different, so calculating the L2 distance is sqrt((1-0)^2 + (0-1)^2) which is the square root of 2. If you have another vector that shares these two features: v3 = {0, 1, 0, 1, 0} Its L2 distance to v1 and v2 is just 1.
>>10118 >In VAEs the points typically follow normal distributions, but the means of those distributions are all over the place, which leads to gaps and empty spaces no training data ever encodes into thus creating garbage when trying to decode anything from those spaces, which happens whenever trying to interpolate between two training samples. MMD-VAEs solve this problem by using a standard normal distribution as a target, so the points approximate a mean near 0 and a standard deviation close to 1 in each dimension. OK, that makes a bit more sense to me. It's pretty common in 3D CGI (particularly when converting from one coordinate system to another one) to need to normalize value ranges, typically into -1 < 0 < 1 in the target coordinate system. Your description reminded me somewhat of that issue. By the looks of it, when we do vector in 3D, I suppose you would describe it as 3 latent features. Put another way vecABC = sqrt(A^2 + B^2 + C^2) ?
>>10122 Yeah, latent features and latent variables are pretty much interchangeable. In some cases the variables become a more complicated encoding, like how numbers are encoded in binary. The latent feature might be something like a 3 but requires two latent variables that may or may not have an understandable meaning. This is called entanglement. Ideally we want all the variables to be features representing their own meaningful spectrum, so that the latent space is disentangled.
>>10126 I think I understand that somewhat. So the unit basis is simply whatever the numeric values are stored in the latent variables apparently as integer values (which seem to normalized to the range of 0 to 1 in the examples you showed)? >entangled You may have lost me there. I could conceptualize something like a velocity vector in a physics simulation though, where the object's own inertia has external forces acting on it it as well, say gravity, and some Tennessee Windage?
>>10111 This seems to be an interesting idea, but I suggest we try to find out, if someone already solved a specific problem e.g. finding similar sentences in English, where the model might be freely available. The we can focus on problems which are less general.
>>10132 *Then
>>10132 Wouldn't surprise me if so, but I've never even heard of the specific idea before I wrote it. One big advantage to my thinking if we did something like this ourselves is that a) as long as it was written in straight C++, it would run blazing fast, and b) we here would actually understand exactly how it works (we wrote it after all) and could quickly change it in whatever way we need to, and c) if it has been done already, then it seems to me it will have been done using the typical dependency-garbage/hell bloat of 'modern' Python-based AI work soy common today. So, by using straight C++ we would also have compiled binaries possibly 1/100th the size, and with no dependency hell whatsoever. All that's needed is a C++17 compiler.
>>10138 You're focusing on ow to solve it, my argument was to just find a solution. Do what you want to do, but for the general progess here, available solutions for common problems need to be used. Also, personally I'm not going to read piles of CPP code and abandon existing libraries, which has been tested by others, for it. So, to me the question is rather how to find common names for problems and then find existing solutions?
>>10143 I understand your view Anon. While that agenda is commendable (great for prototyping, even), to me, the real question is "How do we solve all of the issues required, each in their turn, to create life-sized, relatively low-cost, mobile, autonomous, gynoid companions"? Low-performance, low-energy consumption, low-cost computing hardware is very fundamental to that set of solutions. Only compiled languages have a solid hope of running effectively on such hardware. Whether you yourself choose to read said 'piles' of the code or not is mostly irrelevant to anyone but yourself Anon. I see compiled language libraries as absolutely vital to this entire quest, not simply some personal preference or tacked-on idea. It's why I decided to take the plunge to learn C++ in the very first place. I knew it would prove essential in the end to even make hobbyist humanoid robotics feasible at all.
>>10128 The values in the latent vector are real numbers and represent a point in the latent space. I used integers for brevity. In MMD-VAEs they approximate a normal distribution so they mostly fall into the range of -3 to 3 with a mean around 0. When a latent feature is entangled it just means that you need two or more specific variables at certain values to represent it. The latent feature is essentially tied up in those variables, so there's no way to change it without affecting other features, thus the features are said to be entangled with other features. When a latent feature becomes disentangled then only one dimension in the vector is needed to represent it and the point in the latent space can be adjusted along that dimension without affecting anything else.
>>10154 >When a latent feature is entangled it just means that you need two or more specific variables at certain values to represent it I see. That should be as straightforward and simple to implement (data-storage wise) as just adding a std::vector of doubles into the container's hierarchy to accommodate the information needed for it. >The latent feature is essentially tied up in those variables, so there's no way to change it without affecting other features, thus the features are said to be entangled with other features. Hmm. I can't say I can discern exactly what you're saying there just yet Anon, but there are many obvious corollaries in the real world, I'd think. A simple issue like not being able to get out a magazine buried under a stack of magazines might be one loose analogy -- you'd have to move the other magazines off first. If the basic idea you're referring to is as simple as this kind of notion, then there's one thing from my experience I might suggest as a possible approach for dealing with such issues w/o being too disruptive. Namely, when a transform of an element (a 3D skeleton's joint in XYZ coordinates, say) from one coordinate system into another one is needed, it's very commonplace to 'pad' that transform item inside another one, the second one being associated with the outer coordinates (character-space vs. world-space). The introduced element pads the contained item from any external transforms that may be needed (translating an entire skeleton to a new location in world-space, say) w/o disrupting the state of the contained joint itself. Perhaps entangled features can be isolated in some vaguely similar fashion? Back to the magazine analogy, if you first padded the magazine with a "magazinestack-space" padding, then you could open it, read it, write on it with a marker, or change it in other ways without ever removing it from the stack -- from the perspective of the stack itself. And such a padding works both ways of course. You can lift the entire stack and move it (to another bookshelf in the room, say), and simultaneously without interrupting the contained magazine's reader who is currently adding highlights into it. Effin magnets, man. :^)
>related (>>10326 ...)

Report/Delete/Moderation Forms

Captcha (required for reports)

no cookies?