loreley.core.map_elites.code_embedding¶
Commit-level code embedding utilities that consume chunked code artifacts and talk to the OpenAI embeddings API as part of the Map-Elites pipeline.
Data structures¶
ChunkEmbedding: embedding vector derived from a singleFileChunk, storing the original chunk, its numeric embeddingvector, and a scalarweightused during aggregation.FileEmbedding: aggregated embedding for oneChunkedFile, including the source file, the tuple ofChunkEmbeddinginstances, a file-levelvector, and an overallweight.CommitCodeEmbedding: commit-level representation that bundles allFileEmbeddinginstances, the final aggregatedvector, the embeddingmodelname, anddimensions, plus achunk_countconvenience property.
Embedder¶
CodeEmbedder: orchestrates calls to the OpenAI embeddings API and aggregation logic.- Configured via
Settingsmap-elites code embedding options (MAPELITES_CODE_EMBEDDING_*) controlling model name, output dimensions, batch size, maximum chunks per commit, retry count, and retry backoff. run(chunked_files)filters out empty inputs, flattens chunks into a payload, embeds them in batches with arichprogress spinner, and turns raw vectors intoChunkEmbedding,FileEmbedding, andCommitCodeEmbeddingobjects using weighted averaging.- Logs detailed progress and warnings with
loguru, including mismatched response sizes, missing owners for chunks, and empty aggregation results.
Convenience API¶
embed_chunked_files(chunked_files, settings=None, client=None): helper that constructs aCodeEmbedderand returns aCommitCodeEmbeddingfor the supplied chunked files, orNoneif there is nothing worth embedding.