269: Fix bug when inserting previously deleted documents r=Kerollmops a=Kerollmops
This PR fixes#268.
The issue was in the `ExternalDocumentsIds` implementation in the specific case that an external document id was in the soft map marked as deleted.
The bug was due to a wrong assumption on my side about how the FST unions were returning the `IndexedValue`s, I thought the values returned in an array were in the same order as the FSTs given to the `OpBuilder` but in fact, [the `IndexedValue`'s `index` field was here to indicate from which FST the values were coming from](https://docs.rs/fst/0.4.7/fst/map/struct.IndexedValue.html).
271: Remove the roaring operation functions warnings r=Kerollmops a=Kerollmops
In this PR we are just replacing the usages of the roaring operations function by the new operators. This removes a lot of warnings.
Co-authored-by: Kerollmops <clement@meilisearch.com>
259: Run rustfmt one the whole project and add it to the CI r=curquiza a=irevoire
Since there is currently no other PR modifying the code, I think it's a good time to reformat everything and add rustfmt to the ci.
Co-authored-by: Tamo <tamo@meilisearch.com>
267: Highlighting r=Kerollmops a=irevoire
closes#262
I basically rewrote a part of the damerau-levenshtein function we were using for the highlighting to accept at most two errors from the user and stop on the third mistake.
Also, now it supports utf-8, so it should fix our issue.
Co-authored-by: Tamo <tamo@meilisearch.com>
Co-authored-by: Irevoire <irevoire@protonmail.ch>
258: Use rustls instead of openssl r=curquiza a=irevoire
I also removed all the `default-features` of reqwest since we are only using the JSON one.
Fix#255
Co-authored-by: Tamo <tamo@meilisearch.com>
246: Stop logging the no space left on device error r=curquiza a=irevoire
closes#208
@qdequele what do you think of that?
Are there any other errors we need to ignore?
As you can see in the code, once we are in `Sentry` the error has already been converted to a `String` so the only thing we can do to see if we need to send the error or not is to match the `String` against our error message.
If we have a lot of other logs we want to ignore I would suggest prefixing all the logs with something like:
```
User error: No space left on device
```
So in Sentry, we could just check if the log start by `User error:` and ignore all these errors at once
Co-authored-by: Tamo <tamo@meilisearch.com>
266: Bump LMDB to the latest version (v0.9.70) r=Kerollmops a=Kerollmops
By bumping to a new version of heed (from git, v0.12.0 unpublished yet), this PR fixes Windows disk reservation problems. This new version of heed changes the `del/put_current`, and `append` iterator methods signature by declaring them unsafe.
This PR also bumps milli itself into v0.7.0 as it is breaking due to the heed/LMDB bump.
This PR must be merged after #264.
Co-authored-by: Clément Renault <clement@meilisearch.com>
252: Fix docker run r=curquiza a=curquiza
Not the most beautiful fix since I cannot update alpine to version 3.14 without being flooded with errors.
Co-authored-by: Clémentine Urquizar <clementine@meilisearch.com>
249: Use half of the computer threads for the indexing process by default r=Kerollmops a=irevoire
closes#241
By default, we use only half of the CPU threads when indexing documents; this allows the user to use the search while indexing. Also, the machine will not appear unresponsive when indexing a large batch of documents.
On the special case where a user only has one core, we use it entirely 😄
Co-authored-by: Tamo <tamo@meilisearch.com>
It is undefined behavior to keep a reference to the database while
modifying it, we were keeping references in the database and also
feeding the heed put_current methods with keys referenced inside
the database itself.
https://github.com/Kerollmops/heed/pull/108
248: Unused borrow that must be used r=curquiza a=irevoire
I noticed #228 introduced a warning while compiling
Co-authored-by: Tamo <tamo@meilisearch.com>
228: Authentication rework r=curquiza a=MarinPostma
In an attempt to fix#201, I ended up rewriting completely the authentication system we use. This is because actix doesn't allow to wrap a single route into a middleware, so we initially put each route into it's own service to use the authentication middleware. Routes are now grouped in resources, fixing #201.
As for the authentication, I decided to take a very different approach, and ditch middleware altogether. Instead, I decided to use actix's [extractor](https://actix.rs/docs/extractors/). `Data` is now wrapped in a `GuardedData<P: Policy, T>` (where `T` is `Data`) in each route. The `Policy` trait, thanks to the `authenticate` method tell if a request is authorized to access the resources in the route. Concretely, before the server starts, it is configured with a `AuthConfig` instance that can either be `AuthConfig::NoAuth` when no auth is required at runtime, or `AuthConfig::Auth(Policies)`, where `Policies` maps the `Policy` type to it singleton instance.
In the current implementation, and this to match the legacy meilisearch behaviour, each policy implementation contains a `HashSet` of token (`Vec<u8>` for now), that represents the user it can authenticate. When starting the program, each key (identified as a user) is given a set of `Policy`, representing its roles. The later is facilitated by the `create_users` macro, like so:
```rust
create_users!(
policies,
master_key.as_bytes() => { Admin, Private, Public },
private_key.as_bytes() => { Private, Public },
public_key.as_bytes() => { Public }
);
```
This is some groundwork for later development on a full fledged authentication system for meilisearch.
fix#201
Co-authored-by: marin postma <postma.marin@protonmail.com>