nushell/crates/nu-cmd-dataframe/src/dataframe/eager/to_nu.rs
Ian Manske 6fd854ed9f
Replace ExternalStream with new ByteStream type (#12774)
# Description
This PR introduces a `ByteStream` type which is a `Read`-able stream of
bytes. Internally, it has an enum over three different byte stream
sources:
```rust
pub enum ByteStreamSource {
    Read(Box<dyn Read + Send + 'static>),
    File(File),
    Child(ChildProcess),
}
```

This is in comparison to the current `RawStream` type, which is an
`Iterator<Item = Vec<u8>>` and has to allocate for each read chunk.

Currently, `PipelineData::ExternalStream` serves a weird dual role where
it is either external command output or a wrapper around `RawStream`.
`ByteStream` makes this distinction more clear (via `ByteStreamSource`)
and replaces `PipelineData::ExternalStream` in this PR:
```rust
pub enum PipelineData {
    Empty,
    Value(Value, Option<PipelineMetadata>),
    ListStream(ListStream, Option<PipelineMetadata>),
    ByteStream(ByteStream, Option<PipelineMetadata>),
}
```

The PR is relatively large, but a decent amount of it is just repetitive
changes.

This PR fixes #7017, fixes #10763, and fixes #12369.

This PR also improves performance when piping external commands. Nushell
should, in most cases, have competitive pipeline throughput compared to,
e.g., bash.
| Command | Before (MB/s) | After (MB/s) | Bash (MB/s) |
| -------------------------------------------------- | -------------:|
------------:| -----------:|
| `throughput \| rg 'x'` | 3059 | 3744 | 3739 |
| `throughput \| nu --testbin relay o> /dev/null` | 3508 | 8087 | 8136 |

# User-Facing Changes
- This is a breaking change for the plugin communication protocol,
because the `ExternalStreamInfo` was replaced with `ByteStreamInfo`.
Plugins now only have to deal with a single input stream, as opposed to
the previous three streams: stdout, stderr, and exit code.
- The output of `describe` has been changed for external/byte streams.
- Temporary breaking change: `bytes starts-with` no longer works with
byte streams. This is to keep the PR smaller, and `bytes ends-with`
already does not work on byte streams.
- If a process core dumped, then instead of having a `Value::Error` in
the `exit_code` column of the output returned from `complete`, it now is
a `Value::Int` with the negation of the signal number.

# After Submitting
- Update docs and book as necessary
- Release notes (e.g., plugin protocol changes)
- Adapt/convert commands to work with byte streams (high priority is
`str length`, `bytes starts-with`, and maybe `bytes ends-with`).
- Refactor the `tee` code, Devyn has already done some work on this.

---------

Co-authored-by: Devyn Cairns <devyn.cairns@gmail.com>
2024-05-16 07:11:18 -07:00

137 lines
4.2 KiB
Rust

use crate::dataframe::values::{NuDataFrame, NuExpression};
use nu_engine::command_prelude::*;
#[derive(Clone)]
pub struct ToNu;
impl Command for ToNu {
fn name(&self) -> &str {
"dfr into-nu"
}
fn usage(&self) -> &str {
"Converts a dataframe or an expression into into nushell value for access and exploration."
}
fn signature(&self) -> Signature {
Signature::build(self.name())
.named(
"rows",
SyntaxShape::Number,
"number of rows to be shown",
Some('n'),
)
.switch("tail", "shows tail rows", Some('t'))
.input_output_types(vec![
(Type::Custom("expression".into()), Type::Any),
(Type::Custom("dataframe".into()), Type::table()),
])
//.input_output_type(Type::Any, Type::Any)
.category(Category::Custom("dataframe".into()))
}
fn examples(&self) -> Vec<Example> {
let rec_1 = Value::test_record(record! {
"index" => Value::test_int(0),
"a" => Value::test_int(1),
"b" => Value::test_int(2),
});
let rec_2 = Value::test_record(record! {
"index" => Value::test_int(1),
"a" => Value::test_int(3),
"b" => Value::test_int(4),
});
let rec_3 = Value::test_record(record! {
"index" => Value::test_int(2),
"a" => Value::test_int(3),
"b" => Value::test_int(4),
});
vec![
Example {
description: "Shows head rows from dataframe",
example: "[[a b]; [1 2] [3 4]] | dfr into-df | dfr into-nu",
result: Some(Value::list(vec![rec_1, rec_2], Span::test_data())),
},
Example {
description: "Shows tail rows from dataframe",
example: "[[a b]; [1 2] [5 6] [3 4]] | dfr into-df | dfr into-nu --tail --rows 1",
result: Some(Value::list(vec![rec_3], Span::test_data())),
},
Example {
description: "Convert a col expression into a nushell value",
example: "dfr col a | dfr into-nu",
result: Some(Value::test_record(record! {
"expr" => Value::test_string("column"),
"value" => Value::test_string("a"),
})),
},
]
}
fn run(
&self,
engine_state: &EngineState,
stack: &mut Stack,
call: &Call,
input: PipelineData,
) -> Result<PipelineData, ShellError> {
let value = input.into_value(call.head)?;
if NuDataFrame::can_downcast(&value) {
dataframe_command(engine_state, stack, call, value)
} else {
expression_command(call, value)
}
}
}
fn dataframe_command(
engine_state: &EngineState,
stack: &mut Stack,
call: &Call,
input: Value,
) -> Result<PipelineData, ShellError> {
let rows: Option<usize> = call.get_flag(engine_state, stack, "rows")?;
let tail: bool = call.has_flag(engine_state, stack, "tail")?;
let df = NuDataFrame::try_from_value(input)?;
let values = if tail {
df.tail(rows, call.head)?
} else {
// if rows is specified, return those rows, otherwise return everything
if rows.is_some() {
df.head(rows, call.head)?
} else {
df.head(Some(df.height()), call.head)?
}
};
let value = Value::list(values, call.head);
Ok(PipelineData::Value(value, None))
}
fn expression_command(call: &Call, input: Value) -> Result<PipelineData, ShellError> {
let expr = NuExpression::try_from_value(input)?;
let value = expr.to_value(call.head)?;
Ok(PipelineData::Value(value, None))
}
#[cfg(test)]
mod test {
use super::super::super::expressions::ExprCol;
use super::super::super::test_dataframe::test_dataframe;
use super::*;
#[test]
fn test_examples_dataframe_input() {
test_dataframe(vec![Box::new(ToNu {})])
}
#[test]
fn test_examples_expression_input() {
test_dataframe(vec![Box::new(ToNu {}), Box::new(ExprCol {})])
}
}