nushell/crates/nu-cmd-dataframe/src/dataframe/expressions/datepart.rs
WMR b67b6f7fc5
Add a datepart expression for dfr to be used with dfr with-column (#9285)
# Description

Today the only way to extract date parts from a dfr series is the dfr
get-* set of commands. These create a new dataframe with just the
datepart in it, which is almost entirely useless. As far as I can tell
there's no way to append it as a series in the original dataframe. In
discussion with fdncred on Discord we decided the best route was to add
an expression for modifying columns created in dfr with-column. These
are the way you manipulate series within a data frame.

I'd like feedback on this approach - I think it's a fair way to handle
things. An example to test it would be:

```[[ record_time]; [ (date now)]]  | dfr into-df | dfr with-column [ ((dfr col record_time) | dfr datepart nanosecond | dfr as "ns" ), (dfr col record_time | dfr datepart second | dfr as "s"), (dfr col record_time | dfr datepart minute | dfr as "m"), (dfr col record_time | dfr datepart hour | dfr as "h") ]```

I'm also proposing we deprecate the dfr get-* commands.  I've not been able to figure out any meaningful way they could ever be useful, and this approach makes more sense by attaching the extracted date part to the row in the original dataframe as a new column.

<!--
Thank you for improving Nushell. Please, check our [contributing guide](../CONTRIBUTING.md) and talk to the core team before making major changes.

Description of your pull request goes here. **Provide examples and/or screenshots** if your changes affect the user experience.
-->

# User-Facing Changes

add in dfr datepart as an expression
<!-- List of all changes that impact the user experience here. This helps us keep track of breaking changes. -->

# Tests + Formatting
Need to add some better assertive tests.  I'm also not sure how to properly write the test_dataframe at the bottom, but will revisit as part of this PR.  Wanted to get feedback early.

<!--
Don't forget to add tests that cover your changes.

Make sure you've run and fixed any issues with these commands:

- `cargo fmt --all -- --check` to check standard code formatting (`cargo fmt --all` applies these changes)
- `cargo clippy --workspace -- -D warnings -D clippy::unwrap_used -A clippy::needless_collect -A clippy::result_large_err` to check that you're using the standard code style
- `cargo test --workspace` to check that all tests pass
- `cargo run -- crates/nu-std/tests/run.nu` to run the tests for the standard library

> **Note**
> from `nushell` you can also use the `toolkit` as follows
> ```bash
> use toolkit.nu  # or use an `env_change` hook to activate it automatically
> toolkit check pr
> ```
-->

# After Submitting
<!-- If your PR had any user-facing changes, update [the documentation](https://github.com/nushell/nushell.github.io) after the PR is merged, if necessary. This will help us keep the docs up to date. -->

---------

Co-authored-by: Robert Waugh <robert@waugh.io>
2023-05-30 09:41:18 -05:00

163 lines
6.3 KiB
Rust

use super::super::values::NuExpression;
use crate::dataframe::values::{Column, NuDataFrame};
use chrono::{DateTime, FixedOffset};
use nu_engine::CallExt;
use nu_protocol::{
ast::Call,
engine::{Command, EngineState, Stack},
Category, Example, PipelineData, ShellError, Signature, Span, Spanned, SyntaxShape, Type,
Value,
};
#[derive(Clone)]
pub struct ExprDatePart;
impl Command for ExprDatePart {
fn name(&self) -> &str {
"dfr datepart"
}
fn usage(&self) -> &str {
"Creates an expression for capturing the specified datepart in a column."
}
fn signature(&self) -> Signature {
Signature::build(self.name())
.required(
"Datepart name",
SyntaxShape::String,
"Part of the date to capture. Possible values are year, quarter, month, week, weekday, day, hour, minute, second, millisecond, microsecond, nanosecond",
)
.input_type(Type::Custom("expression".into()))
.output_type(Type::Custom("expression".into()))
.category(Category::Custom("expression".into()))
}
fn examples(&self) -> Vec<Example> {
let dt = DateTime::<FixedOffset>::parse_from_str(
"2021-12-30T01:02:03.123456789 +0000",
"%Y-%m-%dT%H:%M:%S.%9f %z",
)
.expect("date calculation should not fail in test");
vec![
Example {
description: "Creates an expression to capture the year date part",
example: r#"[["2021-12-30T01:02:03.123456789"]] | dfr into-df | dfr as-datetime "%Y-%m-%dT%H:%M:%S.%9f" | dfr with-column [(dfr col datetime | dfr datepart year | dfr as datetime_year )]"#,
result: Some(
NuDataFrame::try_from_columns(vec![
Column::new("datetime".to_string(), vec![Value::test_date(dt)]),
Column::new("datetime_year".to_string(), vec![Value::test_int(2021)]),
])
.expect("simple df for test should not fail")
.into_value(Span::test_data()),
),
},
Example {
description: "Creates an expression to capture multiple date parts",
example: r#"[["2021-12-30T01:02:03.123456789"]] | dfr into-df | dfr as-datetime "%Y-%m-%dT%H:%M:%S.%9f" |
dfr with-column [ (dfr col datetime | dfr datepart year | dfr as datetime_year ),
(dfr col datetime | dfr datepart month | dfr as datetime_month ),
(dfr col datetime | dfr datepart day | dfr as datetime_day ),
(dfr col datetime | dfr datepart hour | dfr as datetime_hour ),
(dfr col datetime | dfr datepart minute | dfr as datetime_minute ),
(dfr col datetime | dfr datepart second | dfr as datetime_second ),
(dfr col datetime | dfr datepart nanosecond | dfr as datetime_ns ) ]"#,
result: Some(
NuDataFrame::try_from_columns(vec![
Column::new("datetime".to_string(), vec![Value::test_date(dt)]),
Column::new("datetime_year".to_string(), vec![Value::test_int(2021)]),
Column::new("datetime_month".to_string(), vec![Value::test_int(12)]),
Column::new("datetime_day".to_string(), vec![Value::test_int(30)]),
Column::new("datetime_hour".to_string(), vec![Value::test_int(1)]),
Column::new("datetime_minute".to_string(), vec![Value::test_int(2)]),
Column::new("datetime_second".to_string(), vec![Value::test_int(3)]),
Column::new("datetime_ns".to_string(), vec![Value::test_int(123456789)]),
])
.expect("simple df for test should not fail")
.into_value(Span::test_data()),
),
},
]
}
fn search_terms(&self) -> Vec<&str> {
vec![
"year",
"month",
"week",
"weekday",
"quarter",
"day",
"hour",
"minute",
"second",
"millisecond",
"microsecond",
"nanosecond",
]
}
fn run(
&self,
engine_state: &EngineState,
stack: &mut Stack,
call: &Call,
input: PipelineData,
) -> Result<PipelineData, ShellError> {
let part: Spanned<String> = call.req(engine_state, stack, 0)?;
let expr = NuExpression::try_from_pipeline(input, call.head)?;
let expr_dt = expr.into_polars().dt();
let expr = match part.item.as_str() {
"year" => expr_dt.year(),
"quarter" => expr_dt.quarter(),
"month" => expr_dt.month(),
"week" => expr_dt.week(),
"day" => expr_dt.day(),
"hour" => expr_dt.hour(),
"minute" => expr_dt.minute(),
"second" => expr_dt.second(),
"millisecond" => expr_dt.millisecond(),
"microsecond" => expr_dt.microsecond(),
"nanosecond" => expr_dt.nanosecond(),
_ => {
return Err(ShellError::UnsupportedInput(
format!("{} is not a valid datepart, expected one of year, month, day, hour, minute, second, millisecond, microsecond, nanosecond", part.item),
"value originates from here".to_string(),
call.head,
part.span,
));
}
}.into();
Ok(PipelineData::Value(
NuExpression::into_value(expr, call.head),
None,
))
}
}
#[cfg(test)]
mod test {
use super::super::super::test_dataframe::test_dataframe;
use super::*;
use crate::dataframe::eager::WithColumn;
use crate::dataframe::expressions::ExprAlias;
use crate::dataframe::expressions::ExprAsNu;
use crate::dataframe::expressions::ExprCol;
use crate::dataframe::series::AsDateTime;
#[test]
fn test_examples() {
test_dataframe(vec![
Box::new(ExprDatePart {}),
Box::new(ExprCol {}),
Box::new(ExprAsNu {}),
Box::new(AsDateTime {}),
Box::new(WithColumn {}),
Box::new(ExprAlias {}),
])
}
}